Searched hist:26309 (Results 1 - 5 of 5) sorted by relevance

/linux-master/tools/testing/selftests/bpf/
H A Dtest_verifier.cdiff 19e2dbb7 Thu Dec 13 12:42:33 MST 2018 Alexei Starovoitov <ast@kernel.org> bpf: improve stacksafe state comparison

"if (old->allocated_stack > cur->allocated_stack)" check is too conservative.
In some cases explored stack could have allocated more space,
but that stack space was not live.
The test case improves from 19 to 15 processed insns
and improvement on real programs is significant as well:

before after
bpf_lb-DLB_L3.o 1940 1831
bpf_lb-DLB_L4.o 3089 3029
bpf_lb-DUNKNOWN.o 1065 1064
bpf_lxc-DDROP_ALL.o 28052 26309
bpf_lxc-DUNKNOWN.o 35487 33517
bpf_netdev.o 10864 9713
bpf_overlay.o 6643 6184
bpf_lcx_jit.o 38437 37335

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
/linux-master/tools/perf/
H A Dbuiltin-trace.cdiff 42052bea Thu Feb 12 20:32:45 MST 2015 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Print thread info when following children

The default for 'trace workload' is to set perf_event_attr.inherit to 1,
i.e. to make it equivalent to 'strace -f workload', so we were ending
with syscalls for multiple processes mixed up, fix it:

Before:

[root@ssdandy ~]# trace -e brk time usleep 1
0.071 ( 0.002 ms): brk( ) = 0x100e000
0.802 ( 0.001 ms): brk( ) = 0x1d99000
1.132 ( 0.003 ms): brk( ) = 0x1d99000
1.136 ( 0.003 ms): brk(brk: 0x1dba000) = 0x1dba000
1.140 ( 0.001 ms): brk( ) = 0x1dba000
0.00user 0.00system 0:00.00elapsed 63%CPU (0avgtext+0avgdata 528maxresident)k
0inputs+0outputs (0major+181minor)pagefaults 0swaps
[root@ssdandy ~]#

After:

[root@ssdandy ~]# trace -f -e brk time usleep 1
0.072 ( 0.002 ms): time/26308 brk( ) = 0x1e6e000
0.860 ( 0.001 ms): usleep/26309 brk( ) = 0xb91000
1.193 ( 0.003 ms): usleep/26309 brk( ) = 0xb91000
1.197 ( 0.003 ms): usleep/26309 brk(brk: 0xbb2000) = 0xbb2000
1.201 ( 0.001 ms): usleep/26309 brk( ) = 0xbb2000
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 524maxresident)k
0inputs+0outputs (0major+180minor)pagefaults 0swaps
[root@ssdandy ~]#

BTW: to achieve the 'strace workload' behaviour, i.e. without a explicit
'-f', one has to use --no-inherit.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
echo Link: http://lkml.kernel.org/n/tip-`ranpwd -l 24`@git.kernel.org
Link: http://lkml.kernel.org/n/tip-2wu2d5n65msxoq1i7vtcaft2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff 42052bea Thu Feb 12 20:32:45 MST 2015 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Print thread info when following children

The default for 'trace workload' is to set perf_event_attr.inherit to 1,
i.e. to make it equivalent to 'strace -f workload', so we were ending
with syscalls for multiple processes mixed up, fix it:

Before:

[root@ssdandy ~]# trace -e brk time usleep 1
0.071 ( 0.002 ms): brk( ) = 0x100e000
0.802 ( 0.001 ms): brk( ) = 0x1d99000
1.132 ( 0.003 ms): brk( ) = 0x1d99000
1.136 ( 0.003 ms): brk(brk: 0x1dba000) = 0x1dba000
1.140 ( 0.001 ms): brk( ) = 0x1dba000
0.00user 0.00system 0:00.00elapsed 63%CPU (0avgtext+0avgdata 528maxresident)k
0inputs+0outputs (0major+181minor)pagefaults 0swaps
[root@ssdandy ~]#

After:

[root@ssdandy ~]# trace -f -e brk time usleep 1
0.072 ( 0.002 ms): time/26308 brk( ) = 0x1e6e000
0.860 ( 0.001 ms): usleep/26309 brk( ) = 0xb91000
1.193 ( 0.003 ms): usleep/26309 brk( ) = 0xb91000
1.197 ( 0.003 ms): usleep/26309 brk(brk: 0xbb2000) = 0xbb2000
1.201 ( 0.001 ms): usleep/26309 brk( ) = 0xbb2000
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 524maxresident)k
0inputs+0outputs (0major+180minor)pagefaults 0swaps
[root@ssdandy ~]#

BTW: to achieve the 'strace workload' behaviour, i.e. without a explicit
'-f', one has to use --no-inherit.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
echo Link: http://lkml.kernel.org/n/tip-`ranpwd -l 24`@git.kernel.org
Link: http://lkml.kernel.org/n/tip-2wu2d5n65msxoq1i7vtcaft2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff 42052bea Thu Feb 12 20:32:45 MST 2015 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Print thread info when following children

The default for 'trace workload' is to set perf_event_attr.inherit to 1,
i.e. to make it equivalent to 'strace -f workload', so we were ending
with syscalls for multiple processes mixed up, fix it:

Before:

[root@ssdandy ~]# trace -e brk time usleep 1
0.071 ( 0.002 ms): brk( ) = 0x100e000
0.802 ( 0.001 ms): brk( ) = 0x1d99000
1.132 ( 0.003 ms): brk( ) = 0x1d99000
1.136 ( 0.003 ms): brk(brk: 0x1dba000) = 0x1dba000
1.140 ( 0.001 ms): brk( ) = 0x1dba000
0.00user 0.00system 0:00.00elapsed 63%CPU (0avgtext+0avgdata 528maxresident)k
0inputs+0outputs (0major+181minor)pagefaults 0swaps
[root@ssdandy ~]#

After:

[root@ssdandy ~]# trace -f -e brk time usleep 1
0.072 ( 0.002 ms): time/26308 brk( ) = 0x1e6e000
0.860 ( 0.001 ms): usleep/26309 brk( ) = 0xb91000
1.193 ( 0.003 ms): usleep/26309 brk( ) = 0xb91000
1.197 ( 0.003 ms): usleep/26309 brk(brk: 0xbb2000) = 0xbb2000
1.201 ( 0.001 ms): usleep/26309 brk( ) = 0xbb2000
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 524maxresident)k
0inputs+0outputs (0major+180minor)pagefaults 0swaps
[root@ssdandy ~]#

BTW: to achieve the 'strace workload' behaviour, i.e. without a explicit
'-f', one has to use --no-inherit.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
echo Link: http://lkml.kernel.org/n/tip-`ranpwd -l 24`@git.kernel.org
Link: http://lkml.kernel.org/n/tip-2wu2d5n65msxoq1i7vtcaft2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff 42052bea Thu Feb 12 20:32:45 MST 2015 Arnaldo Carvalho de Melo <acme@redhat.com> perf trace: Print thread info when following children

The default for 'trace workload' is to set perf_event_attr.inherit to 1,
i.e. to make it equivalent to 'strace -f workload', so we were ending
with syscalls for multiple processes mixed up, fix it:

Before:

[root@ssdandy ~]# trace -e brk time usleep 1
0.071 ( 0.002 ms): brk( ) = 0x100e000
0.802 ( 0.001 ms): brk( ) = 0x1d99000
1.132 ( 0.003 ms): brk( ) = 0x1d99000
1.136 ( 0.003 ms): brk(brk: 0x1dba000) = 0x1dba000
1.140 ( 0.001 ms): brk( ) = 0x1dba000
0.00user 0.00system 0:00.00elapsed 63%CPU (0avgtext+0avgdata 528maxresident)k
0inputs+0outputs (0major+181minor)pagefaults 0swaps
[root@ssdandy ~]#

After:

[root@ssdandy ~]# trace -f -e brk time usleep 1
0.072 ( 0.002 ms): time/26308 brk( ) = 0x1e6e000
0.860 ( 0.001 ms): usleep/26309 brk( ) = 0xb91000
1.193 ( 0.003 ms): usleep/26309 brk( ) = 0xb91000
1.197 ( 0.003 ms): usleep/26309 brk(brk: 0xbb2000) = 0xbb2000
1.201 ( 0.001 ms): usleep/26309 brk( ) = 0xbb2000
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 524maxresident)k
0inputs+0outputs (0major+180minor)pagefaults 0swaps
[root@ssdandy ~]#

BTW: to achieve the 'strace workload' behaviour, i.e. without a explicit
'-f', one has to use --no-inherit.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
echo Link: http://lkml.kernel.org/n/tip-`ranpwd -l 24`@git.kernel.org
Link: http://lkml.kernel.org/n/tip-2wu2d5n65msxoq1i7vtcaft2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
/linux-master/include/linux/
H A Dbpf_verifier.hdiff 9f4686c4 Mon Apr 01 22:27:41 MDT 2019 Alexei Starovoitov <ast@kernel.org> bpf: improve verification speed by droping states

Branch instructions, branch targets and calls in a bpf program are
the places where the verifier remembers states that led to successful
verification of the program.
These states are used to prune brute force program analysis.
For unprivileged programs there is a limit of 64 states per such
'branching' instructions (maximum length is tracked by max_states_per_insn
counter introduced in the previous patch).
Simply reducing this threshold to 32 or lower increases insn_processed
metric to the point that small valid programs get rejected.
For root programs there is no limit and cilium programs can have
max_states_per_insn to be 100 or higher.
Walking 100+ states multiplied by number of 'branching' insns during
verification consumes significant amount of cpu time.
Turned out simple LRU-like mechanism can be used to remove states
that unlikely will be helpful in future search pruning.
This patch introduces hit_cnt and miss_cnt counters:
hit_cnt - this many times this state successfully pruned the search
miss_cnt - this many times this state was not equivalent to other states
(and that other states were added to state list)

The heuristic introduced in this patch is:
if (sl->miss_cnt > sl->hit_cnt * 3 + 3)
/* drop this state from future considerations */

Higher numbers increase max_states_per_insn (allow more states to be
considered for pruning) and slow verification speed, but do not meaningfully
reduce insn_processed metric.
Lower numbers drop too many states and insn_processed increases too much.
Many different formulas were considered.
This one is simple and works well enough in practice.
(the analysis was done on selftests/progs/* and on cilium programs)

The end result is this heuristic improves verification speed by 10 times.
Large synthetic programs that used to take a second more now take
1/10 of a second.
In cases where max_states_per_insn used to be 100 or more, now it's ~10.

There is a slight increase in insn_processed for cilium progs:
before after
bpf_lb-DLB_L3.o 1831 1838
bpf_lb-DLB_L4.o 3029 3218
bpf_lb-DUNKNOWN.o 1064 1064
bpf_lxc-DDROP_ALL.o 26309 26935
bpf_lxc-DUNKNOWN.o 33517 34439
bpf_netdev.o 9713 9721
bpf_overlay.o 6184 6184
bpf_lcx_jit.o 37335 39389
And 2-3 times improvement in the verification speed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
/linux-master/drivers/block/drbd/
H A Ddrbd_main.cdiff 27548088 Thu Nov 04 02:07:09 MDT 2021 Wu Bo <wubo40@huawei.com> drbd: Fix double free problem in drbd_create_device

In drbd_create_device(), the 'out_no_io_page' lable has called
blk_cleanup_disk() when return failed.

So remove the 'out_cleanup_disk' lable to avoid double free the
disk pointer.

Fixes: e92ab4eda516 ("drbd: add error handling support for add_disk()")
Signed-off-by: Wu Bo <wubo40@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1636013229-26309-1-git-send-email-wubo40@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
/linux-master/kernel/bpf/
H A Dverifier.cdiff 9f4686c4 Mon Apr 01 22:27:41 MDT 2019 Alexei Starovoitov <ast@kernel.org> bpf: improve verification speed by droping states

Branch instructions, branch targets and calls in a bpf program are
the places where the verifier remembers states that led to successful
verification of the program.
These states are used to prune brute force program analysis.
For unprivileged programs there is a limit of 64 states per such
'branching' instructions (maximum length is tracked by max_states_per_insn
counter introduced in the previous patch).
Simply reducing this threshold to 32 or lower increases insn_processed
metric to the point that small valid programs get rejected.
For root programs there is no limit and cilium programs can have
max_states_per_insn to be 100 or higher.
Walking 100+ states multiplied by number of 'branching' insns during
verification consumes significant amount of cpu time.
Turned out simple LRU-like mechanism can be used to remove states
that unlikely will be helpful in future search pruning.
This patch introduces hit_cnt and miss_cnt counters:
hit_cnt - this many times this state successfully pruned the search
miss_cnt - this many times this state was not equivalent to other states
(and that other states were added to state list)

The heuristic introduced in this patch is:
if (sl->miss_cnt > sl->hit_cnt * 3 + 3)
/* drop this state from future considerations */

Higher numbers increase max_states_per_insn (allow more states to be
considered for pruning) and slow verification speed, but do not meaningfully
reduce insn_processed metric.
Lower numbers drop too many states and insn_processed increases too much.
Many different formulas were considered.
This one is simple and works well enough in practice.
(the analysis was done on selftests/progs/* and on cilium programs)

The end result is this heuristic improves verification speed by 10 times.
Large synthetic programs that used to take a second more now take
1/10 of a second.
In cases where max_states_per_insn used to be 100 or more, now it's ~10.

There is a slight increase in insn_processed for cilium progs:
before after
bpf_lb-DLB_L3.o 1831 1838
bpf_lb-DLB_L4.o 3029 3218
bpf_lb-DUNKNOWN.o 1064 1064
bpf_lxc-DDROP_ALL.o 26309 26935
bpf_lxc-DUNKNOWN.o 33517 34439
bpf_netdev.o 9713 9721
bpf_overlay.o 6184 6184
bpf_lcx_jit.o 37335 39389
And 2-3 times improvement in the verification speed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
diff 19e2dbb7 Thu Dec 13 12:42:33 MST 2018 Alexei Starovoitov <ast@kernel.org> bpf: improve stacksafe state comparison

"if (old->allocated_stack > cur->allocated_stack)" check is too conservative.
In some cases explored stack could have allocated more space,
but that stack space was not live.
The test case improves from 19 to 15 processed insns
and improvement on real programs is significant as well:

before after
bpf_lb-DLB_L3.o 1940 1831
bpf_lb-DLB_L4.o 3089 3029
bpf_lb-DUNKNOWN.o 1065 1064
bpf_lxc-DDROP_ALL.o 28052 26309
bpf_lxc-DUNKNOWN.o 35487 33517
bpf_netdev.o 10864 9713
bpf_overlay.o 6643 6184
bpf_lcx_jit.o 38437 37335

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Completed in 1553 milliseconds