History log of /freebsd-current/sys/dev/hwpmc/hwpmc_logging.c
Revision Date Author Comments
# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 94426d21 30-May-2023 Jessica Clarke <jrtc27@FreeBSD.org>

pmc: Rework PROCEXEC event to support PIEs

Currently the PROCEXEC event only reports a single address, entryaddr,
which is the entry point of the interpreter in the typical dynamic case,
and used solely to calculate the base address of the interpreter. For
PDEs this is fine, since the base address is known from the program
headers, but for PIEs the base address varies at run time based on where
the kernel chooses to load it, and so pmcstat has no way of knowing the
real address ranges for the executable. This was less of an issue in the
past since PIEs were rare, but now they're on by default on 64-bit
architectures it's more of a problem.

To solve this, pass through what was picked for et_dyn_addr by the
kernel, and use that as the offset for the executable's start address
just as is done for everything in the kernel. Since we're changing this
interface, sanitise the way we determine the interpreter's base address
by passing it through directly rather than indirectly via the entry
point and having to subtract off whatever the ELF header's e_entry is
(and anything that wants the entry point in future can still add that
back on as needed; this merely changes the interface to directly provide
the underlying variables involved).

This will be followed up by a bump to the pmc major version.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D39595


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 3a7c9fc0 05-May-2023 Mitchell Horne <mhorne@FreeBSD.org>

hwpmc_logging: less macro magic for type names

Provide the log type names in their entirely, rather than relying on the
macro to prepend the prefix. This improves their searchability; for
example, if I see PMCLOG_TYPE_PMCALLOCATE in libpmc I will now be able
to find where that is emitted in the kernel with a simple grep.

Reviewed by: jkoshy, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39911


# cb6abe87 08-Sep-2022 Elliott Mitchell <ehem+freebsd@m5p.com>

hwpmc: purge EOL release compatibility


# ba95c556 21-Jul-2022 Dimitry Andric <dim@FreeBSD.org>

Adjust pcmlog_{initialize,shutdown}() definitions to avoid clang 15 warning

With clang 15, the following -Werror warnings are produced:

sys/dev/hwpmc/hwpmc_logging.c:1228:18: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
pmclog_initialize()
^
void
sys/dev/hwpmc/hwpmc_logging.c:1277:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
pmclog_shutdown()
^
void

This is because pcmlog_{initialize,shutdown}() are declared with (void)
argument lists, but defined with empty argument lists. Make the
definitions match the declarations.

MFC after: 3 days


# eff9ee7c 06-Jun-2022 Alexander Motin <mav@FreeBSD.org>

hwpmc: Increase thread priority while iterating CPUs.

This allows to profile already running high-priority threads, that
otherwise by blocking thread migration to respective CPUs blocked PMC
management, i.e. profiling could start only when workload completed.

While there, return the thread to its original CPU after iterating
the list. Otherwise all threads using PMC end up on the last CPU.

MFC after: 1 month


# 0939f965 28-Aug-2021 Piotr Pawel Stefaniak <pstef@FreeBSD.org>

Update a sysctl name to nbuffers_pcpu in hwpmc.4 and pmcstat.c

This change was missed in r333509 (e6b475e0af).

Differential Revision: https://reviews.freebsd.org/D31704
Reviewed by: mjg


# aee6e7dc 15-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

hwpmc: mostly clean up cc --analyze

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 9978bd99 30-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Add malloc_domainset(9) and _domainset variants to other allocator KPIs.

Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain. The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted. Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with: gallatin, jeff
Reported and tested by: pho (previous version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17418


# d9f1b8db 04-Oct-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: Refactor sample ring buffer handling to fix races

Refactor sample ring buffer ring handling to make it more robust to
long running callchain collection handling

r338112 introduced a (now fixed) regression that exposed a number of race
conditions within the management of the sample buffers. This
simplifies the handling and moves the decision to overwrite a
callchain sample that has taken too long out of the NMI in to the
hardlock handler. With this change the problem no longer shows up as a
ring corruption but as the code spending all of its time in callchain
collection.

- Makes the producer / consumer index incrementing monotonic, making it
easier (for me at least) to reason about.
- Moves the decision to overwrite a sample from NMI context to interrupt
context where we can enforce serialization.
- Puts a time limit on waiting to collect a user callchain - putting a
bound on head-of-line blocking causing samples to be dropped
- Removes the flush routine which was previously needed to purge
dangling references to the pmc from the sample buffers but now is only
a source of a race condition on unload.

Previously one could lock up or crash HEAD by running:
pmcstat -S inst_retired.any_p -T and then hitting ^C

After this change it is no longer possible.

PR: 231793
Reviewed by: markj@
Approved by: re (gjb@)
Differential Revision: https://reviews.freebsd.org/D17011


# 72ac73fa 06-Jul-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: remove hacks to work around incorrect pc_domain


# 9616acde 06-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: don't do EMIT64 on constant


# f992dd4b 06-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

pmc: convert native to jsonl and track TSC value of samples

- add '-j' options to filter to enable converting native pmc
log format to json lines format to enable the use of scripts
and external tooling

% pmc filter -j pmc.log pmc.jsonl

- Record the tsc value in sampling interrupts as opposed to
recording nanotime when the sample is copied to a global log
in hardclock - potentially many milliseconds later.

- At initialize record the tsc_freq and the time of day to give
us an offset for translating the tsc values in callchain records


# b2ca2e50 05-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: add summary command and further metadata extensions

metadata changes:
- log pmc sample rate with pmcallocate
- log proc flags with thread / process logging
to identify user vs kernel threads

fixes:
- use log cpuid to translate event id to event name

Implement rudimentary summary command to track sample
counts by thread and process name within a pmc log.

% make -j4 buildkernel >& /dev/null &
% sudo pmcstat -S unhalted_core_cycles -S llc-misses -O foo sleep 15
% pmc summary foo
cpu_clk_unhalted.thread_p_any:
idle: 138108207162
clang-6.0: 105336158004
sh: 72340108510
make: 8642012963
kernel: 7754011631
longest_lat_cache.miss:
clang-6.0: 87502625
sh: 40901227
make: 5500165
kernel: 3300099
awk: 2000060

% pmc summary -f ~/foo
idx: 278 name: cpu_clk_unhalted.thread_p_any rate: 2000003
idle: 69054
clang-6.0: 52668
sh: 36170
make: 4321
kernel: 3877
hwpmc: proc(7445): 3319
awk: 1289
xargs: 357
rand_harvestq: 181
mtree: 102
intr: 53
zfskern: 31
usb: 7
pagedaemon: 4
ntpd: 3
syslogd: 1
acpi_thermal: 1
logger: 1
syncer: 1
snmptrapd: 1
sleep: 1
idx: 17 name: longest_lat_cache.miss rate: 100003
clang-6.0: 875
sh: 409
make: 55
kernel: 33
awk: 20
hwpmc: proc(7445): 14
xargs: 9
idle: 8
intr: 3
zfskern: 2


# ebfaf69c 04-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: log name->pid, name->tid mappings

By logging all threads and processes 'pmc filter'
can now filter on process or thread name, relieving
the user of the burden of determining which tid or
pid was which when the sample was taken.

% pmc filter -T if_io_tqg -P nginx pmc.log pmc-iflib.log

% pmc filter -x -T idle pmc.log pmc-noidle.log


# 07d80fd8 03-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: ABI fixes
- increase pmc cpuid field from 8 to 12 bits
- add cpuid version string to initialize entry in the log
so that filter can identify which counter index an
event name maps to
- GC unused config flags
- make fixed counter assignment more robust as well as the
changes needed to be properly identified for filter


# 5de96e33 03-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: support sampling both kernel and user stacks when interrupted in kernel

This adds the -U options to pmcstat which will attribute in-kernel samples
back to the user stack that invoked the system call. It is not the default,
because when looking at kernel profiles it is generally more desirable to
merge all instances of a given system call together.

Although heavily revised, this change is directly derived from D7350 by
Jonathan T. Looney.

Obtained from: jtl
Sponsored by: Juniper Networks, Limelight Networks


# 2ce69a4d 03-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: ensure that mapin updates are synchronous


# 39446ce5 28-May-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc_logging.c: don't call wakeup_one with thread lock held, don't
malloc(M_WAITOK) in an epoch section


# 959826ca 26-May-2018 Matt Macy <mmacy@FreeBSD.org>

pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the
vendor provided pmu-events tables and sundry cleanups.

The vendor pmu-events tables provide counter descriptions, default
sample rates, event, umask, and flag values for all the counter
configuration permutations. Using this gives us:

- much simpler kernel code for the MD component
- helpful long and short event descriptions
- simpler user code
- sample rates that won't overload the system

Update man page with newer sample types and remove unused sample type.


# 5506ceb8 26-May-2018 Matt Macy <mmacy@FreeBSD.org>

Revert r334242 "pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the"
because of squash commit messages


# 49281356 26-May-2018 Matt Macy <mmacy@FreeBSD.org>

pmc(3)/hwpmc(4): update supported Intel processors to rely fully on the
vendor provided pmu-events tables and sundry cleanups.

The vendor pmu-events tables provide counter descriptions, default
sample rates, event, umask, and flag values for all the counter
configuration permutations. Using this gives us:

- much simpler kernel code for the MD component
- helpful long and short event descriptions
- simpler user code
- sample rates that won't overload the system

Update man page with newer sample types and remove unused sample type.

Squashed commit of the following:

commit 4459d43eff815bec08ccc5533dbe5de846f03128
Author: Matt Macy <mmacy@mattmacy.io>
Date: Sat May 26 00:06:31 2018 -0700

libpmc: fix pmu function signatures for non amd64

commit a2cb8bbc586c65d41f9b291430a2261ec67b59fe
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:38:11 2018 -0700

pmcstat: fix indentation of usage

commit f686954b15ff56a833ac80404898977cb80a265b
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:19:49 2018 -0700

pmclog(3): add callchain and pmcallocatedyn, remove pcsample

commit 73e13a0d2e9498c81c150d14d022050cee7511bb
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:19:00 2018 -0700

pmclog.h: GC pcsample field

commit 3e93ffd65da641fa657539dad3c48e281f8b5798
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:05:57 2018 -0700

hwpmc: make Intel core CPUs use external event tables

commit 634f5fae1e1644ac324003136c66cd9c619d1c93
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 22:00:06 2018 -0700

pmclog: update log record types, bump PMC_MAJOR
- explicitly make log record types a multiple of 8 bytes
- hook in pmu event types for pmc_allocate records
- remove references to no longer PCSAMPLE record

commit 83d84fcd2d65bdf6ddcb2e155a22f0cfa2a9c225
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 21:52:10 2018 -0700

libpmc: add support for having vendor table driven pmc_allocate

commit 9e6ad63c40c2fce8404847ace5078ca6cb33a736
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 19:11:33 2018 -0700

hwpmc_core: add accessors for EVSEL & UMASK, make IAP_UMASK useful to user

commit 859dceb93daa6419a48c794db99b6758e5b041c9
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 19:09:45 2018 -0700

pmcstat: update usage and man page as well as make -L consistent with pmccontrol

commit 79c7d8597e28c2eb13f5f9113e65ec2792ca57b1
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 18:07:03 2018 -0700

pmu_util: add support for all current intel event keywords

commit d8089c7f6a6c8527f38324252b1ffb47004694c6
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 17:45:00 2018 -0700

add description for new arguments

commit 058336740bab53c62ec88a3a026ea848cf3878c6
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 17:38:15 2018 -0700

libpmc: move pmu_events table and pmu_utils out of libpmcstat so that they can be used by pmc_allocate

commit 049b66b382e2f833c3f47bc8df9e750cb265709f
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 16:12:41 2018 -0700

pmcstat: hook pmu_events counter description utility routines in

commit f5e01e7b37a691dc045e1aa16b3ebdd162515de8
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 16:11:59 2018 -0700

pmu_events: add utility routines for listing counters and their descriptions

commit cba4d4f8907f772279f86f18f915e0d74d33ac56
Author: Matt Macy <mmacy@mattmacy.io>
Date: Fri May 25 16:09:50 2018 -0700

pmu-events: expand out skylake regex to simplify string matches


# a85289cf 23-May-2018 Matt Macy <mmacy@FreeBSD.org>

hwppmc: set threadid in callchain records - second part of r334108


# f2daab2c 20-May-2018 Matt Macy <mmacy@FreeBSD.org>

pmc: avoid potential race on shutdown

Clear shutdown flag first, conservatively allow 5ms for all hardclock consumers to
see flag before drainining


# 102ccac2 14-May-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: don't reference domain index with no memory backing it

On multi-socket the domain will be correctly set for a given CPU
regardless of whether or not NUMA is enabled.

Approved by: sbruno


# 0f00315c 13-May-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc: fix load/unload race and vm map LOR

- fix load/unload race by allocating the per-domain list structure at boot

- fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch
to protect the liveness of elements of the pmc_ss_owners list

Reported by: pho
Approved by: sbruno


# f1401123 12-May-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc/epoch - don't reference domain if NUMA is not set

It appears that domain information is set correctly independent
of whether or not NUMA is defined. However, there is no memory
backing secondary domains leading to allocation failure.

Reported by: pho@, np@
Approved by: sbruno@


# d626a614 11-May-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc(9): clear remaining sample work for hardclock

- fix last minute change in 333509 where by runcount references
to a pmc would remaining causing us to pause loop forever

Approved by: sbruno


# e6b475e0 11-May-2018 Matt Macy <mmacy@FreeBSD.org>

hwpmc(9): Make pmclog buffer pcpu and update constants

On non-trivial SMP systems the contention on the pmc_owner mutex leads
to a substantial number of samples captured being from the pmc process
itself. This change a) makes buffers larger to avoid contention on the
global list b) makes the working sample buffer per cpu.

Run pmcstat in the background (default event rate of 64k):
pmcstat -S UNHALTED_CORE_CYCLES -O /dev/null sleep 600 &

Before:
make -j96 buildkernel -s >&/dev/null 3336.68s user 24684.10s system 7442% cpu 6:16.50 total

After:
make -j96 buildkernel -s >&/dev/null 2697.82s user 1347.35s system 6058% cpu 1:06.77 total

For more realistic overhead measurement set the sample rate for ~2khz
on a 2.1Ghz processor:
pmcstat -n 1050000 -S UNHALTED_CORE_CYCLES -O /dev/null sleep 6000 &

Collecting 10 samples of `make -j96 buildkernel` from each:

x before
+ after

real time:
N Min Max Median Avg Stddev
x 10 76.4 127.62 84.845 88.577 15.100031
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
-28.398 +/- 10.0344
-32.0602% +/- 7.69825%
(Student's t, pooled s = 10.6794)

system time:
N Min Max Median Avg Stddev
x 10 2277.96 6948.53 2949.47 3341.492 1385.2677
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
-2277.47 +/- 920.425
-68.1574% +/- 8.77623%
(Student's t, pooled s = 979.596)

x no pmc
+ pmc running
real time:

HEAD:
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 76.4 127.62 84.845 88.577 15.100031
Difference at 95.0% confidence
29.73 +/- 10.0335
50.5208% +/- 17.0525%
(Student's t, pooled s = 10.6785)

patched:
N Min Max Median Avg Stddev
x 10 58.38 59.15 58.86 58.847 0.22504567
+ 10 59.71 60.79 60.135 60.179 0.29957192
Difference at 95.0% confidence
1.332 +/- 0.248939
2.2635% +/- 0.426506%
(Student's t, pooled s = 0.264942)

system time:

HEAD:
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 2277.96 6948.53 2949.47 3341.492 1385.2677
Difference at 95.0% confidence
2309.97 +/- 920.443
223.937% +/- 89.3039%
(Student's t, pooled s = 979.616)

patched:
N Min Max Median Avg Stddev
x 10 1010.15 1073.31 1025.465 1031.524 18.135705
+ 10 1038.7 1081.06 1070.555 1064.017 15.85404
Difference at 95.0% confidence
32.493 +/- 16.0042
3.15% +/- 1.5794%
(Student's t, pooled s = 17.0331)

Reviewed by: jeff@
Approved by: sbruno@
Differential Revision: https://reviews.freebsd.org/D15155


# cbd92ce6 09-May-2018 Matt Macy <mmacy@FreeBSD.org>

Eliminate the overhead of gratuitous repeated reinitialization of cap_rights

- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
A 3.6% speedup in fstat was measured with this change.

Reported by: mjg
Reviewed by: oshogbo
Approved by: sbruno
MFC after: 1 month


# 9f4f1d4d 17-Jan-2018 Fabien Thomas <fabient@FreeBSD.org>

Fix pmcstat exit from kernel introduced by r325275.
pmcstat request for close will generate a close event.
This event will be in turn received by pmcstat to close the file.

Reviewed by: kib
Tested by: pho
MFC after: 1 week
Sponsored by: Stormshield


# 718cf2cc 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/dev: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# f4dd123e 13-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not leak PMC_PO_OWNS_LOGFILE on error.

Note that PMCLOG_RESERVE_WITH_ERROR() macro contains goto error;
statement and executed after the flag is set.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# c9da2637 13-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

Style bug.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 20b555e1 01-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not run pmclog_configure_log() without pmc_sx protection.

The r195005 unlocked pmc_sx before calling into pmclog_configure_log()
to avoid the LOR, but it allows flush or closelog to run in parallel
with the configuration, causing many failure modes.

Revert r195005. Pre-create the logging process, allowing it to run
after the set up succeeded, otherwise the process terminates itself.

Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D12882


# 1121a374 01-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

Be protective and check the po_file validity before dropping the ref.

Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
X-Differential revision: https://reviews.freebsd.org/D12882


# ea4d25f9 01-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

In hwpmc, do not double-close the logging file.

hwpmc(4) must not voluntarily call fo_close(), doing this causes
double-close of the file. It seems to almost avoid bad consequences
for pipes, but other types of files demonstrate random memory access.

To fix, remove fo_close() calls, which also do not provide the
declared wake-up of waiters consistently. Instead, send a signal to
the logger and configure the logger process to not block it. Since
logger never returns to userspace, the signal only causes termination
of the interruptible sleeps in fo_write().

Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
X-Differential revision: https://reviews.freebsd.org/D12882


# cec1957a 19-Sep-2016 Ed Maste <emaste@FreeBSD.org>

hwpmc: remove sys/capability.h backwards compatibility

The Capsicum header is installed as sys/capsicum.h in stable/10 as well.


# b01c40f1 09-Dec-2015 Randall Stewart <rrs@FreeBSD.org>

Fix the tunable in logging so that if its pre-11 we have the proper
line so the tunable is present.

Sponsored by: Netflix Inc.


# 4a3690df 08-May-2015 John Baldwin <jhb@FreeBSD.org>

Convert hwpmc(4) debug printfs over to KTR.

Differential Revision: https://reviews.freebsd.org/D2487
Reviewed by: davide, emaste
MFC after: 2 weeks
Sponsored by: Norse Corp, Inc.


# 680f1afd 08-May-2015 John Baldwin <jhb@FreeBSD.org>

Move hwpmc(4) debugging code under a new HWPMC_DEBUG option instead of
the broader DEBUG option.

Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: Norse Corp, Inc.


# de8d8ca4 11-Mar-2015 Randall Stewart <rrs@FreeBSD.org>

You need to have the capabilities and not skip it if you are
not on head.. otherwise the file pointer will be NULL and when
you try to do something with it you will crash. Make the #else
be the old capabilites, and then remove the erroneous ifdefs for
11.

MFC after: 1 week (with the other MFC I was going to do until the panic)


# d95b3509 13-Jan-2015 Randall Stewart <rrs@FreeBSD.org>

Update the hwpmc driver to have the new type HASWELL_XEON. Also
go back through HASWELL, IVY_BRIDGE, IVY_BRIDGE_XEON and SANDY_BRIDGE
to straighten out all the missing PMCs. We also add a new pmc tool
pmcstudy, this allows one to run the various formulas from
the documents "Using Intel Vtune Amplifier XE on XXX Generation platforms" for
IB/SB and Haswell. The tool also allows one to postulate your own
formulas with any of the various PMC's. At some point I will enahance
this to work with Brendan Gregg's flame-graphs so we can flamegraph
various PMC interactions. Note the manual page also needs some
work (lots of work) but gnn has committed to help me with that ;-)
Reviewed by: gnn
MFC after:1 month
Sponsored by: Netflix Inc.


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 4a144410 16-Mar-2014 Robert Watson <rwatson@FreeBSD.org>

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after: 3 weeks


# 7008be5b 04-Sep-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


# f5f9340b 28-Mar-2012 Fabien Thomas <fabient@FreeBSD.org>

Add software PMC support.

New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).

Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.

Sponsored by: NETASQ
MFC after: 1 month


# 6f33c108 27-Mar-2012 Fabien Thomas <fabient@FreeBSD.org>

Fix random deadlock on pmcstat exit:
- Exit the thread when soft shutdown is requested
- Wakeup owner thread.

Reproduced/tested by looping pmcstat measurement:
pmcstat -S instructions -O/tmp/test ls

MFC after: 1 week


# dceed24a 18-Oct-2011 Fabien Thomas <fabient@FreeBSD.org>

Add a flush of the current PMC log buffer before displaying the next top.

As the underlying block is 4KB if the PMC throughput is low the measurement
will be reported on the next tick. pmcstat(8) use the modified flush API to
reclaim current buffer before displaying next top.

MFC after: 1 month


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# a9d2f8d8 10-Aug-2011 Robert Watson <rwatson@FreeBSD.org>

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 44a4e1d8 29-Mar-2010 Fabien Thomas <fabient@FreeBSD.org>

MFC r205694:
Handling SIGPIPE will cause deadlock/crash.
Return an error immediatly in case of hard shutdown.


# 662cf719 26-Mar-2010 Fabien Thomas <fabient@FreeBSD.org>

Handling SIGPIPE will cause deadlock/crash.
Return an error immediatly in case of hard shutdown.

MFC after: 3days


# 4e0c5d79 11-Mar-2010 Fabien Thomas <fabient@FreeBSD.org>

MFC r204878:
Change the way shutdown is handled for log file.

pmc_flush_logfile is now non-blocking and just ask the kernel
to shutdown the file. From that point, no more data is
accepted by the log thread and when the last buffer is flushed
the file is closed.

This will remove a deadlock between pmcstat asking for
flush while it cannot flush the pipe itself.


# b44906e5 08-Mar-2010 Fabien Thomas <fabient@FreeBSD.org>

Change the way shutdown is handled for log file.

pmc_flush_logfile is now non-blocking and just ask the kernel
to shutdown the file. From that point, no more data is
accepted by the log thread and when the last buffer is flushed
the file is closed.

This will remove a deadlock between pmcstat asking for
flush while it cannot flush the pipe itself.

MFC after: 3 days


# baa1e3c6 01-Dec-2009 Fabien Thomas <fabient@FreeBSD.org>

MFC 199763:
- fix a LOR between process lock and pmc thread mutex
- fix a system deadlock on process exit when the sample buffer
is full (pmclog_loop blocked in fo_write) and pmcstat exit.


# 5eaf27d8 24-Nov-2009 Fabien Thomas <fabient@FreeBSD.org>

- fix a LOR between process lock and pmc thread mutex
- fix a system deadlock on process exit when the sample buffer
is full (pmclog_loop blocked in fo_write) and pmcstat exit.

Reviewed by: jkoshy
MFC after: 3 weeks


# ca2d94be 25-Jun-2009 Attilio Rao <attilio@FreeBSD.org>

Fix a LOR between pmc_sx and proctree/allproc when creating a new thread
for the pmclog.

Reported by: Ryan Stone <rstone at sandvine dot com>
Tested by: Ryan Stone <rstone at sandvine dot com>
Sponsored by: Sandvine Incorporated


# 1ad08c6a 15-Dec-2008 Joseph Koshy <jkoshy@FreeBSD.org>

- Disambiguate a few panic messages.
- Style fixes: wrap long lines, parenthesize return values.


# cb239408 29-Nov-2008 Joseph Koshy <jkoshy@FreeBSD.org>

Improve a comment.


# 0cfab8dd 27-Nov-2008 Joseph Koshy <jkoshy@FreeBSD.org>

- Add support for PMCs in Intel CPUs of Family 6, model 0xE (Core Solo
and Core Duo), models 0xF (Core2), model 0x17 (Core2Extreme) and
model 0x1C (Atom).

In these CPUs, the actual numbers, kinds and widths of PMCs present
need to queried at run time. Support for specific "architectural"
events also needs to be queried at run time.

Model 0xE CPUs support programmable PMCs, subsequent CPUs
additionally support "fixed-function" counters.

- Use event names that are close to vendor documentation, taking in
account that:
- events with identical semantics on two or more CPUs in this family
can have differing names in vendor documentation,
- identical vendor event names may map to differing events across
CPUs,
- each type of CPU supports a different subset of measurable
events.

Fixed-function and programmable counters both use the same vendor
names for events. The use of a class name prefix ("iaf-" or
"iap-" respectively) permits these to be distinguished.

- In libpmc, refactor pmc_name_of_event() into a public interface
and an internal helper function, for use by log handling code.

- Minor code tweaks: staticize a global, freshen a few comments.

Tested by: gnn


# 2ad65bf4 09-Nov-2008 Joseph Koshy <jkoshy@FreeBSD.org>

Style tweak.


# 1ede983c 23-Oct-2008 Dag-Erling Smørgrav <des@FreeBSD.org>

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# d07f36b0 07-Dec-2007 Joseph Koshy <jkoshy@FreeBSD.org>

Kernel and hwpmc(4) support for callchain capture.

Sponsored by: FreeBSD Foundation and Google Inc.


# 3745c395 20-Oct-2007 Julian Elischer <julian@FreeBSD.org>

Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.


# 382d30cd 19-Apr-2007 Joseph Koshy <jkoshy@FreeBSD.org>

Fix witness(4) warnings about mutex use.

Group mutexes used in hwpmc(4) into 3 "types" in the sense of
witness(4):

- leaf spin mutexes---only one of these should be held at a time,
so these mutexes are specified as belonging to a single witness
type "pmc-leaf".

- `struct pmc_owner' descriptors are protected by a spin mutex of
witness type "pmc-owner-proc". Since we call wakeup_one() while
holding these mutexes, the witness type of these mutexes needs
to dominate that of "sleepq chain" mutexes.

- logger threads use a sleep mutex, of type "pmc-sleep".

Submitted by: wkoszek (earlier patch)


# 49874f6e 25-Mar-2006 Joseph Koshy <jkoshy@FreeBSD.org>

MFP4: Support for profiling dynamically loaded objects.

Kernel changes:

Inform hwpmc of executable objects brought into the system by
kldload() and mmap(), and of their removal by kldunload() and
munmap(). A helper function linker_hwpmc_list_objects() has been
added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve
the list of currently loaded kernel modules.

The unused `MAPPINGCHANGE' event has been deprecated in favour
of separate `MAP_IN' and `MAP_OUT' events; this change reduces
space wastage in the log.

Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to
handle the map change callbacks.

Change the default per-cpu sample buffer size to hold
32 samples (up from 16).

Increment __FreeBSD_version.

libpmc(3) changes:

Update libpmc(3) to deal with the new events in the log file; bring
the pmclog(3) manual page in sync with the code.

pmcstat(8) changes:

Introduce new options to pmcstat(8): "-r" (root fs path), "-M"
(mapfile name), "-q"/"-v" (verbosity control). Option "-k" now
takes a kernel directory as its argument but will also work with
the older invocation syntax.

Rework string handling in pmcstat(8) to use an opaque type for
interned strings. Clean up ELF parsing code and add support for
tracking dynamic object mappings reported by a v2.0.00 hwpmc(4).

Report statistics at the end of a log conversion run depending
on the requested verbosity level.

Reviewed by: jhb, dds (kernel parts of an earlier patch)
Tested by: gallatin (earlier patch)


# fc9a2b80 08-Mar-2006 Joseph Koshy <jkoshy@FreeBSD.org>

When a process is de-configuring a log file, also stop all of its
PMCs that require a log file to operate. This change should fix
PR 90269.

PR: kern/90269
MFC after: 1 week


# 342ed5d9 05-Dec-2005 Ruslan Ermilov <ru@FreeBSD.org>

Fix -Wundef warnings found when compiling i386 LINT, GENERIC and
custom kernels.


# fbf1556d 09-Jul-2005 Joseph Koshy <jkoshy@FreeBSD.org>

sys/dev/hwpmc/hwpmc_{amd,piv,ppro}.c:
- Update driver interrupt statistics correctly.

sys/sys/pmc.h, sys/dev/hwpmc/hwpmc_mod.c:
- Fix a bug affecting debug printfs.
- Move the 'stalled' flag from being in a bit in the
'pm_flags' field of a 'struct pmc' to a field of its own in the
same structure. This flag is updated from the NMI handler and
keeping it separate makes it easier to avoid races with other
parts of the code.

sys/dev/hwpmc/hwpmc_logging.c:
- Do arithmetic with 'uintptr_t' types rather that casting
to and from 'char *'.

Approved by: re (scottl)


# 15139246 30-Jun-2005 Joseph Koshy <jkoshy@FreeBSD.org>

MFP4:

- pmcstat(8) gprof output mode fixes:

lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h:
+ Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event
+ Add an 'entryaddr' field to the PMCLOG_PROCEXEC event,
so that pmcstat(8) can determine where the runtime loader
/libexec/ld-elf.so.1 is getting loaded.

sys/kern/kern_exec.c:
+ Use a local struct to group the entry address of the image being
exec()'ed and the process credential changed flag to the exec
handling hook inside hwpmc(4).

usr.sbin/pmcstat/*:
+ Support "-k kernelpath", "-D sampledir".
+ Implement the ELF bits of 'gmon.out' profile generation in a new
file "pmcstat_log.c". Move all log related functions to this
file.
+ Move local definitions and prototypes to "pmcstat.h"

- Other bug fixes:
+ lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read().
+ sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all
attached PMCs when a process exits.
+ sys/sys/pmc.h: correct a function prototype.
+ Improve usage checks in pmcstat(8).

Approved by: re (blanket hwpmc)


# f263522a 09-Jun-2005 Joseph Koshy <jkoshy@FreeBSD.org>

MFP4:

- Implement sampling modes and logging support in hwpmc(4).

- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.

- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).

- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.

- bug fixes & documentation.