1========
2CPU load
3========
4
5Linux exports various bits of information via ``/proc/stat`` and
6``/proc/uptime`` that userland tools, such as top(1), use to calculate
7the average time system spent in a particular state, for example::
8
9    $ iostat
10    Linux 2.6.18.3-exp (linmac)     02/20/2007
11
12    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
13              10.01    0.00    2.92    5.44    0.00   81.63
14
15    ...
16
17Here the system thinks that over the default sampling period the
18system spent 10.01% of the time doing work in user space, 2.92% in the
19kernel, and was overall 81.63% of the time idle.
20
21In most cases the ``/proc/stat``	 information reflects the reality quite
22closely, however due to the nature of how/when the kernel collects
23this data sometimes it can not be trusted at all.
24
25So how is this information collected?  Whenever timer interrupt is
26signalled the kernel looks what kind of task was running at this
27moment and increments the counter that corresponds to this tasks
28kind/state.  The problem with this is that the system could have
29switched between various states multiple times between two timer
30interrupts yet the counter is incremented only for the last state.
31
32
33Example
34-------
35
36If we imagine the system with one task that periodically burns cycles
37in the following manner::
38
39     time line between two timer interrupts
40    |--------------------------------------|
41     ^                                    ^
42     |_ something begins working          |
43                                          |_ something goes to sleep
44                                         (only to be awaken quite soon)
45
46In the above situation the system will be 0% loaded according to the
47``/proc/stat`` (since the timer interrupt will always happen when the
48system is executing the idle handler), but in reality the load is
49closer to 99%.
50
51One can imagine many more situations where this behavior of the kernel
52will lead to quite erratic information inside ``/proc/stat``::
53
54
55	/* gcc -o hog smallhog.c */
56	#include <time.h>
57	#include <limits.h>
58	#include <signal.h>
59	#include <sys/time.h>
60	#define HIST 10
61
62	static volatile sig_atomic_t stop;
63
64	static void sighandler(int signr)
65	{
66		(void) signr;
67		stop = 1;
68	}
69
70	static unsigned long hog (unsigned long niters)
71	{
72		stop = 0;
73		while (!stop && --niters);
74		return niters;
75	}
76
77	int main (void)
78	{
79		int i;
80		struct itimerval it = {
81			.it_interval = { .tv_sec = 0, .tv_usec = 1 },
82			.it_value    = { .tv_sec = 0, .tv_usec = 1 } };
83		sigset_t set;
84		unsigned long v[HIST];
85		double tmp = 0.0;
86		unsigned long n;
87		signal(SIGALRM, &sighandler);
88		setitimer(ITIMER_REAL, &it, NULL);
89
90		hog (ULONG_MAX);
91		for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog(ULONG_MAX);
92		for (i = 0; i < HIST; ++i) tmp += v[i];
93		tmp /= HIST;
94		n = tmp - (tmp / 3.0);
95
96		sigemptyset(&set);
97		sigaddset(&set, SIGALRM);
98
99		for (;;) {
100			hog(n);
101			sigwait(&set, &i);
102		}
103		return 0;
104	}
105
106
107References
108----------
109
110- https://lore.kernel.org/r/loom.20070212T063225-663@post.gmane.org
111- Documentation/filesystems/proc.rst (1.8)
112
113
114Thanks
115------
116
117Con Kolivas, Pavel Machek
118