History log of /linux-master/Documentation/scheduler/sched-bwc.rst
Revision Date Author Comments
# d56b699d 14-Aug-2023 Bjorn Helgaas <bhelgaas@google.com>

Documentation: Fix typos

Fix typos in Documentation.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# ce881fc0 03-Dec-2021 Yanteng Si <siyanteng01@gmail.com>

docs/scheduler: fix typo and warning in sched-bwc

a) since d73df887b6b8 ("sched/fair: Add document for burstable CFS bandwidth")
[cpu.cfs_quota_us: the total available run-time within a period (in] shoud be removed,
let's delete it.

b) Add a period.

c) fix a build warning:

linux-next/Documentation/scheduler/sched-bwc.rst:243: WARNING: Inline emphasis
start-string without end-string.

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Reviewed-by: Alex Shi <alexs@kernel.org>
Link: https://lore.kernel.org/r/163a4dde20b8c4b68d668977a668e281d18fcf92.1638517064.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# d73df887 29-Aug-2021 Huaixin Chang <changhuaixin@linux.alibaba.com>

sched/fair: Add document for burstable CFS bandwidth

Basic description of usage and effect for CFS Bandwidth Control Burst.

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210830032215.16302-3-changhuaixin@linux.alibaba.com


# 716c9d94 19-May-2021 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

docs: sched-bwc.rst: fix a typo on a doc name

cgroupv2.rst -> cgroup-v2.rst

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/1dc0203bd7df375ef45832f0c88566e22c4138ff.1621413933.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# e5ba9ea6 19-Jan-2021 Kir Kolyshkin <kolyshkin@gmail.com>

docs/scheduler/sched-bwc: note/link cgroup v2

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210120001824.385168-6-kolyshkin@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# 6c57c12d 19-Jan-2021 Kir Kolyshkin <kolyshkin@gmail.com>

docs/scheduler/sched-bwc: fix note rendering

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210120001824.385168-4-kolyshkin@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# 7ebc7dc8 19-Jan-2021 Kir Kolyshkin <kolyshkin@gmail.com>

docs/scheduler/sched-bwc: formatting fix

Since commit d6a3b247627a3 these three lines are merged into one by the
RST processor, making it hard to read. Use bullet points to separate
the entries, like it's done in other similar places.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210120001824.385168-2-kolyshkin@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>


# de53fd7a 23-Jul-2019 Dave Chiluk <chiluk+linux@indeed.com>

sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices

It has been observed, that highly-threaded, non-cpu-bound applications
running under cpu.cfs_quota_us constraints can hit a high percentage of
periods throttled while simultaneously not consuming the allocated
amount of quota. This use case is typical of user-interactive non-cpu
bound applications, such as those running in kubernetes or mesos when
run on multiple cpu cores.

This has been root caused to cpu-local run queue being allocated per cpu
bandwidth slices, and then not fully using that slice within the period.
At which point the slice and quota expires. This expiration of unused
slice results in applications not being able to utilize the quota for
which they are allocated.

The non-expiration of per-cpu slices was recently fixed by
'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift
condition")'. Prior to that it appears that this had been broken since
at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some
cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That
added the following conditional which resulted in slices never being
expired.

if (cfs_rq->runtime_expires != cfs_b->runtime_expires) {
/* extend local deadline, drift is bounded above by 2 ticks */
cfs_rq->runtime_expires += TICK_NSEC;

Because this was broken for nearly 5 years, and has recently been fixed
and is now being noticed by many users running kubernetes
(https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion
that the mechanisms around expiring runtime should be removed
altogether.

This allows quota already allocated to per-cpu run-queues to live longer
than the period boundary. This allows threads on runqueues that do not
use much CPU to continue to use their remaining slice over a longer
period of time than cpu.cfs_period_us. However, this helps prevent the
above condition of hitting throttling while also not fully utilizing
your cpu quota.

This theoretically allows a machine to use slightly more than its
allotted quota in some periods. This overflow would be bounded by the
remaining quota left on each per-cpu runqueueu. This is typically no
more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will
change nothing, as they should theoretically fully utilize all of their
quota in each period. For user-interactive tasks as described above this
provides a much better user/application experience as their cpu
utilization will more closely match the amount they requested when they
hit throttling. This means that cpu limits no longer strictly apply per
period for non-cpu bound applications, but that they are still accurate
over longer timeframes.

This greatly improves performance of high-thread-count, non-cpu bound
applications with low cfs_quota_us allocation on high-core-count
machines. In the case of an artificial testcase (10ms/100ms of quota on
80 CPU machine), this commit resulted in almost 30x performance
improvement, while still maintaining correct cpu quota restrictions.
That testcase is available at https://github.com/indeedeng/fibtest.

Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Signed-off-by: Dave Chiluk <chiluk+linux@indeed.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Hammond <jhammond@indeed.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kyle Anderson <kwa@yelp.com>
Cc: Gabriel Munos <gmunoz@netflix.com>
Cc: Peter Oskolkov <posk@posk.io>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Brendan Gregg <bgregg@netflix.com>
Link: https://lkml.kernel.org/r/1563900266-19734-2-git-send-email-chiluk+linux@indeed.com


# d6a3b247 12-Jun-2019 Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

docs: scheduler: convert docs to ReST and rename to *.rst

In order to prepare to add them to the Kernel API book,
convert the files to ReST format.

The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>