CGROUP CFS 调度中的 period，burst 概念|电子爱好者

admin管理员组
文章数量:1595877

基础：Period

先看一个例子：

If period is 500ms and quota is  250ms, the group will get
0.5 CPU worth of runtime every 500ms.

# echo 250000 > cpu.cfs_quota_us /* quota = 250ms */
# echo 250000 > cpu.cfs_period_us /* period = 500ms */

我有一个疑问：上面的设置，如果换成下面的，会有什么不同？

If period is 1000ms and quota is  2000ms, the group will get
0.5 CPU worth of runtime every 2000ms.

# echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */
# echo 1000000 > cpu.cfs_period_us /* period = 2000ms */

二者都可以让这个资源组获得 0.5 个 CPU 的资源，period 设置成 500ms 和 2000ms 的区别在哪里呢？

按照设计预期，每个 period 内，最多可以获得 quota 个时间。如果在 quota 时间内没有完成任务处理，则会被 throttle，等到下一个 period 继续干。

这篇文章【ref】里说：

period 变大，throughput 表现好，rt 表现差
- Larger periods will improve throughput at the expense of latency, since the scheduler will be able to sustain a cpu-bound workload for longer.
period 变小，rt 表现好，throughput 表现差

我不全能理解上面的话。

RT 表现好坏可以稍微理解一点：

所谓 RT 表现好，不是均值变好，而是方差变小，用户感受到更小的波动。

一个有趣的类比：假设用 cgroup 限制高清电影播放器，当 period 很小时，你感受到的卡是一帧一帧的卡；当 period 很大时，你感受到的是流畅播放几秒钟后卡几秒钟，然后继续流畅播放几秒钟，如此往复。而最终，你会使用相同的时间看完一部“卡片”。

但 Throughput 表现好是什么意思呢？Throughput 是单位时间内处理的任务量。通常变好是说处理的任务量变多了。但从上面的类比可以知道，无论 period 怎么变，平均算下来，单位时间内处理的任务量并没有变化。我的理解是这样：

所谓 Throughput 变好，有一个前提：请求的到来不是平均分布（可能是正态分布、泊松分布等），系统有忙有闲。那么当 period 变大时，系统能更好地容忍突发流量。

整体与局部的矛盾

上面说，period 变小时，RT 表现会更好，这说的是整体 RT。但对于单个任务的 RT却恰好相反：

假设一个 java 程序每次要执行 300ms 来完成一个请求。

如果 quota = 250ms，period = 500ms，那么这个 java 程序的 rt 等于 250 + 250 + 50 = 550ms
- 解释：这个 java 程序需要两个 period 才能执行完。第一个 period 内，它会被执行 250ms，然后挂起等待 250ms。当下一个 period 开始后，它会被继续执行 50ms。
如果 quota = 1000ms，period = 2000ms，那么这个 java 程序的 rt 等于 300ms
- 解释：这个 java 程序只需要一个 period 就可以执行完成。

假设一个系统里，大部分任务的 RT 都在一个 quota 内，少数任务的 RT 会超过一个 quota，如何尽量降低调度对后者的影响呢？

改进： burst

Burst 概念用请假打个比方（来自阿里云文档）：一年10天年假。家里有突发事件，需要请假30天，那么就把明年、后年的年假借用过来。明年后年就不要再休假了。最多能借多少年假呢？cpu.cfs_burst_us 来控制。

在 web 领域，流量是持续均匀的，但偶尔会有一些突发高耗时流量，这类流量的 RT 在 quota 的影响下往往会变长很多。

为了克服突发长尾流量的 RT 抖动问题，必须去了解 CFS Bandwidth Control 的实现思路，这篇文档里有一个 burst 的概念。

CPU 时间根据 cpu.cfs_period_us 划分成一个个 period，当一个资源组里出现突发流量时，允许它在当前 period 内超出 quota 限制，然后在后面的 period 内把超出的找补回来。从长期看，实现了总体的限时。

This feature borrows time now against our future underrun, at the cost of increased interference against the other system users. All nicely bounded.

那么，后面要用多少个 period 才能找补回来呢？如果后面的 period 里也一直超出 quota 限制呢？在比较新的内核里，cpu.cfs_burst_us 会控制 burst 上限

*** Burst feature ***

This feature borrows time now against our future underrun, at the cost of increased interference against the other system users. All nicely bounded.

Traditional (UP-EDF) bandwidth control is something like:

(U = Sum u_i) <= 1

This guaranteeds both that every deadline is met and that the system is stable. After all, if U were > 1, then for every second of walltime, we’d have to run more than a second of program time, and obviously miss our deadline, but the next deadline will be further out still, there is never time to catch up, unbounded fail.

The burst feature observes that a workload doesn’t always executes the full quota; this enables one to describe u_i as a statistical distribution.

For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100) (the traditional Worst Case Execution Time , WCET). This effectively allows u to be smaller, increasing the efficiency (we can pack more tasks in the system), but at the cost of missing deadlines when all the odds line up. However, it does maintain stability, since every overrun must be paired with an underrun as long as our x is above the average.

That is, suppose we have 2 tasks, both specify a p(95) value, then we have a p(95)*p(95) = 90.25% chance both tasks are within their quota and everything is good. At the same time we have a p(5)p(5) = 0.25% chance both tasks will exceed their quota at the same time (guaranteed deadline fail). Somewhere in between there’s a threshold where one exceeds and the other doesn’t underrun enough to compensate; this depends on the specific CDFs.

At the same time, we can say that the worst case deadline miss, will be Sum e_i; that is, there is a bounded tardiness (under the assumption that x+e is indeed WCET).

The interferenece when using burst is valued by the possibilities for missing the deadline and the average WCET. Test results showed that when there many cgroups or CPU is under utilized, the interference is limited. More details are shown in: https://lore.kernel/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba/

补充资料

burst 这个功能还在进化中，可以对比两个文档看到：
https://www.kernel/doc/html/v5.13/scheduler/sched-bwc.html
https://www.kernel/doc/html/latest/scheduler/sched-bwc.html
在阿里的常规内核(3.10.0-327.ali2010.rc7.alios7.x86_64) 里，还没有 cpu.cfs_burst_us 这个东西。
它的 burst 能力也比较有限，只能借。

本文标签：概念 CFS CGROUP burst Period

版权声明：本文标题：CGROUP CFS 调度中的 period，burst 概念内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/dongtai/1728236583a1150484.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

xp系统

电子爱好者 - 最新技术资讯及电子产品介绍！

CGROUP CFS 调度中的 period，burst 概念

基础：Period

整体与局部的矛盾

改进： burst

补充资料

更多相关文章

HTTP世界全览（上）：与HTTP相关的各种概念

一、软件测试概念和理论

软件测试概念及基础

软件测试第一阶段：web前端技术基础-8- 软件测试中的理论和概念

软件测试十一阶段（软件测试的概念和理论）

关于音频编码标准AAC，Opus，MP3的概念、原理、优缺点

Windows系统磁盘分区和卷的概念

数据湖--概念、特征、架构与案例概述

Jmeter之Ramp-up Period（in seconds）

报错 JWT strings must contain exactly 2 period characters. Found: 0

java.time.Period类

jdk8时间间隔计算Period.between与LocalDate.until区别

Kubernetes基础： Pod删除的grace-period设置

时间间隔类之Period类和Duration类

java-时间间隔类period类和Duration类

LocalDate&amp;Period日期计算与LocalDateTime&amp;Duration日期时间计算

kubernetes grace period 失效问题排查

java between_Java8 Period.between方法坑及注意事项

The import path must contain at least one period (‘.‘) or forward slash (‘‘) character.

Period Table - org acct periods

发表评论

推荐文章

浅析Linux与Windows的区别

[ 笔记 ] 计算机网络安全_1_网络安全基础

americdan-life

deepin如何批量下载安装字体?

那些成为富翁的程序员！

热门文章

如何用PDF编辑器编辑、修改PDF文本和图片？

极速PDF编辑器提示缺少字体解决方案

手机桌面隐藏大师_【应用隐藏大师app】应用隐藏大师安卓版_应用隐藏大师手机版下载v2.8.1 - 绿点安卓网...

关于Android程序真机不兼容性问题总结及手机ROOT方法

Adobe Acrobat Xi Pro v11.0.10官方简体中文版 功能强大【推荐】

UOS系统无法开机问题解决

大英赛四六级考研英语作文语料库

Geometry

Python爬虫入门教程 79-100 Python Portia爬虫框架-在Win7里面配置起来

python面试大全

最新文章

听见丨 锤子明年将有更多智能硬件还有T3

拒绝亡羊补牢！SQL Server服务器安全防护！

《深入学习VMware vSphere 6》——第1章 vSphere虚拟化基础与规划 1.1虚拟化基础概念...

为什么说荷尔蒙推动了中国互联网？

Window Bat批处理

乌镇夜宴——程序员的江湖

备注

Zip即将复辟？免费是WinRAR的终极武器

常驻我电脑的10个软件

访谈|在网络世界捕获威胁的猎人

《Team of Teams》读书笔记--如何打造联网时代的组织架构

.net基本面试题

DOS BAT脚本学习——非常详细

Windows自动化上传文件至FTP

20145339《网络对抗技术》免杀原理与实践

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

LocalDate&Period日期计算与LocalDateTime&Duration日期时间计算

Adobe Acrobat Xi Pro v11.0.10官方简体中文版功能强大【推荐】

听见丨锤子明年将有更多智能硬件还有T3

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载