linux/kernel
Xunlei Pang 512ac999d2 sched/fair: Fix bandwidth timer clock drift condition
I noticed that cgroup task groups constantly get throttled even
if they have low CPU usage, this causes some jitters on the response
time to some of our business containers when enabling CPU quotas.

It's very simple to reproduce:

  mkdir /sys/fs/cgroup/cpu/test
  cd /sys/fs/cgroup/cpu/test
  echo 100000 > cpu.cfs_quota_us
  echo $$ > tasks

then repeat:

  cat cpu.stat | grep nr_throttled  # nr_throttled will increase steadily

After some analysis, we found that cfs_rq::runtime_remaining will
be cleared by expire_cfs_rq_runtime() due to two equal but stale
"cfs_{b|q}->runtime_expires" after period timer is re-armed.

The current condition to judge clock drift in expire_cfs_rq_runtime()
is wrong, the two runtime_expires are actually the same when clock
drift happens, so this condtion can never hit. The orginal design was
correctly done by this commit:

  a9cf55b286 ("sched: Expire invalid runtime")

... but was changed to be the current implementation due to its locking bug.

This patch introduces another way, it adds a new field in both structures
cfs_rq and cfs_bandwidth to record the expiration update sequence, and
uses them to figure out if clock drift happens (true if they are equal).

Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 51f2176d74 ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")
Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:17:29 +02:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2018-06-17 07:54:24 +09:00
cgroup docs: Fix some broken references 2018-06-15 18:10:01 -03:00
configs kconfig: tinyconfig: remove stale stack protector fixups 2018-06-15 07:15:28 +09:00
debug treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
dma swiotlb: export swiotlb_dma_ops 2018-06-28 14:00:40 +02:00
events perf/core: Move inline keyword at the beginning of declaration 2018-06-27 09:55:58 +02:00
gcov gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT 2018-06-08 18:56:02 +09:00
irq genirq/debugfs: Add missing IRQCHIP_SUPPORTS_LEVEL_MSI debug 2018-06-22 14:22:00 +02:00
livepatch livepatch: Allow to call a custom callback when freeing shadow variables 2018-04-17 13:42:48 +02:00
locking Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-24 19:36:16 +08:00
power fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
printk Printk changes for 4.18 2018-06-06 16:04:55 -07:00
rcu treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
sched sched/fair: Fix bandwidth timer clock drift condition 2018-07-03 09:17:29 +02:00
time time: Make sure jiffies_to_msecs() preserves non-zero time periods 2018-06-22 17:48:36 +02:00
trace tracing: Use swap macro in update_max_tr 2018-06-21 15:12:43 -04:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile dma-mapping: move all DMA mapping code to kernel/dma 2018-06-14 08:50:37 +02:00
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-04 16:45:09 -08:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-06 18:32:44 -08:00
audit.c audit: use inline function to get audit context 2018-05-14 17:24:18 -04:00
audit.h audit: track the owner of the command mutex ourselves 2018-02-23 11:22:22 -05:00
audit_fsnotify.c fsnotify: add fsnotify_add_inode_mark() wrappers 2018-05-18 14:58:22 +02:00
audit_tree.c fsnotify: add fsnotify_add_inode_mark() wrappers 2018-05-18 14:58:22 +02:00
audit_watch.c \n 2018-06-17 05:06:18 +09:00
auditfilter.c audit: use existing session info function 2018-05-18 15:47:54 -04:00
auditsc.c audit: Fix wrong task in comparison of session ID 2018-05-21 14:27:43 -04:00
backtracetest.c
bounds.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-04 20:27:54 -07:00
configs.c
context_tracking.c
cpu.c cpu/hotplug: Fix unused function warning 2018-03-15 20:34:40 +01:00
cpu_pm.c PM / CPU: replace raw_notifier with atomic_notifier 2017-07-31 13:09:49 +02:00
crash_core.c mm: split page_type out from _mapcount 2018-06-07 17:34:37 -07:00
crash_dump.c
cred.c
delayacct.c delayacct: Use raw_spinlocks 2018-04-27 14:34:51 +02:00
dma.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
elfcore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exec_domain.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
exit.c kernel: use kernel_wait4() instead of sys_wait4() 2018-04-02 20:14:51 +02:00
extable.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
fail_function.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
fork.c mm: check for SIGKILL inside dup_mmap() loop 2018-06-15 07:55:24 +09:00
freezer.c
futex.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
futex_compat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
hung_task.c kernel/hung_task.c: show all hung tasks before panic 2018-06-07 17:34:39 -07:00
iomem.c memremap: split devm_memremap_pages() and memremap() infrastructure 2018-05-15 23:08:33 -07:00
irq_work.c irq/work: Improve the flag definitions 2018-01-08 19:43:15 +01:00
jump_label.c jump_label: Disable jump labels in __exit code 2018-03-20 08:57:17 +01:00
kallsyms.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2018-02-01 13:36:15 -08:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kcov.c sched/core / kcov: avoid kcov_area during task switch 2018-06-15 07:55:24 +09:00
kexec.c kexec: call do_kexec_load() in compat syscall directly 2018-04-02 20:15:01 +02:00
kexec_core.c kexec: yield to scheduler when loading kimage segments 2018-06-15 07:55:24 +09:00
kexec_file.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmod.c kmod: move #ifdef CONFIG_MODULES wrapper to Makefile 2017-09-08 18:26:51 -07:00
kprobes.c kprobes: Fix random address output of blacklist file 2018-04-25 10:27:56 -04:00
ksysfs.c kexec: move vmcoreinfo out of the kernel's .bss section 2017-07-12 16:25:59 -07:00
kthread.c kthread: Allow kthread_park() on a parked kthread 2018-05-25 08:03:51 +02:00
latencytop.c
memremap.c mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS 2018-05-22 06:59:39 -07:00
module-internal.h
module.c Modules updates for v4.18 2018-06-16 07:36:39 +09:00
module_signing.c
notifier.c
nsproxy.c
padata.c padata: add SPDX identifier 2018-01-05 18:43:00 +11:00
panic.c Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
params.c kernel/params.c: downgrade warning for unsafe parameters 2018-04-11 10:28:37 -07:00
pid.c xarray: add the xa_lock to the radix_tree_root 2018-04-11 10:28:39 -07:00
pid_namespace.c Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-04-03 19:15:32 -07:00
profile.c
ptrace.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c kernel/reboot.c: add devm_register_reboot_notifier() 2017-11-17 16:10:04 -08:00
relay.c kernel/relay.c: change return type to vm_fault_t 2018-06-15 07:55:24 +09:00
resource.c libnvdimm for 4.18 2018-06-08 17:21:52 -07:00
rseq.c rseq: Avoid infinite recursion when delivering SIGSEGV 2018-06-22 19:04:22 +02:00
seccomp.c audit/stable-4.18 PR 20180605 2018-06-06 16:34:00 -07:00
signal.c signal: Remove no longer required irqsave/restore 2018-06-10 06:14:01 +02:00
smp.c smp/core: Use lockdep to assert IRQs are disabled/enabled 2017-11-08 11:13:50 +01:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c softirq: Reorder trace_softirqs_on to prevent lockdep splat 2018-06-21 15:12:43 -04:00
stacktrace.c
stop_machine.c Linux 4.17-rc5 2018-05-15 08:10:50 +02:00
sys.c mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct 2018-06-07 17:34:34 -07:00
sys_ni.c Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-10 10:17:09 -07:00
sysctl.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
sysctl_binary.c staging: irda: remove remaining remants of irda code removal 2018-04-16 11:26:49 +02:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-17 13:57:15 +01:00
taskstats.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
test_kprobes.c kprobes: Disable the jprobes test code 2017-10-20 11:02:54 +02:00
torture.c rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs() 2018-05-15 10:27:29 -07:00
tracepoint.c tracepoints: Fix the descriptions of tracepoint_probe_register{_prio} 2018-05-28 12:49:51 -04:00
tsacct.c
ucount.c headers: untangle kmemleak.h from mm.h 2018-04-05 21:36:27 -07:00
uid16.c fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers 2018-04-02 20:15:59 +02:00
uid16.h kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c 2018-04-02 20:15:30 +02:00
umh.c umh: fix race condition 2018-06-07 16:56:28 -04:00
up.c smp: Avoid using two cache lines for struct call_single_data 2017-08-29 15:14:38 +02:00
user-return-notifier.c
user.c efivarfs: Limit the rate for non-root to read files 2018-02-22 10:21:02 -08:00
user_namespace.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
utsname.c uts: create "struct uts_namespace" from kmem_cache 2018-04-11 10:28:35 -07:00
utsname_sysctl.c
watchdog.c Merge branch 'linus' into sched/core, to pick up fixes 2017-11-08 10:17:15 +01:00
watchdog_hld.c Merge branch 'linus' into core/urgent, to pick up dependent commits 2017-11-04 08:53:04 +01:00
workqueue.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
workqueue_internal.h workqueue: Set worker->desc to workqueue name by default 2018-05-18 08:47:13 -07:00