0888f06ac9
Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
xruns starting at the 2.6.18-rt kernel, and those problems persisted all
until current -rt kernels. The latencies were serious and unjustified by
system load, often in the milliseconds range.
After a patient and heroic multi-month effort of Fernando, where he
tested dozens of kernels, tried various configs, boot options,
test-patches of mine and provided latency traces of those incidents, the
following 'smoking gun' trace was captured by him:
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (try_to_wake_up)
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (c01262ba 0 0)
IRQ_19-1479 1D..1 0us : resched_task (try_to_wake_up)
IRQ_19-1479 1D..1 0us : __spin_unlock_irqrestore (try_to_wake_up)
...
<idle>-0 1...1 11us!: default_idle (cpu_idle)
...
<idle>-0 0Dn.1 602us : smp_apic_timer_interrupt (c0103baf 1 0)
...
<...>-5856 0D..2 618us : __switch_to (__schedule)
<...>-5856 0D..2 618us : __schedule <<idle>-0> (20 162)
<...>-5856 0D..2 619us : __spin_unlock_irq (__schedule)
<...>-5856 0...1 619us : trace_stop_sched_switched (__schedule)
<...>-5856 0D..1 619us : trace_stop_sched_switched <<...>-5856> (37 0)
what is visible in this trace is that CPU#1 ran try_to_wake_up() for
PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
for CPU#0. But it decided to not send an IPI that no CPU - due to
TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
result was a 600+ usecs latency and a missed wakeup!
the bug turned out to be an idle-wakeup bug introduced into the mainline
kernel this summer via an optimization in the x86_64 tree:
commit
|
||
---|---|---|
.. | ||
acpi | ||
cpufreq | ||
aperture.c | ||
apic.c | ||
asm-offsets.c | ||
audit.c | ||
crash_dump.c | ||
crash.c | ||
e820.c | ||
early_printk.c | ||
early-quirks.c | ||
entry.S | ||
functionlist | ||
genapic_cluster.c | ||
genapic_flat.c | ||
genapic.c | ||
head64.c | ||
head.S | ||
i387.c | ||
i8259.c | ||
init_task.c | ||
io_apic.c | ||
ioport.c | ||
irq.c | ||
k8.c | ||
kprobes.c | ||
ldt.c | ||
machine_kexec.c | ||
Makefile | ||
mce_amd.c | ||
mce_intel.c | ||
mce.c | ||
module.c | ||
mpparse.c | ||
nmi.c | ||
pci-calgary.c | ||
pci-dma.c | ||
pci-gart.c | ||
pci-nommu.c | ||
pci-swiotlb.c | ||
pmtimer.c | ||
process.c | ||
ptrace.c | ||
reboot.c | ||
relocate_kernel.S | ||
setup64.c | ||
setup.c | ||
signal.c | ||
smp.c | ||
smpboot.c | ||
stacktrace.c | ||
suspend_asm.S | ||
suspend.c | ||
sys_x86_64.c | ||
syscall.c | ||
tce.c | ||
time.c | ||
trampoline.S | ||
traps.c | ||
vmlinux.lds.S | ||
vsmp.c | ||
vsyscall.c | ||
x8664_ksyms.c |