qemu-e2k

History

Emilio G. Cota 37b995f6e7 target-i386: remove helper_lock() It's been superseded by the atomic helpers. The use of the atomic helpers provides a significant performance and scalability improvement. Below is the result of running the atomic_add-test microbenchmark with: $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n , where $n is the number of threads and $r is the allowed range for the additions. The scenarios measured are: - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset) - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper - master: before this patchset Results sorted in ascending range, i.e. descending degree of contention. Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64 Opteron 6376 cores. atomic_add-bench: 5000000 ops/thread, [0,1] range 25 ++---------+----------+---------+----------+----------+----------+---++ + atomic +-E--+ + + + + + \| \|cmpxchg +-H--+ \| 20 +Emaster +-N--+ ++ \|\| \| \|++ \| \|\| \| 15 +++ ++ \|N\| \| \|+\| \| 10 ++\| ++ \|+\|+ \| \| \| -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E\| \|+E+E+- +++ +E+------+E+-- \| 5 ++\|+ ++ \|+N+H+--- +++ \| ++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- \| 0 ++---------+-----H----+---H-----+----------+----------+----------+---H+ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,2] range 25 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + \| \|cmpxchg +-H--+ \| 20 ++master +-N--+ ++ \|E\| \| \|++ \| \|\|E \| 15 ++\| ++ \|N\|\| \| \|+\|\| ---+E+------+E+-----+E+------+E\| 10 ++\| \| ---+E+------+E+-----+E+--- +++ +++ \|\|H+E+--+E+-- \| \|+++++ \| \| \|\| \| 5 ++\|+H+-- +++ ++ \|+N+ - ---+H+------+H+------ \| + +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H\| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,8] range 40 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + \| 35 +cmpxchg +-H--+ ++ \| master +-N--+ ---+E+------+E+------+E+-----+E+------+E\| 30 ++\| ---+E+-- +++ ++ \| \| -+E+--- \| 25 ++E ---- +++ ++ \|+++++ -+E+ \| 20 +E+ E-- +++ ++ \|H\|+++ \| \|+\| +H+------- \| 15 ++H+ ---+++ +H+------ ++ \|N++H+-- +++--- +H+------++\| 10 ++ +++ - +++ ---+H+ +++ +H+ \| \| +H+-----+H+------+H+-- \| 5 ++\| +++ ++ ++N+N+--+N++ + + + + + \| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,128] range 160 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + \| 140 +cmpxchg +-H--+ +++ +++ ++ \| master +-N--+ E--------E------+E+------++\| 120 ++ --\| \| +++ E+ \| -- +++ +++ ++\| 100 ++ - ++ \| +++- +++ ++\| 80 ++ -+E+ -+H+------+H+------H--------++ \| ---- ---- +++ H\| \| ---+E+-----+E+- ---+H+ ++\| 60 ++ +E+--- +++ ---+H+--- ++ \| --+++ ---+H+-- \| 40 ++ +E+-+H+--- ++ \| +H+ \| 20 +EE+ ++ +N+ + + + + + + \| 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,1024] range 350 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + \| 300 +cmpxchg +-H--+ +++ \| master +-N--+ +++ \|\| \| +++ \| ----E\| 250 ++ \| ----E---- ++ \| ----E--- \| ---+H\| 200 ++ -+E+--- +++ ---+H+--- ++ \| ---- -+H+-- \| \| +E+ +++ ---- +++ \| 150 ++ ---+++ ---+H+- ++ \| --- -+H+-- \| 100 ++ ---+E+ ---- +++ ++ \| +++ ---+E+-----+H+- \| \| -+E+------+H+-- \| 50 ++ +E+ ++ +EE+ + + + + + + \| 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads hi-res: http://imgur.com/a/fMRmq For master I stopped measuring master after 8 threads, because there is little point in measuring the well-known performance collapse of a contended lock. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>		2016-10-26 08:29:01 -07:00
..
arch_dump.c	x86: Clean up includes	2016-01-29 15:07:22 +00:00
arch_memory_mapping.c	x86: Clean up includes	2016-01-29 15:07:22 +00:00
bpt_helper.c	cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc()	2016-06-09 15:55:02 +01:00
cc_helper_template.h	target-i386: Implement BLSR, BLSMSK, BLSI	2013-02-18 15:52:05 -08:00
cc_helper.c	target-i386: Perform set/reset_inhibit_irq inline	2016-02-13 07:59:59 +11:00
cpu-qom.h	exec: call cpu_exec_exit() from a CPU unrealize common function	2016-10-24 17:29:16 -02:00
cpu.c	exec: call cpu_exec_exit() from a CPU unrealize common function	2016-10-24 17:29:16 -02:00
cpu.h	pc: apic_common: Extend APIC ID property to 32bit	2016-10-24 17:29:15 -02:00
excp_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2016-05-19 16:42:29 +02:00
fpu_helper.c	target-i386: Use struct X86XSaveArea in fpu_helper.c	2016-09-19 15:34:35 -03:00
gdbstub.c	qemu-common: push cpu.h inclusion out of qemu-common.h	2016-05-19 16:42:29 +02:00
helper.c	cpus: pass CPUState to run_on_cpu helpers	2016-09-27 11:57:29 +02:00
helper.h	target-i386: remove helper_lock()	2016-10-26 08:29:01 -07:00
hyperv.c	event-notifier: Add "is_external" parameter	2016-04-22 16:43:56 +02:00
hyperv.h	Clean up header guards that don't match their file name	2016-07-12 16:19:16 +02:00
int_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2016-05-19 16:42:29 +02:00
kvm_i386.h	pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode	2016-10-24 17:29:15 -02:00
kvm-stub.c	intel_iommu: reject broken EIM	2016-10-17 15:44:49 -02:00
kvm.c	pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode	2016-10-24 17:29:15 -02:00
machine.c	target-i386: kvm: Add basic Intel LMCE support	2016-07-07 15:25:16 -03:00
Makefile.objs	target-i386: Enable control registers for MPX	2016-02-13 07:59:59 +11:00
mem_helper.c	target-i386: remove helper_lock()	2016-10-26 08:29:01 -07:00
misc_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2016-05-19 16:42:29 +02:00
monitor.c	hmp: fix qemu crash due to ioapic state dump w/ split irqchip	2016-10-04 17:16:15 +01:00
mpx_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2016-05-19 16:42:29 +02:00
ops_sse_header.h	target-i386: Rename struct XMMReg to ZMMReg	2016-01-21 12:47:15 -02:00
ops_sse.h	target-i386: Rename XMM_[BWLSDQ] helpers to ZMM_*	2016-01-21 12:47:16 -02:00
seg_helper.c	target-i386: Fixed syscall posssible segfault	2016-09-14 22:52:44 +02:00
shift_helper_template.h	target-i386: compute eflags outside rcl/rcr helper	2013-02-18 15:03:56 -08:00
smm_helper.c	target-i386: Enable control registers for MPX	2016-02-13 07:59:59 +11:00
svm_helper.c	cpu: move exec-all.h inclusion out of cpu.h	2016-05-19 16:42:29 +02:00
svm.h	Clean up ill-advised or unusual header guards	2016-07-12 16:20:46 +02:00
TODO	target-i386: fix {min,max}{pd,ps,sd,ss} SSE2 instructions	2012-01-11 09:55:28 +01:00
trace-events	trace-events: fix first line comment in trace-events	2016-08-12 10:36:01 +01:00
translate.c	target-i386: remove helper_lock()	2016-10-26 08:29:01 -07:00