qemu-e2k/target-i386
Emilio G. Cota 37b995f6e7 target-i386: remove helper_lock()
It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
 $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

                atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

              atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:01 -07:00
..
Makefile.objs target-i386: Enable control registers for MPX 2016-02-13 07:59:59 +11:00
TODO target-i386: fix {min,max}{pd,ps,sd,ss} SSE2 instructions 2012-01-11 09:55:28 +01:00
arch_dump.c x86: Clean up includes 2016-01-29 15:07:22 +00:00
arch_memory_mapping.c x86: Clean up includes 2016-01-29 15:07:22 +00:00
bpt_helper.c cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc() 2016-06-09 15:55:02 +01:00
cc_helper.c target-i386: Perform set/reset_inhibit_irq inline 2016-02-13 07:59:59 +11:00
cc_helper_template.h target-i386: Implement BLSR, BLSMSK, BLSI 2013-02-18 15:52:05 -08:00
cpu-qom.h exec: call cpu_exec_exit() from a CPU unrealize common function 2016-10-24 17:29:16 -02:00
cpu.c exec: call cpu_exec_exit() from a CPU unrealize common function 2016-10-24 17:29:16 -02:00
cpu.h pc: apic_common: Extend APIC ID property to 32bit 2016-10-24 17:29:15 -02:00
excp_helper.c cpu: move exec-all.h inclusion out of cpu.h 2016-05-19 16:42:29 +02:00
fpu_helper.c target-i386: Use struct X86XSaveArea in fpu_helper.c 2016-09-19 15:34:35 -03:00
gdbstub.c qemu-common: push cpu.h inclusion out of qemu-common.h 2016-05-19 16:42:29 +02:00
helper.c cpus: pass CPUState to run_on_cpu helpers 2016-09-27 11:57:29 +02:00
helper.h target-i386: remove helper_lock() 2016-10-26 08:29:01 -07:00
hyperv.c event-notifier: Add "is_external" parameter 2016-04-22 16:43:56 +02:00
hyperv.h Clean up header guards that don't match their file name 2016-07-12 16:19:16 +02:00
int_helper.c cpu: move exec-all.h inclusion out of cpu.h 2016-05-19 16:42:29 +02:00
kvm-stub.c intel_iommu: reject broken EIM 2016-10-17 15:44:49 -02:00
kvm.c pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode 2016-10-24 17:29:15 -02:00
kvm_i386.h pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode 2016-10-24 17:29:15 -02:00
machine.c target-i386: kvm: Add basic Intel LMCE support 2016-07-07 15:25:16 -03:00
mem_helper.c target-i386: remove helper_lock() 2016-10-26 08:29:01 -07:00
misc_helper.c cpu: move exec-all.h inclusion out of cpu.h 2016-05-19 16:42:29 +02:00
monitor.c hmp: fix qemu crash due to ioapic state dump w/ split irqchip 2016-10-04 17:16:15 +01:00
mpx_helper.c cpu: move exec-all.h inclusion out of cpu.h 2016-05-19 16:42:29 +02:00
ops_sse.h target-i386: Rename XMM_[BWLSDQ] helpers to ZMM_* 2016-01-21 12:47:16 -02:00
ops_sse_header.h target-i386: Rename struct XMMReg to ZMMReg 2016-01-21 12:47:15 -02:00
seg_helper.c target-i386: Fixed syscall posssible segfault 2016-09-14 22:52:44 +02:00
shift_helper_template.h target-i386: compute eflags outside rcl/rcr helper 2013-02-18 15:03:56 -08:00
smm_helper.c target-i386: Enable control registers for MPX 2016-02-13 07:59:59 +11:00
svm.h Clean up ill-advised or unusual header guards 2016-07-12 16:20:46 +02:00
svm_helper.c cpu: move exec-all.h inclusion out of cpu.h 2016-05-19 16:42:29 +02:00
trace-events trace-events: fix first line comment in trace-events 2016-08-12 10:36:01 +01:00
translate.c target-i386: remove helper_lock() 2016-10-26 08:29:01 -07:00