Commit Graph

275 Commits

Author SHA1 Message Date
Richard Henderson
cae5d53b9e tcg: Remove TCG_TARGET_HAS_cmp_vec
The cmp_vec opcode is mandatory; this symbol is unused.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-08 05:57:32 -05:00
Richard Henderson
f80d09b599 tcg/i386: Fix dupi for avx2 32-bit hosts
The previous change wrongly stated that 32-bit avx2 should have
used VPBROADCASTW.  But that's a 16-bit broadcast and we want a
32-bit broadcast.

Fixes: 7b60ef3264
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-08 05:57:32 -05:00
Richard Henderson
74a117906b tcg: Remove TCG_CT_REG
This wasn't actually used for anything, really.  All variable
operands must accept registers, and which are indicated by the
set in TCGArgConstraint.regs.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-08 05:57:32 -05:00
Richard Henderson
9be0d08019 tcg: Drop union from TCGArgConstraint
The union is unused; let "regs" appear in the main structure
without the "u.regs" wrapping.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-08 05:57:32 -05:00
Stefan Hajnoczi
d73415a315 qemu/atomic.h: rename atomic_ to qatomic_
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:

  $ CC=clang CXX=clang++ ./configure ... && make
  ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)

Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.

This patch was generated using:

  $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
    sort -u >/tmp/changed_identifiers
  $ for identifier in $(</tmp/changed_identifiers); do
        sed -i "s%\<$identifier\>%q$identifier%g" \
            $(git grep -I -l "\<$identifier\>")
    done

I manually fixed line-wrap issues and misaligned rST tables.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
2020-09-23 16:07:44 +01:00
Paolo Bonzini
139c1837db meson: rename included C source files to .c.inc
With Makefiles that have automatically generated dependencies, you
generated includes are set as dependencies of the Makefile, so that they
are built before everything else and they are available when first
building the .c files.

Alternatively you can use a fine-grained dependency, e.g.

        target/arm/translate.o: target/arm/decode-neon-shared.inc.c

With Meson you have only one choice and it is a third option, namely
"build at the beginning of the corresponding target"; the way you
express it is to list the includes in the sources of that target.

The problem is that Meson decides if something is a source vs. a
generated include by looking at the extension: '.c', '.cc', '.m', '.C'
are sources, while everything else is considered an include---including
'.inc.c'.

Use '.c.inc' to avoid this, as it is consistent with our other convention
of using '.rst.inc' for included reStructuredText files.  The editorconfig
file is adjusted.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 06:18:30 -04:00
Richard Henderson
885b1706df tcg/i386: Implement INDEX_op_rotl{i,s,v}_vec
For immediates, we must continue the special casing of 8-bit
elements.  The other element sizes and shift types are trivially
implemented with shifts.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-06-02 08:42:37 -07:00
Richard Henderson
23850a74af tcg: Implement gvec support for rotate by scalar
No host backend support yet, but the interfaces for rotls
are in place.  Only implement left-rotate for now, as the
only known use of vector rotate by scalar is s390x, so any
right-rotate would be unused and untestable.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-06-02 08:42:37 -07:00
Richard Henderson
5d0ceda902 tcg: Implement gvec support for rotate by vector
No host backend support yet, but the interfaces for rotlv
and rotrv are in place.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Drop the generic expansion from rot to shift; we can do better
    for each backend, and then this code becomes unused.
2020-06-02 08:42:37 -07:00
Richard Henderson
b0f7e7444c tcg: Implement gvec support for rotate by immediate
No host backend support yet, but the interfaces for rotli
are in place.  Canonicalize immediate rotate to the left,
based on a survey of architectures, but provide both left
and right shift interfaces to the translators.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-06-02 08:42:37 -07:00
Richard Henderson
cce743abbf tcg/i386: Fix %r12 guest_base initialization
When %gs cannot be used, we use register offset addressing.
This path is almost never used, so it was clearly not tested.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200406174803.8192-1-richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2020-04-07 16:19:49 +01:00
Richard Henderson
e20cb81d9c tcg/i386: Fix INDEX_op_dup2_vec
We were only constructing the 64-bit element, and not
replicating the 64-bit element across the rest of the vector.

Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-03-30 11:16:17 -07:00
Richard Henderson
312b426fea tcg/i386: Bound shift count expanding sari_vec
A given RISU testcase for SVE can produce

tcg-op-vec.c:511: do_shifti: Assertion `i >= 0 && i < (8 << vece)' failed.

because expand_vec_sari gave a shift count of 32 to a MO_32
vector shift.

In 44f1441dbe, we changed from direct expansion of vector opcodes
to re-use of the tcg expanders.  So while the comment correctly notes
that the hw will handle such a shift count, we now have to take our
own sanity checks into account.  Which is easy in this particular case.

Fixes: 44f1441dbe
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-03-17 08:41:07 -07:00
Philippe Mathieu-Daudé
2b434dd127 tcg: Search includes in the parent source directory
All the *.inc.c files included by tcg/$TARGET/tcg-target.inc.c
are in tcg/, their parent directory. To simplify the preprocessor
search path, include the relative parent path: '..'.

Patch created mechanically by running:

  $ for x in tcg-pool.inc.c tcg-ldst.inc.c; do \
    sed -i "s,#include \"$x\",#include \"../$x\"," \
      $(git grep -l "#include \"$x\""); \
    done

Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200101112303.20724-3-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-01-15 15:13:10 -10:00
Philippe Mathieu-Daudé
dcb32f1d8f tcg: Search includes from the project root source directory
We currently search both the root and the tcg/ directories for tcg
files:

  $ git grep '#include "tcg/' | wc -l
  28

  $ git grep '#include "tcg[^/]' | wc -l
  94

To simplify the preprocessor search path, unify by expliciting the
tcg/ directory.

Patch created mechanically by running:

  $ for x in \
      tcg.h tcg-mo.h tcg-op.h tcg-opc.h \
      tcg-op-gvec.h tcg-gvec-desc.h; do \
    sed -i "s,#include \"$x\",#include \"tcg/$x\"," \
      $(git grep -l "#include \"$x\""); \
    done

Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200101112303.20724-2-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-01-15 15:13:10 -10:00
Peter Maydell
2029bf7e52 tcg/i386/tcg-target.opc.h: Add copyright/license
Add the copyright/license boilerplate for tcg/i386/tcg-target.opc.h.
This file has had only one commit, 770c2fc7bb, by
a Linaro engineer.
The license is MIT, since that's what the rest of tcg/i386/ is.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20191025155848.17362-3-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-11-11 15:11:21 +01:00
Tony Nguyen
14776ab5a1 tcg: TCGMemOp is now accelerator independent MemOp
Preparation for collapsing the two byte swaps, adjust_endianness and
handle_bswap, along the I/O path.

Target dependant attributes are conditionalized upon NEED_CPU_H.

Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <81d9cd7d7f5aaadfa772d6c48ecee834e9cf7882.1566466906.git.tony.nguyen@bt.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03 08:30:38 -07:00
Richard Henderson
269bd5d8f6 cpu: Move the softmmu tlb to CPUNegativeOffsetState
We have for some time had code within the tcg backends to
handle large positive offsets from env.  This move makes
sure that need not happen.  Indeed, we are able to assert
at build time that simple offsets suffice for all hosts.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-10 07:03:42 -07:00
Richard Henderson
a40ec84ee2 tcg: Create struct CPUTLB
Move all softmmu tlb data into this structure.  Arrange the
members so that we are able to place mask+table together and
at a smaller absolute offset from ENV.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-10 07:03:34 -07:00
Richard Henderson
11e2bfef79 tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
This instruction raises #GP, aka SIGSEGV, if the effective address
is not aligned to 16-bytes.

We have assertions in tcg-op-gvec.c that the offset from ENV is
aligned, for vector types <= V128.  But the offset itself does not
validate that the final pointer is aligned -- one must also remember
to use the QEMU_ALIGNED() attribute on the vector member within ENV.

PowerPC Altivec has vector load/store instructions that silently
discard the low 4 bits of the address, making alignment mistakes
difficult to discover.  Aid that by making the most popular host
visibly signal the error.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
ebcfb91abe tcg/i386: Use umin/umax in expanding unsigned compare
Using umin(a, b) == a as an expansion for TCG_COND_LEU is a
better alternative to (a - INT_MIN) <= (b - INT_MIN).

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
3ec3538a45 tcg/i386: Remove expansion for missing minmax
This is now handled by code within tcg-op-vec.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
904c5e1967 tcg/i386: Support vector comparison select value
We already had backend support for this feature.  Expand the new
cmpsel opcode using vpblendb.  The combination allows us to avoid
an extra NOT for some comparison codes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
f75da2988e tcg: Add support for vector compare select
Perform a per-element conditional move.  This combination operation is
easier to implement on some host vector units than plain cmp+bitsel.
Omit the usual gvec interface, as this is intended to be used by
target-specific gvec expansion call-backs.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
38dc12947e tcg: Add support for vector bitwise select
This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units.  Include gvec expanders.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
7b60ef3264 tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
The VBROADCASTSD instruction only allows %ymm registers as destination.
Rather than forcing VEX.L and writing to the entire 256-bit register,
revert to using MOVDDUP with an %xmm register.  This is sufficient for
an avx1 host since we do not support TCG_TYPE_V256 for that case.

Also fix the 32-bit avx2, which should have used VPBROADCASTW.

Fixes: 1e262b49b5
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
18f9b65f1a tcg/i386: Support vector absolute value
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
bcefc90208 tcg: Add support for vector absolute value
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
0a8d7a3bf5 tcg/i386: Support vector scalar shift opcodes
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
a2ce146a06 tcg/i386: Support vector variable shift opcodes
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
37ee55a081 tcg: Add INDEX_op_dupm_vec
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first.  This is especially important
if integer/vector register moves do not exist.

Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:

  VECE == 32/64:
    Load the value into a vector register, then dup.
    Both of these must work.

  VECE == 8/16:
    If the value happens to be at an offset such that an aligned
    load would place the desired value in the least significant
    end of the register, go ahead and load w/garbage in high bits.

    Load the value w/INDEX_op_ld{8,16}_i32.
    Attempt a move directly to vector reg, which may fail.
    Store the value into the backing store for OTS.
    Load the value into the vector reg w/TCG_TYPE_I32, which must work.
    Duplicate from the vector reg into itself, which must work.

All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
1e262b49b5 tcg/i386: Implement tcg_out_dupm_vec
At the same time, improve tcg_out_dupi_vec wrt broadcast
from the constant pool.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
d6ecb4a978 tcg: Add tcg_out_dupm_vec to the backend interface
Currently stubbed out in all backends that support vectors.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
bab1671f0f tcg: Manually expand INDEX_op_dup_vec
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Added some commentary to the tcg_reg_alloc_* functions.
2019-05-13 14:44:03 -07:00
Richard Henderson
e7632cfa8b tcg: Promote tcg_out_{dup,dupi}_vec to backend interface
The i386 backend already has these functions, and the aarch64 backend
could easily split out one.  Nothing is done with these functions yet,
but this will aid register allocation of INDEX_op_dup_vec in a later patch.

Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
78113e83e0 tcg: Return bool success from tcg_out_mov
This patch merely changes the interface, aborting on all failures,
of which there are currently none.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
aeee05f53a tcg: Restart TB generation after out-of-line ldst overflow
This is part c of relocation overflow handling.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:05:28 -07:00
Richard Henderson
c6fb8c0cf7 tcg/i386: Support INDEX_op_extract2_{i32,i64}
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
fce1296f13 tcg: Add INDEX_op_extract2_{i32,i64}
This will let backends implement the double-word shift operation.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Mark Cave-Ayland
3115584d39 tcg/i386: fix unsigned vector saturating arithmetic
Due to a cut/paste error in the original implementation, the unsigned
vector saturating arithmetic was erroneously being calculated as signed
vector saturating arithmetic.

Fixes: 8ffafbcec2 ("tcg/i386: Implement vector saturating arithmetic")
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190207224258.426-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-11 08:52:44 -08:00
Richard Henderson
e77c89fb08 cputlb: Remove static tlb sizing
Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB,
remove the define and the old code.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:35 -08:00
Emilio G. Cota
54eaf40b8f tcg/i386: enable dynamic TLB sizing
As the following experiments show, this series is a net perf gain,
particularly for memory-heavy workloads. Experiments are run on an
Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz.

1. System boot + shudown, debian aarch64:

- Before (v3.1.0):
 Performance counter stats for './die.sh v3.1.0' (10 runs):

       9019.797015      task-clock (msec)         #    0.993 CPUs utilized            ( +-  0.23% )
    29,910,312,379      cycles                    #    3.316 GHz                      ( +-  0.14% )
    54,699,252,014      instructions              #    1.83  insn per cycle           ( +-  0.08% )
    10,061,951,686      branches                  # 1115.541 M/sec                    ( +-  0.08% )
       172,966,530      branch-misses             #    1.72% of all branches          ( +-  0.07% )

       9.084039051 seconds time elapsed                                          ( +-  0.23% )

- After:
 Performance counter stats for './die.sh tlb-dyn-v5' (10 runs):

       8624.084842      task-clock (msec)         #    0.993 CPUs utilized            ( +-  0.23% )
    28,556,123,404      cycles                    #    3.311 GHz                      ( +-  0.13% )
    51,755,089,512      instructions              #    1.81  insn per cycle           ( +-  0.05% )
     9,526,513,946      branches                  # 1104.641 M/sec                    ( +-  0.05% )
       166,578,509      branch-misses             #    1.75% of all branches          ( +-  0.19% )

       8.680540350 seconds time elapsed                                          ( +-  0.24% )

That is, a 4.4% perf increase.

2. System boot + shutdown, ubuntu 18.04 x86_64:

- Before (v3.1.0):
      56100.574751      task-clock (msec)         #    1.016 CPUs utilized            ( +-  4.81% )
   200,745,466,128      cycles                    #    3.578 GHz                      ( +-  5.24% )
   431,949,100,608      instructions              #    2.15  insn per cycle           ( +-  5.65% )
    77,502,383,330      branches                  # 1381.490 M/sec                    ( +-  6.18% )
       844,681,191      branch-misses             #    1.09% of all branches          ( +-  3.82% )

      55.221556378 seconds time elapsed                                          ( +-  5.01% )

- After:
      56603.419540      task-clock (msec)         #    1.019 CPUs utilized            ( +- 10.19% )
   202,217,930,479      cycles                    #    3.573 GHz                      ( +- 10.69% )
   439,336,291,626      instructions              #    2.17  insn per cycle           ( +- 14.14% )
    80,538,357,447      branches                  # 1422.853 M/sec                    ( +- 16.09% )
       776,321,622      branch-misses             #    0.96% of all branches          ( +-  3.77% )

      55.549661409 seconds time elapsed                                          ( +- 10.44% )

No improvement (within noise range). Note that for this workload,
increasing the time window too much can lead to perf degradation,
since it flushes the TLB *very* frequently.

3. x86_64 SPEC06int:

           x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set)
            Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)

5.5 +------------------------------------------------------------------------+
    |                   +-+                                                  |
  5 |-+.................+-+...............................tlb-dyn-v5.......+-|
    |                   * *                                                  |
4.5 |-+.................*.*................................................+-|
    |                   * *                                                  |
  4 |-+.................*.*................................................+-|
    |                   * *                                                  |
3.5 |-+.................*.*................................................+-|
    |                   * *                                                  |
  3 |-+......+-+*.......*.*................................................+-|
    |        *  *       * *                                                  |
2.5 |-+......*..*.......*.*.................................+-+*...........+-|
    |        *  *       * *                                 *  *             |
  2 |-+......*..*.......*.*.................................*..*...........+-|
    |        *  *       * *                                 *  *  +-+        |
1.5 |-+......*..*.......*.*.................................*..*.*+-+.*+-+.+-|
    |        *  * *+-+  * *  +-+       *+-+  +-+       +-+  *  * *  * *  *   |
  1 |++++-+*+*++*+*++*++*+*++*+*+++-+*+*+-++*+-++++-++++-+++*++*+*++*+*++*+++|
    |   *  * *  * *  *  * *  * *  *  * *  * *  *  * *  * *  *  * *  * *  *   |
0.5 +------------------------------------------------------------------------+
  400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean
  png: https://imgur.com/YRF90f7

That is, a 1.51x average speedup over the baseline, with a max speedup
of 5.17x.

Here's a different look at the SPEC06int results, using KVM as the baseline:

             x86_64-softmmu slowdown vs. KVM for SPEC06int (test set)
             Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)

25 +---------------------------------------------------------------------------+
   |                   +-+                                        +-+          |
   |                   * *                             +-+      v3.1.0         |
   |                   * *                             +-+  tlb-dyn-v5         |
   |                   * *                             * *        +-+          |
20 |-+.................*.*.............................*.+-+......*.*........+-|
   |                   * *                             * # #      * *          |
   |        +-+        * *                             * # #      * *          |
   |        * *        * *                             * # #      * *          |
15 |-+......*.*........*.*.............................*.#.#......*.+-+......+-|
   |        * *        * *                             * # #      * #|#        |
   |        * *        * *        +-+                  * # #      * +-+        |
   |        * *  +-+   * *        ++-+       +-+       * # #      * # # +-+    |
   |        * *  +-+   * *        * ##       *|   +-+  * # #      * # # +-+    |
10 |-+......*.*..*.+-+.*.*........*.##.......++-+.*.+-+*.#.#......*.#.#.*.*..+-|
   |        * *  * +-+ * *        * ## +-+   *# # * # #* # # +-+  * # # * *    |
   |        * *  * # # * *  +-+   * ## * +-+ *# # * # #* # # * *  * # # *+-+   |
   |        * *  * # # * *  * +-+ * ## * # # *# # * # #* # # * *  * # # * ##   |
 5 |-+......*.+-+*.#.#.*.*..*.#.#.*.##.*.#.#.*#.#.*.#.#*.#.#.*.*..*.#.#.*.##.+-|
   |        * # #* # # * +-+* # # * ## * # # *# # * # #* # # * *  * # # * ##   |
   |        * # #* # # * # #* # # * ## * # # *# # * # #* # # * +-+* # # * ##   |
   |   ++-+ * # #* # # * # #* # # * ## * # # *# # * # #* # # * # #* # # * ##   |
   |+++*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+*+#+#+*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+++|
 0 +---------------------------------------------------------------------------+
 400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean
  png: https://imgur.com/YzAMNEV

After this series, we bring down the average SPEC06int slowdown vs KVM
from 11.47x to 7.58x.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-4-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Emilio G. Cota
86e1eff8bc tcg: introduce dynamic TLB sizing
Disabled in all TCG backends for now.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
bc37faf4cb tcg/i386: Implement vector minmax arithmetic
The avx instruction set does not directly provide MO_64.
We can still implement 64-bit with comparison and vpblendvb.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
8ffafbcec2 tcg/i386: Implement vector saturating arithmetic
Only MO_8 and MO_16 are implemented, since that's all the
instruction set provides.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
44f1441dbe tcg/i386: Split subroutines out of tcg_expand_vec_op
This routine was becoming too large.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
dd0a0fcdd8 tcg: Add opcodes for vector minmax arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
8afaf05066 tcg: Add opcodes for vector saturated arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Paolo Bonzini
7d37435bd5 avoid TABs in files that only contain a few
Most files that have TABs only contain a handful of them.  Change
them to spaces so that we don't confuse people.

disas, standard-headers, linux-headers and libdecnumber are imported
from other projects and probably should be exempted from the check.
Outside those, after this patch the following files still contain both
8-space and TAB sequences at the beginning of the line.  Many of them
have a majority of TABs, or were initially committed with all tabs.

    bsd-user/i386/target_syscall.h
    bsd-user/x86_64/target_syscall.h
    crypto/aes.c
    hw/audio/fmopl.c
    hw/audio/fmopl.h
    hw/block/tc58128.c
    hw/display/cirrus_vga.c
    hw/display/xenfb.c
    hw/dma/etraxfs_dma.c
    hw/intc/sh_intc.c
    hw/misc/mst_fpga.c
    hw/net/pcnet.c
    hw/sh4/sh7750.c
    hw/timer/m48t59.c
    hw/timer/sh_timer.c
    include/crypto/aes.h
    include/disas/bfd.h
    include/hw/sh4/sh.h
    libdecnumber/decNumber.c
    linux-headers/asm-generic/unistd.h
    linux-headers/linux/kvm.h
    linux-user/alpha/target_syscall.h
    linux-user/arm/nwfpe/double_cpdo.c
    linux-user/arm/nwfpe/fpa11_cpdt.c
    linux-user/arm/nwfpe/fpa11_cprt.c
    linux-user/arm/nwfpe/fpa11.h
    linux-user/flat.h
    linux-user/flatload.c
    linux-user/i386/target_syscall.h
    linux-user/ppc/target_syscall.h
    linux-user/sparc/target_syscall.h
    linux-user/syscall.c
    linux-user/syscall_defs.h
    linux-user/x86_64/target_syscall.h
    slirp/cksum.c
    slirp/if.c
    slirp/ip.h
    slirp/ip_icmp.c
    slirp/ip_icmp.h
    slirp/ip_input.c
    slirp/ip_output.c
    slirp/mbuf.c
    slirp/misc.c
    slirp/sbuf.c
    slirp/socket.c
    slirp/socket.h
    slirp/tcp_input.c
    slirp/tcpip.h
    slirp/tcp_output.c
    slirp/tcp_subr.c
    slirp/tcp_timer.c
    slirp/tftp.c
    slirp/udp.c
    slirp/udp.h
    target/cris/cpu.h
    target/cris/mmu.c
    target/cris/op_helper.c
    target/sh4/helper.c
    target/sh4/op_helper.c
    target/sh4/translate.c
    tcg/sparc/tcg-target.inc.c
    tests/tcg/cris/check_addo.c
    tests/tcg/cris/check_moveq.c
    tests/tcg/cris/check_swap.c
    tests/tcg/multiarch/test-mmap.c
    ui/vnc-enc-hextile-template.h
    ui/vnc-enc-zywrle.h
    util/envlist.c
    util/readline.c

The following have only TABs:

    bsd-user/i386/target_signal.h
    bsd-user/sparc64/target_signal.h
    bsd-user/sparc64/target_syscall.h
    bsd-user/sparc/target_signal.h
    bsd-user/sparc/target_syscall.h
    bsd-user/x86_64/target_signal.h
    crypto/desrfb.c
    hw/audio/intel-hda-defs.h
    hw/core/uboot_image.h
    hw/sh4/sh7750_regnames.c
    hw/sh4/sh7750_regs.h
    include/hw/cris/etraxfs_dma.h
    linux-user/alpha/termbits.h
    linux-user/arm/nwfpe/fpopcode.h
    linux-user/arm/nwfpe/fpsr.h
    linux-user/arm/syscall_nr.h
    linux-user/arm/target_signal.h
    linux-user/cris/target_signal.h
    linux-user/i386/target_signal.h
    linux-user/linux_loop.h
    linux-user/m68k/target_signal.h
    linux-user/microblaze/target_signal.h
    linux-user/mips64/target_signal.h
    linux-user/mips/target_signal.h
    linux-user/mips/target_syscall.h
    linux-user/mips/termbits.h
    linux-user/ppc/target_signal.h
    linux-user/sh4/target_signal.h
    linux-user/sh4/termbits.h
    linux-user/sparc64/target_syscall.h
    linux-user/sparc/target_signal.h
    linux-user/x86_64/target_signal.h
    linux-user/x86_64/termbits.h
    pc-bios/optionrom/optionrom.h
    slirp/mbuf.h
    slirp/misc.h
    slirp/sbuf.h
    slirp/tcp.h
    slirp/tcp_timer.h
    slirp/tcp_var.h
    target/i386/svm.h
    target/sparc/asi.h
    target/xtensa/core-dc232b/xtensa-modules.inc.c
    target/xtensa/core-dc233c/xtensa-modules.inc.c
    target/xtensa/core-de212/core-isa.h
    target/xtensa/core-de212/xtensa-modules.inc.c
    target/xtensa/core-fsf/xtensa-modules.inc.c
    target/xtensa/core-sample_controller/core-isa.h
    target/xtensa/core-sample_controller/xtensa-modules.inc.c
    target/xtensa/core-test_kc705_be/core-isa.h
    target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
    tests/tcg/cris/check_abs.c
    tests/tcg/cris/check_addc.c
    tests/tcg/cris/check_addcm.c
    tests/tcg/cris/check_addoq.c
    tests/tcg/cris/check_bound.c
    tests/tcg/cris/check_ftag.c
    tests/tcg/cris/check_int64.c
    tests/tcg/cris/check_lz.c
    tests/tcg/cris/check_openpf5.c
    tests/tcg/cris/check_sigalrm.c
    tests/tcg/cris/crisutils.h
    tests/tcg/cris/sys.c
    tests/tcg/i386/test-i386-ssse3.c
    ui/vgafont.h

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181213223737.11793-3-pbonzini@redhat.com>
Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:56 +01:00
Richard Henderson
e1dcf3529d tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP
For now, defined universally as true, since we previously required
backends to implement swapped memory operations.  Future patches
may now remove that support where it is onerous.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00