Commit Graph

2090 Commits

Author SHA1 Message Date
Richard Henderson
f75da2988e tcg: Add support for vector compare select
Perform a per-element conditional move.  This combination operation is
easier to implement on some host vector units than plain cmp+bitsel.
Omit the usual gvec interface, as this is intended to be used by
target-specific gvec expansion call-backs.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
38dc12947e tcg: Add support for vector bitwise select
This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units.  Include gvec expanders.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
532ba368a1 tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem
The paths through tcg_gen_dup_mem_vec and through MO_128 were
missing the check_size_align.  The path through MO_128 was also
missing the expand_clr.  This last was not visible because the
only user is ARM SVE, which would set oprsz == maxsz, and not
require the clear.

Fix by adding the check_size_align and using do_dup directly
instead of duplicating the check in tcg_gen_gvec_dup_{i32,i64}.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
7b60ef3264 tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
The VBROADCASTSD instruction only allows %ymm registers as destination.
Rather than forcing VEX.L and writing to the entire 256-bit register,
revert to using MOVDDUP with an %xmm register.  This is sufficient for
an avx1 host since we do not support TCG_TYPE_V256 for that case.

Also fix the 32-bit avx2, which should have used VPBROADCASTW.

Fixes: 1e262b49b5
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Richard Henderson
a7b6d286cf tcg/aarch64: Do not advertise minmax for MO_64
The min/max instructions are not available for 64-bit elements.

Fixes: 93f332a503
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
a456394ae5 tcg/aarch64: Support vector absolute value
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
18f9b65f1a tcg/i386: Support vector absolute value
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
bcefc90208 tcg: Add support for vector absolute value
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
ff1f11f7f8 tcg: Add support for integer absolute value
Remove a function of the same name from target/arm/.
Use a branchless implementation of abs gleaned from gcc.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
0a8d7a3bf5 tcg/i386: Support vector scalar shift opcodes
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
b4578cd91c tcg: Add gvec expanders for vector shift by scalar
Allow expansion either via shift by scalar or by replicating
the scalar for shift by vector.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Use a private structure for do_gvec_shifts.
2019-05-13 22:52:08 +00:00
Richard Henderson
79525dfd08 tcg/aarch64: Support vector variable shift opcodes
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
a2ce146a06 tcg/i386: Support vector variable shift opcodes
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
5ee5c14cac tcg: Add gvec expanders for variable shift
The gvec expanders perform a modulo on the shift count.  If the target
requires alternate behaviour, then it cannot use the generic gvec
expanders anyway, and will have to have its own custom code.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
37ee55a081 tcg: Add INDEX_op_dupm_vec
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first.  This is especially important
if integer/vector register moves do not exist.

Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:

  VECE == 32/64:
    Load the value into a vector register, then dup.
    Both of these must work.

  VECE == 8/16:
    If the value happens to be at an offset such that an aligned
    load would place the desired value in the least significant
    end of the register, go ahead and load w/garbage in high bits.

    Load the value w/INDEX_op_ld{8,16}_i32.
    Attempt a move directly to vector reg, which may fail.
    Store the value into the backing store for OTS.
    Load the value into the vector reg w/TCG_TYPE_I32, which must work.
    Duplicate from the vector reg into itself, which must work.

All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
f23e5e15ed tcg/aarch64: Implement tcg_out_dupm_vec
The LD1R instruction does all the work.  Note that the only
useful addressing mode is a base register with no offset.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:50:35 +00:00
Richard Henderson
1e262b49b5 tcg/i386: Implement tcg_out_dupm_vec
At the same time, improve tcg_out_dupi_vec wrt broadcast
from the constant pool.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
d6ecb4a978 tcg: Add tcg_out_dupm_vec to the backend interface
Currently stubbed out in all backends that support vectors.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
bab1671f0f tcg: Manually expand INDEX_op_dup_vec
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Added some commentary to the tcg_reg_alloc_* functions.
2019-05-13 14:44:03 -07:00
Richard Henderson
e7632cfa8b tcg: Promote tcg_out_{dup,dupi}_vec to backend interface
The i386 backend already has these functions, and the aarch64 backend
could easily split out one.  Nothing is done with these functions yet,
but this will aid register allocation of INDEX_op_dup_vec in a later patch.

Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
240c08d099 tcg: Support cross-class moves without instruction support
PowerPC Altivec does not support direct moves between vector registers
and general registers.  So when tcg_out_mov fails, we can use the
backing memory for the temporary to perform the move.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
78113e83e0 tcg: Return bool success from tcg_out_mov
This patch merely changes the interface, aborting on all failures,
of which there are currently none.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
c16f52b2c5 tcg/arm: Use tcg_out_mov_reg in tcg_out_mov
We have a function that takes an additional condition parameter
over the standard backend interface.  It already takes care of
eliding no-op moves.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
d63e3b6e69 tcg: Assert fixed_reg is read-only
The only fixed_reg is cpu_env, and it should not be modified
during any TB.  Therefore code that tries to special-case moves
into a fixed_reg is dead.  Remove it.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
53229a7703 tcg: Specify optional vector requirements with a list
Replace the single opcode in .opc with a null-terminated
array in .opt_opc.  We still require that all opcodes be
used with the same .vece.

Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion.  Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.

Convert all existing vector aware front ends.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
ce27c5d1a3 tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded
PowerPC Altivec does not support add and subtract of 64-bit elements.
Prepare for that configuration by not assuming the operation is
universally supported.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:44:03 -07:00
Richard Henderson
ac383dde33 tcg: Do not recreate INDEX_op_neg_vec unless supported
Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec,
so that we check the type and vece for the actual operation.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:26:28 -07:00
David Hildenbrand
e1227bb6e5 tcg: Implement tcg_gen_gvec_3i()
Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however
without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed
for now.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190416185301.25344-2-david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:26:28 -07:00
Richard Henderson
b4b82d7e9c tcg/arm: Restrict constant pool displacement to 12 bits
This will not necessarily restrict the size of the TB, since for v7
the majority of constant pool usage is for calls from the out-of-line
ldst code, which is already at the end of the TB.  But this does
allow us to save one insn per reference on the off-chance.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-25 10:39:39 -07:00
Richard Henderson
a7cdaf710f tcg/ppc: Allow the constant pool to overflow at 32k
There is no point in coding for a 2GB offset when the max TB size
is already limited to 64k.  If we further restrict to 32k then we
can eliminate the extra ADDIS instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:05:28 -07:00
Richard Henderson
aeee05f53a tcg: Restart TB generation after out-of-line ldst overflow
This is part c of relocation overflow handling.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:05:28 -07:00
Richard Henderson
1768987b73 tcg: Restart TB generation after constant pool overflow
This is part b of relocation overflow handling.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:05:28 -07:00
Richard Henderson
7ecd02a06f tcg: Restart TB generation after relocation overflow
If the TB generates too much code, such that backend relocations
overflow, try again with a smaller TB.  In support of this, move
relocation processing from a random place within tcg_out_op, in
the handling of branch opcodes, to a new function at the end of
tcg_gen_code.

This is not a complete solution, as there are additional relocs
generated for out-of-line ldst handling and constant pools.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:05:21 -07:00
Richard Henderson
6e6c4efed9 tcg: Restart after TB code generation overflow
If a TB generates too much code, try again with fewer insns.

Fixes: https://bugs.launchpad.net/bugs/1824853
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
464c2969d5 tcg/aarch64: Support INDEX_op_extract2_{i32,i64}
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
3b832d67a9 tcg/arm: Support INDEX_op_extract2_i32
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
c6fb8c0cf7 tcg/i386: Support INDEX_op_extract2_{i32,i64}
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
b0a6056719 tcg: Use extract2 in tcg_gen_deposit_{i32,i64}
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
02616bad6f tcg: Use deposit and extract2 in tcg_gen_shifti_i64
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
fce1296f13 tcg: Add INDEX_op_extract2_{i32,i64}
This will let backends implement the double-word shift operation.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
David Hildenbrand
2089fcc9e7 tcg: Implement tcg_gen_extract2_{i32,i64}
Will be helpful for s390x. Input 128 bit and output 64 bit only,
which is sufficient for now.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190225154204.26751-1-david@redhat.com>
[rth: Add matching tcg_gen_extract2_i32.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Markus Armbruster
3de2faa9a8 tcg: Simplify how dump_exec_info() prints
dump_exec_info() takes an fprintf()-like callback and a FILE * to pass
to it.

Its only caller hmp_info_jit() passes monitor_fprintf() and the
current monitor cast to FILE *.  monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf().  The
type-punning is ugly.

Drop the callback, and call qemu_printf() instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-5-armbru@redhat.com>
2019-04-18 22:18:59 +02:00
Markus Armbruster
d4c51a0af3 tcg: Simplify how dump_opcount_info() prints
dump_opcount_info() takes an fprintf()-like callback and a FILE * to
pass to it.

Its only caller hmp_info_opcount() passes monitor_fprintf() and the
current monitor cast to FILE *.  monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf().  The
type-punning is ugly.

Drop the callback, and call qemu_printf() instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-4-armbru@redhat.com>
2019-04-18 22:18:59 +02:00
Richard Henderson
9e564a1dde tcg: Remove TODO file
The last update to this file was 9 years ago.  In the meantime,
4 of the 6 ideas have actually been completed.  The lat two do
not actually make sense anymore.

Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-21 10:22:24 -08:00
Mark Cave-Ayland
3115584d39 tcg/i386: fix unsigned vector saturating arithmetic
Due to a cut/paste error in the original implementation, the unsigned
vector saturating arithmetic was erroneously being calculated as signed
vector saturating arithmetic.

Fixes: 8ffafbcec2 ("tcg/i386: Implement vector saturating arithmetic")
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190207224258.426-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-11 08:52:44 -08:00
Richard Henderson
bef16ab4e6 tcg: Diagnose referenced labels that have not been emitted
Currently, a jump to a label that is not defined anywhere will
be emitted not be relocated.  This results in a jump to a random
jump target.  With tcg debugging, print a diagnostic to the -d op
file and abort.

This could help debug or detect errors like
c2d9644e6d ("target/arm: Fix crash on conditional instruction in an IT block")

Reported-by: Roman Kapl <code@rkapl.cz>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-11 08:52:44 -08:00
Thomas Huth
fb0343d5b4 tcg: Fix LGPL version number
It's either "GNU *Library* General Public version 2" or "GNU Lesser
General Public version *2.1*", but there was no "version 2.0" of the
"Lesser" library. So assume that version 2.1 is meant here.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-01-30 11:01:52 +01:00
Richard Henderson
e77c89fb08 cputlb: Remove static tlb sizing
Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB,
remove the define and the old code.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:35 -08:00
Richard Henderson
0a9a83d6bf tcg/tci: enable dynamic TLB sizing
This is automatic due to TCI using the other softtlb macros.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
ac33373e0e tcg/mips: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
a31aa4ce00 tcg/mips: Fix tcg_out_qemu_ld_slow_path
Patch the branch after it has been emitted rather
than before it exists.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
cd7d3cb7a2 tcg/arm: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
41b70f220b tcg/riscv: enable dynamic TLB sizing
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:24 -08:00
Richard Henderson
4f47e338f6 tcg/s390: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
17ff9f7801 tcg/sparc: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
644f591ab0 tcg/ppc: enable dynamic TLB sizing
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Richard Henderson
f7bcd96669 tcg/aarch64: enable dynamic TLB sizing
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:10 -08:00
Emilio G. Cota
54eaf40b8f tcg/i386: enable dynamic TLB sizing
As the following experiments show, this series is a net perf gain,
particularly for memory-heavy workloads. Experiments are run on an
Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz.

1. System boot + shudown, debian aarch64:

- Before (v3.1.0):
 Performance counter stats for './die.sh v3.1.0' (10 runs):

       9019.797015      task-clock (msec)         #    0.993 CPUs utilized            ( +-  0.23% )
    29,910,312,379      cycles                    #    3.316 GHz                      ( +-  0.14% )
    54,699,252,014      instructions              #    1.83  insn per cycle           ( +-  0.08% )
    10,061,951,686      branches                  # 1115.541 M/sec                    ( +-  0.08% )
       172,966,530      branch-misses             #    1.72% of all branches          ( +-  0.07% )

       9.084039051 seconds time elapsed                                          ( +-  0.23% )

- After:
 Performance counter stats for './die.sh tlb-dyn-v5' (10 runs):

       8624.084842      task-clock (msec)         #    0.993 CPUs utilized            ( +-  0.23% )
    28,556,123,404      cycles                    #    3.311 GHz                      ( +-  0.13% )
    51,755,089,512      instructions              #    1.81  insn per cycle           ( +-  0.05% )
     9,526,513,946      branches                  # 1104.641 M/sec                    ( +-  0.05% )
       166,578,509      branch-misses             #    1.75% of all branches          ( +-  0.19% )

       8.680540350 seconds time elapsed                                          ( +-  0.24% )

That is, a 4.4% perf increase.

2. System boot + shutdown, ubuntu 18.04 x86_64:

- Before (v3.1.0):
      56100.574751      task-clock (msec)         #    1.016 CPUs utilized            ( +-  4.81% )
   200,745,466,128      cycles                    #    3.578 GHz                      ( +-  5.24% )
   431,949,100,608      instructions              #    2.15  insn per cycle           ( +-  5.65% )
    77,502,383,330      branches                  # 1381.490 M/sec                    ( +-  6.18% )
       844,681,191      branch-misses             #    1.09% of all branches          ( +-  3.82% )

      55.221556378 seconds time elapsed                                          ( +-  5.01% )

- After:
      56603.419540      task-clock (msec)         #    1.019 CPUs utilized            ( +- 10.19% )
   202,217,930,479      cycles                    #    3.573 GHz                      ( +- 10.69% )
   439,336,291,626      instructions              #    2.17  insn per cycle           ( +- 14.14% )
    80,538,357,447      branches                  # 1422.853 M/sec                    ( +- 16.09% )
       776,321,622      branch-misses             #    0.96% of all branches          ( +-  3.77% )

      55.549661409 seconds time elapsed                                          ( +- 10.44% )

No improvement (within noise range). Note that for this workload,
increasing the time window too much can lead to perf degradation,
since it flushes the TLB *very* frequently.

3. x86_64 SPEC06int:

           x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set)
            Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)

5.5 +------------------------------------------------------------------------+
    |                   +-+                                                  |
  5 |-+.................+-+...............................tlb-dyn-v5.......+-|
    |                   * *                                                  |
4.5 |-+.................*.*................................................+-|
    |                   * *                                                  |
  4 |-+.................*.*................................................+-|
    |                   * *                                                  |
3.5 |-+.................*.*................................................+-|
    |                   * *                                                  |
  3 |-+......+-+*.......*.*................................................+-|
    |        *  *       * *                                                  |
2.5 |-+......*..*.......*.*.................................+-+*...........+-|
    |        *  *       * *                                 *  *             |
  2 |-+......*..*.......*.*.................................*..*...........+-|
    |        *  *       * *                                 *  *  +-+        |
1.5 |-+......*..*.......*.*.................................*..*.*+-+.*+-+.+-|
    |        *  * *+-+  * *  +-+       *+-+  +-+       +-+  *  * *  * *  *   |
  1 |++++-+*+*++*+*++*++*+*++*+*+++-+*+*+-++*+-++++-++++-+++*++*+*++*+*++*+++|
    |   *  * *  * *  *  * *  * *  *  * *  * *  *  * *  * *  *  * *  * *  *   |
0.5 +------------------------------------------------------------------------+
  400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean
  png: https://imgur.com/YRF90f7

That is, a 1.51x average speedup over the baseline, with a max speedup
of 5.17x.

Here's a different look at the SPEC06int results, using KVM as the baseline:

             x86_64-softmmu slowdown vs. KVM for SPEC06int (test set)
             Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)

25 +---------------------------------------------------------------------------+
   |                   +-+                                        +-+          |
   |                   * *                             +-+      v3.1.0         |
   |                   * *                             +-+  tlb-dyn-v5         |
   |                   * *                             * *        +-+          |
20 |-+.................*.*.............................*.+-+......*.*........+-|
   |                   * *                             * # #      * *          |
   |        +-+        * *                             * # #      * *          |
   |        * *        * *                             * # #      * *          |
15 |-+......*.*........*.*.............................*.#.#......*.+-+......+-|
   |        * *        * *                             * # #      * #|#        |
   |        * *        * *        +-+                  * # #      * +-+        |
   |        * *  +-+   * *        ++-+       +-+       * # #      * # # +-+    |
   |        * *  +-+   * *        * ##       *|   +-+  * # #      * # # +-+    |
10 |-+......*.*..*.+-+.*.*........*.##.......++-+.*.+-+*.#.#......*.#.#.*.*..+-|
   |        * *  * +-+ * *        * ## +-+   *# # * # #* # # +-+  * # # * *    |
   |        * *  * # # * *  +-+   * ## * +-+ *# # * # #* # # * *  * # # *+-+   |
   |        * *  * # # * *  * +-+ * ## * # # *# # * # #* # # * *  * # # * ##   |
 5 |-+......*.+-+*.#.#.*.*..*.#.#.*.##.*.#.#.*#.#.*.#.#*.#.#.*.*..*.#.#.*.##.+-|
   |        * # #* # # * +-+* # # * ## * # # *# # * # #* # # * *  * # # * ##   |
   |        * # #* # # * # #* # # * ## * # # *# # * # #* # # * +-+* # # * ##   |
   |   ++-+ * # #* # # * # #* # # * ## * # # *# # * # #* # # * # #* # # * ##   |
   |+++*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+*+#+#+*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+++|
 0 +---------------------------------------------------------------------------+
 400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean
  png: https://imgur.com/YzAMNEV

After this series, we bring down the average SPEC06int slowdown vs KVM
from 11.47x to 7.58x.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-4-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Emilio G. Cota
86e1eff8bc tcg: introduce dynamic TLB sizing
Disabled in all TCG backends for now.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
93f332a503 tcg/aarch64: Implement vector minmax arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
d32648d445 tcg/aarch64: Implement vector saturating arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
bc37faf4cb tcg/i386: Implement vector minmax arithmetic
The avx instruction set does not directly provide MO_64.
We can still implement 64-bit with comparison and vpblendvb.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
8ffafbcec2 tcg/i386: Implement vector saturating arithmetic
Only MO_8 and MO_16 are implemented, since that's all the
instruction set provides.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
44f1441dbe tcg/i386: Split subroutines out of tcg_expand_vec_op
This routine was becoming too large.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
dd0a0fcdd8 tcg: Add opcodes for vector minmax arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
8afaf05066 tcg: Add opcodes for vector saturated arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
5d6acdd4a4 tcg: Add write_aofs to GVecGen4
This allows writing 2 output, 3 input operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
f550805d83 tcg: Add gvec expanders for nand, nor, eqv
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
9a9eda78e4 tcg: Add logical simplifications during gvec expand
We handle many of these during integer expansion, and the
rest of them during integer optimization.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Paolo Bonzini
7d37435bd5 avoid TABs in files that only contain a few
Most files that have TABs only contain a handful of them.  Change
them to spaces so that we don't confuse people.

disas, standard-headers, linux-headers and libdecnumber are imported
from other projects and probably should be exempted from the check.
Outside those, after this patch the following files still contain both
8-space and TAB sequences at the beginning of the line.  Many of them
have a majority of TABs, or were initially committed with all tabs.

    bsd-user/i386/target_syscall.h
    bsd-user/x86_64/target_syscall.h
    crypto/aes.c
    hw/audio/fmopl.c
    hw/audio/fmopl.h
    hw/block/tc58128.c
    hw/display/cirrus_vga.c
    hw/display/xenfb.c
    hw/dma/etraxfs_dma.c
    hw/intc/sh_intc.c
    hw/misc/mst_fpga.c
    hw/net/pcnet.c
    hw/sh4/sh7750.c
    hw/timer/m48t59.c
    hw/timer/sh_timer.c
    include/crypto/aes.h
    include/disas/bfd.h
    include/hw/sh4/sh.h
    libdecnumber/decNumber.c
    linux-headers/asm-generic/unistd.h
    linux-headers/linux/kvm.h
    linux-user/alpha/target_syscall.h
    linux-user/arm/nwfpe/double_cpdo.c
    linux-user/arm/nwfpe/fpa11_cpdt.c
    linux-user/arm/nwfpe/fpa11_cprt.c
    linux-user/arm/nwfpe/fpa11.h
    linux-user/flat.h
    linux-user/flatload.c
    linux-user/i386/target_syscall.h
    linux-user/ppc/target_syscall.h
    linux-user/sparc/target_syscall.h
    linux-user/syscall.c
    linux-user/syscall_defs.h
    linux-user/x86_64/target_syscall.h
    slirp/cksum.c
    slirp/if.c
    slirp/ip.h
    slirp/ip_icmp.c
    slirp/ip_icmp.h
    slirp/ip_input.c
    slirp/ip_output.c
    slirp/mbuf.c
    slirp/misc.c
    slirp/sbuf.c
    slirp/socket.c
    slirp/socket.h
    slirp/tcp_input.c
    slirp/tcpip.h
    slirp/tcp_output.c
    slirp/tcp_subr.c
    slirp/tcp_timer.c
    slirp/tftp.c
    slirp/udp.c
    slirp/udp.h
    target/cris/cpu.h
    target/cris/mmu.c
    target/cris/op_helper.c
    target/sh4/helper.c
    target/sh4/op_helper.c
    target/sh4/translate.c
    tcg/sparc/tcg-target.inc.c
    tests/tcg/cris/check_addo.c
    tests/tcg/cris/check_moveq.c
    tests/tcg/cris/check_swap.c
    tests/tcg/multiarch/test-mmap.c
    ui/vnc-enc-hextile-template.h
    ui/vnc-enc-zywrle.h
    util/envlist.c
    util/readline.c

The following have only TABs:

    bsd-user/i386/target_signal.h
    bsd-user/sparc64/target_signal.h
    bsd-user/sparc64/target_syscall.h
    bsd-user/sparc/target_signal.h
    bsd-user/sparc/target_syscall.h
    bsd-user/x86_64/target_signal.h
    crypto/desrfb.c
    hw/audio/intel-hda-defs.h
    hw/core/uboot_image.h
    hw/sh4/sh7750_regnames.c
    hw/sh4/sh7750_regs.h
    include/hw/cris/etraxfs_dma.h
    linux-user/alpha/termbits.h
    linux-user/arm/nwfpe/fpopcode.h
    linux-user/arm/nwfpe/fpsr.h
    linux-user/arm/syscall_nr.h
    linux-user/arm/target_signal.h
    linux-user/cris/target_signal.h
    linux-user/i386/target_signal.h
    linux-user/linux_loop.h
    linux-user/m68k/target_signal.h
    linux-user/microblaze/target_signal.h
    linux-user/mips64/target_signal.h
    linux-user/mips/target_signal.h
    linux-user/mips/target_syscall.h
    linux-user/mips/termbits.h
    linux-user/ppc/target_signal.h
    linux-user/sh4/target_signal.h
    linux-user/sh4/termbits.h
    linux-user/sparc64/target_syscall.h
    linux-user/sparc/target_signal.h
    linux-user/x86_64/target_signal.h
    linux-user/x86_64/termbits.h
    pc-bios/optionrom/optionrom.h
    slirp/mbuf.h
    slirp/misc.h
    slirp/sbuf.h
    slirp/tcp.h
    slirp/tcp_timer.h
    slirp/tcp_var.h
    target/i386/svm.h
    target/sparc/asi.h
    target/xtensa/core-dc232b/xtensa-modules.inc.c
    target/xtensa/core-dc233c/xtensa-modules.inc.c
    target/xtensa/core-de212/core-isa.h
    target/xtensa/core-de212/xtensa-modules.inc.c
    target/xtensa/core-fsf/xtensa-modules.inc.c
    target/xtensa/core-sample_controller/core-isa.h
    target/xtensa/core-sample_controller/xtensa-modules.inc.c
    target/xtensa/core-test_kc705_be/core-isa.h
    target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
    tests/tcg/cris/check_abs.c
    tests/tcg/cris/check_addc.c
    tests/tcg/cris/check_addcm.c
    tests/tcg/cris/check_addoq.c
    tests/tcg/cris/check_bound.c
    tests/tcg/cris/check_ftag.c
    tests/tcg/cris/check_int64.c
    tests/tcg/cris/check_lz.c
    tests/tcg/cris/check_openpf5.c
    tests/tcg/cris/check_sigalrm.c
    tests/tcg/cris/crisutils.h
    tests/tcg/cris/sys.c
    tests/tcg/i386/test-i386-ssse3.c
    ui/vgafont.h

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181213223737.11793-3-pbonzini@redhat.com>
Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:56 +01:00
Paolo Bonzini
eae3eb3e18 qemu/queue.h: simplify reverse access to QTAILQ
The new definition of QTAILQ does not require passing the headname,
remove it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Paolo Bonzini
b58deb344d qemu/queue.h: leave head structs anonymous unless necessary
Most list head structs need not be given a name.  In most cases the
name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV
or reverse iteration, but this does not apply to lists of other kinds,
and even for QTAILQ in practice this is only rarely needed.  In addition,
we will soon reimplement those macros completely so that they do not
need a name for the head struct.  So clean up everything, not giving a
name except in the rare case where it is necessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Richard Henderson
4250da1092 tcg: Improve call argument loading
Free the argument register only after we have verified that the
temporary is not already in that register.  This case is likely
now that we are back propagating the preferred register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:43 +11:00
Richard Henderson
25f49c5f15 tcg: Record register preferences during liveness
With these preferences, we can arrange for function call arguments to
be computed into the proper registers instead of requiring extra moves.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:35 +11:00
Richard Henderson
ae36a246ed tcg: Add TCG_OPF_BB_EXIT
Use this to notice the opcodes that exit the TB, which implies
that local temps are really dead and need not be synced.

Previously we so marked the true end of the TB, but that was
immediately overwritten by the la_bb_end invoked by any
TCG_OPF_BB_END opcode, like exit_tb.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:20 +11:00
Richard Henderson
f65a061c39 tcg: Split out more subroutines from liveness_pass_1
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:17 +11:00
Richard Henderson
2616c80821 tcg: Rename and adjust liveness_pass_1 helpers
No need for a "tcg_" prefix for a static function; we already
have another "la_" prefix for indicating liveness analysis.
Pass in nb_globals and nb_temps, as we will already have them
in registers for other loops within the parent function.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:58:09 +11:00
Richard Henderson
152c35aab4 tcg: Reindent parts of liveness_pass_1
There are two blocks of the form

    if (foo) {
        stuff1;
        goto bar;
    } else {
    baz:
        stuff2;
    }

which have unnecessary and confusing indentation.
Remove the else and unindent stuff2.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:57 +11:00
Richard Henderson
1894f69a61 tcg: Dump register preference info with liveness
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:48 +11:00
Richard Henderson
d62816f2db tcg: Improve register allocation for matching constraints
Try harder to honor the output_pref.  When we're forced to allocate
a second register for the input, it does not need to use the input
constraint; that will be honored by the register we allocate for the
output and a move is already required.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:40 +11:00
Richard Henderson
69e3706d2b tcg: Add output_pref to TCGOp
Allocate storage for, but do not yet fill in, per-opcode
preferences for the output operands.  Pass it in to the
register allocation routines for output operands.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:37 +11:00
Richard Henderson
ba87719cd2 tcg: Add preferred_reg argument to tcg_reg_alloc_do_movi
Pass this through to temp_sync.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:34 +11:00
Richard Henderson
98b4e186c1 tcg: Add preferred_reg argument to temp_sync
Pass this through to tcg_reg_alloc.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:32 +11:00
Richard Henderson
b722452aef tcg: Add preferred_reg argument to temp_load
Pass this through to tcg_reg_alloc.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:29 +11:00
Richard Henderson
b016486e7b tcg: Add preferred_reg argument to tcg_reg_alloc
This new argument will aid register allocation by indicating how
the temporary will be used in future.  If the preference cannot
be satisfied, fall back to the constraints of the current insn.

Short circuit the preference when it cannot be satisfied or if
it does not further constrain the operation.

With an eye toward optimizing function call sequences, optimize
for the preferred_reg set containing a single register.

For the moment, all users pass 0 for preference.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:57:06 +11:00
Richard Henderson
b4fc67c7af tcg: Add reachable_code_pass
Delete trivially dead code that follows unconditional branches and
noreturn helpers.  These can occur either via optimization or via
the structure of a target's translator following an exception.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:30 +11:00
Richard Henderson
d88a117eaa tcg: Reference count labels
Increment when adding branches, and decrement when removing them.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:27 +11:00
Richard Henderson
15d7409260 tcg: Add TCG_CALL_NO_RETURN
Remember which helpers have been marked noreturn.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:24 +11:00
Richard Henderson
3b50352b05 tcg: Renumber TCG_CALL_* flags
Previously, the low 4 bits were used for TCG_CALL_TYPE_MASK,
which was removed in 6a18ae2d29.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:15 +11:00
Alistair Francis
7a5549f2ae tcg/riscv: Add the target init code
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <dd6e439ab81883974b8fd91f904f6de26ab5d697.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
92c041c59b tcg/riscv: Add the prologue generation and register the JIT
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <c4d023127967a0217d8d1eabdf5de6c0e8f8c228.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
bdf503819e tcg/riscv: Add the out op decoder
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <7c47f00cb4a9a777120456e0704b4076a5d943ab.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
03a7d0213d tcg/riscv: Add direct load and store instructions
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <2e047a95c39c007c66cda024c095e29b0ac4c43e.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
efbea94c76 tcg/riscv: Add slowpath load and store instructions
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1a0a7e8f3347764f212c5efa5c07c9be17efdec6.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
15840069e1 tcg/riscv: Add branch and jump instructions
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <c356657e627168d89cb5b012b7e21e4efbbe83f3.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
28ca738e9d tcg/riscv: Add the add2 and sub2 instructions
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <5665a57809e32b35775e8e98fdab898853af37b8.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
61535d4988 tcg/riscv: Add the out load and store instructions
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <d5d88ff29163788938368bbdbd18815d59cef6a0.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
27fd64144b tcg/riscv: Add the extract instructions
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <c4d2afba46efefa9388cf3205fcedbb9a5fa411f.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
6cd2eda39f tcg/riscv: Add the mov and movi instruction
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <bd6a45c73a67b77ddaa2fe590a6bb8ee422b9683.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
dfa8e74f94 tcg/riscv: Add the relocation functions
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <6ac4f4b0d5ea93cb0ee9a3b8b47ee9f7b3711494.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
bedf14e335 tcg/riscv: Add the instruction emitters
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <c740aca183675625bb9cf3ce7b9e8b9d431ca694.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
54a9ce0f68 tcg/riscv: Add the immediate encoders
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <d54dc56303fd1b0d7ed53869de2dbb59b111c7ca.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
8ce23a1312 tcg/riscv: Add support for the constraints
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <dba7315e4e20e879933f72d47ccf98f1cc612b8a.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
505e75c592 tcg/riscv: Add the tcg target registers
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <6e43abaa64361d57b9bc9439820d0e7701f2d47e.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Alistair Francis
fb1f70f368 tcg/riscv: Add the tcg-target.h file
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <a135ee1a88cd7bd08993a519d4d654da27785254.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Emilio G. Cota
ac1043f6d6 tcg: Drop nargs from tcg_op_insert_{before,after}
It's unused since 75e8b9b7aa.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181209193749.12277-9-cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Alistair Francis
161dec9d1b tcg/mips: Improve the add2/sub2 command to use TCG_TARGET_REG_BITS
Instead of hard coding 31 for the shift right use TCG_TARGET_REG_BITS - 1.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <7dfbddf7014a595150aa79011ddb342c3cc17ec3.1544648105.git.alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
e1dcf3529d tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP
For now, defined universally as true, since we previously required
backends to implement swapped memory operations.  Future patches
may now remove that support where it is onerous.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
6498594c8e tcg/optimize: Optimize bswap
Somehow we forgot these operations, once upon a time.
This will allow immediate stores to have their bswap
optimized away.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
9e821eab0a tcg: Clean up generic bswap64
Based on the only current user, Sparc:

New code uses 2 constants that take 2 insns to load from constant pool,
plus 13.  Old code used 6 constants that took 1 or 2 insns to create,
plus 21.  The result is a new total of 17 vs an old total of 29.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
a686dc71d8 tcg: Clean up generic bswap32
Based on the only current user, Sparc:

New code uses 1 constant that takes 2 insns to create, plus 8.
Old code used 2 constants that took 2 insns to create, plus 9.
The result is a new total of 10 vs an old total of 13.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
5785c17f31 tcg/i386: Add setup_guest_base_seg for FreeBSD
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
913c2bddc2 tcg/i386: Precompute all guest_base parameters
These values are constant between all qemu_ld/st invocations;
there is no need to figure this out each time.  If we cannot
use a segment or an offset directly for guest_base, load the
value into a register in the prologue.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
4810d96f03 tcg/i386: Assume 32-bit values are zero-extended
We now have an invariant that all TCG_TYPE_I32 values are
zero-extended, which means that we do not need to extend
them again during qemu_ld/st, either explicitly via a separate
tcg_out_ext32u or implicitly via P_ADDR32.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
75478279a0 tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests
This preserves the invariant that all TCG_TYPE_I32 values are
zero-extended in the 64-bit host register.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
3dbc8c61de tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_path
This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
1d21d95b61 tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct
This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
55dfd8fedc tcg/s390x: Return false on failure from patch_reloc
This does require an extra two checks within the slow paths
to replace the assert that we're moving.  Also add two checks
within existing functions that lacked any kind of assert for
out of range branch.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
d513290351 tcg/ppc: Return false on failure from patch_reloc
The reloc_pc{14,24}_val routines retain their asserts.
Use these directly within the slow paths.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
43fabd30e2 tcg/arm: Return false on failure from patch_reloc
This does require an extra two checks within the slow paths
to replace the assert that we're moving.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
214bfe83d5 tcg/aarch64: Return false on failure from patch_reloc
This does require an extra two checks within the slow paths
to replace the assert that we're moving.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
bec3afd5fc tcg/i386: Return false on failure from patch_reloc
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
6ac1778676 tcg: Return success from patch_reloc
This will move the assert for success from within (subroutines of)
patch_reloc into the callers.  It will also let new code do something
different when a relocation is out of range.

For the moment, all backends are trivially converted to return true.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
8c1b079279 tcg/mips: Remove retranslation code
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
791645f022 tcg/sparc: Remove retranslation code
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
3661612fc3 tcg/s390: Remove retranslation code
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
f9c7246faa tcg/ppc: Fold away "noaddr" branch routines
There is no longer a need for preserving branch offset operands,
as we no longer re-translate.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
37ee93a974 tcg/arm: Fold away "noaddr" branch routines
There are one use apiece for these.  There is no longer a need for
preserving branch offset operands, as we no longer re-translate.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
2672ccc7ee tcg/arm: Remove reloc_pc24_atomic
It is unused since 3fb53fb4d1.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
733589b338 tcg/aarch64: Fold away "noaddr" branch routines
There are one use apiece for these.  There is no longer a need for
preserving branch offset operands, as we no longer re-translate.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
90d6cb7811 tcg/aarch64: Remove reloc_pc26_atomic
It is unused since b68686bd4b.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
66c0285df4 tcg/i386: Move TCG_REG_CALL_STACK from define to enum
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
5740d9f714 tcg/i386: Always use %ebp for TCG_AREG0
For x86_64, this can remove a REX prefix resulting in smaller code
when manipulating globals of type i32, as we move them between backing
store via cpu_env, aka TCG_AREG0.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:43 +03:00
Richard Henderson
f6823cbe37 target/sparc: Remove the constant pool
Partially reverts ab20bdc116.  The 14-bit displacement that we
allowed to reach the constant pool is not always sufficient.
Retain the tb-relative addressing, as that is how most return
values from the tb are computed.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:42 +03:00
Thomas Huth
6fa2cef205 tcg/tcg.h: Remove GCC check for tcg_debug_assert() macro
Both GCC v4.8 and Clang v3.4 (our minimum versions) support
__builtin_unreachable(), so we can remove the version check here now.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-12-12 10:01:12 +01:00
Peter Maydell
a7ce790a02 tcg/tcg-op.h: Add multiple include guard
The tcg-op.h header was missing the usual guard against multiple
inclusion; add it.

(Spotted by lgtm.com's static analyzer.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20181108125256.30986-1-peter.maydell@linaro.org
2018-11-08 15:15:32 +00:00
Richard Henderson
e6cd4bb59b tcg: Split CONFIG_ATOMIC128
GCC7+ will no longer advertise support for 16-byte __atomic operations
if only cmpxchg is supported, as for x86_64.  Fortunately, x86_64 still
has support for __sync_compare_and_swap_16 and we can make use of that.
AArch64 does not have, nor ever has had such support, so open-code it.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 19:46:36 -07:00
Emilio G. Cota
72fd2efbbd tcg: distribute tcg_time into TCG contexts
When we implemented per-vCPU TCG contexts, we forgot to also
distribute the tcg_time counter, which has remained as a global
accessed without any serialization, leading to potentially missed
counts.

Fix it by distributing the field over the TCG contexts, embedding
it into TCGProfile with a field called "cpu_exec_time", which is more
descriptive than "tcg_time". Add a function to query this value
directly, and for completeness, fill in the field in
tcg_profile_snapshot, even though its callers do not use it.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181010144853.13005-5-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Emilio G. Cota
dd1d7da23b tcg: plug holes in struct TCGProfile
This plugs two 4-byte holes in 64-bit.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181010144853.13005-4-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Emilio G. Cota
c1f543b739 tcg: fix use of uninitialized variable under CONFIG_PROFILER
We forgot to initialize n in commit 15fa08f845 ("tcg: Dynamically
allocate TCGOps", 2017-12-29).

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181010144853.13005-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Richard Henderson
d7f425fdea tcg: Implement CPU_LOG_TB_NOCHAIN during expansion
Rather than test NOCHAIN before linking, do not emit the
goto_tb opcode at all.  We already do this for goto_ptr.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Roman Kapl
93bf9a4273 tcg/i386: fix vector operations on 32-bit hosts
The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers.
This was defined as no-op for 32-bit x86, with the assumption that we have
eight registers anyway. This assumption is not true once we have xmm regs.

Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes
and have overflown into other opcode fields, wreaking havoc.

To trigger these problems, you can try running the "movi d8, #0x0" AArch64
instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".

Fixes: 770c2fc7bb ("Add vector operations")
Signed-off-by: Roman Kapl <rka@sysgo.com>
Message-Id: <20180824131734.18557-1-rka@sysgo.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-09-26 09:02:51 -07:00
Richard Henderson
1fb57da72a tcg/optimize: Do not skip default processing of dup_vec
If we do not opimize away dup_vec, we must mark its output as changed.

Fixes: 170ba88f45
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 20180805233258.31892-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-08-06 14:57:48 +01:00
Richard Henderson
672189cd58 tcg/i386: Mark xmm registers call-clobbered
When host vector registers and operations were introduced, I failed
to mark the registers call clobbered as required by the ABI.

Fixes: 770c2fc7bb
Cc: qemu-stable@nongnu.org
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-07-23 09:21:14 -07:00
Alex Bennée
e65a5f227d tcg/aarch64: limit mul_vec size
In AdvSIMD we can only do 32x32 integer multiples although SVE is
capable of larger 64 bit multiples. As a result we can end up
generating invalid opcodes. Fix this by only reprting we can emit
mul vector ops if the size is small enough.

Fixes a crash on:

  sve-all-short-v8.3+sve@vq3/insn_mul_z_zi___INC.risu.bin

When running on AArch64 hardware.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20180719154248.29669-1-alex.bennee@linaro.org>
[rth: Removed the tcg_debug_assert -- there are plenty of other
cases that we do not diagnose within the insn encoding helpers.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-07-19 09:07:31 -07:00
Richard Henderson
499748d768 tcg: Restrict check_size_impl to multiples of the line size
Normally this is automatic in the size restrictions that are placed
on vector sizes coming from the implementation.  However, for the
legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final
24 bytes of the vector register.  Without this check, do_dup selects
TCG_TYPE_V128 and clears only 16 bytes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180705191929.30773-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-09 14:51:34 +01:00
Richard Henderson
9f75462065 tcg: Reduce max TB opcode count
Also, assert that we don't overflow any of two different offsets into
the TB. Both unwind and goto_tb both record a uint16_t for later use.

This fixes an arm-softmmu test case utilizing NEON in which there is
a TB generated that runs to 7800 opcodes, and compiles to 96k on an
x86_64 host.  This overflows the 16-bit offset in which we record the
goto_tb reset offset.  Because of that overflow, we install a jump
destination that goes to neverland.  Boom.

With this reduced op count, the same TB compiles to about 48k for
aarch64, ppc64le, and x86_64 hosts, and neither assertion fires.

Cc: qemu-stable@nongnu.org
Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 09:39:53 -10:00
Emilio G. Cota
0ac20318ce tcg: remove tb_lock
Use mmap_lock in user-mode to protect TCG state and the page descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks needed.
Per-page locks are used to protect the page descriptors.

Per-TB locks are used in both modes to protect TB jumps.

Some notes:

- tb_lock is removed from notdirty_mem_write by passing a
  locked page_collection to tb_invalidate_phys_page_fast.

- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
  so there is no need to further serialize access to them.

- do_tb_flush is run in a safe async context, meaning no other
  vCPU threads are running. Therefore acquiring mmap_lock there
  is just to please tools such as thread sanitizer.

- Not visible in the diff, but tb_invalidate_phys_page already
  has an assert_memory_lock.

- cpu_io_recompile is !user-only, so no mmap_lock there.

- Added mmap_unlock()'s before all siglongjmp's that could
  be called in user-mode while mmap_lock is held.
  + Added an assert for !have_mmap_lock() after returning from
    the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.

Performance numbers before/after:

Host: AMD Opteron(tm) Processor 6376

                 ubuntu 17.04 ppc64 bootup+shutdown time

  700 +-+--+----+------+------------+-----------+------------*--+-+
      |    +    +      +            +           +           *B    |
      |         before ***B***                            ** *    |
      |tb lock removal ###D###                         ***        |
  600 +-+                                           ***         +-+
      |                                           **         #    |
      |                                        *B*          #D    |
      |                                     *** *         ##      |
  500 +-+                                ***           ###      +-+
      |                             * ***           ###           |
      |                            *B*          # ##              |
      |                          ** *          #D#                |
  400 +-+                      **            ##                 +-+
      |                      **           ###                     |
      |                    **           ##                        |
      |                  **         # ##                          |
  300 +-+  *           B*          #D#                          +-+
      |    B         ***        ###                               |
      |    *       **       ####                                  |
      |     *   ***      ###                                      |
  200 +-+   B  *B     #D#                                       +-+
      |     #B* *   ## #                                          |
      |     #*    ##                                              |
      |    + D##D#     +            +           +            +    |
  100 +-+--+----+------+------------+-----------+------------+--+-+
           1    8      16      Guest CPUs       48           64
  png: https://imgur.com/HwmBHXe

              debian jessie aarch64 bootup+shutdown time

  90 +-+--+-----+-----+------------+------------+------------+--+-+
     |    +     +     +            +            +            +    |
     |         before ***B***                                B    |
  80 +tb lock removal ###D###                              **D  +-+
     |                                                   **###    |
     |                                                 **##       |
  70 +-+                                             ** #       +-+
     |                                             ** ##          |
     |                                           **  #            |
  60 +-+                                       *B  ##           +-+
     |                                       **  ##               |
     |                                    ***  #D                 |
  50 +-+                               ***   ##                 +-+
     |                             * **   ###                     |
     |                           **B*  ###                        |
  40 +-+                     ****  # ##                         +-+
     |                   ****     #D#                             |
     |             ***B**      ###                                |
  30 +-+    B***B**        ####                                 +-+
     |    B *   *     # ###                                       |
     |     B       ###D#                                          |
  20 +-+   D  ##D##                                             +-+
     |      D#                                                    |
     |    +     +     +            +            +            +    |
  10 +-+--+-----+-----+------------+------------+------------+--+-+
          1     8     16      Guest CPUs        48           64
  png: https://imgur.com/iGpGFtv

The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 08:18:48 -10:00
Emilio G. Cota
128ed2278c tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx
Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota
be2cdc5e35 tcg: track TBs with per-region BST's
This paves the way for enabling scalable parallel generation of TCG code.

Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.

The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.

Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.

Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().

As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
John Arbuckle
1019242af1 tcg/i386: Use byte form of xgetbv instruction
The assembler in most versions of Mac OS X is pretty old and does not
support the xgetbv instruction.  To go around this problem, the raw
encoding of the instruction is used instead.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Message-Id: <20180604215102.11002-1-programmingkidx@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Peter Maydell
163670542f tcg-next queue
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbEdLqAAoJEGTfOOivfiFfQaEH/Rq96S5bo94495KmRJY9e/jw
 lV321YYI7nx7sHtViG/B3iTkvnxzZPWcc7XbBMxyV5xmMQ/5zjS/ynZPFyy/cYRn
 zLM4W0SJ38EqhHTZpkkvw9Nle8UbNWKm5PgND2TyE4hmeuQ98OrQ6Y1GvP4MFpXs
 uQErbmMjYHMq7thbfCO6ulJjjEliRy3AJ2C3fCCCUgBQrJt6JeqbGr/Zzi2y88M9
 IhoK8RbJiWT2O5Tl95q2NOQvr11WbFlu/K0nuaVgbfTwd2tp3ygmRKPpeZ24qA52
 qtwgcIjWHHkkC5s1qaP8oW4FtoMQZdsaOwSOPw0ZBnG+VA7P/h33fWr9f5SistA=
 =UVdE
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/rth/tags/tcg-next-pull-request' into staging

tcg-next queue

# gpg: Signature made Sat 02 Jun 2018 00:12:42 BST
# gpg:                using RSA key 64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/tcg-next-pull-request:
  tcg: Pass tb and index to tcg_gen_exit_tb separately

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04 11:28:31 +01:00
Richard Henderson
07ea28b418 tcg: Pass tb and index to tcg_gen_exit_tb separately
Do the cast to uintptr_t within the helper, so that the compiler
can type check the pointer argument.  We can also do some more
sanity checking of the index argument.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-01 15:15:27 -07:00
Philippe Mathieu-Daudé
23c11b04dc target: Do not include "exec/exec-all.h" if it is not necessary
Code change produced with:
    $ git grep '#include "exec/exec-all.h"' | \
      cut -d: -f-1 | \
      xargs egrep -L "(cpu_address_space_init|cpu_loop_|tlb_|tb_|GETPC|singlestep|TranslationBlock)" | \
      xargs sed -i.bak '/#include "exec\/exec-all.h"/d'

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180528232719.4721-10-f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-01 14:15:10 +02:00
Emilio G. Cota
1d34982155 tcg: fix s/compliment/complement/ typos
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2018-05-20 08:25:23 +03:00
Peter Maydell
f5583c527f target-arm queue:
* hw/arm/iotkit.c: fix minor memory leak
  * softfloat: fix wrong-exception-flags bug for multiply-add corner case
  * arm: isolate and clean up DTB generation
  * implement Arm v8.1-Atomics extension
  * Fix some bugs and missing instructions in the v8.2-FP16 extension
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJa9IUCAAoJEDwlJe0UNgzeEGMQAKKjVRzZ7MBgvxQj0FJSWhSP
 BZkATf3ktid255PRpIssBZiY9oM+uY6n+/IRozAGvfDBp9eQOkrZczZjfW5hpe0B
 YsQadtk5cUOXqQzRTegSMPOoMmz8f5GaGOk4R6AEXJEX+Rug/zbOn9Q8Yx7JTd7o
 yBvU1+fys3galSiB88cffA95B9fwGfLsM7rP6OC4yNdUBYwjHf3wtY53WsxtWqX9
 oX4keEiROQkrOfbSy9wYPZzu/0iRo8v35+7wIZhvNSlf02k6yJ7a+w0C4EQIRhWm
 5zciE+aMYr7nOGpj7AEJLrRekhwnD6Ppje6aUd15yrxfNRZkpk/FeECWnaOPDis7
 QNijx5Zqg6+GyItQKi5U4vFVReMj09OB7xDyAq77xDeBj4l3lg2DNkRfRhqQZAcv
 2r4EW+pfLNj76Ah1qtQ410fprw462Sopb6bHmeuFbf1QFbQvJ4CL1+7Jl3ExrDX4
 2+iQb4sQghWDxhDLfRSLxQ7K+bX+mNfGdFW8h+jPShD/+JY42dTKkFZEl4ghNgMD
 mpj8FrQuIkSMqnDmPfoTG5MVTMERacqPU7GGM7/fxudIkByO3zTiLxJ/E+Iy8HvX
 29xKoOBjKT5FJrwJABsN6VpA3EuyAARgQIZ/dd6N5GZdgn2KAIHuaI+RHFOesKFd
 dJGM6sdksnsAAz28aUEJ
 =uXY+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20180510' into staging

target-arm queue:
 * hw/arm/iotkit.c: fix minor memory leak
 * softfloat: fix wrong-exception-flags bug for multiply-add corner case
 * arm: isolate and clean up DTB generation
 * implement Arm v8.1-Atomics extension
 * Fix some bugs and missing instructions in the v8.2-FP16 extension

# gpg: Signature made Thu 10 May 2018 18:44:34 BST
# gpg:                using RSA key 3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20180510: (21 commits)
  target/arm: Clear SVE high bits for FMOV
  target/arm: Fix float16 to/from int16
  target/arm: Implement vector shifted FCVT for fp16
  target/arm: Implement vector shifted SCVF/UCVF for fp16
  target/arm: Enable ARM_FEATURE_V8_ATOMICS for user-only
  target/arm: Implement CAS and CASP
  target/arm: Fill in disas_ldst_atomic
  target/arm: Introduce ARM_FEATURE_V8_ATOMICS and initial decode
  target/riscv: Use new atomic min/max expanders
  tcg: Use GEN_ATOMIC_HELPER_FN for opposite endian atomic add
  tcg: Introduce atomic helpers for integer min/max
  target/xtensa: Use new min/max expanders
  target/arm: Use new min/max expanders
  tcg: Introduce helpers for integer min/max
  atomic.h: Work around gcc spurious "unused value" warning
  make sure that we aren't overwriting mc->get_hotplug_handler by accident
  arm/boot: split load_dtb() from arm_load_kernel()
  platform-bus-device: use device plug callback instead of machine_done notifier
  pc: simplify MachineClass::get_hotplug_handler handling
  softfloat: Handle default NaN mode after pickNaNMulAdd, not before
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	target/riscv/translate.c
2018-05-11 17:41:54 +01:00
Richard Henderson
5507c2bf35 tcg: Introduce atomic helpers for integer min/max
Given that this atomic operation will be used by both risc-v
and aarch64, let's not duplicate code across the two targets.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180508151437.4232-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-10 18:10:57 +01:00
Richard Henderson
b87fb8cd9f tcg: Introduce helpers for integer min/max
These operations are re-invented by several targets so far.
Several supported hosts have insns for these, so place the
expanders out-of-line for a future introduction of tcg opcodes.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180508151437.4232-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-10 18:10:57 +01:00
Richard Henderson
abebf92597 tcg: Limit the number of ops in a TB
In 6001f7729e we partially attempt to address the branch
displacement overflow caused by 15fa08f845.

However, gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c
is a testcase that contains a TB so large as to overflow anyway.
The limit here of 8000 ops produces a maximum output TB size of
24112 bytes on a ppc64le host with that test case.  This is still
much less than the maximum forward branch distance of 32764 bytes.

Cc: qemu-stable@nongnu.org
Fixes: 15fa08f845 ("tcg: Dynamically allocate TCGOps")
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-05-09 08:30:57 -07:00
Peter Maydell
7eb30ef0ba tcg/i386: Fix dup_vec in non-AVX2 codepath
The VPUNPCKLD* instructions are all "non-destructive source",
indicated by "NDS" in the encoding string in the x86 ISA manual.
This means that they take two source operands, one of which is
encoded in the VEX.vvvv field. We were incorrectly treating them
as if they were destructive-source and passing 0 as the 'v'
argument of tcg_out_vex_modrm(). This meant we were always
using %xmm0 as one of the source operands, causing incorrect
results if the register allocator happened to want to use
something else. For instance the input AArch64 insn:
 DUP v26.16b, w21
which becomes TCG IR ops:
 dup_vec v128,e8,tmp2,x21
 st_vec v128,e8,tmp2,env,$0xa40
was assembled to:
0x607c568c:  c4 c1 7a 7e 86 e8 00 00  vmovq    0xe8(%r14), %xmm0
0x607c5694:  00
0x607c5695:  c5 f9 60 c8              vpunpcklbw %xmm0, %xmm0, %xmm1
0x607c5699:  c5 f9 61 c9              vpunpcklwd %xmm1, %xmm0, %xmm1
0x607c569d:  c5 f9 70 c9 00           vpshufd  $0, %xmm1, %xmm1
0x607c56a2:  c4 c1 7a 7f 8e 40 0a 00  vmovdqu  %xmm1, 0xa40(%r14)
0x607c56aa:  00

when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1".
This resulted in our incorrectly setting the output vector to
q26=0000320000003200:0000320000003200
when given an input of x21 == 0000000002803200
rather than the expected all-zeroes.

Pass the correct source register number to tcg_out_vex_modrm()
for these insns.

Fixes: 770c2fc7bb
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180504153431.5169-1-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-05-09 08:30:57 -07:00
Laurent Vivier
6001f7729e tcg: workaround branch instruction overflow in tcg_out_qemu_ld/st
ppc64 uses a BC instruction to call the tcg_out_qemu_ld/st
slow path. BC instruction uses a relative address encoded
on 14 bits.

The slow path functions are added at the end of the generated
instructions buffer, in the reverse order of the callers.
So more we have slow path functions more the distance between
the caller (BC) and the function increases.

This patch changes the behavior to generate the functions in
the same order of the callers.

Cc: qemu-stable@nongnu.org
Fixes: 15fa08f845 ("tcg: Dynamically allocate TCGOps")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20180429235840.16659-1-lvivier@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-05-01 11:56:55 -07:00
Richard Henderson
5bfa803448 tcg: Improve TCGv_ptr support
Drop TCGV_PTR_TO_NAT and TCGV_NAT_TO_PTR internal macros.

Add tcg_temp_local_new_ptr, tcg_gen_brcondi_ptr, tcg_gen_ext_i32_ptr,
tcg_gen_trunc_i64_ptr, tcg_gen_extu_ptr_i64, tcg_gen_trunc_ptr_i32.

Use inlines instead of macros where possible.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-05-01 11:56:16 -07:00
Richard Henderson
9a938d86b0 tcg: Allow wider vectors for cmp and mul
In db432672, we allow wide inputs for operations such as add.
However, in 212be173 and 3774030a we didn't do the same for
compare and multiply.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-05-01 11:56:16 -07:00
Henry Wertz
3f814b8037 tcg/arm: Fix memory barrier encoding
I found with qemu 2.11.x or newer that I would get an illegal instruction
error running some Intel binaries on my ARM chromebook.  On investigation,
I found it was quitting on memory barriers.

qemu instruction:
mb $0x31
was translating as:
0x604050cc:  5bf07ff5  blpl     #0x600250a8

After patch it gives:
0x604050cc:  f57ff05b  dmb      ish

In short, I found INSN_DMB_ISH (memory barrier for ARMv7) appeared to be
correct based on online docs, but due to some endian-related shenanigans it
had to be byte-swapped to suit qemu; it appears INSN_DMB_MCR (memory
barrier for ARMv6) also should be byte swapped  (and this patch does so).
I have not checked for correctness of aarch64's barrier instruction.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Henry Wertz <hwertz10@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-05-01 11:56:07 -07:00
Richard Henderson
d103021269 tcg: Document INDEX_mul[us]h_*
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-05-01 11:41:56 -07:00
Peter Maydell
161dfd1e7f tcg/mips: Handle large offsets from target env to tlb_table
The MIPS TCG target makes the assumption that the offset from the
target env pointer to the tlb_table is less than about 64K. This
used to be true, but gradual addition of features to the Arm
target means that it's no longer true there. This results in
the build-time assertion failing:

In file included from /home/pm215/qemu/include/qemu/osdep.h:36:0,
                 from /home/pm215/qemu/tcg/tcg.c:28:
/home/pm215/qemu/tcg/mips/tcg-target.inc.c: In function ‘tcg_out_tlb_load’:
/home/pm215/qemu/include/qemu/compiler.h:90:36: error: static assertion failed: "not expecting: offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1]) > 0x7ff0 + 0x7fff"
 #define QEMU_BUILD_BUG_MSG(x, msg) _Static_assert(!(x), msg)
                                    ^
/home/pm215/qemu/include/qemu/compiler.h:98:30: note: in expansion of macro ‘QEMU_BUILD_BUG_MSG’
 #define QEMU_BUILD_BUG_ON(x) QEMU_BUILD_BUG_MSG(x, "not expecting: " #x)
                              ^
/home/pm215/qemu/tcg/mips/tcg-target.inc.c:1236:9: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’
         QEMU_BUILD_BUG_ON(offsetof(CPUArchState,
         ^
/home/pm215/qemu/rules.mak:66: recipe for target 'tcg/tcg.o' failed

An ideal long term approach would be to rearrange the CPU state
so that the tlb_table was not so far along it, but this is tricky
because it would move it from the "not cleared on CPU reset" part
of the struct to the "cleared on CPU reset" part. As a simple fix
for the 2.12 release, make the MIPS TCG target handle an arbitrary
offset by emitting more add instructions. This will mean an extra
instruction in the fastpath for TCG loads and stores for the
affected guests (currently just aarch64-softmmu).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20180413142336.32163-1-peter.maydell@linaro.org
2018-04-16 11:51:37 +01:00
Richard Henderson
9743cd5736 tcg: Introduce tcg_set_insn_start_param
The parameters for tcg_gen_insn_start are target_ulong, which may be split
into two TCGArg parameters for storage in the opcode on 32-bit hosts.

Fixes the ARM target and its direct use of tcg_set_insn_param, which would
set the wrong argument in the 64-on-32 case.

Cc: qemu-stable@nongnu.org
Reported-by: alarson@ddci.com
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180410003558.2470-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-04-10 13:02:26 +01:00
Richard Henderson
f2f1dde751 tcg: Mark muluh_i64 and mulsh_i64 as 64-bit ops
Failure to do so results in the tcg optimizer sign-extending
any constant fold from 32-bits.  This turns out to be visible
in the RISC-V testsuite using a host that emits these opcodes
(e.g. any non-x86_64).

Reported-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-03-28 12:45:16 +08:00
Richard Henderson
adb196cbd5 tcg: Add choose_vector_size
This unifies 5 copies of checks for supported vector size,
and in the process fixes a missing check in tcg_gen_gvec_2s.

This lead to an assertion failure for 64-bit vector multiply,
which is not available in the AVX instruction set.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-03-16 00:55:04 +08:00
Richard Henderson
7f34ed4bcd tcg/i386: Support INDEX_op_dup2_vec for -m32
Unknown why -m32 was passing with gcc but not clang; it should have
failed for both.  This would be used for tcg_gen_dup_i64_vec, and
visible with the right TB and an aarch64 guest.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-03-16 00:55:04 +08:00
Richard Henderson
b2e3ae9452 tcg: Improve tcg_gen_muli_i32/i64
Convert multiplication by power of two to left shift.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-03-16 00:55:04 +08:00
Richard Henderson
14e4c1e235 tcg/aarch64: Add vector operations
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:08 +00:00
Richard Henderson
770c2fc7bb tcg/i386: Add vector operations
The x86 vector instruction set is extremely irregular.  With newer
editions, Intel has filled in some of the blanks.  However, we don't
get many 64-bit operations until SSE4.2, introduced in 2009.

The subsequent edition was for AVX1, introduced in 2011, which added
three-operand addressing, and adjusts how all instructions should be
encoded.

Given the relatively narrow 2 year window between possible to support
and desirable to support, and to vastly simplify code maintainence,
I am only planning to support AVX1 and later cpus.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:08 +00:00
Richard Henderson
170ba88f45 tcg/optimize: Handle vector opcodes during optimize
Trivial move and constant propagation.  Some identity and constant
function folding, but nothing that requires knowledge of the size
of the vector element.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:06 +00:00
Richard Henderson
22fc352703 tcg: Add generic vector helpers with a scalar operand
Use dup to convert a non-constant scalar to a third vector.

Add addition, multiplication, and logical operations with an immediate.
Add addition, subtraction, multiplication, and logical operations with
a non-constant scalar.  Allow for the front-end to build operations in
which the scalar operand comes first.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:06 +00:00
Richard Henderson
f49b12c6e6 tcg: Add generic helpers for saturating arithmetic
No vector ops as yet.  SSE only has direct support for 8- and 16-bit
saturation; handling 32- and 64-bit saturation is much more expensive.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:06 +00:00
Richard Henderson
3774030a3e tcg: Add generic vector ops for multiplication
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:06 +00:00
Richard Henderson
212be173f0 tcg: Add generic vector ops for comparisons
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:05 +00:00
Richard Henderson
d0ec97967f tcg: Add generic vector ops for constant shifts
Opcodes are added for scalar and vector shifts, but considering the
varied semantics of these do not expose them to the front ends.  Do
go ahead and provide them in case they are needed for backend expansion.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:05 +00:00
Richard Henderson
db432672dc tcg: Add generic vector expanders
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:05 +00:00
Richard Henderson
474b2e8f0f tcg: Standardize integral arguments to expanders
Some functions use intN_t arguments, some use uintN_t, some just
used "unsigned".  To aid putting function pointers in tables, we
need consistency.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:05 +00:00
Richard Henderson
d2fd745fe8 tcg: Add types and basic operations for host vectors
Nothing uses or enables them yet.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:04 +00:00
Richard Henderson
da73a4abca tcg: Allow multiple word entries into the constant pool
This will be required for storing vector constants.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:53:34 +00:00
Richard Henderson
030ffe39dd tcg/ppc: Allow a 32-bit offset to the constant pool
We recently relaxed the limit of the number of opcodes that can
appear in a TranslationBlock.  In certain cases this has resulted
in relocation overflow.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-01-16 08:21:56 -08:00
Richard Henderson
4a64e0fd68 tcg/ppc: Support tlb offsets larger than 64k
AArch64 with SVE has an offset of 80k to the 8th TLB.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-01-16 08:21:56 -08:00
Richard Henderson
71f9cee9d0 tcg/arm: Support tlb offsets larger than 64k
AArch64 with SVE has an offset of 80k to the 8th TLB.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-01-16 08:21:56 -08:00
Richard Henderson
7170ac3313 tcg/arm: Fix double-word comparisons
The code sequence we were generating was only good for unsigned
comparisons.  For signed comparisions, use the sequence from gcc.

Fixes booting of ppc64 firmware, with a patch changing the code
sequence for ppc comparisons.

Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-01-16 08:20:39 -08:00
Richard Henderson
1df3caa946 tcg: Allow 6 arguments to TCG helpers
We already handle this in the backends, and the lifetime datum
for the TCGOp is already large enough.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29 12:43:40 -08:00
Richard Henderson
923ed17501 tcg: Add tcg_signed_cond
Complimenting the existing tcg_unsigned_cond.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29 12:43:40 -08:00
Richard Henderson
cd9090aa9d tcg: Generalize TCGOp parameters
We had two fields specific to INDEX_op_call.  Rename these and
add some macros so that the fields may be reused for other opcodes.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29 12:43:39 -08:00
Richard Henderson
15fa08f845 tcg: Dynamically allocate TCGOps
With no fixed array allocation, we can't overflow a buffer.
This will be important as optimizations related to host vectors
may expand the number of ops used.

Use QTAILQ to link the ops together.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29 12:43:39 -08:00
Richard Henderson
f764718d0c tcg: Remove TCGV_UNUSED* and TCGV_IS_UNUSED*
These are now trivial sets and tests against NULL.  Unwrap.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29 12:43:39 -08:00
Richard Henderson
ba2c747992 tcg/s390x: Use constant pool for prologue
Rather than have separate code only used for guest_base,
rely on a recent change to handle constant pool entries.

Cc: qemu-s390x@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-11-03 09:33:45 +01:00
Richard Henderson
5b38ee3161 tcg: Allow constant pool entries in the prologue
Both ARMv6 and AArch64 currently may drop complex guest_base values
into the constant pool.  But generic code wasn't expecting that, and
the pool is not emitted.  Correct that.

Tested-by: Emilio G. Cota <cota@braap.org>
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-11-03 09:33:45 +01:00
Richard Henderson
1c2adb958f tcg: Initialize cpu_env generically
This is identical for each target.  So, move the initialization to
common code.  Move the variable itself out of tcg_ctx and name it
cpu_env to minimize changes within targets.

This also means we can remove tcg_global_reg_new_{ptr,i32,i64},
since there are no longer global-register temps created by targets.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
3468b59e18 tcg: enable multiple TCG contexts in softmmu
This enables parallel TCG code generation. However, we do not take
advantage of it yet since tb_lock is still held during tb_gen_code.

In user-mode we use a single TCG context; see the documentation
added to tcg_region_init for the rationale.

Note that targets do not need any conversion: targets initialize a
TCGContext (e.g. defining TCG globals), and after this initialization
has finished, the context is cloned by the vCPU threads, each of
them keeping a separate copy.

TCG threads claim one entry in tcg_ctxs[] by atomically increasing
n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
of that variable and tcg_ctxs; they are there just to play nice with
analysis tools such as thread sanitizer.

Note that we do not allocate an array of contexts (we allocate
an array of pointers instead) because when tcg_context_init
is called, we do not know yet how many contexts we'll use since
the bool behind qemu_tcg_mttcg_enabled() isn't set yet.

Previous patches folded some TCG globals into TCGContext. The non-const
globals remaining are only set at init time, i.e. before the TCG
threads are spawned. Here is a list of these set-at-init-time globals
under tcg/:

Only written by tcg_context_init:
- indirect_reg_alloc_order
- tcg_op_defs
Only written by tcg_target_init (called from tcg_context_init):
- tcg_target_available_regs
- tcg_target_call_clobber_regs
- arm: arm_arch, use_idiv_instructions
- i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
        have_movbe, have_popcnt
- mips: use_movnz_instructions, use_mips32_instructions,
        use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
- ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
- s390: tb_ret_addr, s390_facilities
- sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
         use_vis3_instructions

Only written by tcg_prologue_init:
- 'struct jit_code_entry one_entry'
- aarch64: tb_ret_addr
- arm: tb_ret_addr
- i386: tb_ret_addr, guest_base_flags
- ia64: tb_ret_addr
- mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
e8feb96fcc tcg: introduce regions to split code_gen_buffer
This is groundwork for supporting multiple TCG contexts.

The naive solution here is to split code_gen_buffer statically
among the TCG threads; this however results in poor utilization
if translation needs are different across TCG threads.

What we do here is to add an extra layer of indirection, assigning
regions that act just like pages do in virtual memory allocation.
(BTW if you are wondering about the chosen naming, I did not want
to use blocks or pages because those are already heavily used in QEMU).

We use a global lock to serialize allocations as well as statistics
reporting (we now export the size of the used code_gen_buffer with
tcg_code_size()). Note that for the allocator we could just use
a counter and atomic_inc; however, that would complicate the gathering
of tcg_code_size()-like stats. So given that the region operations are
not a fast path, a lock seems the most reasonable choice.

The effectiveness of this approach is clear after seeing some numbers.
I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
Note that I'm evaluating this after enabling per-thread TCG (which
is done by a subsequent commit).

* -smp 1, 1 region (entire buffer):
    qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
    qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
    qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
    qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
    qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
    qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
    qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
    qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367

That is, 8 flushes.

* -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:

    qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
    qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
    qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
    qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
    qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
    qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
    qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
    qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368

Again, 8 flushes. Note how buffer utilization is not 100%, but it
is close. Smaller region sizes would yield higher utilization,
but we want region allocation to be rare (it acquires a lock), so
we do not want to go too small.

* -smp 8, static partitioning of 8 regions (10 MB per region):
    qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
    qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
    qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
    qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
    qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
    qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
    qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
    qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
    qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
    qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
    qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
    qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
    qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
    qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372
    qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
    qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
    qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
    qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378
    qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
    qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364

That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
34184b0718 tcg: allocate optimizer temps with tcg_malloc
Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of using a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 1.12% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

             exec time (s)  Relative slowdown wrt original (%)
---------------------------------------------------------------
 original     20.213321616                                  0.
 tcg_malloc   20.441130078                           1.1270214
 TCGContext   20.477846517                           1.3086662
 g_malloc     20.780527895                           2.8061013

The other two alternatives shown in the table are:
- TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps
  in TCGContext. This is simple enough but it isn't faster than using
  tcg_malloc; moreover, it wastes memory.
- g_malloc: allocate/deallocate both temps and used_temps every time
  tcg_optimize is executed.

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
c3fac1138e tcg: distribute profiling counters across TCGContext's
This is groundwork for supporting multiple TCG contexts.

To avoid scalability issues when profiling info is enabled, this patch
makes the profiling info counters distributed via the following changes:

1) Consolidate profile info into its own struct, TCGProfile, which
   TCGContext also includes. Note that tcg_table_op_count is brought
   into TCGProfile after dropping the tcg_ prefix.
2) Iterate over the TCG contexts in the system to obtain the total counts.

This change also requires updating the accessors to TCGProfile fields to
use atomic_read/set whenever there may be conflicting accesses (as defined
in C11) to them.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
df2cce2968 tcg: introduce **tcg_ctxs to keep track of all TCGContext's
Groundwork for supporting multiple TCG contexts.

Note that having n_tcg_ctxs is unnecessary. However, it is
convenient to have it, since it will simplify iterating over the
array: we'll have just a for loop instead of having to iterate
over a NULL-terminated array (which would require n+1 elems)
or having to check with ifdef's for usermode/softmmu.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
26689780f8 gen-icount: fold exitreq_label into TCGContext
Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
b1311c4acf tcg: define tcg_init_ctx and make tcg_ctx a pointer
Groundwork for supporting multiple TCG contexts.

The core of this patch is this change to tcg/tcg.h:

> -extern TCGContext tcg_ctx;
> +extern TCGContext tcg_init_ctx;
> +extern TCGContext *tcg_ctx;

Note that for now we set *tcg_ctx to whatever TCGContext is passed
to tcg_context_init -- in this case &tcg_init_ctx.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
44ded3d048 tcg: take tb_ctx out of TCGContext
Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
e82d5a2460 tcg: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state
of the system.

The tb->cflags field is not passed to tcg generation functions. So
we add a field to TCGContext, storing there a copy of tb->cflags.

Most architectures have <= 32 registers, which results in a 4-byte hole
in TCGContext. Use this hole for the new field.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Emilio G. Cota
4e2ca83e71 tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.

Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:41 -07:00
Richard Henderson
e89b28a635 tcg: Use offsets not indices for TCGv_*
Using the offset of a temporary, relative to TCGContext, rather than
its index means that we don't use 0.  That leaves offset 0 free for
a NULL representation without having to leave index 0 unused.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:36 -07:00
Richard Henderson
11f4e8f8bf tcg: Remove TCGV_EQUAL*
When we used structures for TCGv_*, we needed a macro in order to
perform a comparison.  Now that we use pointers, this is just clutter.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 21:50:15 +02:00
Richard Henderson
dc41aa7d34 tcg: Remove GET_TCGV_* and MAKE_TCGV_*
The GET and MAKE functions weren't really specific enough.
We now have a full complement of functions that convert exactly
between temporaries, arguments, tcgv pointers, and indices.

The target/sparc change is also a bug fix, which would have affected
a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 21:49:30 +02:00
Richard Henderson
085272b35e tcg: Introduce temp_tcgv_{i32,i64,ptr}
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 21:48:59 +02:00
Richard Henderson
ae8b75dc6e tcg: Introduce tcgv_{i32,i64,ptr}_{arg,temp}
Transform TCGv_* to an "argument" or a temporary.
For now, an argument is simply the temporary index.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 21:47:46 +02:00
Richard Henderson
960c50e077 tcg: Push tcg_ctx into tcg_gen_callN
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 21:47:29 +02:00
Richard Henderson
b7e8b17a77 tcg: Push tcg_ctx into generator functions
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 21:45:07 +02:00
Richard Henderson
6349039d0b tcg: Use per-temp state data in optimize
While we're touching many of the lines anyway, adjust the naming
of the functions to better distinguish when "TCGArg" vs "TCGTemp"
should be used.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:45:07 +02:00
Richard Henderson
54534d7cfd tcg: Remove unused TCG_CALL_DUMMY_TCGV
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:45:07 +02:00
Richard Henderson
2272e4a791 tcg: Change temp_allocate_frame arg to TCGTemp
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:44:52 +02:00
Richard Henderson
ac3b88911e tcg: Avoid loops against variable bounds
Copy s->nb_globals or s->nb_temps to a local variable for the purposes
of iteration.  This should allow the compiler to use low-overhead
looping constructs on some hosts.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:44:34 +02:00
Richard Henderson
b83eabeac0 tcg: Use per-temp state data in liveness
This avoids having to allocate external memory for each temporary.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:44:34 +02:00
Richard Henderson
1807f4c400 tcg: Introduce temp_arg, export temp_idx
At the same time, drop the TCGContext argument and use tcg_ctx instead.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:44:12 +02:00
Richard Henderson
c6c7d84df8 tcg: Return NULL temp for TCG_CALL_DUMMY_ARG
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:44:02 +02:00
Richard Henderson
fa477d2547 tcg: Add temp_global bit to TCGTemp
This avoids needing to test the index of a temp against nb_globals.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:43:50 +02:00
Richard Henderson
434391390b tcg: Introduce arg_temp
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:43:36 +02:00
Richard Henderson
dd18629201 tcg: Propagate TCGOp down to allocators
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:34:47 +02:00
Richard Henderson
efee3746fa tcg: Propagate args to op->args in tcg.c
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:34:47 +02:00
Richard Henderson
acd937019b tcg: Propagate args to op->args in optimizer
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:34:47 +02:00
Richard Henderson
75e8b9b7aa tcg: Merge opcode arguments into TCGOp
Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries.  The result is actually a bit
smaller and should have slightly more cache locality.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:34:47 +02:00
Jiang Biao
8df8d529ed tcg/mips: delete commented out extern keyword.
Delete commented out extern keyword on link_error().

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Message-Id: <1506762042-32145-1-git-send-email-jiang.biao2@zte.com.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 09:45:01 -07:00
Emilio G. Cota
a505785cd2 tcg: define TCG_HIGHWATER
Will come in handy very soon.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 09:45:00 -07:00
Emilio G. Cota
619205fd1f tcg: take .helpers out of TCGContext
Groundwork for supporting multiple TCG contexts.

The hash table becomes read-only after it is filled in,
so we can save space by keeping just a global pointer to it.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 07:37:10 -07:00
Emilio G. Cota
5e75150cdf tci: move tci_regs to tcg_qemu_tb_exec's stack
Groundwork for supporting multiple TCG contexts.

Compile-tested for all targets on an x86_64 host.

Suggested-by: Richard Henderson <rth@twiddle.net>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 07:37:10 -07:00
Emilio G. Cota
e7e168f413 exec-all: extract tb->tc_* into a separate struct tc_tb
In preparation for adding tc.size to be able to keep track of
TB's using the binary search tree implementation from glib.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 07:37:10 -07:00
Emilio G. Cota
7f11636dbe tcg: remove addr argument from lookup_tb_ptr
It is unlikely that we will ever want to call this helper passing
an argument other than the current PC. So just remove the argument,
and use the pc we already get from cpu_get_tb_cpu_state.

This change paves the way to having a common "tb_lookup" function.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 07:37:10 -07:00
Emilio G. Cota
d453ec7825 tcg/mips: constify tcg_target_callee_save_regs
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 07:37:10 -07:00
Emilio G. Cota
e268f4c036 tcg/i386: constify tcg_target_callee_save_regs
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 07:37:10 -07:00
Richard Henderson
89b2e37e65 tcg/mips: Fully convert tcg_target_op_def
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
9be44a16c2 tcg/sparc: Fully convert tcg_target_op_def
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
6cb3658a04 tcg/ppc: Fully convert tcg_target_op_def
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
7536b82d28 tcg/arm: Fully convert tcg_target_op_def
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
1897cc2eb8 tcg/aarch64: Fully convert tcg_target_op_def
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
80a8b9a910 tcg: Fix types in tcg_regset_{set,reset}_reg
There was a potential problem here with an ILP32 host
with 64 host registers.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
f46934df66 tcg: Remove tcg_regset_set32
It's not even clear what the interface REG and VAL32 were supposed to mean.
All uses had REG = 0 and VAL32 was the bitset assigned to the destination.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
07ddf036fa tcg: Remove tcg_regset_{or,and,andnot,not}
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
d21369f5fb tcg: Remove tcg_regset_set
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
ccb1bb66ea tcg: Remove tcg_regset_clear
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Richard Henderson
be0f34b584 tcg: Add tcg_op_supported
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Philippe Mathieu-Daudé
61a3f8f6c0 accel/tcg: move tcg-runtime to accel/tcg/
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170911213328.9701-4-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Philippe Mathieu-Daudé
ba026602a6 tcg/ppc: disable atomic write check on ppc32
This fixes building for ppc64 on ppc32 (changed in 5964fca8a1):

tcg/ppc/tcg-target.inc.c: In function 'tb_target_set_jmp_target':
include/qemu/compiler.h:86:30: error: static assertion failed: \
  "not expecting: sizeof(*(uint64_t *)jmp_addr) > ATOMIC_REG_SIZE"
	QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
	^
tcg/ppc/tcg-target.inc.c:1377:9: note: in expansion of macro 'atomic_set'
	atomic_set((uint64_t *)jmp_addr, pair);
	^

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170911204936.5020-1-f4bug@amsat.org>
[rth: Added commentary requested by pmm.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17 06:52:19 -07:00
Philippe Mathieu-Daudé
04ef33052c tcg/tci: do not use ldst label (never implemented)
changed in 659ef5cbb8, this fixes building with --enable-tcg-interpreter:

/home/travis/build/qemu/qemu/tcg/tcg.c:116:14: error: ‘tcg_out_ldst_finalize’ used but never defined [-Werror]
 static bool tcg_out_ldst_finalize(TCGContext *s);
              ^

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20170911022839.23231-1-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-11 19:24:05 +01:00
Richard Henderson
53c89efd02 tcg/ppc: Use constant pool for movi
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
77bfc7c0b4 tcg/ppc: Look for shifted constants
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
5964fca8a1 tcg/ppc: Change TCG_REG_RA to TCG_REG_TB
At this point the conversion is a wash.  Loading of TB+ofs is
smaller, but the actual return address from exit_tb is larger.
There are a few more insns required to transition between TBs.

But the expectation is that accesses to the constant pool will
on the whole be smaller.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
afe74dbd6a tcg/arm: Use constant pool for call
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
880ad9626c tcg/arm: Use constant pool for movi
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
2a8ab93c6b tcg/arm: Extract INSN_NOP
We'll want this for tcg_out_nop_fill.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
1507061637 tcg/arm: Code rearrangement
Move constants before all of the functions.
Move tcg_out_<format> functions before all
of the others.  No functional change.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
95ede84f4d tcg/arm: Tighten tlb indexing offset test
We are not going to use ldrd for loading the comparator
for 32-bit guests, so don't limit cmp_off to 8 bits then.
This eliminates one insn in the tlb load for some guests.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
647ab96aaf tcg/arm: Improve tlb load for armv7
Use UBFX to avoid limitation on CPU_TLB_BITS.  Since we're dropping
the initial shift, we need to replace the page masking.  We can use
MOVW+BIC to do this without shifting.  The result is the same size
as the armv6 path with one less conditional instruction.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
e9823b4c33 tcg/sparc: Use constant pool for movi
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
ab20bdc116 tcg/sparc: Introduce TCG_REG_TB
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
55129955e9 tcg/aarch64: Use constant pool for movi
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
a534bb15f3 tcg/s390: Use constant pool for cmpi
Also use CHI/CGHI for 16-bit signed constants.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
5bf67a9217 tcg/s390: Use constant pool for xori
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
4046d9ca04 tcg/s390: Use constant pool for ori
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
bdcd5d1926 tcg/s390: Use constant pool for andi
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
28eef8aaec tcg/s390: Use constant pool for movi
Split out maybe_out_small_movi for use with other operations
that want to add to the constant pool.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
e692a3492d tcg/s390: Fix sign of patch_reloc addend
We were passing in -2 instead of +2, but then ignoring
the actual contents of addend in the calculation.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
829e1376d9 tcg/s390: Introduce TCG_REG_TB
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
4e45f23943 tcg/i386: Store out-of-range call targets in constant pool
Already it saves 2 bytes per call, but also the constant pool
entry may well be shared across multiple calls.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
57a269469d tcg: Infrastructure for managing constant pools
A new shared header tcg-pool.inc.c adds new_pool_label,
for registering a tcg_target_ulong to be emitted after
the generated code, plus relocation data to install a
pointer to the data.

A new pointer is added to the TCGContext, so that we
dump the constant pool as data, not code.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
659ef5cbb8 tcg: Rearrange ldst label tracking
Dispense with TCGBackendData, as it has never been used for more than
holding a single pointer.  Use a define in the cpu/tcg-target.h to
signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
tcg-be-null.h stubs.  Rename tcg-be-ldst.h to tcg-ldst.inc.c.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:35 -07:00
Richard Henderson
a858339336 tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
boolean test.  Replace the tb_set_jmp_target1 ifdef with an unconditional
function tb_target_set_jmp_target.

While we're touching all backends, add a parameter for tb->tc_ptr;
we're going to need it shortly for some backends.

Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.

This opens the possibility for TCG_TARGET_HAS_direct_jump to be
a runtime decision -- based on host cpu capabilities, the size of
code_gen_buffer, or a future debugging switch.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07 11:57:34 -07:00
Richard Henderson
cda4a338c4 tcg/tci: Add TCG_TARGET_DEFAULT_MO
Missed being added as part of 71650df7b0.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-07 18:57:34 +01:00
Richard Henderson
4609190b5f tcg/s390: Use slbgr for setcond le and leu
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-06 07:24:46 -07:00
Richard Henderson
7af525af01 tcg/s390: Use load-on-condition-2 facility
This allows LOAD HALFWORD IMMEDIATE ON CONDITION,
eliminating one insn in some common cases.

Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-06 07:24:41 -07:00
Richard Henderson
c2097136ad tcg/s390: Use distinct-operands facility
This allows using a 3-operand insn form for some arithmetic,
logicals and shifts.

Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-06 07:24:38 -07:00
Richard Henderson
e42349cbd6 tcg/s390: Merge ori+xori facilities check to tcg_target_op_def
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-06 07:24:35 -07:00
Richard Henderson
ba18b07dc6 tcg/s390: Merge add2i facilities check to tcg_target_op_def
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-06 07:24:33 -07:00
Richard Henderson
a8f0269e9e tcg/s390: Merge muli facilities check to tcg_target_op_def
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-06 07:24:31 -07:00
Richard Henderson
07952d9570 tcg/s390: Merge cmpi facilities check to tcg_target_op_def
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-06 07:24:28 -07:00
Richard Henderson
9b5500b697 tcg/s390: Fully convert tcg_target_op_def
Use a switch instead of searching a table.

Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-06 07:22:24 -07:00
Pranith Kumar
b32dc3370a tcg: Implement implicit ordering semantics
Currently, we cannot use mttcg for running strong memory model guests
on weak memory model hosts due to missing ordering semantics.

We implicitly generate fence instructions for stronger guests if an
ordering mismatch is detected. We generate fences only for the orders
for which fence instructions are necessary, for example a fence is not
necessary between a store and a subsequent load on x86 since its
absence in the guest binary tells that ordering need not be
ensured. Also note that if we find multiple subsequent fence
instructions in the generated IR, we combine them in the TCG
optimization pass.

This patch allows us to boot an x86 guest on ARM64 hosts using mttcg.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170829063313.10237-4-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-05 13:41:46 -07:00
Pranith Kumar
71650df7b0 tcg: Add tcg target default memory ordering
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170829063313.10237-3-bobby.prani@gmail.com>
[rth: Dropped ia64 hunk]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-05 12:56:40 -07:00
Richard Henderson
a46c1244a0 tcg: Remove support for ia64 as host
We threatened to remove ia64 as host in v2.9.0.  Its time has now come.

There are still some usages of defined(__ia64__) throughout the source
code that would be triggered if one were to enable TCI on an ia64 host.
Leave those alone for now.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-05 12:39:25 -07:00
Richard Henderson
13aaef678e tcg: Increase minimum alignment from tcg_malloc to 8
For a 64-bit ILP32 host, aligning to sizeof(long) is not enough.
Guess the minimum for any host is 8, as that covers uint64_t.
Qemu doesn't use a host long double or host vectors, except in
extremely limited circumstances.

Fixes a bus error for a sparc v8plus host.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-08-03 11:00:30 -07:00
Richard Henderson
ca671de8af tcg/arm: Fix runtime overalignment test
Patch 85aa80813d changed the IF emitting the TST instruction,
but failed to change the ?: converting CMP to CMPEQ, so the
result of the TST is ignored.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-08-03 10:56:44 -07:00
Philippe Mathieu-Daudé
b208ac07ea docs: fix broken paths to docs/devel/atomics.txt
With the move of some docs/ to docs/devel/ on ac06724a71,
a couple of references were not updated.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-07-31 13:12:47 +03:00
Richard Henderson
5dd8990841 util: Introduce include/qemu/cpuid.h
Clang 3.9 passes the CONFIG_AVX2_OPT configure test.  However, the
supplied <cpuid.h> does not contain the bit_AVX2 define that we use
when detecting whether the routine can be enabled.

Introduce a qemu-specific header that uses the compiler's definition
of __cpuid et al, but supplies any missing bit_* definitions needed.
This avoids introducing any extra ifdefs to util/bufferiszero.c, and
allows quite a few to be removed from tcg/i386/tcg-target.inc.c.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20170719044018.18063-1-rth@twiddle.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-07-24 12:42:55 +01:00
Philippe Mathieu-Daudé
797ed66d29 tcg/tci: enable bswap16_i64
Altough correctly implemented, bswap16_i64() never got tested/executed so the
safety TODO() statement was never removed.

Since it got now tested the TODO() can be removed.

while running Alex Bennée's image aarch64-linux-3.15rc2-buildroot.img:

Trace 0x7fa1904b0890 [0: ffffffc00036cd04]
----------------
IN:
0xffffffc00036cd24:  5ac00694      rev16 w20, w20

OP:
 ---- ffffffc00036cd24 0000000000000000 0000000000000000
 ext32u_i64 tmp3,x20
 ext16u_i64 tmp2,tmp3
 bswap16_i64 x20,tmp2
 movi_i64 tmp4,$0x10
 shr_i64 tmp2,tmp3,tmp4
 ext16u_i64 tmp2,tmp2
 bswap16_i64 tmp2,tmp2
 deposit_i64 x20,x20,tmp2,$0x10,$0x10

Linking TBs 0x7fa1904b0890 [ffffffc00036cd04] index 0 -> 0x7fa1904b0aa0 [ffffffc00036cd24]
Trace 0x7fa1904b0aa0 [0: ffffffc00036cd24]
TODO qemu/tci.c:1049: tcg_qemu_tb_exec()
qemu/tci.c:1049: tcg fatal error
Aborted

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Jaroslaw Pelczar <j.pelczar@samsung.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <20170718045540.16322-11-f4bug@amsat.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-07-19 14:45:16 -07:00
Jiang Biao
4df9cac57f tcg/mips: reserve a register for the guest_base.
Reserve a register for the guest_base using ppc code for reference.
By doing so, we do not have to recompute it for every memory load.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1499677934-2249-1-git-send-email-jiang.biao2@zte.com.cn>
2017-07-19 14:45:15 -07:00
Lluís Vilanova
61a67f71dd exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state
Every vCPU now uses a separate set of TBs for each set of dynamic
tracing event state values. Each set of TBs can be used by any number of
vCPUs to maximize TB reuse when vCPUs have the same tracing state.

This feature is later used by tracetool to optimize tracing of guest
code events.

The maximum number of TB sets is defined as 2^E, where E is the number
of events that have the 'vcpu' property (their state is stored in
CPUState->trace_dstate).

For this to work, a change on the dynamic tracing state of a vCPU will
force it to flush its virtual TB cache (which is only indexed by
address), and fall back to the physical TB cache (which now contains the
vCPU's dynamic tracing state as part of the hashing function).

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-id: 149915775266.6295.10060144081246467690.stgit@frigg.lan
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-07-17 13:11:05 +01:00
Jiang Biao
8b8d768f19 tcg/mips: Bugfix for crash when running program with qemu-i386.
When running a helloworld program with qemu-i386 in linux-user
mode on Loongson 3A3000, it will crash. This patch fix the bug.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Message-Id: <1499669979-25904-1-git-send-email-jiang.biao2@zte.com.cn>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-07-09 21:11:38 -10:00
Pranith Kumar
2acee8b2b5 tcg/aarch64: Enable indirect jump path using LDR (literal)
This patch enables the indirect jump path using an LDR (literal)
instruction. It will be interesting to test and see which performs
better among the two paths.

CC: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170630143614.31059-3-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-07-09 21:10:23 -10:00
Pranith Kumar
b68686bd4b tcg/aarch64: Use ADRP+ADD to compute target address
We use ADRP+ADD to compute the target address for goto_tb. This patch
introduces the NOP instruction which is used to align the above
instruction pair so that we can use one atomic instruction to patch
the destination offsets.

CC: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170630143614.31059-2-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-07-09 21:10:23 -10:00
Pranith Kumar
23b7aa1d2a tcg/aarch64: Introduce and use long branch to register
We can use a branch to register instruction for exit_tb for offsets
greater than 128MB.

CC: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170630143614.31059-1-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-07-09 21:10:23 -10:00
Paolo Bonzini
beeaef55e4 tcg: move tb_lock out of translate-all.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-04 16:01:16 +02:00
Peter Maydell
db7a99cdc1 Queued TCG patches
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZSBP2AAoJEK0ScMxN0CebnyMH/1ZiDhYiqCD7PYfk4/Y7Db+h
 MNKNozrWKyChWQp1RzwWqcBaIzbuMZkDYn8dfS419PNtFRNoYtHjhYvjSTfcrxS0
 U8dGOoqQUHCr/jlyIDUE4y5+aFA9R/1Ih5IQv+QCi5QNXcfeST8zcYF+ImuikP6C
 7heIc7dE9kXdA8ycWJ39kYErHK9qEJbvDx6dxMPmb4cM36U239Zb9so985TXULlQ
 LoHrDpOCBzCbsICBE8iP2RKDvcwENIx21Dwv+9gW/NqR+nRdKcxhTjKEodkS8gl/
 UxMxM/TjIPQOLLUhdck5DFgIgBgQWHRqPMJKqt466I0JlXvSpifmWxckWzslXLc=
 =R+em
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170619' into staging

Queued TCG patches

# gpg: Signature made Mon 19 Jun 2017 19:12:06 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-tcg-20170619:
  target/arm: Exit after clearing aarch64 interrupt mask
  target/s390x: Exit after changing PSW mask
  target/alpha: Use tcg_gen_lookup_and_goto_ptr
  tcg: Increase hit rate of lookup_tb_ptr
  tcg/arm: Use ldr (literal) for goto_tb
  tcg/arm: Try pc-relative addresses for movi
  tcg/arm: Remove limit on code buffer size
  tcg/arm: Use indirect branch for goto_tb
  tcg/aarch64: Use ADR in tcg_out_movi
  translate-all: consolidate tb init in tb_gen_code
  tcg: allocate TB structs before the corresponding translated code
  util: add cacheinfo

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-22 10:25:03 +01:00
Richard Henderson
308714e6bc tcg/arm: Use ldr (literal) for goto_tb
The new placement of the TB means that we can use one insn
to load the goto_tb destination directly from the TB.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-19 11:10:59 -07:00
Richard Henderson
9c39b94f14 tcg/arm: Try pc-relative addresses for movi
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-19 11:10:59 -07:00
Richard Henderson
3fb53fb4d1 tcg/arm: Use indirect branch for goto_tb
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-19 11:10:59 -07:00
Richard Henderson
cc74d332ff tcg/aarch64: Use ADR in tcg_out_movi
The new placement of the TB means that we can use one insn
to load the return value for exit_tb returning the TB pointer.

Tested-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-19 11:10:59 -07:00
Emilio G. Cota
6e3b2bfd6a tcg: allocate TB structs before the corresponding translated code
Allocating an arbitrarily-sized array of tbs results in either
(a) a lot of memory wasted or (b) unnecessary flushes of the code
cache when we run out of TB structs in the array.

An obvious solution would be to just malloc a TB struct when needed,
and keep the TB array as an array of pointers (recall that tb_find_pc()
needs the TB array to run in O(log n)).

Perhaps a better solution, which is implemented in this patch, is to
allocate TB's right before the translated code they describe. This
results in some memory waste due to padding to have code and TBs in
separate cache lines--for instance, I measured 4.7% of padding in the
used portion of code_gen_buffer when booting aarch64 Linux on a
host with 64-byte cache lines. However, it can allow for optimizations
in some host architectures, since TCG backends could safely assume that
the TB and the corresponding translated code are very close to each
other in memory. See this message by rth for a detailed explanation:

  https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html
  Subject: Re: GSoC 2017 Proposal: TCG performance enhancements
  Message-ID: <1e67644b-4b30-887e-d329-1848e94c9484@twiddle.net>

Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1496790745-314-3-git-send-email-cota@braap.org>
[rth: Simplify the arithmetic in tcg_tb_alloc]
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-19 11:10:59 -07:00
Emilio G. Cota
b255b2c8a5 util: add cacheinfo
Add helpers to gather cache info from the host at init-time.

For now, only export the host's I/D cache line sizes, which we
will use to improve cache locality to avoid false sharing.

Suggested-by: Richard Henderson <rth@twiddle.net>
Suggested-by: Geert Martin Ijewski <gm.ijewski@web.de>
Tested-by:    Geert Martin Ijewski <gm.ijewski@web.de>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1496794624-4083-1-git-send-email-cota@braap.org>
[rth: Move all implementations from tcg/ppc/]
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-19 11:10:59 -07:00
Yang Zhong
244f144134 tcg: move tcg backend files into accel/tcg/
move tcg-runtime.c, translate-all.(ch) and translate-common.c into
accel/tcg/ subdirectory and updated related trace-events file.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <1496383606-18060-4-git-send-email-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-15 11:04:06 +02:00
Aurelien Jarno
5786e0683c tcg/mips: implement goto_ptr
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170430145254.25616-2-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05 09:25:42 -07:00
Richard Henderson
085c648bef tcg/arm: Implement goto_ptr
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05 09:25:42 -07:00
Richard Henderson
702a947484 tcg/arm: Clarify tcg_out_bx for arm4 host
In theory this would re-enable usage of QEMU on an armv4 host.
Whether this is worthwhile is debatable -- we've been unconditionally
issuing the armv5t BX instruction in the prologue since 2011 without
complaint.  Possibly we should simply require an armv6 host.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05 09:25:42 -07:00
Richard Henderson
46644483ca tcg/s390: Implement goto_ptr
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05 09:25:42 -07:00
Richard Henderson
38f81dc593 tcg/sparc: Implement goto_ptr
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05 09:25:42 -07:00
Richard Henderson
b19f0c2e7d tcg/aarch64: Implement goto_ptr
Measurements:

                      SPECint06 (test set), x86_64-linux-user. Host: APM 64-bit ARMv8 (Atlas/A57) @ 2.4 GHz

 1.45x +-+-------------------------------------------------------------------------------------------------------------+-+
       |                                      *****                                                                      |
       |      +++                             *   *                                                    +goto-ptr         |
  1.4x +-+...*****............................*...*....................................................................+-+
       |     *+++*                            *   *                            +++                                       |
 1.35x +-+...*...*............................*...*...........................*****....................................+-+
       |     *   *                            *   *                           *+++*                                      |
       |     *   *                            *   *                           *   *                                      |
  1.3x +-+...*...*............................*...*...........................*...*....................................+-+
       |     *   *                            *   *                           *   *                                      |
       |     *   *                            *   *                           *   *                    *****             |
 1.25x +-+...*...*...........*****............*...*...........................*...*............*****...*...*...........+-+
       |     *   *           *   *            *   *                           *   *            *+++*   *   *             |
  1.2x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...........+-+
       |     *   *           *   *            *   *                           *   *            *   *   *   *             |
       |     *   *           *   *            *   *                           *   *            *   *   *   *   *****     |
 1.15x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...*...*...+-+
       |     *   *           *   *            *   *                           *   *    +++     *   *   *   *   *   *     |
       |     *   *           *   *            *   *                           *   *   *****    *   *   *   *   *   *     |
  1.1x +-+...*...*...........*...*....*****...*...*...*****...................*...*...*...*....*...*...*...*...*...*...+-+
       |     *   *           *   *    *   *   *   *   *   *                   *   *   *   *    *   *   *   *   *   *     |
 1.05x +-+...*...*...........*...*....*...*...*...*...*...*...................*...*...*...*....*...*...*...*...*...*...+-+
       |     *   *   *****   *   *    *   *   *   *   *   *                   *   *   *   *    *   *   *   *   *   *     |
       |     *   *   *   *   *   *    *   *   *   *   *   *   *****   *****   *   *   *   *    *   *   *   *   *   *     |
    1x +-+---*****---*****---*****----*****---*****---*****---*****---*****---*****---*****----*****---*****---*****---+-+
          astar   bzip2     gcc    gobmk h264ref   hmmlibquantum     mcf omnetpperlbench    sjenxalancbmk   hmean
  png: http://imgur.com/en9HE8L

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05 09:25:42 -07:00
Richard Henderson
0c240785a8 tcg/ppc: Implement goto_ptr
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05 09:25:42 -07:00
Emilio G. Cota
5cb4ef80f6 tcg/i386: implement goto_ptr
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-6-git-send-email-cota@braap.org>
[rth: Reuse goto_ptr epilogue for exit_tb 0.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05 09:25:42 -07:00
Emilio G. Cota
cedbcb0152 tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr
Instead of exporting goto_ptr directly to TCG frontends, export
tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
returned by the lookup_tb_ptr() helper. This is the only use case
we have for goto_ptr and lookup_tb_ptr, so having this function is
very convenient. Furthermore, it trivially allows us to avoid calling
the lookup helper if goto_ptr is not implemented by the backend.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org>
Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org>
Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org>
Message-Id: <1493263764-18657-5-git-send-email-cota@braap.org>
[rth: Squashed 4 related commits.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05 09:25:42 -07:00
Aurelien Jarno
2f5a5f5774 tcg/mips: fix field extraction opcode
The "msb" argument should correspond to (len - 1).

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2017-05-06 12:48:53 +02:00
Richard Henderson
79b1af9062 tcg: Initialize return value after exit_atomic
Users of tcg_gen_atomic_cmpxchg and do_atomic_op rightfully utilize
the output.  Even though this code is dead, it gets translated, and
without the initialization we encounter a tcg_error.

Reported-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Tested-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-04-26 19:26:11 +02:00
Peter Maydell
fa54abb8c2 Drop QEMU_GNUC_PREREQ() checks for gcc older than 4.1
We already require gcc 4.1 or newer (for the atomic
support), so the fallback codepaths for older gcc
versions than that are now dead code and we can
just delete them.

NB: clang reports itself as gcc 4.2 (regardless of
clang version), so clang won't be using the fallbacks
either.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2017-04-20 18:33:33 +01:00
Peter Maydell
5c32be5baf tcg/sparc: Zero extend address argument to ld/st helpers
The C store helper functions take the address argument as a
target_ulong type; if this is 32 bit but the host is 64 bit
then the SPARC calling convention requires that the caller
must zero extend the value. We weren't doing this, which
meant we could pass values to the caller with high bits set
and QEMU would crash if it was compiled with optimizations.
In particular, the i386 BIOS would not start.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1490871151-29029-3-git-send-email-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-04-03 12:59:37 +01:00
Peter Maydell
709a340d67 tcg/sparc: Zero extend data argument to store helpers
The C store helper functions take the data argument as a uint8_t,
uint16_t, etc depending on the store size. The SPARC calling
convention requires that data types smaller than the register
size must be extended by the caller. We weren't doing this,
which meant that if QEMU was compiled with optimizations enabled
we could end up storing incorrect values to guest memory.
(In particular the i386 guest BIOS would crash on startup.)

Add code to the trampolines that call the store helpers to
do the zero extension as required.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1490871151-29029-2-git-send-email-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-04-03 12:59:14 +01:00
Paolo Bonzini
30f3dda24b Merge branch 'icount-update' into HEAD
Merge the original development branch due to breakage caused by the
MTTCG merge.

Conflicts:
	cpu-exec.c
	translate-common.c

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-03 16:39:18 +01:00
Pranith Kumar
dc1eccd661 aarch64: Change ext type to TCGType to fix warnings
To fix the following warnings:

In file included from /users/pranith/qemu/tcg/tcg.c:255:
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:879:24: warning: implicit conversion from enumeration type 'TCGMemOp' (aka 'enum TCGMemOp') to different enumeration type 'TCGType' (aka 'enum TCGType')
      [-Wenum-conversion]
        tcg_out_cmp(s, ext, a, b, b_const);
        ~~~~~~~~~~~    ^~~
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:893:36: warning: implicit conversion from enumeration type 'TCGMemOp' (aka 'enum TCGMemOp') to different enumeration type 'TCGType' (aka 'enum TCGType')
      [-Wenum-conversion]
        tcg_out_insn(s, 3201, CBZ, ext, a, offset);
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:389:65: note: expanded from macro 'tcg_out_insn'
    glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
                                                                ^
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:895:37: warning: implicit conversion from enumeration type 'TCGMemOp' (aka 'enum TCGMemOp') to different enumeration type 'TCGType' (aka 'enum TCGType')
      [-Wenum-conversion]
        tcg_out_insn(s, 3201, CBNZ, ext, a, offset);
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:389:65: note: expanded from macro 'tcg_out_insn'
    glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
                                                                ^
/users/pranith/qemu/tcg/aarch64/tcg-target.inc.c:1610:27: warning: implicit conversion from enumeration type 'TCGType' (aka 'enum TCGType') to different enumeration type 'TCGMemOp' (aka 'enum TCGMemOp')
      [-Wenum-conversion]
        tcg_out_brcond(s, ext, a2, a0, a1, const_args[1], arg_label(args[3]));
        ~~~~~~~~~~~~~~    ^~~

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170217154311.13920-1-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-03-01 08:28:06 +11:00
Alex Bennée
ca759f9e38 tcg: enable MTTCG by default for ARM on x86 hosts
This enables the multi-threaded system emulation by default for ARMv7
and ARMv8 guests using the x86_64 TCG backend. This is because on the
guest side:

  - The ARM translate.c/translate-64.c have been converted to
    - use MTTCG safe atomic primitives
    - emit the appropriate barrier ops
  - The ARM machine has been updated to
    - hold the BQL when modifying shared cross-vCPU state
    - defer powerctl changes to async safe work

All the host backends support the barrier and atomic primitives but
need to provide same-or-better support for normal load/store
operations.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Pranith Kumar <bobby.prani@gmail.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2017-02-24 10:32:46 +00:00
KONRAD Frederic
8d4e9146b3 tcg: add options for enabling MTTCG
We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.

As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[AJB: move to -accel tcg,thread=multi|single, defaults]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24 10:32:45 +00:00
Alex Bennée
2093714314 tcg: move TCG_MO/BAR types into own file
We'll be using the memory ordering definitions to define values for
both the host and guest. To avoid fighting with circular header
dependencies just move these types into their own minimal header.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24 10:32:45 +00:00
Paolo Bonzini
1aab16c28a cpu-exec: unify icount_decr and tcg_exit_req
The icount interrupt flag and tcg_exit_req serve almost the same
purpose, let's make them completely the same.

The former TB_EXIT_REQUESTED and TB_EXIT_ICOUNT_EXPIRED cases are
unified, since we can distinguish them from the value of the
interrupt flag.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-22 14:56:34 +01:00
Stefan Weil
77e217d1bf tci: Remove invalid assertions
tb_jmp_insn_offset and tb_jmp_reset_offset are pointers
and cannot be used with ARRAY_SIZE.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20170202195601.11286-1-sw@weilnetz.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-02-03 11:38:55 +00:00
Richard Henderson
39f099ec9d tcg/i386: Always use TZCNT when available
I think this is cleaner than sometimes using BSF.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-17 12:02:08 -08:00
Richard Henderson
9bf38308f6 Revert "tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR"
This reverts commit 4ac7691073.

This fixes
  http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03062.html

While I think we could get away with relying on the undocumented
behaviour, the tcg constraint system isn't powerful enough to
properly describe the required (non-)overlap conditions.

Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-17 11:59:13 -08:00
Richard Henderson
8cf9a3d3f7 tcg/aarch64: Fix tcg_out_movi
There were some patterns, like 0x0000_ffff_ffff_00ff, for which we
would select to begin a multi-insn sequence with MOVN, but would
fail to set the 0x0000 lane back from 0xffff.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <20161207180727.6286-3-rth@twiddle.net>
2017-01-13 11:47:29 -08:00
Richard Henderson
b1eb20da62 tcg/aarch64: Fix addsub2 for 0+C
When al == xzr, we cannot use addi/subi because that encodes xsp.
Force a zero into the temp register for that (rare) case.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <20161207180727.6286-2-rth@twiddle.net>
2017-01-13 11:46:27 -08:00
Richard Henderson
a32b6ae897 tcg/s390: Fix merge error with facilities
The variable was renamed s390_facilities.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-13 09:30:40 -08:00
Richard Henderson
993508e43e tcg/i386: Handle ctpop opcode
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:49:59 -08:00
Richard Henderson
33e75fb9c8 tcg/ppc: Handle ctpop opcode
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:49:59 -08:00
Richard Henderson
14e99210f6 tcg: Use ctpop to generate ctz if needed
Particularly when andc is also available, this is two insns
shorter than using clz to compute ctz.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:49:59 -08:00
Richard Henderson
a768e4e992 tcg: Add opcode for ctpop
The number of actual invocations of ctpop itself does not warrent
an opcode, but it is very helpful for POWER7 to use in generating
an expansion for ctz.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:48:56 -08:00
Richard Henderson
086920c2c8 tcg: Add helpers for clrsb
The number of actual invocations does not warrent an opcode,
and the backends generating it.  But at least we can eliminate
redundant helpers.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:47:48 -08:00
Richard Henderson
4ac7691073 tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR
The ISA manual documents the output is undefined if the input was zero.

However, we document in target-i386 that the behavior of real silicon
is to preserve the contents of the output register.  We also mention
that there are real applications that depend on this.  That this is
baked into silicon is mentioned as a potential cause for some false
sharing behaviour wrt lzcnt/tzcnt.

Taking advantage of this allows us to save 2 insns in the normal case,
and 4 insns for i686 emulating a 64-bit clz.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:47:48 -08:00
Richard Henderson
bbf25f90ba tcg/i386: Handle ctz and clz opcodes
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:47:48 -08:00
Richard Henderson
6a5aed4bdc tcg/i386: Allow bmi2 shiftx to have non-matching operands
Previously we could not have different constraints for different ISA levels,
which prevented us from eliding the matching constraint for shifts.

We do now have to make sure that the operands match for constant shifts.
We can also handle some small left shifts via lea.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:47:48 -08:00
Richard Henderson
42d5b51492 tcg/i386: Hoist common arguments in tcg_out_op
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:47:48 -08:00
Richard Henderson
cd26449a50 tcg/i386: Fuly convert tcg_target_op_def
Use a switch instead of searching a table.  Share constraints between
32-bit and 64-bit, when at all possible.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:47:48 -08:00
Richard Henderson
ce411066f4 tcg/s390: Handle clz opcode
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:47:48 -08:00
Richard Henderson
2a1d9d41ae tcg/mips: Handle clz opcode
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:47:48 -08:00
Richard Henderson
cc0fec8a4d tcg/arm: Handle ctz and clz opcodes
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
53c76c1990 tcg/aarch64: Handle ctz and clz opcodes
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
d0b07481fa tcg/ppc: Handle ctz and clz opcodes
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
0e28d0063b tcg: Add clz and ctz opcodes
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
17280ff4a5 tcg: Allow an operand to be matching or a constant
This allows an output operand to match an input operand
only when the input operand needs a register.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
069ea736b5 tcg: Pass the opcode width to target_parse_constraint
This will let us choose how to interpret a given constraint
depending on whether the opcode is 32- or 64-bit.  Which will
let us share more constraint combinations between opcodes.

At the same time, change the interface to return the advanced
pointer instead of passing it in/out by reference.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
f69d277ece tcg: Transition flat op_defs array to a target callback
This will allow the target to tailor the constraints to the
auto-detected ISA extensions.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
82790a8709 tcg: Add markup for output requires new register
This is the same concept as, and same markup as, the
early clobber markup in gcc.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
333b21b809 tcg/optimize: Fold movcond 0/1 into setcond
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:10 -08:00
Richard Henderson
752b1be947 tcg/s390: Support deposit into zero
Since we can no longer use matching constraints, this does
mean we must handle that data movement by hand.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:10 -08:00