Now that CONFIG_TCG_PASS_AREG0 has been removed, it's easier to get
an optimal code for the load/store functions.
First swap the two registers used in tcg_out_tlb_load() so that the
address end-up in the second register instead of the first one. Adjust
tcg_out_qemu_ld() and tcg_out_qemu_st() to respectively call
tcg_out_qemu_ld_direct() and tcg_out_qemu_st_direct() with the correct
registers. Then replace the register shifting by direct load of the
arguments.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We had a hack for arm and sparc, allocating code_gen_prologue to a
special section. Which, honestly does no good under certain cases.
We've already got limits on code_gen_buffer_size to ensure that all
TBs can use direct branches between themselves; reuse this limit to
ensure the prologue is also reachable.
As a bonus, we get to avoid marking a page of the main executable's
data segment as executable.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Like add2, do operand ordering, constant folding, and dead operand
elimination. The latter happens about 15% of all mulu2 during an
x86_64 bios boot.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When x86_64 guest is not in 64-bit mode, the high-part of the 64-bit
add is dead. When the host is 32-bit, we can simplify to 32-bit
arithmetic.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We can re-use these for implementing double-word folding.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This saves a whole lot of repetitive code sequences.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reduces code duplication and prefers
movcond d, c1, c2, const, s
to
movcond d, c1, c2, s, const
It also prefers
add r, r, c
over
add r, c, r
when both inputs are known constants. This doesn't matter for true add, as
we will fully constant fold that. But it matters for a follow-on patch using
this routine for add2 which may not be fully foldable.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Note that in the general reg=reg,reg case we're restricted
to 16-bit insertions. This makes it easy to allow "any"
constant as input, as post-truncation it will fit into the
constant load insn for which we have room in the bundle.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
It is possible to slightly optimize the TLB access code, by replacing
the movi + and instructions by a deposit instruction.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Remove suboptimal register shifting in qemu_ld/st ops, introduced at the
CONFIG_TCG_PASS_AREG0 time.
As mem_idx is now loaded in register R58/R59 for the slow path, we have
to make sure to do it last, to not add additional register constraints.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Implement movcond_i32/64 on ia64 hosts. It is not possible to have
immediate compare arguments without adding a new bundle, but it is
possible to have 22-bit immediate value arguments.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use stack instead of temp_buf array in CPUState for TCG temps.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Implement movcond_i32 for ARM, as the sequence
mov dst, v2 (implicitly done by the tcg common code)
cmp c1, c2
movCC dst, v1
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The code to emit either an immediate cmp or a register cmp insn is
duplicated in several places; factor it out into its own function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that we're always sparcv9, we can not bother using Bicc for
32-bit branches and BPcc for 64-bit branches and instead always
use BPcc.
New interfaces allow less direct use of tcg_out32 and raw numbers
inside the qemu_ld/st routines.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
We must care not to clobber the high parts before we consume them.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The set of comparisons that can immediately use the carry are LTU/GEU,
not LTU/LEU. Don't swap operands when we need a temp register; the
register may already be in use from setcond2.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The datalo variable is still live in the miss path. Use another
when reconstructing the full data value.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Like brcond2, use tcg_high_cond. Use movcc instead of branches.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
GUEST_BASE support is now supported by all TCG backends, and is
now mandatory. Drop the now-pointless TCG_TARGET_HAS_GUEST_BASE
define (set by every backend) and the error if it is unset.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
The pointer entry 'temps' always refers to the array entry 'static_temps'.
Removing the pointer and renaming 'static_temps' to 'temps' reduces the
size of TCGContext (4 or 8 byte) and allows better code generation.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* 'trivial-patches' of git://github.com/stefanha/qemu:
versatilepb: Use symbolic indices for ARM PIC
qdev: kill bogus comment
qemu-barrier: Fix compiler version check for future gcc versions
hw: Add missing 'static' attribute for QEMUMachine
cleanup useless return sentence
qemu-sockets: Fix compiler warning (regression for MinGW)
vnc: Fix spelling (hellmen -> hellman) in comment
slirp: Fix spelling in comment (enought -> enough, insure -> ensure)
tcg/arm: Use tcg_out_mov_reg rather than inline equivalent code
cpu: Add missing 'static' attribute to qemu_global_mutex
configure: Support empty target list (--target-list=)
hw: Fix return value check for bdrv_read, bdrv_write
The table that was recently added for hppa is generally usable.
And with the renumbering of the TCG_COND constants it's not too
difficult to compute rather than have a table.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There are several cases that can be handled easier inside both
translators and code generators if we have out-of-band values
for conditions. It's easy enough to handle ALWAYS and NEVER in
the natural way inside the tcg middle-end.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Before we rearrange the TCG_COND enumeration, add a predicate for
the (single) use of comparisons vs TCGCond.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG jmp operation doesn't really make sense in the QEMU context, it
is unused, it is not implemented by some targets, and it is wrongly
implemented by some others.
This patch simply removes it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Stefan Weil<sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use the recently introduced tcg_out_mov_reg() function rather than
the equivalent inline code.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Support for helper functions with 5 arguments was missing
in the code generator and in the interpreter.
There is no need to pass the constant TCG_AREG0 from the
code generator to the interpreter. Remove that code for
the INDEX_op_qemu_st* opcodes.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The movcond_i32 op has to be protected with TCG_TARGET_HAS_movcond_i32
to fix the build with -march < i686.
Thanks to Richard Henderson for the hint.
Reported-by: Alex Barcelo <abarcelo@ac.upc.edu>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When movcond_i32 is available we can further reduce the generated
op count from 12 to 6, and the generated code size on i686 from
88 to 74 bytes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Avoiding 64-bit arithmetic (outside of the compare) reduces the
generated op count from 15 to 12, and the generated code size on
i686 from 105 to 88 bytes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Checking that we don't try for idx != [01] is trivial. Checking
that we don't issue more than one of any index requires a tad
more data and some ifdefs protecting that new variable.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Given these are constants, checking once here means everything
after can assume they're correct.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Like the C assert macro, except only enabled for CONFIG_DEBUG_TCG,
and without having to set _NDEBUG and disable all other asserts at
the same time.
The use of __builtin_unreachable (when available) gives the compiler
the same information, which may (or may not) help it optimize better.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
For tcg_gen_concat_i32_i64 we only use deposit if the host supports it.
For tcg_gen_concat32_i64 even if the host does not, as we get identical
code before and after.
Note that this relies on the ANDI -> EXTU patch for the identity claim.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Note that xori_i64 failed to perform even the minimal
optimizations promised by the README.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Note that andi_i64 failed to perform even the minimal
optimizations promised by the README.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The README file documented the operand ordering of the tcg_gen_*
functions. Since we're documenting opcodes here, use the true
operand ordering.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Cc: malc <av1474@comtv.ru>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Fix the MIPS32(R2) cpu detection so that it also works with
-march=octeon. Thanks to Andrew Pinski for the hint.
Cc: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* 'tcg-sparc' of git://repo.or.cz/qemu/rth:
tcg-sparc: Preserve branch destinations during retranslation
tcg-sparc: Fix and enable direct TB chaining.
tcg-sparc: Add %g/%o registers to alloc_order
tcg-sparc: Use defines for temporaries.
tcg-sparc: Mask shift immediates to avoid illegal insns.
tcg-sparc: Clean up cruft stemming from attempts to use global registers.
tcg-sparc: Change AREG0 in generated code to %i0.
tcg-sparc: Support GUEST_BASE.
tcg-sparc: Fix qemu_ld/st to handle 32-bit host.
tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode.
tcg-sparc: Don't MAP_FIXED on top of the program
tcg-sparc: Fix ADDX opcode.
tcg-sparc: Hack in qemu_ld/st64 for 32-bit.
linux-user: Use memcpy in get_user/put_user.
The TCG targets no longer need individual implementations.
Since commit 6a18ae2d29,
'flags' is no longer used in tcg_target_get_call_iarg_regs_count.
The remaining tcg_target_get_call_iarg_regs_count is trivial and only
called once. Therefore the patch eliminates it completely.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
32 bit x86 hosts don't need registers for helper function arguments
because they use the default stack based calling convention.
Removing the registers allows simpler code for function
tcg_target_get_call_iarg_regs_count.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
While 64 bit hosts use the first three registers which are also used
as function input parameters, 32 bit hosts use TCG_REG_EAX and
TCG_REG_EDX which are not used in parameter passing.
After defining new register macros for the registers used in L
constraint, the patch replaces most occurrences of
tcg_target_call_iarg_regs[0], tcg_target_call_iarg_regs[1] and
tcg_target_call_iarg_regs[2] by those new macros.
tcg_target_call_iarg_regs remains unchanged when it is used for input
arguments (only with 64 bit hosts) before tcg_out_calli.
A comment related to those registers was fixed, too.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
[aurel32: build fix on i386, small optimization for i386 in the prologue]
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
TCG uses 6 registers for function arguments on 64 bit Linux hosts,
but only 4 registers on W64 hosts.
Commit 2999a0b200 increased the number
of arguments for some important helper functions from 4 to 5
which triggered a bug for W64 hosts: QEMU aborts when executing
helper_lcall_real in the guest's BIOS because function
tcg_target_get_call_iarg_regs_count always returned 6.
As W64 has only 4 registers for arguments, the 5th argument must be
passed on the stack using a correct stack offset.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit 25c4d9cc changed all TCGOpcode enums to be available, so we don't
need to #ifdef #endif the one that are available only on some targets.
This makes the code easier to read.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The "op a, a, b" form is better handled on non-RISC host than the "op
a, b, a" form, so swap the arguments to this form when possible, and
when b is not a constant.
This reduces the number of generated instructions by a tiny bit.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When both argument of brcond/movcond/setcond are the same or when one
of the two values is a constant equal to zero, it's possible to do
further optimizations.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that it's possible to detect copies, we can optimize the case
the "op r, a, a => movi r, 0". This helps in the computation of
overflow flags when one of the two args is 0.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that we can easily detect all copies, we can optimize the
"op r, a, a => mov r, a" case a bit more.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
It is possible to due copy propagation for all operations, even the one
that have side effects or clobber arguments (it only concerns input
arguments). That said, the call operation should be handled differently
due to the variable number of arguments.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The copy propagation pass tries to keep track what is a copy of what
and what has copy of what, and in addition it keep a circular list of
of all the copies. Unfortunately this doesn't fully work: a mov from
a temp which has a state "COPY" changed it into a state "HAS_COPY".
Later when this temp is used again, it is considered has not having
copy and thus no propagation is done.
This patch fixes that by removing the hiearchy between copies, and thus
only keeping a "COPY" state both meaning "is a copy" and "has a copy".
The decision of which copy to use is deferred to the actual temp
replacement. At this stage there is not one best choice to do, but only
better choices than others. For doing the best choice the operation
would have to be parsed in reversed to know if a temp is going to be
used later or not. That what is done by the liveness analysis. At this
stage it is known that globals will be always live, that local temps
will be dead at the end of the translation block, and that the temps
will be dead at the end of the basic block. This means that this stage
should try to replace temps by local temps or globals and local temps
by globals.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The copy propagation doesn't check the types of the temps during copy
propagation. However TCG is using the mov_i32 for the i64 to i32
conversion and thus the two are not equivalent.
With this patch tcg_opt_gen_mov() doesn't consider two temps of
different type as copies anymore.
So far it seems the optimization was not aggressive enough to trigger
this bug, but it will be triggered later in this series once the copy
propagation is improved.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
TCG_TEMP_ANY has no different meaning than TCG_TEMP_UNDEF, so use
the later instead.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
movcond operation can be implemented on MIPS32 Release 2 using the MOVN,
MOVZ, SLT and SLTU instructions.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
deposit operations can be optimized on MIPS32 Release 2 using the INS
instruction.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
rotr operations can be optimized on MIPS32 Release 2 using the ROTR and
ROTRV instructions. Also implemented rotl operations by subtracting the
shift from 32.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
bswap operations can be optimized on MIPS32 Release 2 using the ROTR,
WSBH and SEH instructions. We can't use the non-R2 code to implement the
ops due to registers constraints, so don't define the corresponding
TCG_TARGET_HAS_bswap* values.
Also bswap16* operations are supposed to be called with the 16 high bits
zeroed. This is the case everywhere (including for TCG by definition)
except when called from the store helper. Remove the AND instructions from
bswap16* and move it there.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
MIPS has some conditional branch instructions when comparing with zero.
Use them.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use stack instead of temp_buf array in CPUState for TCG
temps.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Instead of int, use the correct TCGArg and TCGReg type: TCGReg when
representing a TCG target register, TCGArg when representing the latter
or a constant.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Recent versions of GCC emit warnings when compiling user mode targets.
Kill them by reordering a bit the #ifdef.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The 'Z' constraint has been introduced to map the zero register. However
when the op also accept a constant, there is no point to accept the zero
register in addition.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The xtensa-test image generates a sra_i32 with count 0x40.
Whether this is accident of tcg constant propagation or
originating directly from the instruction stream is immaterial.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Don't use -ffixed-gN. Don't link statically. Don't save/restore
AREG0 around calls. Don't allocate space on the stack for AREG0 save.
Signed-off-by: Richard Henderson <rth@twiddle.net>
At the same time, split out the tlb load logic to a new function.
Fixes the cases of two data registers and two address registers.
Fixes the signature of, and adds missing, qemu_ld/st opcodes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Current code doesn't actually work in 32-bit mode at all. Since
no one really noticed, drop the complication of v7 and v8 cpus.
Eliminate the --sparc_cpu configure option and standardize macro
testing on TCG_TARGET_REG_BITS / HOST_LONG_BITS
Signed-off-by: Richard Henderson <rth@twiddle.net>
The CONFIG_TCG_PASS_AREG0 code for calling ld/st helpers
was not respecting the ABI requirement for 64-bit values
being aligned in registers.
Mirror the ARM port in use of helper functions to marshal
arguments into the correct registers.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Neither of these functions were performing double-word
compares properly.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit 6375e09e changed the type of TranslationBlock.tb_next,
but failed to change the type of TCGContext.tb_next.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
While swapping constants to the second operand, swap
sources matching destinations to the first operand.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>