With the linked list scheme we need not leave nops in the stream
that we need to process later.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The previous setup required ops and args to be completely sequential,
and was error prone when it came to both iteration and optimization.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The method by which we count the number of ops emitted
is going to change. Abstract that away into some inlines.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Some of these functions are really quite large. We have a number of
things that ought to be circularly dependent, but we duplicated code
to break that chain for the inlines.
This saved 25% of the code size of one of the translators I examined.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Currently 'info jit' outputs half of the information to monitor and the
rest to qemu log. Dumping opcode counts to monitor as a part of 'info
jit' command doesn't sound useful. Add new monitor command 'info
opcount' that only dumps opcode counters.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Instead of using structures, which imply some amount of overhead
on certain ABIs, use pointer types.
This actually reduces the size of the binaries vs a NON-debug
build on ppc64 and x86_64, due to a reduction in the number of
sign-extension insns.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Unify pieces of cpu-all.h, exec-all.h, softmmu_exec.h and tcg/tcg.h
into a single new header file with all helpers.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than special casing them, use the standard mechanisms
for tcg helper generation.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Avoid allocating a tcg temporary to hold the constant address,
and instead place it directly into the op_call arguments.
At the same time, convert to the newly introduced tcg_out_call
backend function, rather than invoking tcg_out_op for the call.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Now that all backends do define TCG_TARGET_INSN_UNIT_SIZE,
remove the fallback definition.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
To be defined by the tcg backend based on the elemental unit of the ISA.
During the transition, allow TCG_TARGET_INSN_UNIT_SIZE to be undefined,
which allows us to default tcg_insn_unit to the current uint8_t.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Instead require either mulu2_i32 or muluh_i32. The code in tcg-op.h
already supports looking for both. Previous incomplete conversion?
Signed-off-by: Richard Henderson <rth@twiddle.net>
We have macros for marking TCGv values as unused, checking if they
are unused and comparing them to each other. However these only exist
for TCGv_i32 and TCGv_i64; add them for TCGv_ptr as well.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
We previously allocated 32-bits per temp for the next_free_temp entry.
We now allocate 4 bits per temp across the 4 bitmaps.
Using a linked list meant that if a translator is tweeked, resulting in
temps being freed in a different order, that would have follow-on effects
throughout the TB. Always allocating the lowest free temp means that
follow-on effects are minimized, which can make it easier to diff output
when debugging the translators.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Step three in the transition: helpers not tied to the target
"default" endianness. To be used when the guest uses a memory
operation with non-default endianness.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Slightly changes the interface, in that we now return name
instead of a TCGHelperInfo structure, which goes away.
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The _cmmu helpers can be moved to exec-all.h. The helpers that are
used from TCG will shortly need access to tcg_target_long so move
their declarations into tcg.h.
This requires minor include adjustments to all TCG backends.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
There are several hosts for which it would be useful to use the
available 64-bit registers in a 32-bit pointer environment.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use them in places where mulu2 and muls2 are used.
Optimize mulx2 with dead low part to mulxh.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Expand the definition of "not present" to include "should not be present".
This means we can simplify the logic surrounding the generic tcg opcodes
for which the host backend ought not be providing definitions.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
There are several hosts with only a "div" insn. Remainder is computed
manually from the quotient and inputs. We can do this generically.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Fix some of the nasty TCG race conditions and crashes by implementing
cpu_exit() as setting a flag which is checked at the start of each TB.
This avoids crashes if a thread or signal handler calls cpu_exit()
while the execution thread is itself modifying the TB graph (which
may happen in system emulation mode as well as in linux-user mode
with a multithreaded guest binary).
This fixes the crashes seen in LP:668799; however there are another
class of crashes described in LP:1098729 which stem from the fact
that in linux-user with a multithreaded guest all threads will
use and modify the same global TCG date structures (including the
generated code buffer) without any kind of locking. This means that
multithreaded guest binaries are still in the "unsupported"
category.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Document tcg_qemu_tb_exec(). In particular, its return value is a
combination of a pointer to the next translation block and some
extra information in the low two bits. Provide some #defines for
the values passed in these bits to improve code clarity.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
It's worth to clean-up translation blocks variables and move them
into one context as was suggested by Swirl.
Also if we use this context directly inside tcg_ctx, then it
speeds up code generation a bit.
Signed-off-by: Evgeny Voevodin <evgenyvoevodin@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Add optimized TCG qemu_ld/st generation which locates the code of TLB miss
cases at the end of a block after generating the other IRs.
Currently, this optimization supports only i386 and x86_64 hosts.
Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>