All of the helpers with the explicit big/little endian option
require the return address as a parameter. Acquire this via
a trampoline.
Move the load of areg0 into the trampoline.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Pass address registers explicitly, rather than as indicies of args[].
It's two argument registers either way. Use more TCGReg as appropriate.
Signed-off-by: Richard Henderson <rth@twiddle.net>
We were computing the full address into %o0 and then not using it.
Adjust some of the computation to rely less on having to pull immediate
values into registers.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Cleaning up the implementation of tcg_out_movi at the same time.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Clean up multiply at the same time.
For remainder, generic code will produce mul+sub,
whereas we can implement with msub.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Also tidy the implementation of ubfm, sbfm, extr in order to share code.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Handle a simplified set of logical immediates for the moment.
The way gcc and binutils do it, with 52k worth of tables, and
a binary search depth of log2(5334) = 13, seems slow for the
most common cases.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Avoid the magic numbers in the current implementation.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
This merges the implementation of tcg_out_addi and tcg_out_subi.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Converting the add/sub (3.5.2) and logical shifted (3.5.10) instruction
groups to the new scheme.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Commit 023261ef85 failed to remove a
nop that's no longer required.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
At first glance the code appears to be using 1's compliment encoding,
a-la AArch32. Except that the constant is "off", creating a complicated
split field 2's compliment encoding.
Much clearer to just use a normal mask and shift.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It was unused. Let's not overcomplicate things before we need them.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This reduces the code size of the function significantly.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We assert that the values for _I32 and _I64 are 0 and 1 respectively.
This will make a couple of functions declared by tcg.c cleaner.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Removed from other targets in 56bbc2f967.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Win32 doesn't have a cpuid.h, and MacOSX may have one but without
the __cpuid() function we use, which means that commit 9d2eec20
broke the build for those platforms. Fix this by tightening up
our configure cpuid.h check to test that the functions we need
are present, and adding some missing #ifdef guards in
tcg/i386/tcg-target.c.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
These three-operand shift instructions do not require the shift count
to be placed into ECX. This reduces the number of mov insns required,
with the mere addition of a new register constraint.
Don't attempt to get rid of the matching constraint, as that's impossible
to manipulate with just a new constraint. In addition, constant shifts
still need the matching constraint.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Note that the optimizer cannot simplify ANDC X,Y,C to AND X,Y,~C
so we must handle constants in the implementation of andc.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
These are not needed by users of tcg-target.h. No need to recompile
when we adjust them.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Recognize 0 operand to andc, and -1 operands to and, orc, eqv.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Like we already do for SUB and XOR.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Given, of course, an appropriate constant. These could be generated
from the "canonical" operation for inversion on the guest, or via
other optimizations.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The shl_i32 op might set some bits of the unused 32 high bits of the
mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
except load/store which operate on tl values.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Known-zero bits optimization is a great idea that helps to generate more
optimized code. However the current implementation only works in very few
cases as the computed mask is not saved.
Fix this to make it really working.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
32-bit versions of sar and shr ops should not propagate known-zero bits
from the unused 32 high bits. For sar it could even lead to wrong code
being generated.
Cc: qemu-stable@nongnu.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It's this that should be subtracted from 0x20 when converting to a right rotate.
Cc: qemu-stable@nongnu.org
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The second half register of a 64-bit temp on a 32-bit host
was allocated with the wrong base_type.
The base_type of the second half register is never checked,
but for consistency it should be the same as the first half.
Signed-off-by: Richard Henderson <rth@twiddle.net>