rotr_i32 calculates the amount to left shift and puts it into a
temporary, but then doesn't use it when doing the shift.
Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
add2_i64 was adding the lower double word to the upper double word
of each input. Fix this so we add the lower double words, then the
upper double words with carry propagation.
Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
If our input and output is in the same register, bswap64 tries to
undo a rotate of the input. This just ends up rotating the output.
Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The rldcl instruction doesn't have an sh field, so the minor opcode
is shifted 1 bit. We were using the XO30 macro which shifted the
minor opcode 2 bits.
Remove XO30 and add MD30 and MDS30 macros which match the
Power ISA categories.
Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
# By Claudio Fontana (9) and others
# Via Peter Maydell
* pmaydell/tcg-aarch64.next:
MAINTAINERS: add tcg/aarch64 maintainer
configure: permit compilation on arm aarch64
tcg/aarch64: implement user mode qemu ld/st
user-exec.c: aarch64 initial implementation of cpu_signal_handler
tcg/aarch64: implement sign/zero extend operations
tcg/aarch64: implement byte swap operations
tcg/aarch64: implement AND/TEST immediate pattern
tcg/aarch64: improve arith shifted regs operations
tcg/aarch64: implement new TCG target for aarch64
include/elf.h: add aarch64 ELF machine and relocs
configure: Drop CONFIG_ATFILE test
linux-user: Drop direct use of openat etc syscalls
linux-user: Allow getdents to be provided by getdents64
Message-id: 1371052645-9006-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
also put aarch64 in the list of archs that do not need an ldscript.
Signed-off-by: Jani Kokkoken <jani.kokkonen@huawei.com>
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 51AF40EE.1000104@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
implement the optional sign/zero extend operations with the dedicated
aarch64 instructions.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 51AC9A58.40502@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
implement the optional byte swap operations with the dedicated
aarch64 instructions.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 51AC9A33.9050003@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
for arith operations, add SUBS, ANDS, ADDS and add a shift parameter
so that all arith instructions can make use of shifted registers.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 51AC998B.7070506@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
add preliminary support for TCG target aarch64.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 51A5C596.3090108@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We've got a compile-time check for the condition in exec/cpu-defs.h.
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
When setcond2 is rewritten into setcond, the state of the destination
temp should be reset, so that a copy of the previous value is not
used instead of the result.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Avoid the mini constant pool for armv7, and avoid replicating
the test for pre-v7.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Found by inspection, since the effect of the bug was simply to
send all memory ops through the slow path.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Move the slow path out of line, as the TODO's mention.
This allows the fast path to be unconditional, which can
speed up the fast path as well, depending on the core.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Work better with branch predition when we have movw+movt,
as the size of the code is the same. Perhaps re-evaluate
when we have a proper constant pool.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The schedule was fully serial, with no possibility for dual issue.
The old schedule had a minimal issue of 7 cycles; the new schedule
has a minimal issue of 5 cycles.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Share code between qemu_ld and qemu_st to process the tlb.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use even more primitive helper functions to avoid lots of duplicated code.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Make the code more readable by only having one copy of the magic
numbers, swapping registers as needed prior to that. Speed the
compiler by not applying the rd == rn avoidance for v6 or later.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
R12 is call clobbered, while R8 is call saved. This change
gives tcg one more call saved register for real data.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We get to re-use the _rIN and _rIK subroutines to handle the various
combinations of add vs sub. Fold the << 21 into the opcode enum values
so that we can explicitly add TO_CPSR as desired.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This greatly improves code generation for addition of small
negative constants.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This greatly improves the code we can produce for deposit
without armv7 support.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This makes it easier to verify changes to the code
generating the prologue.
[Aurelien: change the format from %i to %zu]
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We were not allocating TCG_STATIC_CALL_ARGS_SIZE, so this meant that
any helper with more than 4 arguments would clobber the saved regs.
Realizing that we're supposed to have this memory pre-allocated means
we can clean up the tcg_out_arg functions, which were trying to do
more stack allocation.
Allocate stack memory for the TCG temporaries while we're at it.
Signed-off-by: Richard Henderson <rth@twiddle.net>
On 32-bit TCG targets, when emulating deposit_i64 with a mov_i32 +
deposit_i32, care should be taken to not overwrite the low part of
the second argument before the deposit when it is the same the
destination.
This fixes the shld instruction in qemu-system-x86_64, which in turns
fixes booting "system rescue CD version 2.8.0" on this target.
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG optimizer does great work when inserting constants, being able
to fold the open-coded deposit expansion to just an AND or an OR. Avoid
a bit the regression caused by having the deposit opcode by expanding
deposit of zero as an AND.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Glibc 2.16 includes an easy way to get feature bits previously
buried in /proc or the program startup auxiliary vector. Use it.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
There are a few simple special cases that should be handled first.
Break these out to subroutines to avoid code duplication.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It takes half the cycles to read one CR register instead of all 8.
This is a backward compatible addition to the ISA, so chips prior
to Power 2.00 spec will simply continue to read the entire CR register.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Nothing else in the call chain ensures that these
constants don't have garbage in the high bits.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The optimization/bug being fixed is that tcg_out_cmp was not applying the
right type to loading a constant, in the case it can't be implemented
directly. Rather than recomputing the TCGType enum from the arch64 bool,
pass around the original TCGType throughout.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The mul_i32 pattern was loading non-16-bit constants into a register,
when we can get the middle-end to do that for us. The mul_i64 pattern
was not considering that MULLI takes 64-bit inputs.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Since we have special code to handle and/or/xor with a constant,
apply the same to andc/orc/eqv with a constant.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>