For Neon shifts by immediate and narrow, correctly handle the case
where the source registers and the destination registers overlap
(the second pass should use the original register contents, not the
results of the first pass).
This includes a refactoring to pull the size check outside the
loop rather than inside, since there is now very little common
code between the size == 3 and size != 3 case.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Pull the code which decodes narrowing operations as being either
signed/unsigned saturate or plain out into its own function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Call the normal shift helpers instead of the rounding ones.
Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Move the implementation of the Neon VUZP unzip instruction from inline
code to helper functions. (At 50+ TCG ops it was well over the
recommended limit for coding inline.) The helper implementations also
give the correct answers where the inline implementation did not.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Move the implementation of the Neon VUZP unzip instruction from inline
code to helper functions. (At 50+ TCG ops it was well over the
recommended limit for coding inline.) The helper implementations also
fix the handling of the quadword version of the instruction.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We handle Thumb Neon data processing instructions by converting them
into the equivalent ARM encoding, as the two are very close. However
the ARM encoding should have bit 28 set, not clear. This wasn't causing
any problems because we don't actually look at that bit during decode;
however it is better to do the conversion correctly to avoid problems
later if we add checks to UNDEF on SBZ/SBO bits.
Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
For VQDMLSL, negation has to occur after saturation, not before.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Refactor the handling of VQDMULL so that it is dealt with in
its own if() case rather than together with the accumulating
instructions.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Implement VMULL.P8 (the 32x32->64 version of the polynomial multiply
instruction).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The Neon half-precision conversion operations (VCVT.F16.F32 and
VCVT.F32.F16) use ARM standard floating-point arithmetic, unlike
the VFP versions (VCVTB and VCVTT).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Fix bit mask used when widening the result of shift on narrow input.
Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
VQMOVUN does a signed-to-unsigned saturating conversion. This is
different from both the signed-to-signed and unsigned-to-unsigned
conversions already implemented, so we need a new set of helper
functions (neon_unarrow_sat*).
Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Refine the decoding of the Thumb preload and hint space, so we
UNDEF on the patterns that are supposed to UNDEF rather than NOP.
We also move the tests for this space earlier, so we don't emit
harmless but unnecessary address generation code for preload
hints (which by their nature are likely to be in hot code paths).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Correct the decoding of the ARM preload and memory hint space,
by adding decoding of PLI, PLDW and the v7MP unallocated hint
space. This commit also corrects a slightly overexuberant
decoding of PLD(register) which was not checking that bit 4
was one.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This patch fixes the errors reported by my tests in VSRA.
Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Fix the register and part of register we get the scalar from in
the various "multiply vector by scalar" ops (VMUL by scalar
and friends).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This patch fixes corner-case saturations, when the target range is
zero. It merely removes the guard against (sh == 0), and makes:
__ssat(0x87654321, 1) return 0xffffffff and set the saturation flag
__usat(0x87654321, 0) return 0 and set the saturation flag
Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add support for logging the start of instructions in TCG
code debug dumps for ARM targets.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
We were not correctly restoring the IT bits when resuming execution
after taking an unexpected exception in the middle of an IT block.
Fix this by tracking them along with PC changes and restoring in
gen_pc_load().
This fixes bug https://bugs.launchpad.net/qemu/+bug/581335
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Create a new function which does the common sequence of gen_set_condexec,
gen_set_pc_im, gen_exception, set is_jmp to DISAS_JUMP.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Remove a redundant call to gen_set_condexec() in the translation of Thumb
mode SWI. (SWI and WFI generate "exceptions" which happen after the
execution of the instruction, ie when PC and IT bits have updated.
So the condexec bits at this point are not correct. However, the code
that handles finishing the translation of the TB will write the correct
value of the condexec bits later, so the only effect was that a conditional
Thumb SWI would generate slightly worse code than necessary.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When translating, get the user/priv state from the TB flags, not
the CPUState.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When translating, the condexec bits for the TB are in the TB flags;
the CPUState condexec bits may be different.
This patch fixes https://bugs.launchpad.net/bugs/604872 where we might
segfault if we took an exception in the middle of a TB with an IT
block, because when we came to retranslate in cpu_restore_state()
the CPUState condexec bits would have advanced compared to the start
of the TB and we would generate different (wrong) code.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The Thumb/ARM state for the TB being translated should come from
the TB flags, not the CPUState.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When translating, the VFP vector length and stride for this TB are encoded
in the TB flags; the CPUState copies may be different and must not be used.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When translating code, whether the VFP unit is enabled for this TB
is stored in a bit in the TB flags. Use this rather than incorrectly
reading the FPEXC from the CPUState passed to translation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When translating the SRS instruction, handle the "store registers
to stack of current mode" case in the helper function rather than
inline. This means the generated code does not make assumptions
about the current CPU mode which might not be valid when the TB
is executed later.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Fix errors in the decoding of ARM VQSHL/VQSHLU immediate forms,
including using the new VQSHLU helper functions where appropriate.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
SMMLA and SMMLS are broken on both in normal and thumb mode, that is
both (different) implementations are wrong. They try to avoid a 64-bit
add for the rounding, which is not trivial if you want to support both
SMMLA and SMMLS with the same code.
The code below uses the same implementation for both modes, using the
code from the ARM manual. It also fixes the thumb decoding that was a
mix between normal and thumb mode.
This fixes the issues reported in
https://bugs.launchpad.net/qemu/+bug/629298
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
UMAAL should use unsigned multiply instead of signed.
This patch fixes this issue by handling UMAAL separately from
UMULL/UMLAL/SMULL/SMLAL as these instructions are different
enough. It also explicitly list instructions in case and catch
nonexistent instruction as illegal. Also fixes a few style issues.
This fixes the issues reported in
https://bugs.launchpad.net/qemu/+bug/696015
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Correct the arguments passed when generating neon qshl_{u,s}64()
helpers so that we use the correct registers.
Signed-off-by: Juha Riihimäki <juha.riihimaki@nokia.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The translation of REVSH shifted the low byte 8 steps left before performing
an 8-bit sign extend, causing this part of the expression to alwas be 0.
Reported-by: Johan Bengtsson <teofrastius@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Fix errors in the decoding of the Neon forms of fixed-point VCVT:
* fixed-point VCVT is op 14 and 15, not 15 and 16
* the fbits immediate field was being misinterpreted
* the sense of the to_fixed bit was inverted
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>
Correct the decoding of source and destination registers
for the VFP forms of the VCVT instructions which convert
between floating point and integer or fixed-point.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>
Correct ldrexd and strexd code to always read and write the
high word of the 64-bit value from addr+4.
Also make ldrexd and strexd agree that for a 64 bit value the
address in env->exclusive_addr is that of the low word.
This fixes the issues reported in
https://bugs.launchpad.net/qemu/+bug/670883
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>
Refine check on bkpt so that smc and undefined instruction encodings are
handled as an undefined instruction and trap.
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>
The thumb2 decoder contained a mixup between the bit controlling
doubling and the bit controlling if the operation was an add or a sub.
Signed-off-by: Johan Bengtsson <teofrastius@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>
The PKHxx instructions were not recognized by the thumb2 decoder. The
solution provided in this changeset is identical to the arm-mode
implementation.
Signed-off-by: Johan Bengtsson <teofrastius@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>
fprintf_function uses format checking with GCC_FMT_ATTR.
Format errors were fixed in
* target-i386/helper.c
* target-mips/translate.c
* target-ppc/translate.c
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
When combining multiple values as part of a NEON array load, do explcit
shift/or rather than using gen_bfi. This voids redundant mask
operations.
Signed-off-by: Paul Brook <paul@codesourcery.com>
This patch fixes few resource leaks in the iwmmxt disassemble.
Signed-off-by: Lars Munch <lars@segv.dk>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Do not try to insert a conditional jump over next instruction when the
condition code is AL as this will trigger an internal error.
Signed-off-by: Johan Bengtsson <teofrastius@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>