Commit Graph

27 Commits

Author SHA1 Message Date
Peter Maydell 8fc9d8918c target/arm: Convert integer-to-float insns to decodetree
Convert the VCVT integer-to-float instructions to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:06 +01:00
Peter Maydell 6ed7e49c36 target/arm: Convert double-single precision conversion insns to decodetree
Convert the VCVT double/single precision conversion insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:06 +01:00
Peter Maydell e25155f55d target/arm: Convert VFP round insns to decodetree
Convert the VFP round-to-integer instructions VRINTR, VRINTZ and
VRINTX to decodetree.

These instructions were only introduced as part of the "VFP misc"
additions in v8A, so we check this. The old decoder's implementation
was incorrectly providing them even for v7A CPUs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:06 +01:00
Peter Maydell cdfd14e86a target/arm: Convert the VCVT-to-f16 insns to decodetree
Convert the VCVTT and VCVTB instructions which convert from
f32 and f64 to f16 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
store of the right half of the input single-precision register
rather than doing a load/modify/store sequence on the full
32 bits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:06 +01:00
Peter Maydell b623d803dd target/arm: Convert the VCVT-from-f16 insns to decodetree
Convert the VCVTT, VCVTB instructions that deal with conversion
from half-precision floats to f32 or 64 to decodetree.

Since we're no longer constrained to the old decoder's style
using cpu_F0s and cpu_F0d we can perform a direct 16 bit
load of the right half of the input single-precision register
rather than loading the full 32 bits and then doing a
separate shift or sign-extension.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:06 +01:00
Peter Maydell 386bba2368 target/arm: Convert VFP comparison insns to decodetree
Convert the VFP comparison instructions to decodetree.

Note that comparison instructions should not honour the VFP
short-vector length and stride information: they are scalar-only
operations.  This applies to all the 2-operand instructions except
for VMOV, VABS, VNEG and VSQRT.  (In the old decoder this is
implemented via the "if (op == 15 && rn > 3) { veclen = 0; }" check.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:06 +01:00
Peter Maydell 17552b979e target/arm: Convert VMOV (register) to decodetree
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell b8474540cb target/arm: Convert VSQRT to decodetree
Convert the VSQRT instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell 1882651afd target/arm: Convert VNEG to decodetree
Convert the VNEG instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell 90287e22c9 target/arm: Convert VABS to decodetree
Convert the VFP VABS instruction to decodetree.

Unlike the 3-op versions, we don't pass fpst to the VFPGen2OpSPFn or
VFPGen2OpDPFn because none of the operations which use this format
and support short vectors will need it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell b518c753f0 target/arm: Convert VMOV (imm) to decodetree
Convert the VFP VMOV (immediate) instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell d4893b01d2 target/arm: Convert VFP fused multiply-add insns to decodetree
Convert the VFP fused multiply-add instructions (VFNMA, VFNMS,
VFMA, VFMS) to decodetree.

Note that in the old decode structure we were implementing
these to honour the VFP vector stride/length. These instructions
were introduced in VFPv4, and in the v7A architecture they
are UNPREDICTABLE if the vector stride or length are non-zero.
In v8A they must UNDEF if stride or length are non-zero, like
all VFP instructions; we choose to UNDEF always.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell 519ee7ae31 target/arm: Convert VDIV to decodetree
Convert the VDIV instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell 8fec9a1192 target/arm: Convert VSUB to decodetree
Convert the VSUB instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell ce28b30371 target/arm: Convert VADD to decodetree
Convert the VADD instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell 43c4be1236 target/arm: Convert VNMUL to decodetree
Convert the VNMUL instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell 88c5188ced target/arm: Convert VMUL to decodetree
Convert the VMUL instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell 8a483533ad target/arm: Convert VFP VNMLA to decodetree
Convert the VFP VNMLA instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:05 +01:00
Peter Maydell c54a416cc6 target/arm: Convert VFP VNMLS to decodetree
Convert the VFP VNMLS instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:04 +01:00
Peter Maydell e7258280d4 target/arm: Convert VFP VMLS to decodetree
Convert the VFP VMLS instruction to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:04 +01:00
Peter Maydell 266bd25c48 target/arm: Convert VFP VMLA to decodetree
Convert the VFP VMLA instruction to decodetree.

This is the first of the VFP 3-operand data processing instructions,
so we include in this patch the code which loops over the elements
for an old-style VFP vector operation. The existing code to do this
looping uses the deprecated cpu_F0s/F0d/F1s/F1d TCG globals; since
we are going to be converting instructions one at a time anyway
we can take the opportunity to make the new loop use TCG temporaries,
which means we can do that conversion one operation at a time
rather than needing to do it all in one go.

We include an UNDEF check which was missing in the old code:
short-vector operations (with stride or length non-zero) were
deprecated in v7A and must UNDEF in v8A, so if the MVFR0 FPShVec
field does not indicate that support for short vectors is present
we UNDEF the operations that would use them. (This is a change
of behaviour for Cortex-A7, Cortex-A15 and the v8 CPUs, which
previously were all incorrectly allowing short-vector operations.)

Note that the conversion fixes a bug in the old code for the
case of VFP short-vector "mixed scalar/vector operations". These
happen where the destination register is in a vector bank but
but the second operand is in a scalar bank. For example
  vmla.f64 d10, d1, d16   with length 2 stride 2
is equivalent to the pair of scalar operations
  vmla.f64 d10, d1, d16
  vmla.f64 d8, d3, d16
where the destination and first input register cycle through
their vector but the second input is scalar (d16). In the
old decoder the gen_vfp_F1_mul() operation uses cpu_F1{s,d}
as a temporary output for the multiply, which trashes the
second input operand. For the fully-scalar case (where we
never do a second iteration) and the fully-vector case
(where the loop loads the new second input operand) this
doesn't matter, but for the mixed scalar/vector case we
will end up using the wrong value for later loop iterations.
In the new code we use TCG temporaries and so avoid the bug.
This bug is present for all the multiply-accumulate insns
that operate on short vectors: VMLA, VMLS, VNMLA, VNMLS.

Note 2: the expression used to calculate the next register
number in the vector bank is not in fact correct; we leave
this behaviour unchanged from the old decoder and will
fix this bug later in the series.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:04 +01:00
Peter Maydell fa288de272 target/arm: Convert the VFP load/store multiple insns to decodetree
Convert the VFP load/store multiple insns to decodetree.
This includes tightening up the UNDEF checking for pre-VFPv3
CPUs which only have D0-D15 : they now UNDEF for any access
to D16-D31, not merely when the smallest register in the
transfer list is in D16-D31.

This conversion does not try to share code between the single
precision and the double precision versions; this looks a bit
duplicative of code, but it leaves the door open for a future
refactoring which gets rid of the use of the "F0" registers
by inlining the various functions like gen_vfp_ld() and
gen_mov_F0_reg() which are hiding "if (dp) { ... } else { ... }"
conditionalisation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:04 +01:00
Peter Maydell 79b02a3b52 target/arm: Convert VFP VLDR and VSTR to decodetree
Convert the VFP single load/store insns VLDR and VSTR to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:04 +01:00
Peter Maydell 81f681106e target/arm: Convert VFP two-register transfer insns to decodetree
Convert the VFP two-register transfer instructions to decodetree
(in the v8 Arm ARM these are the "Advanced SIMD and floating-point
64-bit move" encoding group).

Again, we expand out the sequences involving gen_vfp_msr() and
gen_msr_vfp().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:04 +01:00
Peter Maydell a9ab50011a target/arm: Convert "single-precision" register moves to decodetree
Convert the "single-precision" register moves to decodetree:
 * VMSR
 * VMRS
 * VMOV between general purpose register and single precision

Note that the VMSR/VMRS conversions make our handling of
the "should this UNDEF?" checks consistent between the two
instructions:
 * VMSR to MVFR0, MVFR1, MVFR2 now UNDEF from EL0
   (previously was a nop)
 * VMSR to FPSID now UNDEFs from EL0 or if VFPv3 or better
   (previously was a nop)
 * VMSR to FPINST and FPINST2 now UNDEF if VFPv3 or better
   (previously would write to the register, which had no
   guest-visible effect because we always UNDEF reads)

We also tighten up the decode: we were previously underdecoding
some SBZ or SBO bits.

The conversion of VMOV_single includes the expansion out of the
gen_mov_F0_vreg()/gen_vfp_mrs() and gen_mov_vreg_F0()/gen_vfp_msr()
sequences into the simpler direct load/store of the TCG temp via
neon_{load,store}_reg32(): we know in the new function that we're
always single-precision, we don't need to use the old-and-deprecated
cpu_F0* TCG globals, and we don't happen to have the declaration of
gen_vfp_msr() and gen_vfp_mrs() at the point in the file where the
new function is.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:04 +01:00
Peter Maydell 9851ed9269 target/arm: Convert "double-precision" register moves to decodetree
Convert the "double-precision" register moves to decodetree:
this covers VMOV scalar-to-gpreg, VMOV gpreg-to-scalar and VDUP.

Note that the conversion process has tightened up a few of the
UNDEF encoding checks: we now correctly forbid:
 * VMOV-to-gpr with U:opc1:opc2 == 10x00 or x0x10
 * VMOV-from-gpr with opc1:opc2 == 0x10
 * VDUP with B:E == 11
 * VDUP with Q == 1 and Vn<0> == 1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
The accesses of elements < 32 bits could be improved by doing
direct ld/st of the right size rather than 32-bit read-and-shift
or read-modify-write, but we leave this for later cleanup,
since this series is generally trying to stick to fixing
the decode.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:04 +01:00
Peter Maydell 78e138bc1f target/arm: Add stubs for AArch32 VFP decodetree
Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 VFP encodings.  At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.

We need to have one decoder for the unconditional insns and one for
the conditional insns, as otherwise the patterns for conditional
insns would incorrectly match against the unconditional ones too.

Since translate.c is over 14,000 lines long and we're going to be
touching pretty much every line of the VFP code as part of the
decodetree conversion, we create a new translate-vfp.inc.c to hold
the code which deals with VFP in the new scheme.  It should be
possible to convert this into a standalone translation unit
eventually, but the conversion process will be much simpler if we
simply #include it midway through translate.c to start with.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-13 15:14:03 +01:00