Commit Graph

2285 Commits

Author SHA1 Message Date
Peter Maydell
1e35cd9166 target/arm: Implement MVE VADD (floating-point)
Implement the MVE VADD (floating-point) insn.  Handling of this is
similar to the 2-operand integer insns, except that we must take care
to only update the floating point exception status if the least
significant bit of the predicate mask for each element is active.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-01 11:08:16 +01:00
Peter Maydell
e784807cd2 target/arm: Do hflags rebuild in cpsr_write()
Currently we rely on all the callsites of cpsr_write() to rebuild the
cached hflags if they change one of the CPSR bits which we use as a
TB flag and cache in hflags.  This is a bit awkward when we want to
change the set of CPSR bits that we cache, because it means we need
to re-audit all the cpsr_write() callsites to see which flags they
are writing and whether they now need to rebuild the hflags.

Switch instead to making cpsr_write() call arm_rebuild_hflags()
itself if one of the bits being changed is a cached bit.

We don't do the rebuild for the CPSRWriteRaw write type, because that
kind of write is generally doing something special anyway.  For the
CPSRWriteRaw callsites in the KVM code and inbound migration we
definitely don't want to recalculate the hflags; the callsites in
boot.c and arm-powerctl.c have to do a rebuild-hflags call themselves
anyway because of other CPU state changes they make.

This allows us to drop explicit arm_rebuild_hflags() calls in a
couple of places where the only reason we needed to call it was the
CPSR write.

This fixes a bug where we were incorrectly failing to rebuild hflags
in the code path for a gdbstub write to CPSR, which meant that you
could make QEMU assert by breaking into a running guest, altering the
CPSR to change the value of, for example, CPSR.E, and then
continuing.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210817201843.3829-1-peter.maydell@linaro.org
2021-08-26 17:02:01 +01:00
Peter Maydell
8e228c9e4b target/arm: Implement HSTR.TJDBX
In v7A, the HSTR register has a TJDBX bit which traps NS EL0/EL1
access to the JOSCR and JMCR trivial Jazelle registers, and also BXJ.
Implement these traps. In v8A this HSTR bit doesn't exist, so don't
trap for v8A CPUs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210816180305.20137-3-peter.maydell@linaro.org
2021-08-26 17:02:01 +01:00
Peter Maydell
cc7613bfaa target/arm: Implement HSTR.TTEE
In v7, the HSTR register has a TTEE bit which allows EL0/EL1 accesses
to the Thumb2EE TEECR and TEEHBR registers to be trapped to the
hypervisor. Implement these traps.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210816180305.20137-2-peter.maydell@linaro.org
2021-08-26 17:02:01 +01:00
Peter Maydell
49e7f191ca target/arm: Avoid assertion trying to use KVM and multiple ASes
KVM cannot support multiple address spaces per CPU; if you try to
create more than one then cpu_address_space_init() will assert.

In the Arm CPU realize function, detect the configurations which
would cause us to need more than one AS, and cleanly fail the
realize rather than blundering on into the assertion. This
turns this:
  $ qemu-system-aarch64  -enable-kvm -display none -cpu max -machine raspi3b
  qemu-system-aarch64: ../../softmmu/physmem.c:747: cpu_address_space_init: Assertion `asidx == 0 || !kvm_enabled()' failed.
  Aborted

into:
  $ qemu-system-aarch64  -enable-kvm -display none -machine raspi3b
  qemu-system-aarch64: Cannot enable KVM when guest CPU has EL3 enabled

and this:
  $ qemu-system-aarch64  -enable-kvm -display none -machine mps3-an524
  qemu-system-aarch64: ../../softmmu/physmem.c:747: cpu_address_space_init: Assertion `asidx == 0 || !kvm_enabled()' failed.
  Aborted

into:
  $ qemu-system-aarch64  -enable-kvm -display none -machine mps3-an524
  qemu-system-aarch64: Cannot enable KVM when using an M-profile guest CPU

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/528
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210816135842.25302-3-peter.maydell@linaro.org
2021-08-26 17:02:01 +01:00
Andrew Jones
022707e5d6 target/arm/cpu64: Validate sve vector lengths are supported
Future CPU types may specify which vector lengths are supported.
We can apply nearly the same logic to validate those lengths
as we do for KVM's supported vector lengths. We merge the code
where we can, but unfortunately can't completely merge it because
KVM requires all vector lengths, power-of-two or not, smaller than
the maximum enabled length to also be enabled. The architecture
only requires all the power-of-two lengths, though, so TCG will
only enforce that.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210823160647.34028-5-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-08-26 17:01:59 +01:00
Andrew Jones
5b65e5abea target/arm/cpu64: Replace kvm_supported with sve_vq_supported
Now that we have an ARMCPU member sve_vq_supported we no longer
need the local kvm_supported bitmap for KVM's supported vector
lengths.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210823160647.34028-4-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-08-26 17:01:59 +01:00
Andrew Jones
927703cc40 target/arm/kvm64: Ensure sve vls map is completely clear
bitmap_clear() only clears the given range. While the given
range should be sufficient in this case we might as well be
100% sure all bits are zeroed by using bitmap_zero().

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210823160647.34028-3-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-08-26 17:01:59 +01:00
Andrew Jones
5401b1e08d target/arm/cpu: Introduce sve_vq_supported bitmap
Allow CPUs that support SVE to specify which SVE vector lengths they
support by setting them in this bitmap. Currently only the 'max' and
'host' CPU types supports SVE and 'host' requires KVM which obtains
its supported bitmap from the host. So, we only need to initialize the
bitmap for 'max' with TCG. And, since 'max' should support all SVE
vector lengths we simply fill the bitmap. Future CPU types may have
less trivial maps though.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210823160647.34028-2-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-08-26 17:01:59 +01:00
Hamza Mahfooz
dfa0d9b80e target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
As per commit 5626f8c6d4 ("rcu: Add automatically released rcu_read_lock
variants"), RCU_READ_LOCK_GUARD() should be used instead of
rcu_read_{un}lock().

Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210727235201.11491-1-someguy@effective-light.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
e534629296 target/arm: Implement M-profile trapping on division by zero
Unlike A-profile, for M-profile the UDIV and SDIV insns can be
configured to raise an exception on division by zero, using the CCR
DIV_0_TRP bit.

Implement support for setting this bit by making the helper functions
raise the appropriate exception.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730151636.17254-3-peter.maydell@linaro.org
2021-08-25 10:48:50 +01:00
Peter Maydell
fc7a5038a6 target/arm: Re-indent sdiv and udiv helpers
We're about to make a code change to the sdiv and udiv helper
functions, so first fix their indentation and coding style.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730151636.17254-2-peter.maydell@linaro.org
2021-08-25 10:48:50 +01:00
Peter Maydell
075e7e97e3 target/arm: Implement MVE interleaving loads/stores
Implement the MVE interleaving load/store functions VLD2, VLD4, VST2
and VST4.  VLD2 loads 16 bytes of data from memory and writes to 2
consecutive Qregs; VLD4 loads 16 bytes of data from memory and writes
to 4 consecutive Qregs.  The 'pattern' field in the encoding
determines the offset into memory which is accessed and also which
elements in the Qregs are written to.  (The intention is that a
sequence of four consecutive VLD4 with different pattern values
performs a complete de-interleaving load of 64 bytes into all
elements of the 4 Qregs.) VST2 and VST4 do the same, but for stores.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
fac80f0856 target/arm: Implement MVE scatter-gather immediate forms
Implement the MVE VLDR/VSTR insns which do scatter-gather using base
addresses from Qm plus or minus an immediate offset (possibly with
writeback). Note that writeback is not predicated but it does have
to honour ECI state, so we have to add an eci_mask check to the
VSTR_SG macros (the VLDR_SG macros already needed this to be able
to distinguish "skip beat" from "set predicated element to 0").

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
dc18628b18 target/arm: Implement MVE scatter-gather insns
Implement the MVE gather-loads and scatter-stores which
form the address by adding a base value from a scalar
register to an offset in each element of a vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
0f31e37c7f target/arm: Implement MVE VCTP
Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so
as to predicate any element at index Rn or greater is predicated.  As
with VPNOT, this insn itself is predicable and subject to beatwise
execution.

The calculation of the mask is the same as is used to determine
ltpmask in mve_element_mask(), but we precalculate masklen in
generated code to avoid having to have 4 helpers specialized by size.

We put the decode line in with the low-overhead-loop insns in
t32.decode because it's logically part of that collection of insn
patterns, even though it is an MVE only insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
fea3958fa1 target/arm: Implement MVE VPNOT
Implement the MVE VPNOT insn, which inverts the bits in VPR.P0
(subject to both predication and to beatwise execution).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
1241f148d5 target/arm: Implement MVE VMOV to/from 2 general-purpose registers
Implement the MVE VMOV forms that move data between 2 general-purpose
registers and 2 32-bit lanes in a vector register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
d5c571ea6d target/arm: Implement MVE VMAXA, VMINA
Implement the MVE VMAXA and VMINA insns, which take the absolute
value of the signed elements in the input vector and then accumulate
the unsigned max or min into the destination vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
398e7cd3cd target/arm: Implement MVE VQABS, VQNEG
Implement the MVE 1-operand saturating operations VQABS and VQNEG.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
8be9a25058 target/arm: Implement MVE saturating doubling multiply accumulates
Implement the MVE saturating doubling multiply accumulate insns
VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH.  These perform a multiply,
double, add the accumulator shifted by the element size, possibly
round, saturate to twice the element size, then take the high half of
the result.  The *MLAH insns do vector * scalar + vector, and the
*MLASH insns do vector * vector + scalar.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
c69e34c6de target/arm: Implement MVE VMLA
Implement the MVE VMLA insn, which multiplies a vector by a scalar
and accumulates into another vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
f0ffff5163 target/arm: Implement MVE VMLADAV and VMLSLDAV
Implement the MVE VMLADAV and VMLSLDAV insns.  Like the VMLALDAV and
VMLSLDAV insns already implemented, these accumulate multiplied
vector elements; but they accumulate a 32-bit result rather than a
64-bit one.

Note that these encodings overlap with what would be RdaHi=0b111 for
VMLALDAV, VMLSLDAV, VRMLALDAVH and VRMLSLDAVH.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
640cdf20a2 target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
The MVEGenDualAccOpFn is a bit misnamed, since it is used for
the "long dual accumulate" operations that use a 64-bit
accumulator. Rename it to MVEGenLongDualAccOpFn so we can
use the former name for the 32-bit accumulator insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
54dc78a901 target/arm: Implement MVE narrowing moves
Implement the MVE narrowing move insns VMOVN, VQMOVN and VQMOVUN.
These take a double-width input, narrow it (possibly saturating) and
store the result to either the top or bottom half of the output
element.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
7f061c0ab9 target/arm: Implement MVE VABAV
Implement the MVE VABAV insn, which computes absolute differences
between elements of two vectors and accumulates the result into
a general purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
688ba4cf33 target/arm: Implement MVE integer min/max across vector
Implement the MVE integer min/max across vector insns
VMAXV, VMINV, VMAXAV and VMINAV, which find the maximum
from the vector elements and a general purpose register,
and store the maximum back into the general purpose
register.

These insns overlap with VRMLALDAVH (they use what would
be RdaHi=0b110).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
345910f8c1 target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
All the users of the vmlaldav formats have an 'x bit in bit 12 and an
'a' bit in bit 5; move these to the format rather than specifying them
in each insn pattern.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
1b15a97d4c target/arm: Implement MVE shift-by-scalar
Implement the MVE instructions which perform shifts by a scalar.
These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2.  They take the
shift amount in a general purpose register and shift every element in
the vector by that amount.

Mostly we can reuse the helper functions for shift-by-immediate; we
do need two new helpers for VQRSHL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
6b895bf8fb target/arm: Implement MVE VMLAS
Implement the MVE VMLAS insn, which multiplies a vector by a vector
and adds a scalar.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
c386443b16 target/arm: Implement MVE VPSEL
Implement the MVE VPSEL insn, which sets each byte of the destination
vector Qd to the byte from either Qn or Qm depending on the value of
the corresponding bit in VPR.P0.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
cce81873bc target/arm: Implement MVE integer vector-vs-scalar comparisons
Implement the MVE integer vector comparison instructions that compare
each element against a scalar from a general purpose register.  These
are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)"
encodings T4, T5 and T6.

We have to move the decodetree pattern for VPST, because it
overlaps with VCMP T4 with size = 0b11.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
eff5d9a9bd target/arm: Implement MVE integer vector comparisons
Implement the MVE integer vector comparison instructions.  These are
"VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings
T1, T2 and T3.

These insns compare corresponding elements in each vector, and update
the VPR.P0 predicate bits with the results of the comparison.  VPT
also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively
"VCMP then VPST".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
552517861c target/arm: Factor out gen_vpst()
Factor out the "generate code to update VPR.MASK01/MASK23" part of
trans_VPST(); we are going to want to reuse it for the VPT insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
395b92d50e target/arm: Implement MVE incrementing/decrementing dup insns
Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP,
VIWDUP and VDWDUP.  These fill the elements of a vector with
successively incrementing values, starting at the offset specified in
a general purpose register.  The final value of the offset is written
back to this register.  The wrapping variants take a second general
purpose register which specifies the point where the count should
wrap back to 0.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
c1bd78cb06 target/arm: Implement MVE VMULL (polynomial)
Implement the MVE VMULL (polynomial) insn.  Unlike Neon, this comes
in two flavours: 8x8->16 and a 16x16->32.  Also unlike Neon, the
inputs are in either the low or the high half of each double-width
element.

The assembler for this insn indicates the size with "P8" or "P16",
encoded into bit 28 as size = 0 or 1. We choose to follow the
same encoding as VQDMULL and decode this into a->size as MO_16
or MO_32 indicating the size of the result elements. This then
carries through to the helper function names where it then
matches up with the existing pmull_h() which does an 8x8->16
operation and a new pmull_w() which does the 16x16->32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
41704cc262 target/arm: Fix VLDRB/H/W for predicated elements
For vector loads, predicated elements are zeroed, instead of
retaining their previous values (as happens for most data
processing operations). This means we need to distinguish
"beat not executed due to ECI" (don't touch destination
element) from "beat executed but predicated out" (zero
destination element).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
e3152d02da target/arm: Fix VPT advance when ECI is non-zero
We were not paying attention to the ECI state when advancing the VPT
state.  Architecturally, VPT state advance happens for every beat
(see the pseudocode VPTAdvance()), so on every beat the 4 bits of
VPR.P0 corresponding to the current beat are inverted if required,
and at the end of beats 1 and 3 the VPR MASK fields are updated.
This means that if the ECI state says we should not be executing all
4 beats then we need to skip some of the updating of the VPR that we
currently do in mve_advance_vpt().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
e0d40070e1 target/arm: Factor out mve_eci_mask()
In some situations we need a mask telling us which parts of the
vector correspond to beats that are not being executed because of
ECI, separately from the combined "which bytes are predicated away"
mask.  Factor this mask calculation out of mve_element_mask() into
its own function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
3f4f1880c2 target/arm: Fix calculation of LTP mask when LR is 0
In mve_element_mask(), we calculate a mask for tail predication which
should have a number of 1 bits based on the value of LR.  However,
our MAKE_64BIT_MASK() macro has undefined behaviour when passed a
zero length.  Special case this to give the all-zeroes mask we
require.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
fdcf2269c4 target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
We got an edge case wrong in the 48-bit SQRSHRL implementation: if
the shift is to the right, although it always makes the result
smaller than the input value it might not be within the 48-bit range
the result is supposed to be if the input had some bits in [63..48]
set and the shift didn't bring all of those within the [47..0] range.

Handle this similarly to the way we already do for this case in
do_uqrshl48_d(): extend the calculated result from 48 bits,
and return that if not saturating or if it doesn't change the
result; otherwise fall through to return a saturated value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
95351aa76c target/arm: Fix 48-bit saturating shifts
In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge
cases wrong and failed to saturate correctly:

(1) In do_sqrshl48_d() we used the same code that do_shrshl_bhs()
does to obtain the saturated most-negative and most-positive 48-bit
signed values for the large-shift-left case.  This gives (1 << 47)
for saturate-to-most-negative, but we weren't sign-extending this
value to the 64-bit output as the pseudocode requires.

(2) For left shifts by less than 48, we copied the "8/16 bit" code
from do_sqrshl_bhs() and do_uqrshl_bhs().  This doesn't do the right
thing because it assumes the C type we're working with is at least
twice the number of bits we're saturating to (so that a shift left by
bits-1 can't shift anything off the top of the value).  This isn't
true for bits == 48, so we would incorrectly return 0 rather than the
most-positive value for situations like "shift (1 << 44) right by
20".  Instead check for saturation by doing the shift and signextend
and then testing whether shifting back left again gives the original
value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
a5e59e8dcb target/arm: Fix mask handling for MVE narrowing operations
In the MVE helpers for the narrowing operations (DO_VSHRN and
DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for
the 'top' versions of the insn.  This is because the loop works over
the double-sized input elements and shifts the predicate mask by that
many bits each time, but when we write out the half-sized output we
must look at the mask bits for whichever half of the element we are
writing to.

Correct this by shifting the whole mask right by ESIZE bits for the
'top' insns.  This allows us also to simplify the saturation bit
checking (where we had noticed that we needed to look at a different
mask bit for the 'top' insn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
ed5a59d61f target/arm: Fix signed VADDV
A cut-and-paste error meant we handled signed VADDV like
unsigned VADDV; fix the type used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
c88ff88498 target/arm: Fix MVE VSLI by 0 and VSRI by <dt>
In the MVE shift-and-insert insns, we special case VSLI by 0
and VSRI by <dt>. VSRI by <dt> means "don't update the destination",
which is what we've implemented. However VSLI by 0 is "set
destination to the input", so we don't want to use the same
special-casing that we do for VSRI by <dt>.

Since the generic logic gives the right answer for a shift
by 0, just use that.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
aa29190826 target/arm: Print MVE VPR in CPU dumps
Include the MVE VPR register value in the CPU dumps produced by
arm_cpu_dump_state() if we are printing FPU information. This
makes it easier to interpret debug logs when predication is
active.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
9dacf0764b target/arm: Note that we handle VMOVL as a special case of VSHLL
Although the architecture doesn't define it as an alias, VMOVL
(vector move long) is encoded as a VSHLL with a zero shift.
Add a comment in the decode file noting that we handle VMOVL
as part of VSHLL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Richard Henderson
b3d52804c5 target/arm: Add sve-default-vector-length cpu property
Mirror the behavour of /proc/sys/abi/sve_default_vector_length
under the real linux kernel.  We have no way of passing along
a real default across exec like the kernel can, but this is a
decent way of adjusting the startup vector length of a process.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/482
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210723203344.968563-4-richard.henderson@linaro.org
[PMM: tweaked docs formatting, document -1 special-case,
 added fixup patch from RTH mentioning QEMU's maximum veclen.]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-27 10:57:40 +01:00
Richard Henderson
ce440581c1 target/arm: Export aarch64_sve_zcr_get_valid_len
Rename from sve_zcr_get_valid_len and make accessible
from outside of helper.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210723203344.968563-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-27 10:57:40 +01:00
Richard Henderson
dc0bc8e785 target/arm: Correctly bound length in sve_zcr_get_valid_len
Currently, our only caller is sve_zcr_len_for_el, which has
already masked the length extracted from ZCR_ELx, so the
masking done here is a nop.  But we will shortly have uses
from other locations, where the length will be unmasked.

Saturate the length to ARM_MAX_VQ instead of truncating to
the low 4 bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210723203344.968563-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-27 10:57:40 +01:00