Following the implementation of tcg_gen_gvec_3i, add a four-vector and
immediate operand expansion method.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220225210936.1749575-34-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The Arm MVE VDUP implementation would like to be able to emit code to
duplicate a byte or halfword value into an i32. We have code to do
this already in tcg-op-gvec.c, so all we need to do is make the
functions global.
For consistency with other functions made available to the frontends:
* we rename to tcg_gen_dup_*
* we expose both the _i32 and _i64 forms
* we provide the #define for a _tl form
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210617121628.20116-10-peter.maydell@linaro.org
In f47db80cc0, we handled odd-sized tail clearing for
the case of hosts that have vector operations, but did
not handle the case of hosts that do not have vector ops.
This was ok until e2e7168a21, which changed the encoding
of simd_desc such that the odd sizes are impossible.
Add memset as a tcg helper, and use that for all out-of-line
byte stores to vectors. This includes, but is not limited to,
the tail clearing operation in question.
Cc: qemu-stable@nongnu.org
Buglink: https://bugs.launchpad.net/bugs/1907817
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
With larger vector sizes, it turns out oprsz == maxsz, and we only
need to represent mismatch for oprsz <= 32. We do, however, need
to represent larger oprsz and do so without reducing SIMD_DATA_BITS.
Reduce the size of the oprsz field and increase the maxsz field.
Steal the oprsz value of 24 to indicate equality with maxsz.
Tested-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We already support duplication of 128-bit blocks. This extends
that support to 256-bit blocks. This will be needed by SVE2.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Do not store back to the exact memory from which we just loaded.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The fallback inline expansion for vectorized absolute value,
when the host doesn't support such an insn was flawed.
E.g. when a vector of bytes has all elements negative, mask
will be 0xffff_ffff_ffff_ffff. Subtracting mask only adds 1
to the low element instead of all elements becase -mask is 1
and not 0x0101_0101_0101_0101.
Signed-off-by: Stephen Long <steplong@quicinc.com>
Message-Id: <20200813161818.190-1-steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
No host backend support yet, but the interfaces for rotls
are in place. Only implement left-rotate for now, as the
only known use of vector rotate by scalar is s390x, so any
right-rotate would be unused and untestable.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
No host backend support yet, but the interfaces for rotlv
and rotrv are in place.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Drop the generic expansion from rot to shift; we can do better
for each backend, and then this code becomes unused.
No host backend support yet, but the interfaces for rotli
are in place. Canonicalize immediate rotate to the left,
based on a survey of architectures, but provide both left
and right shift interfaces to the translators.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We have this same parameter for GVecGen2i, GVecGen3,
and GVecGen3i. This will make some SVE2 insns easier
to parameterize.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Better handling of non-power-of-2 tails as seen with Arm 8-byte
vector operations.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These interfaces are now unused.
Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Replace the outgoing interface.
Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add a version of tcg_gen_dup_* that takes both immediate and
a vector element size operand. This will replace the set of
tcg_gen_gvec_dup{8,16,32,64}i functions that encode the element
size within the function name.
Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Extend the vector generator infrastructure to handle
5 vector arguments.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Taylor Simpson <tsimpson@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We currently search both the root and the tcg/ directories for tcg
files:
$ git grep '#include "tcg/' | wc -l
28
$ git grep '#include "tcg[^/]' | wc -l
94
To simplify the preprocessor search path, unify by expliciting the
tcg/ directory.
Patch created mechanically by running:
$ for x in \
tcg.h tcg-mo.h tcg-op.h tcg-opc.h \
tcg-op-gvec.h tcg-gvec-desc.h; do \
sed -i "s,#include \"$x\",#include \"tcg/$x\"," \
$(git grep -l "#include \"$x\""); \
done
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200101112303.20724-2-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h). It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.
Include qemu/main-loop.h only where it's needed. Touching it now
recompiles only some 1700 objects. For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the
others, they shrink only slightly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units. Include gvec expanders.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The paths through tcg_gen_dup_mem_vec and through MO_128 were
missing the check_size_align. The path through MO_128 was also
missing the expand_clr. This last was not visible because the
only user is ARM SVE, which would set oprsz == maxsz, and not
require the clear.
Fix by adding the check_size_align and using do_dup directly
instead of duplicating the check in tcg_gen_gvec_dup_{i32,i64}.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Allow expansion either via shift by scalar or by replicating
the scalar for shift by vector.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Use a private structure for do_gvec_shifts.
The gvec expanders perform a modulo on the shift count. If the target
requires alternate behaviour, then it cannot use the generic gvec
expanders anyway, and will have to have its own custom code.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first. This is especially important
if integer/vector register moves do not exist.
Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:
VECE == 32/64:
Load the value into a vector register, then dup.
Both of these must work.
VECE == 8/16:
If the value happens to be at an offset such that an aligned
load would place the desired value in the least significant
end of the register, go ahead and load w/garbage in high bits.
Load the value w/INDEX_op_ld{8,16}_i32.
Attempt a move directly to vector reg, which may fail.
Store the value into the backing store for OTS.
Load the value into the vector reg w/TCG_TYPE_I32, which must work.
Duplicate from the vector reg into itself, which must work.
All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Replace the single opcode in .opc with a null-terminated
array in .opt_opc. We still require that all opcodes be
used with the same .vece.
Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion. Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.
Convert all existing vector aware front ends.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however
without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed
for now.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190416185301.25344-2-david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
It's either "GNU *Library* General Public version 2" or "GNU Lesser
General Public version *2.1*", but there was no "version 2.0" of the
"Lesser" library. So assume that version 2.1 is meant here.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
We handle many of these during integer expansion, and the
rest of them during integer optimization.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Normally this is automatic in the size restrictions that are placed
on vector sizes coming from the implementation. However, for the
legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final
24 bytes of the vector register. Without this check, do_dup selects
TCG_TYPE_V128 and clears only 16 bytes.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180705191929.30773-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This unifies 5 copies of checks for supported vector size,
and in the process fixes a missing check in tcg_gen_gvec_2s.
This lead to an assertion failure for 64-bit vector multiply,
which is not available in the AVX instruction set.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use dup to convert a non-constant scalar to a third vector.
Add addition, multiplication, and logical operations with an immediate.
Add addition, subtraction, multiplication, and logical operations with
a non-constant scalar. Allow for the front-end to build operations in
which the scalar operand comes first.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
No vector ops as yet. SSE only has direct support for 8- and 16-bit
saturation; handling 32- and 64-bit saturation is much more expensive.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Opcodes are added for scalar and vector shifts, but considering the
varied semantics of these do not expose them to the front ends. Do
go ahead and provide them in case they are needed for backend expansion.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>