The only issue with FMA instructions is that there are _a lot_ of them (30
opcodes, each of which comes in up to 4 versions depending on VEX.W and
VEX.L; a total of 96 possibilities). However, they can be implement with
only 6 helpers, two for scalar operations and four for packed operations.
(Scalar versions do not do any merging; they only affect the bottom 32
or 64 bits of the output operand. Therefore, there is no separate XMM
and YMM of the scalar helpers).
First, we can reduce the number of helpers to one third by passing four
operands (one output and three inputs); the reordering of which operands
go to the multiply and which go to the add is done in emit.c.
Second, the different instructions also dispatch to the same softfloat
function, so the flags for float32_muladd and float64_muladd are passed
in the helper as int arguments, with a little extra complication to
handle FMADDSUB and FMSUBADD.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
F16C only consists of two instructions, which are a bit peculiar
nevertheless.
First, they access only the low half of an YMM or XMM register for the
packed-half operand; the exact size still depends on the VEX.L flag.
This is similar to the existing avx_movx flag, but not exactly because
avx_movx is hardcoded to affect operand 2. To this end I added a "ph"
format name; it's possible to reuse this approach for the VPMOVSX and
VPMOVZX instructions, though that would also require adding two more
formats for the low-quarter and low-eighth of an operand.
Second, VCVTPS2PH is somewhat weird because it *stores* the result of
the instruction into memory rather than loading it.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Extracted from a patch by Paul Brook <paul@nowt.org>.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Extend the support to memory operands, and skip MMX instructions that
were introduced in SSE times, because they are now covered in test-mmx.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Adjust the test-avx.py generator to produce tests specifically for
MMX and 3DNow. Using a separate generator introduces some code
duplication, but is a simpler approach because of test-avx's extra
complexity to support 3- and 4-operand AVX instructions.
If needed, a common library can be introduced later.
While at it, for consistency move all the -cpu max rules to the
same place.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tests for correct operation of most x86-64 SSE instructions.
It should cover all combinations of overlapping register and memory
operands on a set of random-ish data.
Results are bit-identical to an Intel i5-8500, with the exception of
the RCPSS and RSQRT approximations where the real CPU gives less accurate
results (the Intel spec allows relative errors up to 1.5 * 2^-12)
Signed-off-by: Paul Brook <paul@nowt.org>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220424220204.2493824-42-paul@nowt.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>