qemu-e2k

Author	SHA1	Message	Date
Paolo Bonzini	e582b629f0	target/i386: implement SHA instructions The implementation was validated with OpenSSL and with the test vectors in https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/sha.rs. The instructions provide a ~25% improvement on hashing a 64 MiB file: runtime goes down from 1.8 seconds to 1.4 seconds; instruction count on the host goes down from 5.8 billion to 4.8 billion with slightly better IPC too. Good job Intel. ;) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:07 +02:00
Richard Henderson	7fcb505455	target/i386: Use clmul_64 Use generic routine for 64-bit carry-less multiply. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-09-15 13:57:00 +00:00
Richard Henderson	44a0c4a8cc	target/i386: Use aesdec_ISB_ISR_IMC_AK This implements the AESDEC instruction. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-07-08 07:30:18 +01:00
Richard Henderson	03cf414ec3	target/i386: Use aesenc_SB_SR_MC_AK This implements the AESENC instruction. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-07-08 07:30:18 +01:00
Richard Henderson	5f40edb71e	target/i386: Use aesdec_IMC This implements the AESIMC instruction. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-07-08 07:30:18 +01:00
Richard Henderson	00b5c7bde9	target/i386: Use aesdec_ISB_ISR_AK This implements the AESDECLAST instruction. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-07-08 07:30:18 +01:00
Richard Henderson	cc648f5024	target/i386: Use aesenc_SB_SR_AK This implements the AESENCLAST instruction. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-07-08 07:30:17 +01:00
Xinyu Li	056d649007	target/i386: fix avx2 instructions vzeroall and vpermdq vzeroall: xmm_regs should be used instead of xmm_t0 vpermdq: bit 3 and 7 of imm should be considered Signed-off-by: Xinyu Li <lixinyu20s@ict.ac.cn> Message-Id: <20230510145222.586487-1-lixinyu20s@ict.ac.cn> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:50 +02:00
Eric Auger	c0a6665c3c	target/i386: Remove compilation errors when -Werror=maybe-uninitialized To avoid compilation errors when -Werror=maybe-uninitialized is used, add a default case with g_assert_not_reached(). Otherwise with GCC 11.3.1 "cc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2)" we get: ../target/i386/ops_sse.h: In function ‘helper_vpermdq_ymm’: ../target/i386/ops_sse.h:2495:13: error: ‘r3’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 2495 \| d->Q(3) = r3; \| ~~~~~~~~^~~~ ../target/i386/ops_sse.h:2494:13: error: ‘r2’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 2494 \| d->Q(2) = r2; \| ~~~~~~~~^~~~ ../target/i386/ops_sse.h:2493:13: error: ‘r1’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 2493 \| d->Q(1) = r1; \| ~~~~~~~~^~~~ ../target/i386/ops_sse.h:2492:13: error: ‘r0’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 2492 \| d->Q(0) = r0; \| ~~~~~~~~^~~~ Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20221222140158.1260748-1-eric.auger@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-01-11 09:59:39 +01:00
Paolo Bonzini	2872b0f390	target/i386: implement FMA instructions The only issue with FMA instructions is that there are _a lot_ of them (30 opcodes, each of which comes in up to 4 versions depending on VEX.W and VEX.L; a total of 96 possibilities). However, they can be implement with only 6 helpers, two for scalar operations and four for packed operations. (Scalar versions do not do any merging; they only affect the bottom 32 or 64 bits of the output operand. Therefore, there is no separate XMM and YMM of the scalar helpers). First, we can reduce the number of helpers to one third by passing four operands (one output and three inputs); the reordering of which operands go to the multiply and which go to the add is done in emit.c. Second, the different instructions also dispatch to the same softfloat function, so the flags for float32_muladd and float64_muladd are passed in the helper as int arguments, with a little extra complication to handle FMADDSUB and FMSUBADD. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-22 09:05:54 +02:00
Paolo Bonzini	cf5ec6641e	target/i386: implement F16C instructions F16C only consists of two instructions, which are a bit peculiar nevertheless. First, they access only the low half of an YMM or XMM register for the packed-half operand; the exact size still depends on the VEX.L flag. This is similar to the existing avx_movx flag, but not exactly because avx_movx is hardcoded to affect operand 2. To this end I added a "ph" format name; it's possible to reuse this approach for the VPMOVSX and VPMOVZX instructions, though that would also require adding two more formats for the low-quarter and low-eighth of an operand. Second, VCVTPS2PH is somewhat weird because it stores the result of the instruction into memory rather than loading it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-20 15:16:18 +02:00
Paolo Bonzini	314d3eff66	target/i386: introduce function to set rounding mode from FPCW or MXCSR bits VROUND, FSTCW and STMXCSR all have to perform the same conversion from x86 rounding modes to softfloat constants. Since the ISA is consistent on the meaning of the two-bit rounding modes, extract the common code into a wrapper for set_float_rounding_mode. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-20 15:16:13 +02:00
Paolo Bonzini	653fad2497	target/i386: remove old SSE decoder With all SSE (and AVX!) instructions now implemented in disas_insn_new, it's possible to remove gen_sse, as well as the helpers for instructions that now use gvec. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	7170a17ec3	target/i386: reimplement 0x0f 0x10-0x17, add AVX These are mostly moves, and yet are a total pain. The main issue is that: 1) some instructions are selected by mod==11 (register operand) vs. mod=00/01/10 (memory operand) 2) stores to memory are two-operand operations, while the 3-register and load-from-memory versions operate on the entire contents of the destination; this makes it easier to separate the gen_* function for the store case 3) it's inefficient to load into xmm_T0 only to move the value out again, so the gen_* function for the load case is separated too The manual also has various mistakes in the operands here, for example the store case of MOVHPS operates on a 128-bit source (albeit discarding the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq. Likewise for the destination and source of MOVHLPS. VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ, but encoded as prefixes rather than separate operands. The helpers can be reused however. For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as helpers. I named the helper for MOVDDUP "movdldup" in preparation for possible future introduction of MOVDHDUP and to clarify the similarity with MOVSLDUP. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	16fc5726a6	target/i386: reimplement 0x0f 0x38, add AVX There are several special cases here: 1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov. 2) some instructions, such as variable-width shifts, select the vector element size via REX.W. 3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands. 3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order. X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers that accept a Reg* but have a M argument. These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	7906847768	target/i386: reimplement 0x0f 0x3a, add AVX The more complicated operations here are insertions and extractions. Otherwise, there are just more entries than usual because the PS/PD/SS/SD variations are encoded in the opcode rater than in the prefixes. These three-byte opcodes also include AVX new instructions, whose implementation in the helpers was originally done by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	a64eee3ab4	target/i386: clarify (un)signedness of immediates from 0F3Ah opcodes Three-byte opcodes from the 0F3Ah area all have an immediate byte which is usually unsigned. Clarify in the helper code that it is unsigned; the new decoder treats immediates as signed by default, and seeing an intN_t in the prototype might give the wrong impression that one can use decode->immediate directly. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	b98f886c8f	target/i386: Introduce 256-bit vector helpers The new implementation of SSE will cover AVX from the get go, because all the work for the helper functions is already done. We just need to build them. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	6e0cac782a	target/i386: implement additional AVX comparison operators The new implementation of SSE will cover AVX from the get go, so include the 24 extra comparison operators that are only available with the VEX prefix. Based on a patch by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	620f75566a	target/i386: provide 3-operand versions of unary scalar helpers Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the old SSE decoder to be compatible with new-style helpers. The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	1de9e7e612	target/i386: support operand merging in binary scalar helpers Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the helpers to provide this functionality. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	f05f9789f5	target/i386: extend helpers to support VEX.V 3- and 4- operand encodings Add to the helpers all the operands that are needed to implement AVX. Extracted from a patch by Paul Brook <paul@nowt.org>. Message-Id: <20220424220204.2493824-26-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	ca4b1b43bc	target/i386: fix INSERTQ implementation INSERTQ is defined to not modify any bits in the lower 64 bits of the destination, other than the ones being replaced with bits from the source operand. QEMU instead is using unshifted bits from the source for those bits. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-19 15:16:00 +02:00
Paolo Bonzini	034668c329	target/i386: correctly mask SSE4a bit indices in register operands SSE4a instructions EXTRQ and INSERTQ have two bit index operands, that can be immediates or taken from an XMM register. In both cases, the fields are 6-bit wide and the top two bits in the byte are ignored. translate.c is doing that correctly for the immediate case, but not for the XMM case, so fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-19 15:16:00 +02:00
Paul Brook	a64fc26919	target/i386: AVX+AES helpers prep Make the AES vector helpers AVX ready No functional changes to existing helpers Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-22-paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	5a09df21f7	target/i386: AVX pclmulqdq prep Make the pclmulqdq helper AVX ready Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-21-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	0e29cea589	target/i386: Rewrite blendv helpers Rewrite the blendv helpers so that they can easily be extended to support the AVX encodings, which make all 4 arguments explicit. No functional changes to the existing helpers Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-20-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	fd17264ad1	target/i386: Misc AVX helper prep Fixup various vector helpers that either trivially exten to 256 bit, or don't have 256 bit variants. No functional changes to existing helpers Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-19-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	6567ffb4f2	target/i386: Destructive FP helpers for AVX Perpare the horizontal atithmetic vector helpers for AVX These currently use a dummy Reg typed variable to store the result then assign the whole register. This will cause 128 bit operations to corrupt the upper half of the register, so replace it with explicit temporaries and element assignments. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-18-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	6f218d6e99	target/i386: Dot product AVX helper prep Make the dpps and dppd helpers AVX-ready I can't see any obvious reason why dppd shouldn't work on 256 bit ymm registers, but both AMD and Intel agree that it's xmm only. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-17-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	cbf4ad5498	target/i386: reimplement AVX comparison helpers AVX includes an additional set of comparison predicates, some of which our softfloat implementation does not expose as separate functions. Rewrite the helpers in terms of floatN_compare for future extensibility. Signed-off-by: Paul Brook <paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220424220204.2493824-24-paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	3403cafeee	target/i386: Floating point arithmetic helper AVX prep Prepare the "easy" floating point vector helpers for AVX No functional changes to existing helpers. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-16-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	d45b0de63d	target/i386: Destructive vector helpers for AVX These helpers need to take special care to avoid overwriting source values before the wole result has been calculated. Currently they use a dummy Reg typed variable to store the result then assign the whole register. This will cause 128 bit operations to corrupt the upper half of the register, so replace it with explicit temporaries and element assignments. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-14-paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	e894bae8cb	target/i386: Misc integer AVX helper prep More preparatory work for AVX support in various integer vector helpers No functional changes to existing helpers. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-13-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	ee04a3c86d	target/i386: Rewrite simple integer vector helpers Rewrite the "simple" vector integer helpers in preperation for AVX support. While the current code is able to use the same prototype for unary (a = F(b)) and binary (a = F(b, c)) operations, future changes will cause them to diverge. No functional changes to existing helpers Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-12-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paul Brook	18592d2ec2	target/i386: Rewrite vector shift helper Rewrite the vector shift helpers in preperation for AVX support (3 operand form and 256 bit vectors). For now keep the existing two operand interface. No functional changes to existing helpers. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-11-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paolo Bonzini	25bdec79c6	target/i386: rewrite destructive 3DNow operations Remove use of the MOVE macro, since it will be purged from MMX/SSE as well. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paolo Bonzini	ce4fa29f94	target/i386: Add size suffix to vector FP helpers For AVX we're going to need both 128 bit (xmm) and 256 bit (ymm) variants of floating point helpers. Add the register type suffix to the existing PS and PD helpers (SS and SD variants are only valid on 128 bit vectors) No functional changes. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-15-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paolo Bonzini	bf30ad8cef	target/i386: DPPS rounding fix The DPPS (Dot Product) instruction is defined to first sum pairs of intermediate results, then sum those values to get the final result. i.e. (A+B)+(C+D) We incrementally sum the results, i.e. ((A+B)+C)+D, which can result in incorrect rouding. For consistency, also change the variable names to the ones used in the Intel SDM and implement DPPD following the manual. Based on a patch by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 08:37:04 +02:00
Paolo Bonzini	75046ad72e	target/i386: fix PHSUB* instructions with dest=src The computation must not overwrite neither the destination nor the source before the last element has been computed. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 08:37:04 +02:00
Paul Brook	d1da229ff1	i386: pcmpestr 64-bit sign extension bug The abs1 function in ops_sse.h only works sorrectly when the result fits in a signed int. This is fine most of the time because we're only dealing with byte sized values. However pcmp_elen helper function uses abs1 to calculate the absolute value of a cpu register. This incorrectly truncates to 32 bits, and will give the wrong anser for the most negative value. Fix by open coding the saturation check before taking the absolute value. Signed-off-by: Paul Brook <paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-28 08:51:56 +02:00
Paolo Bonzini	d22697dde0	target/i386: do not access beyond the low 128 bits of SSE registers The i386 target consolidates all vector registers so that instead of XMMReg, YMMReg and ZMMReg structs there is a single ZMMReg that can fit all of SSE, AVX and AVX512. When TCG copies data from and to the SSE registers, it uses the full 64-byte width. This is not a correctness issue because TCG never lets guest code see beyond the first 128 bits of the ZMM registers, however it causes uninitialized stack memory to make it to the CPU's migration stream. Fix it by only copying the low 16 bytes of the ZMMReg union into the destination register. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-13 18:59:52 +02:00
Chetan Pant	d9ff33ada7	x86 tcg cpus: Fix Lesser GPL version number There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Message-Id: <20201023122801.19514-1-chetan4windows@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2020-11-15 16:41:42 +01:00
Joseph Myers	418b0f93d1	target/i386: fix IEEE SSE floating-point exception raising The SSE instruction implementations all fail to raise the expected IEEE floating-point exceptions because they do nothing to convert the exception state from the softfloat machinery into the exception flags in MXCSR. Fix this by adding such conversions. Unlike for x87, emulated SSE floating-point operations might be optimized using hardware floating point on the host, and so a different approach is taken that is compatible with such optimizations. The required invariant is that all exceptions set in env->sse_status (other than "denormal operand", for which the SSE semantics are different from those in the softfloat code) are ones that are set in the MXCSR; the emulated MXCSR is updated lazily when code reads MXCSR, while when code sets MXCSR, the exceptions in env->sse_status are set accordingly. A few instructions do not raise all the exceptions that would be raised by the softfloat code, and those instructions are made to save and restore the softfloat exception state accordingly. Nothing is done about "denormal operand"; setting that (only for the case when input denormals are not flushed to zero, the opposite of the logic in the softfloat code for such an exception) will require custom code for relevant instructions, or else architecture-specific conditionals in the softfloat code for when to set such an exception together with custom code for various SSE conversion and rounding instructions that do not set that exception. Nothing is done about trapping exceptions (for which there is minimal and largely broken support in QEMU's emulation in the x87 case and no support at all in the SSE case). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2006252358000.3832@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 18:02:17 -04:00
Joseph Myers	bc921b2711	target/i386: correct fix for pcmpxstrx substring search This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri / pcmpestrm / pcmpistri / pcmpistrm substring search, commit `ae35eea7e4`. That commit fixed a bug that showed up in four GCC tests with one libc implementation. The tests in question generate random inputs to the intrinsics and compare results to a C implementation, but they only test 1024 possible random inputs, and when the tests use the cases of those instructions that work with word rather than byte inputs, it's easy to have problematic cases that show up much less frequently than that. Thus, testing with a different libc implementation, and so a different random number generator, showed up a problem with the previous patch. When investigating the previous test failures, I found the description of these instructions in the Intel manuals (starting from computing a 16x16 or 8x8 set of comparison results) confusing and hard to match up with the more optimized implementation in QEMU, and referred to AMD manuals which described the instructions in a different way. Those AMD descriptions are very explicit that the whole of the string being searched for must be found in the other operand, not running off the end of that operand; they say "If the prototype and the SUT are equal in length, the two strings must be identical for the comparison to be TRUE.". However, that statement is incorrect. In my previous commit message, I noted: The operation in this case is a search for a string (argument d to the helper) in another string (argument s to the helper); if a copy of d at a particular position would run off the end of s, the resulting output bit should be 0 whether or not the strings match in the region where they overlap, but the QEMU implementation was wrongly comparing only up to the point where s ends and counting it as a match if an initial segment of d matched a terminal segment of s. Here, "run off the end of s" means that some byte of d would overlap some byte outside of s; thus, if d has zero length, it is considered to match everywhere, including after the end of s. The description "some byte of d would overlap some byte outside of s" is accurate only when understood to refer to overlapping some byte within the 16-byte operand but at or after the zero terminator; it is valid to run over the end of s if the end of s is the end of the 16-byte operand. So the fix in the previous patch for the case of d being empty was correct, but the other part of that patch was not correct (as it never allowed partial matches even at the end of the 16-byte operand). Nor was the code before the previous patch correct for the case of d nonempty, as it would always have allowed partial matches at the end of s. Fix with a partial revert of my previous change, combined with inserting a check for the special case of s having maximum length to determine where it is necessary to check for matches. In the added test, test 1 is for the case of empty strings, which failed before my 2017 patch, test 2 is for the bug introduced by my 2017 patch and test 3 deals with the case where a match of an initial segment at the end of the string is not valid when the string ends before the end of the 16-byte operand (that is, the case that would be broken by a simple revert of the non-empty-string part of my 2017 patch). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2006121344290.9881@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-12 11:10:39 -04:00
Janne Grunau	2dfbea1a87	target/i386: fix phadd* with identical destination and source register Detected by asm test suite failures in dav1d (https://code.videolan.org/videolan/dav1d). Can be reproduced by `qemu-x86_64 -cpu core2duo ./tests/checkasm --test=mc_8bpc 1659890620`. Signed-off-by: Janne Grunau <j@jannau.net> Message-Id: <20200401225253.30745-1-j@jannau.net> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-10 12:09:42 -04:00
Richard Henderson	71bfd65c5f	softfloat: Name compare relation enum Give the previously unnamed enum a typedef name. Use it in the prototypes of compare functions. Use it to hold the results of the compare functions. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-05-19 08:41:45 -07:00
Peter Maydell	1e8a98b538	target/i386: Return 'indefinite integer value' for invalid SSE fp->int conversions The x86 architecture requires that all conversions from floating point to integer which raise the 'invalid' exception (infinities of both signs, NaN, and all values which don't fit in the destination integer) return what the x86 spec calls the "indefinite integer value", which is 0x8000_0000 for 32-bits or 0x8000_0000_0000_0000 for 64-bits. The softfloat functions return the more usual behaviour of positive overflows returning the maximum value that fits in the destination integer format and negative overflows returning the minimum value that fits. Wrap the softfloat functions in x86-specific versions which detect the 'invalid' condition and return the indefinite integer. Note that we don't use these wrappers for the 3DNow! pf2id and pf2iw instructions, which do return the minimum value that fits in an int32 if the input float is a large negative number. Fixes: https://bugs.launchpad.net/qemu/+bug/1815423 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20190805180332.10185-1-peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-08-20 17:26:20 +02:00
Joseph Myers	aa406feadf	target/i386: fix phminposuw in-place operation The SSE4.1 phminposuw instruction finds the minimum 16-bit element in the source vector, putting the value of that element in the low 16 bits of the destination vector, the index of that element in the next three bits and zeroing the rest of the destination. The helper for this operation fills the destination from high to low, meaning that when the source and destination are the same register, the minimum source element can be overwritten before it is copied to the destination. This patch fixes it to fill the destination from low to high instead, so the minimum source element is always copied first. This fixes one gcc test failure in my GCC 6-based testing (and so concludes the present sequence of patches, as I don't have any further gcc test failures left in that testing that I attribute to QEMU bugs). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708111422580.11919@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:11 +02:00
Joseph Myers	ae35eea7e4	target/i386: fix pcmpxstrx substring search One of the cases of the SSE4.2 pcmpestri / pcmpestrm / pcmpistri / pcmpistrm instructions does a substring search. The implementation of this case in the pcmpxstrx helper is incorrect. The operation in this case is a search for a string (argument d to the helper) in another string (argument s to the helper); if a copy of d at a particular position would run off the end of s, the resulting output bit should be 0 whether or not the strings match in the region where they overlap, but the QEMU implementation was wrongly comparing only up to the point where s ends and counting it as a match if an initial segment of d matched a terminal segment of s. Here, "run off the end of s" means that some byte of d would overlap some byte outside of s; thus, if d has zero length, it is considered to match everywhere, including after the end of s. This patch fixes the implementation to correspond with the proper instruction semantics. This fixes four gcc test failures in my GCC 6-based testing. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708102139310.8101@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:10 +02:00

1 2

54 Commits