qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Paul Brook	d1da229ff1	i386: pcmpestr 64-bit sign extension bug The abs1 function in ops_sse.h only works sorrectly when the result fits in a signed int. This is fine most of the time because we're only dealing with byte sized values. However pcmp_elen helper function uses abs1 to calculate the absolute value of a cpu register. This incorrectly truncates to 32 bits, and will give the wrong anser for the most negative value. Fix by open coding the saturation check before taking the absolute value. Signed-off-by: Paul Brook <paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-28 08:51:56 +02:00
Paolo Bonzini	d22697dde0	target/i386: do not access beyond the low 128 bits of SSE registers The i386 target consolidates all vector registers so that instead of XMMReg, YMMReg and ZMMReg structs there is a single ZMMReg that can fit all of SSE, AVX and AVX512. When TCG copies data from and to the SSE registers, it uses the full 64-byte width. This is not a correctness issue because TCG never lets guest code see beyond the first 128 bits of the ZMM registers, however it causes uninitialized stack memory to make it to the CPU's migration stream. Fix it by only copying the low 16 bytes of the ZMMReg union into the destination register. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-04-13 18:59:52 +02:00
Chetan Pant	d9ff33ada7	x86 tcg cpus: Fix Lesser GPL version number There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Message-Id: <20201023122801.19514-1-chetan4windows@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2020-11-15 16:41:42 +01:00
Joseph Myers	418b0f93d1	target/i386: fix IEEE SSE floating-point exception raising The SSE instruction implementations all fail to raise the expected IEEE floating-point exceptions because they do nothing to convert the exception state from the softfloat machinery into the exception flags in MXCSR. Fix this by adding such conversions. Unlike for x87, emulated SSE floating-point operations might be optimized using hardware floating point on the host, and so a different approach is taken that is compatible with such optimizations. The required invariant is that all exceptions set in env->sse_status (other than "denormal operand", for which the SSE semantics are different from those in the softfloat code) are ones that are set in the MXCSR; the emulated MXCSR is updated lazily when code reads MXCSR, while when code sets MXCSR, the exceptions in env->sse_status are set accordingly. A few instructions do not raise all the exceptions that would be raised by the softfloat code, and those instructions are made to save and restore the softfloat exception state accordingly. Nothing is done about "denormal operand"; setting that (only for the case when input denormals are not flushed to zero, the opposite of the logic in the softfloat code for such an exception) will require custom code for relevant instructions, or else architecture-specific conditionals in the softfloat code for when to set such an exception together with custom code for various SSE conversion and rounding instructions that do not set that exception. Nothing is done about trapping exceptions (for which there is minimal and largely broken support in QEMU's emulation in the x87 case and no support at all in the SSE case). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2006252358000.3832@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 18:02:17 -04:00
Joseph Myers	bc921b2711	target/i386: correct fix for pcmpxstrx substring search This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri / pcmpestrm / pcmpistri / pcmpistrm substring search, commit `ae35eea7e4`. That commit fixed a bug that showed up in four GCC tests with one libc implementation. The tests in question generate random inputs to the intrinsics and compare results to a C implementation, but they only test 1024 possible random inputs, and when the tests use the cases of those instructions that work with word rather than byte inputs, it's easy to have problematic cases that show up much less frequently than that. Thus, testing with a different libc implementation, and so a different random number generator, showed up a problem with the previous patch. When investigating the previous test failures, I found the description of these instructions in the Intel manuals (starting from computing a 16x16 or 8x8 set of comparison results) confusing and hard to match up with the more optimized implementation in QEMU, and referred to AMD manuals which described the instructions in a different way. Those AMD descriptions are very explicit that the whole of the string being searched for must be found in the other operand, not running off the end of that operand; they say "If the prototype and the SUT are equal in length, the two strings must be identical for the comparison to be TRUE.". However, that statement is incorrect. In my previous commit message, I noted: The operation in this case is a search for a string (argument d to the helper) in another string (argument s to the helper); if a copy of d at a particular position would run off the end of s, the resulting output bit should be 0 whether or not the strings match in the region where they overlap, but the QEMU implementation was wrongly comparing only up to the point where s ends and counting it as a match if an initial segment of d matched a terminal segment of s. Here, "run off the end of s" means that some byte of d would overlap some byte outside of s; thus, if d has zero length, it is considered to match everywhere, including after the end of s. The description "some byte of d would overlap some byte outside of s" is accurate only when understood to refer to overlapping some byte within the 16-byte operand but at or after the zero terminator; it is valid to run over the end of s if the end of s is the end of the 16-byte operand. So the fix in the previous patch for the case of d being empty was correct, but the other part of that patch was not correct (as it never allowed partial matches even at the end of the 16-byte operand). Nor was the code before the previous patch correct for the case of d nonempty, as it would always have allowed partial matches at the end of s. Fix with a partial revert of my previous change, combined with inserting a check for the special case of s having maximum length to determine where it is necessary to check for matches. In the added test, test 1 is for the case of empty strings, which failed before my 2017 patch, test 2 is for the bug introduced by my 2017 patch and test 3 deals with the case where a match of an initial segment at the end of the string is not valid when the string ends before the end of the 16-byte operand (that is, the case that would be broken by a simple revert of the non-empty-string part of my 2017 patch). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2006121344290.9881@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-12 11:10:39 -04:00
Janne Grunau	2dfbea1a87	target/i386: fix phadd* with identical destination and source register Detected by asm test suite failures in dav1d (https://code.videolan.org/videolan/dav1d). Can be reproduced by `qemu-x86_64 -cpu core2duo ./tests/checkasm --test=mc_8bpc 1659890620`. Signed-off-by: Janne Grunau <j@jannau.net> Message-Id: <20200401225253.30745-1-j@jannau.net> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-10 12:09:42 -04:00
Richard Henderson	71bfd65c5f	softfloat: Name compare relation enum Give the previously unnamed enum a typedef name. Use it in the prototypes of compare functions. Use it to hold the results of the compare functions. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2020-05-19 08:41:45 -07:00
Peter Maydell	1e8a98b538	target/i386: Return 'indefinite integer value' for invalid SSE fp->int conversions The x86 architecture requires that all conversions from floating point to integer which raise the 'invalid' exception (infinities of both signs, NaN, and all values which don't fit in the destination integer) return what the x86 spec calls the "indefinite integer value", which is 0x8000_0000 for 32-bits or 0x8000_0000_0000_0000 for 64-bits. The softfloat functions return the more usual behaviour of positive overflows returning the maximum value that fits in the destination integer format and negative overflows returning the minimum value that fits. Wrap the softfloat functions in x86-specific versions which detect the 'invalid' condition and return the indefinite integer. Note that we don't use these wrappers for the 3DNow! pf2id and pf2iw instructions, which do return the minimum value that fits in an int32 if the input float is a large negative number. Fixes: https://bugs.launchpad.net/qemu/+bug/1815423 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20190805180332.10185-1-peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-08-20 17:26:20 +02:00
Joseph Myers	aa406feadf	target/i386: fix phminposuw in-place operation The SSE4.1 phminposuw instruction finds the minimum 16-bit element in the source vector, putting the value of that element in the low 16 bits of the destination vector, the index of that element in the next three bits and zeroing the rest of the destination. The helper for this operation fills the destination from high to low, meaning that when the source and destination are the same register, the minimum source element can be overwritten before it is copied to the destination. This patch fixes it to fill the destination from low to high instead, so the minimum source element is always copied first. This fixes one gcc test failure in my GCC 6-based testing (and so concludes the present sequence of patches, as I don't have any further gcc test failures left in that testing that I attribute to QEMU bugs). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708111422580.11919@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:11 +02:00
Joseph Myers	ae35eea7e4	target/i386: fix pcmpxstrx substring search One of the cases of the SSE4.2 pcmpestri / pcmpestrm / pcmpistri / pcmpistrm instructions does a substring search. The implementation of this case in the pcmpxstrx helper is incorrect. The operation in this case is a search for a string (argument d to the helper) in another string (argument s to the helper); if a copy of d at a particular position would run off the end of s, the resulting output bit should be 0 whether or not the strings match in the region where they overlap, but the QEMU implementation was wrongly comparing only up to the point where s ends and counting it as a match if an initial segment of d matched a terminal segment of s. Here, "run off the end of s" means that some byte of d would overlap some byte outside of s; thus, if d has zero length, it is considered to match everywhere, including after the end of s. This patch fixes the implementation to correspond with the proper instruction semantics. This fixes four gcc test failures in my GCC 6-based testing. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708102139310.8101@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:10 +02:00
Joseph Myers	80e1960621	target/i386: fix packusdw in-place operation The SSE4.1 packusdw instruction combines source and destination vectors of signed 32-bit integers into a single vector of unsigned 16-bit integers, with unsigned saturation. When the source and destination are the same register, this means each 32-bit element of that register is used twice as an input, to produce two of the 16-bit output elements, and so if the operation is carried out element-by-element in-place, no matter what the order in which it is applied to the elements, the first element's operation will overwrite some future input. The helper for packssdw avoids this issue by computing the result in a local temporary and copying it to the destination at the end; this patch fixes the packusdw helper to do likewise. This fixes three gcc test failures in my GCC 6-based testing. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708100023050.9262@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:10 +02:00
Joseph Myers	c6a56c8e99	target/i386: fix pmovsx/pmovzx in-place operations The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte or 4-byte inputs and sign-extend or zero-extend them to a wider vector output. The associated helpers for these instructions do the extension on each element in turn, starting with the lowest. If the input and output are the same register, this means that all the input elements after the first have been overwritten before they are read. This patch makes the helpers extend starting with the highest element, not the lowest, to avoid such overwriting. This fixes many GCC test failures (161 in the gcc testsuite in my GCC 6-based testing) when testing with a default CPU setting enabling those instructions. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708082018390.23380@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:10 +02:00
Richard Henderson	4885c3c495	target-i386: Use ctpop helper Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:49:59 -08:00
Thomas Huth	fcf5ef2ab5	Move target-* CPU file into a target/ folder We've currently got 18 architectures in QEMU, and thus 18 target-xxx folders in the root folder of the QEMU source tree. More architectures (e.g. RISC-V, AVR) are likely to be included soon, too, so the main folder of the QEMU sources slowly gets quite overcrowded with the target-xxx folders. To disburden the main folder a little bit, let's move the target-xxx folders into a dedicated target/ folder, so that target-xxx/ simply becomes target/xxx/ instead. Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part] Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part] Acked-by: Michael Walle <michael@walle.cc> [lm32 part] Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part] Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part] Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part] Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part] Acked-by: Richard Henderson <rth@twiddle.net> [alpha part] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part] Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part] Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [cris&microblaze part] Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part] Signed-off-by: Thomas Huth <thuth@redhat.com>	2016-12-20 21:52:12 +01:00

14 Commits