Commit Graph

168 Commits

Author SHA1 Message Date
H.J. Lu 70bc83b910 Load pointers into RAX_LP in strcmp-sse42.S 2012-05-15 09:59:31 -07:00
H.J. Lu 9bc0b730a6 Load cache sizes into R*_LP in memcpy-ssse3.S 2012-05-15 09:58:28 -07:00
H.J. Lu 6d2850e7f5 Load cache sizes into R*_LP in memcpy-ssse3-back.S 2012-05-15 09:56:17 -07:00
H.J. Lu 8a17f34979 Load cache size into R8_LP 2012-05-15 09:35:43 -07:00
Paul Eggert 59ba27a63a Replace FSF snail mail address with URLs. 2012-02-09 23:18:22 +00:00
Ulrich Drepper 08cf777f9e Really fix AVX tests
There is no problem with strcmp, it doesn't use the YMM registers.
The math routines might since gcc perhaps generates such code.
Introduce bit_YMM_USBALE and use it in the math routines.
2012-01-26 09:45:54 -05:00
Ulrich Drepper afc5ed09cb Reset bit_AVX in __cpu_features is OS support is missing 2012-01-26 07:45:14 -05:00
Liubov Dmitrieva 15db4de19d Fix overrun in destination buffer 2011-12-23 12:02:15 -05:00
Ulrich Drepper 370a7d88f7 WP fixes 2011-12-17 14:41:05 -05:00
Ulrich Drepper 1d3e4b618a Optimized wcschr and wcscpy for x86-64 and x86-32 2011-12-17 14:39:23 -05:00
Ulrich Drepper aff2453df7 Fix more warnings 2011-12-03 21:49:35 -05:00
Ulrich Drepper 34372fc6d3 Fix test of non-ASCII locales in x86-64 strcasecmp et.al. 2011-11-01 16:46:23 -04:00
Ulrich Drepper 52e4b9eb62 More cleanups of x86-64 strstr 2011-10-28 19:01:48 -04:00
Ulrich Drepper fd52bc6dc4 Clean up x86-64 strcasestr
Actually describe in the C code what is going on.
2011-10-28 18:18:04 -04:00
Ulrich Drepper e0016b11d6 Add AVX optimized versions for some x86-64 math functions 2011-10-25 21:34:55 -04:00
Ulrich Drepper 618280a192 Optimize x86-64 SSE4.2+ strcmp a bit more 2011-10-25 14:50:31 -04:00
Ulrich Drepper 09229f3e1b Fix WS 2011-10-23 14:57:28 -04:00
Liubov Dmitrieva ce7dd29f28 Optimized strnlen and wcscmp for x86-64 2011-10-23 14:56:04 -04:00
Ulrich Drepper c196fed8f0 Fix compilation problems in x86-64 init-arch 2011-10-21 20:47:20 -04:00
Ulrich Drepper ed72b6545f Check for FMA4 support and generate appropriate fma functions 2011-10-20 22:43:15 -04:00
Ulrich Drepper 8d4f46c613 Move fma routines to right place 2011-10-20 21:55:41 -04:00
Ulrich Drepper 855d156018 Optimize x86-64 rawmemchr and add test 2011-10-19 22:22:29 -04:00
Ulrich Drepper d9a4d2ab27 Add optimized str{,n}casecmp for AVX on x86-64 2011-10-19 12:42:38 -04:00
Ulrich Drepper 2d1f3a4db6 Fix WS 2011-10-15 11:11:12 -04:00
Liubov Dmitrieva be13f7bff6 Optimized memcmp and wmemcmp for x86-64 and x86-32 2011-10-15 11:10:08 -04:00
Liubov Dmitrieva 093ecf9299 Improve 64 bit memchr, memrchr, rawmemchr with SSE2 2011-10-07 11:49:10 -04:00
Ulrich Drepper ceaa0c5dc3 Move Atom-optimized code out of the way and together 2011-09-06 21:53:03 -04:00
Ulrich Drepper 6d18b67f4d Fix whitespaces 2011-09-05 21:42:12 -04:00
Liubov Dmitrieva a5f524e479 Add Atom-optimized strchr and strrchr for x86-64 2011-09-05 21:34:03 -04:00
Andreas Schwab 8c1a459f9a Fix inline strncat/strncmp on x86 2011-08-04 14:59:25 -04:00
Ulrich Drepper 21137f89c5 Fix overflow bug is optimized strncat for x86-64 2011-07-21 12:32:36 -04:00
Ulrich Drepper 8002999481 Fix whitespaces 2011-07-19 17:27:09 -04:00
Liubov Dmitrieva 99710781cc Improve 64 bit strcat functions with SSE2/SSSE3 2011-07-19 17:11:54 -04:00
H.J. Lu 8912479f9e Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
H.J. Lu 0b1cbaaef5 Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
H.J. Lu 3d29045b5e Assume Intel Core i3/i5/i7 processor if AVX is available 2011-06-03 07:01:25 -04:00
Mike Frysinger 4c559bcdf3 Fix static linking with checking x86/x86-64 memcpy. 2011-04-17 22:20:47 -04:00
H.J. Lu 0354e35501 Work around old buggy program which cannot cope with memcpy semantics. 2011-04-01 19:38:21 -04:00
H.J. Lu c97a1282a4 Handle page boundaries in x86 SSE4.2 strncmp. 2011-03-21 05:35:38 -04:00
Harsha Jagasia 7e4ba49cd3 Enable SSE2 memset for AMD'supcoming Orochi processor.
This patch enables SSE2 memset for AMD's upcoming Orochi processor.
This patch also fixes the following bug:
For misaligned blocks larger than > 144 Bytes, memset branches into
the integer code path depending on the value of misalignment even if
the startup code chooses the SSE2 code path upfront, when multiarch
is enabled.
2011-03-04 23:30:08 -05:00
Roland McGrath a0bf67cca2 Fix some warning nits. 2011-02-04 10:53:51 -08:00
H.J. Lu 13b695749a Support Intel processor model 6 and model 0x2. 2010-11-12 03:48:52 -05:00
H.J. Lu 8ca52c6e3b Fix one exit path in x86-64 SSE4.2 str{,n}casecmp. 2010-11-10 03:05:37 -05:00
H.J. Lu ff02d5280b Use IFUNC on x86-64 memset 2010-11-08 03:41:34 -05:00
Richard Li dbf3a06904 Fix x86-64 strchr propagation of search byte into all bytes of SSE register 2010-10-25 14:13:17 -04:00
Jakub Jelinek 5e908464b9 Implement accurate fma. 2010-10-13 22:27:03 -04:00
Jakub Jelinek 9ff8d36f27 Correct implementation of fmaf. 2010-10-11 09:27:05 -04:00
Ulrich Drepper 015a4c6193 Re-enable all strncasecmp versions. 2010-09-20 20:18:00 -07:00
Ulrich Drepper 8ffcee4a04 Fix limit detection in x86-64 SSE2 strncasecmp. 2010-09-20 14:02:23 -07:00
Ulrich Drepper 9ea3de11f1 Move slow Atom code to separate section. 2010-08-26 22:17:03 -07:00
H.J. Lu 623aac7f84 Unroll x86-64 strlen 2010-08-26 22:09:34 -07:00
H.J. Lu b416a90085 Missing comma in last commit. 2010-08-26 13:18:46 -07:00
Roland McGrath 8b2b771538 Clean up warnings in new x86_64/multiarch code. 2010-08-25 12:13:08 -07:00
H.J. Lu e73015f2d6 Unroll 32bit SSE strlen and handle slow bsf 2010-08-25 10:07:37 -07:00
Ulrich Drepper 1cdfe7242f Add missing copyright year updated and pretty printing. 2010-08-24 11:42:19 -07:00
Richard Henderson 73f27d5e72 Clean up SSE variable shifts 2010-08-24 11:35:01 -07:00
Ulrich Drepper 9da4bb316f Fix two typos in x86-64 SSE4.2 strncasecmp implementation. 2010-08-19 09:20:44 -07:00
Ulrich Drepper 1feccb6caf Fix fourth parameter of SSE4.2 strcmp for x86-64. 2010-08-15 20:46:09 -07:00
Ulrich Drepper e9f82e0d1d Add optimized strncasecmp versions for x86-64. 2010-08-14 22:04:01 -07:00
Ulrich Drepper ca6bb004eb Fix x86-64 build without multiarch. 2010-08-14 14:56:32 -07:00
Ulrich Drepper 73507d3ae0 Add support for SSSE3 and SSE4.2 versions of strcasecmp on x86-64. 2010-07-31 21:41:09 -07:00
Ulrich Drepper 66f6765a47 Pretty printing x86-64 SSE4.3 strcmp. 2010-07-30 12:54:37 -07:00
Ulrich Drepper fe36dd025e Fix tolower operation in strcasestr. 2010-07-30 00:09:07 -07:00
Ulrich Drepper 880113d91e Avoid compiling unneeded file in ld.so. 2010-07-27 21:12:59 -07:00
Ulrich Drepper 8e96b93aa7 Speed up x86-64 strcasestr a bit moew.
Using the new SSE4.2 instructions is cool but not really the fastest.
Some older SSE instructions can do the trick faster.
2010-07-24 08:34:44 -07:00
Andreas Schwab f6a31e0eb6 Add strcasestr-nonascii to i386 build 2010-07-21 07:26:18 -07:00
Ulrich Drepper d02dc4ba08 Fix non-ASCII case of SSE4.2 strcasstr. 2010-07-16 16:00:22 -07:00
Ulrich Drepper cc9f2e47a0 Speed up SSE4.2 strcasestr by avoiding indirect function call. 2010-07-16 15:37:38 -07:00
H.J. Lu 6fb8cbcb58 Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and
Core i7.  It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and
up to 1X on Core i7.  It also improves memmove by up to 3X on Atom, up to
4X on Core 2 and up to 2X on Core i7.
2010-06-30 08:26:11 -07:00
H.J. Lu 3c88fe1e3a Incorrect x86 CPU family and model check. 2010-05-27 11:14:18 -07:00
H.J. Lu df87f54923 Check DATA_CACHE_SIZE_HALF 2010-04-14 22:18:27 -07:00
H.J. Lu dd37cd1a12 Optimie x86-64 SSE4 memcmp for unaligned data. 2010-04-14 17:53:44 -07:00
H.J. Lu 404a6e3201 x86-64 SSE4 optimized memcmp
This is 64bit SSE4 optimized memcmp. It improves memcmp by upto 3X
on Intel Core i7.
2010-04-14 00:12:53 -07:00
Ulrich Drepper bbbdd77809 Update x86-64 cpu multiarch selection header. 2010-04-13 19:17:10 -07:00
Ulrich Drepper 22f4f44b67 Fix concurrent handling of __cpu_features. 2010-04-04 00:25:46 -07:00
H.J. Lu 7d9335ecd7 Don't define __strpbrk_sse42 in static library 2010-03-24 12:16:24 -07:00
H.J. Lu 5a7af22fbb Unroll the loop x86-64 SSE4.2 strlen. 2010-01-13 07:51:48 -08:00
H.J. Lu 3af48cbdfa Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
H.J. Lu 2510d01ddb Define bit_SSE2 and index_SSE2. 2009-12-13 15:23:02 -08:00
H.J. Lu 51ddd2c01e Define bit_XXX and index_XXX.
This patch defines bit_XXX and index_XXX and use them to check processor
feature in assembly code.  It can prevent typos in processor feature
check.
2009-12-13 09:47:02 -08:00
Ulrich Drepper 823bc6da65 Fix whitespaces. 2009-10-22 22:50:00 -07:00
H.J. Lu 001659f4d5 Implement SSE4.2 optimized strchr and strrchr. 2009-10-22 22:47:12 -07:00
Roland McGrath b0f3a2e43f Clean up unnecessary libc_hidden_builtin_def fiddling in x86 multiarch definitions. 2009-10-06 20:01:23 -07:00
Roland McGrath 9d6982d5d2 Clean up x86 multiarch HAS_FOO macros. 2009-10-06 19:59:03 -07:00
Jakub Jelinek 22bb992d51 Fix strstr/strcasestr/fma/fmaf on x86_64. 2009-09-02 19:43:04 -07:00
H.J. Lu 5a4eb7282e Remove ENABLE_SSSE3_ON_ATOM.
It turns that SSSE3 isn't slow on Atom. The problem is bsf. This patch
removes ENABLE_SSSE3_ON_ATOM.
2009-08-28 14:54:46 -07:00
Ulrich Drepper 8e436522e1 Move SSE4.2 functions together. 2009-08-08 09:38:32 -07:00
Ulrich Drepper 0fda545d5f Add SSSE3-optimized implementation of str{,n}cmp for x86-64. 2009-08-07 22:51:02 -07:00
Ulrich Drepper 57b378ac89 Avoid warning through fake initialization. 2009-08-07 16:19:54 -07:00
H.J. Lu 02cea47161 Add x86 32-bit SSE4.2 string functions.
This patch adds 32bit SSE4.2 string functions.  It uses -16L instead of
0xfffffffffffffff0L, which works for both 32bit and 64bit long.  Tested
on 32bit Core i7 and Core 2.
2009-08-04 12:13:43 -07:00
H.J. Lu 6f6f1215f6 Support multiarch for i686.
This patch adds multiarch support when configured for i686.  I modified
some x86-64 functions to support 32bit. I will contribute 32bit SSE string
and memory functions later.
2009-07-31 11:53:35 -07:00
Ulrich Drepper 78c4ef475d Add support for x86-64 fma instruction.
Use it to implement fma and fmaf, if possible.
2009-07-29 15:26:06 -07:00
Ulrich Drepper 9a1d2d4555 Prepare use if IFUNC functions outside libc.so.
We use a callback function into libc.so to get access to the data
structure with the information and have special versions of the test
macros which automatically use this function.
2009-07-29 15:22:28 -07:00
Ulrich Drepper e83c1a8a72 Refine testing for xmm/ymm register use in x86-64 ld.so.
The test now takes the callgraph into account.  Only code called
during runtime relocation is affected by the limitation.  We now
determine the affected object files as closely as possible from
the outside.  This allowed to remove some the specializations
for some of the string functions as they are only used in other
code paths.
2009-07-27 13:40:27 -07:00
Ulrich Drepper 16d2ea4c82 Make sure no code in ld.so uses xmm/ymm registers on x86-64.
This patch introduces a test to make sure no function modifies the
xmm/ymm registers.  With the exception of the auditing functions.

The test is probably too pessimistic.  All code linked into ld.so
is checked.  Perhaps at some point the callgraph starting from
_dl_fixup and _dl_profile_fixup is checked and we can start using
faster SSE-using functions in parts of ld.so.
2009-07-26 16:10:00 -07:00
H.J. Lu 7956a3d27c Add SSE2 support to str{,n}cmp for x86-64. 2009-07-26 13:32:28 -07:00
H.J. Lu 4e5b5821bf Some some optimizations for x86-64 strcmp. 2009-07-25 19:15:14 -07:00
Ulrich Drepper 29e92fa5cd Optimize x86-64 SSE4.2 strcmp.
The file contained some code which was never used.  Don't compile it
in.
2009-07-25 12:02:47 -07:00
Ulrich Drepper d28797e426 Perform test for Arom x86-64 in central place and handle it.
There will be more than one function which, in multiarch mode, wants
to use SSSE3.  We should not test in each of them for Atoms with
slow SSSE3.  Instead, disable the SSSE3 bit in the startup code for
such machines.
2009-07-23 13:15:17 -07:00
Ulrich Drepper ae612b04cc Minor cleanups in x86-64 strstr. 2009-07-21 07:52:12 -07:00