Commit Graph

149 Commits

Author SHA1 Message Date
Allan McRae 6f8e37ebf8 Update file name in x86_64 ifunc list
File name update missed in commit 584b18eb.
2013-12-16 13:00:39 +10:00
Ondřej Bílka 584b18eb4d Add strstr with unaligned loads. Fixes bug 12100.
A sse42 version of strstr used pcmpistr instruction which is quite
ineffective. A faster way is look for pairs of characters which is uses
sse2, is faster than pcmpistr and for real strings a pairs we look for
are relatively rare.

For linear time complexity we use buy or rent technique which switches
to two-way algorithm when superlinear behaviour is detected.
2013-12-14 20:08:13 +01:00
Ondřej Bílka e7044ea76b Use p2align instead ALIGN 2013-10-08 15:46:48 +02:00
Ondřej Bílka dc1a95c730 Faster strrchr. 2013-09-26 19:23:01 +02:00
Ondřej Bílka 5905e7b3e2 Faster strchr implementation. 2013-09-11 17:07:38 +02:00
Ondřej Bílka 8f02859f17 Add unaligned strcmp. 2013-09-03 16:27:10 +02:00
Ondřej Bílka 382466e04e Fix typos. 2013-08-30 18:08:59 +02:00
Ondřej Bílka 0186c6e97e Fix rawmemchr regression on bulldozer. 2013-08-30 10:14:37 +02:00
Ondřej Bílka c0c3f78afb Fix typos. 2013-08-21 19:48:48 +02:00
Liubov Dmitrieva 6308fd9a46 Skip SSE4.2 versions on Intel Silvermont
SSE2/SSSE3 versions are faster than SSE4.2 versions on Intel Silvermont.
2013-06-28 15:31:40 -07:00
Liubov Dmitrieva 11b8a0e1d7 Fix buffers overrun in x86_64 memcmp-ssse3.S 2013-06-26 12:31:51 -07:00
Liubov Dmitrieva d086fc7ba0 Set fast unaligned load flag for new Intel microarchitecture
I have small patch for new Intel Silvermont machines.

http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture

I checked this on my machine and see that strcpy, ... unaligned
versions are faster than ssse3 versions.
2013-06-14 20:46:15 +02:00
Ondrej Bilka 2d48b41c8f Faster memcpy on x64.
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.

Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
2013-05-20 08:24:41 +02:00
Ondrej Bilka 37bb363f03 Faster strlen on x64. 2013-03-18 07:39:12 +01:00
Ondrej Bilka 80f844c9d8 Remove Prefer_SSE_for_memop on x64 2013-03-11 15:39:08 +01:00
Ondrej Bilka 87bd9bc4bd Revert " * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation"
This reverts commit b79188d717.
2013-03-06 22:27:18 +01:00
Ondrej Bilka b79188d717 * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation
which is faster on all x86_64 architectures.
	Tested on AMD, Intel Nehalem, SNB, IVB.
2013-03-06 21:54:01 +01:00
Roland McGrath f1d70dad53 Remove lots of inline keywords. 2013-02-07 14:44:18 -08:00
H.J. Lu afec409af9 Change __x86_64 prefix in cache size to __x86 2013-01-05 16:00:38 -08:00
H.J. Lu 5d7dd1ca84 Add HAS_RTM 2013-01-03 09:38:20 -08:00
Joseph Myers 568035b787 Update copyright notices with scripts/update-copyrights. 2013-01-02 19:05:09 +00:00
Pino Toscano 94558d30b1 test-multiarch: terminate printf output with newline 2012-11-22 11:34:03 +01:00
H.J. Lu f62c8abcfb Compile x86 rtld with -mno-sse -mno-mmx 2012-11-02 18:43:27 -07:00
H.J. Lu ac49ecaf9d Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
H.J. Lu 9a387d1f78 Use IFUNC memmove/memset in x86-64 bcopy/bzero
Also add separate tests for bcopy and bzero.
2012-10-11 13:58:16 -07:00
H.J. Lu 0569936773 Define HAS_FMA with bit_FMA_Usable 2012-10-02 05:05:17 -07:00
H.J. Lu 31ed415328 Don't define x86-64 __strncmp_ssse3 in libc.a 2012-09-27 07:43:03 -07:00
Roland McGrath 7312ca90dc Clean up x86_64/multiarch/strstr-c.c include order. 2012-08-15 11:38:57 -07:00
Roland McGrath 9a0a54864b Clean up x86_64/multiarch/memmove.c include order. 2012-08-15 11:26:02 -07:00
H.J. Lu f85fa27058 Avoid DWARF definition DIE on ifunc symbols 2012-08-09 16:04:37 -07:00
Carlos O'Donell 1a0994f535 BZ#14059: Fix AVX and FMA4 detection.
Fix AVX and FMA4 detection by following the guidelines
set out by Intel and AMD for detecting these features.
2012-05-17 06:59:28 -07:00
H.J. Lu 70bc83b910 Load pointers into RAX_LP in strcmp-sse42.S 2012-05-15 09:59:31 -07:00
H.J. Lu 9bc0b730a6 Load cache sizes into R*_LP in memcpy-ssse3.S 2012-05-15 09:58:28 -07:00
H.J. Lu 6d2850e7f5 Load cache sizes into R*_LP in memcpy-ssse3-back.S 2012-05-15 09:56:17 -07:00
H.J. Lu 8a17f34979 Load cache size into R8_LP 2012-05-15 09:35:43 -07:00
Paul Eggert 59ba27a63a Replace FSF snail mail address with URLs. 2012-02-09 23:18:22 +00:00
Ulrich Drepper 08cf777f9e Really fix AVX tests
There is no problem with strcmp, it doesn't use the YMM registers.
The math routines might since gcc perhaps generates such code.
Introduce bit_YMM_USBALE and use it in the math routines.
2012-01-26 09:45:54 -05:00
Ulrich Drepper afc5ed09cb Reset bit_AVX in __cpu_features is OS support is missing 2012-01-26 07:45:14 -05:00
Liubov Dmitrieva 15db4de19d Fix overrun in destination buffer 2011-12-23 12:02:15 -05:00
Ulrich Drepper 370a7d88f7 WP fixes 2011-12-17 14:41:05 -05:00
Ulrich Drepper 1d3e4b618a Optimized wcschr and wcscpy for x86-64 and x86-32 2011-12-17 14:39:23 -05:00
Ulrich Drepper aff2453df7 Fix more warnings 2011-12-03 21:49:35 -05:00
Ulrich Drepper 34372fc6d3 Fix test of non-ASCII locales in x86-64 strcasecmp et.al. 2011-11-01 16:46:23 -04:00
Ulrich Drepper 52e4b9eb62 More cleanups of x86-64 strstr 2011-10-28 19:01:48 -04:00
Ulrich Drepper fd52bc6dc4 Clean up x86-64 strcasestr
Actually describe in the C code what is going on.
2011-10-28 18:18:04 -04:00
Ulrich Drepper e0016b11d6 Add AVX optimized versions for some x86-64 math functions 2011-10-25 21:34:55 -04:00
Ulrich Drepper 618280a192 Optimize x86-64 SSE4.2+ strcmp a bit more 2011-10-25 14:50:31 -04:00
Ulrich Drepper 09229f3e1b Fix WS 2011-10-23 14:57:28 -04:00
Liubov Dmitrieva ce7dd29f28 Optimized strnlen and wcscmp for x86-64 2011-10-23 14:56:04 -04:00
Ulrich Drepper c196fed8f0 Fix compilation problems in x86-64 init-arch 2011-10-21 20:47:20 -04:00