Commit Graph

16 Commits

Author SHA1 Message Date
H.J. Lu 2f5d20ac99 x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2
SSE2 memchr is extended to support wmemchr.  AVX2 memchr/rawmemchr/wmemchr
are added to search 32 bytes with a single vector compare instruction.
AVX2 memchr/rawmemchr/wmemchr are as fast as SSE2 memchr/rawmemchr/wmemchr
for small sizes and up to 1.5X faster for larger sizes on Haswell and
Skylake.  Select AVX2 memchr/rawmemchr/wmemchr on AVX2 machines where
vzeroupper is preferred and AVX unaligned load is fast.

NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input.  TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.

	* sysdeps/x86_64/memchr.S (MEMCHR): New.  Depending on if
	USE_AS_WMEMCHR is defined.
	(PCMPEQ): Likewise.
	(memchr): Renamed to ...
	(MEMCHR): This.  Support wmemchr if USE_AS_WMEMCHR is defined.
	Replace pcmpeqb with PCMPEQ.
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memchr-sse2, rawmemchr-sse2, memchr-avx2, rawmemchr-avx2,
	wmemchr-sse4_1, wmemchr-avx2 and wmemchr-c.
	* sysdeps/x86_64/multiarch/ifunc-avx2.h: New file.
	* sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/memchr-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/memchr.c: Likewise.
	* sysdeps/x86_64/multiarch/rawmemchr-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/rawmemchr.c: Likewise.
	* sysdeps/x86_64/multiarch/wmemchr-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wmemchr-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/wmemchr.c: Likewise.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test __memchr_avx2, __memchr_sse2,
	__rawmemchr_avx2, __rawmemchr_sse2, __wmemchr_avx2 and
	__wmemchr_sse2.
2017-06-09 05:13:31 -07:00
H.J. Lu 4f26ef1b67 x86_64: Remove redundant REX bytes from memchr.S
By x86-64 specification, 32-bit destination registers are zero-extended
to 64 bits.  There is no need to use 64-bit registers when only the lower
32 bits are non-zero.

	* sysdeps/x86_64/memchr.S (MEMCHR): Use 32-bit registers for
	the lower 32 bits.
2017-05-30 12:39:14 -07:00
H.J. Lu 402bf06952 x86: Optimize SSE2 memchr overflow calculation
SSE2 memchr computes "edx + ecx - 16" where ecx is less than 16.  Use
"edx - (16 - ecx)", instead of satured math, to avoid possible addition
overflow.  This replaces

	add	%ecx, %edx
	sbb	%eax, %eax
	or	%eax, %edx
	sub	$16, %edx

with

	neg	%ecx
	add	$16, %ecx
	sub	%ecx, %edx

It is the same for x86_64, except for rcx/rdx, instead of ecx/edx.

	* sysdeps/i386/i686/multiarch/memchr-sse2.S (MEMCHR): Use
	"edx + ecx - 16" to avoid possible addition overflow.
	* sysdeps/x86_64/memchr.S (memchr): Likewise.
2017-05-19 10:48:45 -07:00
Joseph Myers bfff8b1bec Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
Adhemerval Zanella 3daef2c8ee Fix x86_64 memchr for large input sizes
Current optimized memchr for x86_64 does for input arguments pointers
module 64 in range of [49,63] if there is no searchr char in the rest
of 64-byte block a pointer addition which might overflow:

* sysdeps/x86_64/memchr.S

    77          .p2align 4
    78  L(unaligned_no_match):
    79          add     %rcx, %rdx

Add (uintptr_t)s % 16 to n in %rdx.

    80          sub     $16, %rdx
    81          jbe     L(return_null)

This patch fixes by adding a saturated math that sets a maximum pointer
value if it overflows (UINTPTR_MAX).

Checked on x86_64-linux-gnu and powerpc64-linux-gnu.

	[BZ# 19387]
	* sysdeps/x86_64/memchr.S (memchr): Avoid overflow in pointer
	addition.
	* string/test-memchr.c (do_test): Remove alignment limitation.
	(test_main): Add test that trigger BZ# 19387.
2016-12-27 10:50:41 -02:00
Joseph Myers f7a9f785e5 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
Joseph Myers b168057aaa Update copyright dates with scripts/update-copyrights. 2015-01-02 16:29:47 +00:00
Allan McRae d4697bc93d Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
Joseph Myers 568035b787 Update copyright notices with scripts/update-copyrights. 2013-01-02 19:05:09 +00:00
Paul Eggert 59ba27a63a Replace FSF snail mail address with URLs. 2012-02-09 23:18:22 +00:00
Liubov Dmitrieva 093ecf9299 Improve 64 bit memchr, memrchr, rawmemchr with SSE2 2011-10-07 11:49:10 -04:00
Jakub Jelinek fab8238de6 Fix x86-64 memchr for large lengths. 2009-06-16 10:23:31 -07:00
Ulrich Drepper 2221e33e5d * sysdeps/x86_64/memchr.S: Handle invalid buffer pointers when
count is zero.
2009-05-09 06:40:15 +00:00
Ulrich Drepper f140a0d53d * sysdeps/x86_64/rawmemchr.S: New file. 2009-04-10 07:57:20 +00:00
Ulrich Drepper ddba0f1700 * string/stratcliff.c (do_test): Add memchr tests..
* sysdeps/x86_64/memchr.S: Fix handling of end of buffer after
	first read quad word.
2009-04-07 14:53:04 +00:00
Ulrich Drepper 322e23db24 * sysdeps/x86_64/memchr.S: New file. 2009-04-07 06:36:33 +00:00