glibc/sysdeps
Leonardo Sandoval 1457016337 x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2
Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector
comparison as much as possible. Peak performance observed on a SkyLake
machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp,
respectively. The larger the comparison length, the more benefit using
avx2 functions, except on the strcmp, where peak is observed at length
== 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper
is preferred and AVX unaligned load is fast.

NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input.  TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strcmp-avx2, strncmp-avx2, wcscmp-avx2, wcscmp-sse2, wcsncmp-avx2 and
	wcsncmp-sse2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add tests for __strcmp_avx2,
	__strncmp_avx2,	__wcscmp_avx2, __wcsncmp_avx2, __wcscmp_sse2
	and __wcsncmp_sse2.
	* sysdeps/x86_64/multiarch/strcmp.c (OPTIMIZE (avx2)):
	(IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if
	AVX unaligned load is fast and vzeroupper is preferred.
	* sysdeps/x86_64/multiarch/strncmp.c: Likewise.
	* sysdeps/x86_64/multiarch/strcmp-avx2.S: New file.
	* sysdeps/x86_64/multiarch/strncmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp-sse2.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp.c: Likewise.
	* sysdeps/x86_64/wcscmp.S (__wcscmp): Add alias only if __wcscmp
	is undefined.
2018-06-01 16:32:43 -05:00
..
aarch64 Remove sysdeps/aarch64/soft-fp directory. 2018-05-22 17:23:34 +00:00
alpha Remove sysdeps/alpha/soft-fp directory. 2018-05-23 17:29:20 +00:00
arm Drop fpregset unused symbol exposition 2018-04-20 01:27:13 +02:00
generic Remove sysdeps/generic/libcidn.abilist 2018-06-01 11:25:41 +02:00
gnu Increase robustness of internal dlopen() by using RTLD_NOW [BZ #22766] 2018-04-26 10:41:43 -03:00
hppa R_PARISC_TLS_DTPOFF32 reloc handling 2018-05-13 08:32:28 +09:30
htl hurd: Avoid exposing all <sched.h> symbols from sys/types.h 2018-04-19 20:24:36 +02:00
hurd hurd: Fix hurd installed headers test 2018-04-20 00:16:40 +02:00
i386 math: Update i686 ulps (--disable-multi-arch configuration) 2018-06-01 22:37:55 +02:00
ia64 elf: Unify symbol address run-time calculation [BZ #19818] 2018-04-04 23:09:37 +01:00
ieee754 Fix i686-linux-gnu build with GCC mainline. 2018-05-22 16:55:04 +00:00
init_array sysdeps/init_array: Add PREINIT_FUNCTION to crti.S 2018-01-29 10:22:26 -08:00
m68k Do not include math-barriers.h in math_private.h. 2018-05-11 15:11:38 +00:00
mach Add narrowing divide functions. 2018-05-17 00:40:52 +00:00
microblaze elf: Unify symbol address run-time calculation [BZ #19818] 2018-04-04 23:09:37 +01:00
mips Update MIPS libm-test-ulps. 2018-05-16 15:35:26 +00:00
nios2 Make powerpc-nofpu __sqrtsf2, __sqrtdf2 compat symbols (bug 18473). 2018-06-01 17:25:12 +00:00
nptl Fix comment typo 2018-05-08 14:59:13 +02:00
posix Switch IDNA implementation to libidn2 [BZ #19728] [BZ #19729] [BZ #22247] 2018-05-23 15:27:24 +02:00
powerpc Make powerpc-nofpu __sqrtsf2, __sqrtdf2 compat symbols (bug 18473). 2018-06-01 17:25:12 +00:00
pthread hurd: fix sigevent's sigev_notify_attributes field type 2018-04-19 21:43:44 +02:00
riscv elf: Unify symbol address run-time calculation [BZ #19818] 2018-04-04 23:09:37 +01:00
s390 nptl: Remove __ASSUME_PRIVATE_FUTEX 2018-05-17 04:25:10 -07:00
sh Remove sysdeps/sh/soft-fp directory. 2018-05-23 20:05:31 +00:00
sparc Remove sysdeps/sparc/sparc64/soft-fp directory. 2018-05-25 20:00:51 +00:00
unix Remove sysdeps/powerpc/soft-fp directory. 2018-05-24 22:02:32 +00:00
wordsize-32 Use libc_hidden_* for strtoumax (bug 15105). 2018-02-28 14:16:21 +00:00
wordsize-64 Use libc_hidden_* for strtoumax (bug 15105). 2018-02-28 14:16:21 +00:00
x86 x86-64: Check Prefer_FSRM in ifunc-memmove.h 2018-05-21 16:54:59 -07:00
x86_64 x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 2018-06-01 16:32:43 -05:00