glibc

Author	SHA1	Message	Date
Rajalakshmi Srinivasaraghavan	6b86036452	powerpc: Use latest optimization for internal function calls Update strcasestr-power8 to use power8 version of strnlen for calculating length. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>	2017-11-07 10:07:48 +05:30
Adhemerval Zanella	6a4235f129	Cleanup Linux sigqueue implementation This patch simplify Linux sigqueue implementation by assuming __NR_rt_sigqueueinfo existence due minimum kernel requirement (it pre-dates Linux git inclusion for Linux 2.6.12). Checked on x86_64-linux-gnu. * sysdeps/unix/sysv/linux/sigqueue.c (__sigqueue): Asssume __NR_rt_sigqueueinfo. Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Zack Weinberg <zackw@panix.com>	2017-11-06 17:37:57 -02:00
Adhemerval Zanella	8b0e795aaa	Simplify Linux sig{timed}wait{info} implementations This patch simplifies sig{timed}wait{info} by: - Assuming __NR_rt_sigtimedwait existence on all architectures due minimum kernel version requirement (it pre-dates Linux git inclusion for Linux 2.6.12). - Call __sigtimedwait on both sigwait and sigwaitinfo. - Now that sigwait is based on an internal sigtimedwait call and it is present of both libc.so and libpthread.so we need to add an external private definition of __sigtimedwait for libpthread.so call. Checked on x86_64-linux-gnu. * sysdeps/unix/sysv/linux/Versions (libc) [GLIBC_PRIVATE]: Add __sigtimedwait. * sysdeps/unix/sysv/linux/sigtimedwait.c: Simplify includes and assume __NR_rt_sigtimedwait. * sysdeps/unix/sysv/linux/sigwait.c (__sigwait): Call __sigtimedwait and add LIBC_CANCEL_HANDLED for cancellation marking. * sysdeps/unix/sysv/linux/sigwaitinfo.c (__sigwaitinfo): Likewise. Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Zack Weinberg <zackw@panix.com>	2017-11-06 17:37:57 -02:00
Adhemerval Zanella	a1a638dda9	arm: Implement memchr ifunc selection in C This patch refactor ARM memchr ifunc selector to a C implementation. No functional change is expected, including ifunc resolution rules. It also reorganize the ifunc options code: 1. The memchr_impl.S is renamed to memchr_neon.S and multiple compilation options (which route to armv6t2/memchr one) is removed. The code to build if __ARM_NEON__ is defined is also simplified. 2. A memchr_noneon is added (which as build along previous ifunc resolution) and includes the armv6t2 direct. 3. Same as 2. for loader object. Alongside the aforementioned changes, it also some cleanus: - Internal memchr definition (__GI_memcpy) is now a hidden symbol. - No need to create hidden definition for the ifunc variants. Checked on armv7-linux-gnueabihf and with a build for arm-linux-gnueabi, arm-linux-gnueabihf with and without multiarch support and with both GCC 7.1 and GCC mainline. * sysdeps/arm/armv7/multiarch/Makefile [$(subdir) = string] (sysdeps_routines): Add memchr_noneon. * sysdeps/arm/armv7/multiarch/ifunc-memchr.h: New file. * sysdeps/arm/armv7/multiarch/memchr_noneon.S: Likewise. * sysdeps/arm/armv7/multiarch/rtld-memchr.S: Likewise. * sysdeps/arm/armv7/multiarch/memchr.S: Remove file. * sysdeps/arm/armv7/multiarch/memchr.c: New file. * sysdeps/arm/armv7/multiarch/memchr_impl.S: Move to ... * sysdeps/arm/armv7/multiarch/memchr_neon.S: ... here. Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2017-11-06 17:37:57 -02:00
Adhemerval Zanella	802c1f1645	arm: Implement memcpy ifunc selection in C This patch refactor ARM memcpy ifunc selector to a C implementation. No functional change is expected, including ifunc resolution rules. It also adds some cleanup: - Internal memcpy hidden definition (__GI_memcpy) is now a hidden symbol. - No need to create hidden definition for the ifunc variants. Checked on armv7-linux-gnueabihf and with a build for arm-linux-gnueabi, arm-linux-gnueabihf with and without multiarch support and with both GCC 7.1 and GCC mainline. I also checked with the some possible multiarch different configurations that trigger different memcpy buids (__ARM_NEON__ && !__SOFT_FP__, !__ARM_NEON__ && !__SOFT_FP__, and !__ARM_NEON__ && __SOFT_FP__). * sysdeps/arm/arm-ifunc.h: New file. * sysdeps/arm/armv7/multiarch/ifunc-memcpy.h: Likewise. * sysdeps/arm/armv7/multiarch/memcpy.c: Likewise. * sysdeps/arm/armv7/multiarch/memcpy_arm.S: Likewise. * sysdeps/arm/armv7/multiarch/rtld-memcpy.S: Likewise. * sysdeps/arm/armv7/multiarch/memcpy_neon.S [!__ARM_NEON__] (__memcpy_neon): Avoid create hidden alias. * sysdeps/arm/armv7/multiarch/memcpy_vfp.S [!__ARM_NEON_] (__memcpy_vfp): Likewise. * sysdeps/arm/armv7/multiarch/Makefile [$(subdir) = string] (sysdep_routines): Add memcpy_arm. * sysdeps/arm/armv7/multiarch/memcpy.S: Remove file. Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2017-11-06 17:37:57 -02:00
Joseph Myers	4e2dff67be	Do not declare _Float128 support for powerpc64le -mlong-double-64 (bug 22402). The powerpc bits/floatn.h declares _Float128 support to be present when the compiler supports it for powerpc64le. However, in the case where -mlong-double-64 is used, __MATH_TG does not actually support _Float128; it only supports _Float128 in the distinct-long-double case. This shows up as a build failure when building glibc mainline with GCC mainline, given the recently added sanity check in math.h for configurations supported by __MATH_TG, as the compat code for -mlong-double-64 fails to build. However, the bug was logically present before that change (including in 2.26), just less visible. This patch fixes the build failure by declaring _Float128 to be unsupported in that case. (Of course this can't actually stop users calling the type-generic macros with _Float128 arguments with -mlong-double-64, just as they could be called with other unsupported types on other platforms, but perhaps makes it less likely by making all the type-specific _Float128 interfaces invisible in that case.) Tested compilation for powerpc64le with build-many-glibcs.py. [BZ #22402] * sysdeps/powerpc/bits/floatn.h: Include <bits/long-double.h>. [__NO_LONG_DOUBLE_MATH] (__HAVE_FLOAT128): Define to 0.	2017-11-06 13:26:15 +00:00
Richard Henderson	6d58ce5e50	aarch64: Guess L1 cache linesize for aarch64 Using the cache hierarchy linesize minimum in CTR_EL0. See the comment within the code for rationale. * sysdeps/unix/sysv/linux/aarch64/sysconf.c: New file.	2017-11-03 16:40:27 +00:00
Szabolcs Nagy	659ca26736	aarch64: optimize _dl_tlsdesc_dynamic fast path Remove some load/store instructions from the dynamic tlsdesc resolver fast path. This gives around 20% faster tls access in dlopened shared libraries (assuming glibc ran out of static tls space). * sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_dynamic): Optimize.	2017-11-03 14:50:55 +00:00
Szabolcs Nagy	94d2f0af15	arm: Remove lazy tlsdesc initialization related code Lazy tlsdesc initialization is no longer used in the dynamic linker so all related code can be removed. * sysdeps/arm/dl-machine.h (elf_machine_runtime_setup): Remove DT_TLSDESC_GOT initialization. * sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_lazy_resolver): Remove. (_dl_tlsdesc_resolve_hold): Likewise. * sysdeps/aarch64/dl-tlsdesc.h (_dl_tlsdesc_lazy_resolver): Remove. (_dl_tlsdesc_resolve_hold): Likewise. * sysdeps/aarch64/tlsdesc.c (_dl_tlsdesc_lazy_resolver_fixup): Remove. (_dl_tlsdesc_resolve_hold_fixup): Likewise.	2017-11-03 14:49:20 +00:00
Szabolcs Nagy	28e1ddf340	arm: Remove unnecessary volatile qualifier There is no reason to treat tlsdesc entries as volatile objects. * sysdeps/arm/dl-machine.h (elf_machine_rel): Remove volatile.	2017-11-03 14:48:35 +00:00
Szabolcs Nagy	0ca3d1d6d0	[BZ #18572 ] arm: Disable lazy initialization of tlsdesc entries Follow up to https://sourceware.org/ml/libc-alpha/2015-11/msg00272.html Always do tls descriptor initialization at load time during relocation processing (as if DF_BIND_NOW were set for the binary) to avoid barriers at every tls access. This patch mimics bind-now semantics in the lazy relocation code of the arm target (elf_machine_lazy_rel). Ideally the static linker should be updated too to not emit tlsdesc relocs in DT_REL, so elf_machine_lazy_rel is not called on them at all. [BZ #18572] sysdeps/arm/dl-machine.h (elf_machine_lazy_rel): Do symbol binding non-lazily for R_ARM_TLS_DESC.	2017-11-03 14:47:35 +00:00
Szabolcs Nagy	2c1d4e5fe4	[BZ #17078 ] arm: remove prelinker support for R_ARM_TLS_DESC This patch reverts commit `9c82da17b5` Author: Maciej W. Rozycki <macro@codesourcery.com> Date: 2014-07-17 19:22:05 +0100 [BZ #17078] ARM: R_ARM_TLS_DESC prelinker support This only implemented support for the lazy binding case (and thus closed the bugzilla ticket prematurely), however tlsdesc on arm is not correct with lazy binding because there is a data race between the lazy initialization code and tlsdesc resolver functions. Lazy initialization of tlsdesc entries will be removed from arm to fix the data races and thus this half-finished prelinker support is no longer useful. [BZ #17078] * sysdeps/arm/dl-machine.h (elf_machine_rela): Remove the R_ARM_TLS_DESC case. (elf_machine_lazy_rel): Remove the prelink check.	2017-11-03 14:45:26 +00:00
Szabolcs Nagy	91c5a366d8	aarch64: Remove barriers from TLS descriptor functions Remove ldar synchronization and most lazy TLSDESC initialization related code. * sysdeps/aarch64/dl-machine.h (elf_machine_runtime_setup): Remove DT_TLSDESC_GOT initialization. * sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return_lazy): Remove. (_dl_tlsdesc_resolve_rela): Likewise. (_dl_tlsdesc_resolve_hold): Likewise. (_dl_tlsdesc_undefweak): Remove ldar. (_dl_tlsdesc_dynamic): Likewise. * sysdeps/aarch64/dl-tlsdesc.h (_dl_tlsdesc_return_lazy): Remove. (_dl_tlsdesc_resolve_rela): Likewise. (_dl_tlsdesc_resolve_hold): Likewise. * sysdeps/aarch64/tlsdesc.c (_dl_tlsdesc_resolve_rela_fixup): Remove. (_dl_tlsdesc_resolve_hold_fixup): Likewise. (_dl_tlsdesc_resolve_rela): Likewise. (_dl_tlsdesc_resolve_hold): Likewise.	2017-11-03 14:43:32 +00:00
Szabolcs Nagy	b7cf203b5c	aarch64: Disable lazy symbol binding of TLSDESC Always do TLS descriptor initialization at load time during relocation processing to avoid barriers at every TLS access. In non-dlopened shared libraries the overhead of tls access vs static global access is > 3x bigger when lazy initialization is used (_dl_tlsdesc_return_lazy) compared to bind-now (_dl_tlsdesc_return) so the barriers dominate tls access performance. TLSDESC relocs are in DT_JMPREL which are processed at load time using elf_machine_lazy_rel which is only supposed to do lightweight initialization using the DT_TLSDESC_PLT trampoline (the trampoline code jumps to the entry point in DT_TLSDESC_GOT which does the lazy tlsdesc initialization at runtime). This patch changes elf_machine_lazy_rel in aarch64 to do the symbol binding and initialization as if DF_BIND_NOW was set, so the non-lazy code path of elf/do-rel.h was replicated. The static linker could be changed to emit TLSDESC relocs in DT_REL, which are processed non-lazily, but the goal of this patch is to always guarantee bind-now semantics, even if the binary was produced with an old linker, so the barriers can be dropped in tls descriptor functions. After this change the synchronizing ldar instructions can be dropped as well as the lazy initialization machinery including the DT_TLSDESC_GOT setup. I believe this should be done on all targets, including ones where no barrier is needed for lazy initialization. There is very little gain in optimizing for large number of symbolic tlsdesc relocations which is an extremely uncommon case. And currently the tlsdesc entries are only readonly protected with -z now and some hardennings against writable JUMPSLOT relocs don't work for TLSDESC so they are a security hazard. (But to fix that the static linker has to be changed.) sysdeps/aarch64/dl-machine.h (elf_machine_lazy_rel): Do symbol binding and initialization non-lazily for R_AARCH64_TLSDESC.	2017-11-03 14:41:35 +00:00
Florian Weimer	ef11081fed	test-errno-linux: quotactl can fail with EPERM in containers Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2017-11-02 13:55:51 +01:00
H.J. Lu	95b93c6e0d	x86: Add sysdeps/x86/sysdep.h Add a new header file, sysdeps/x86/sysdep.h, for common assembly code macros between i386 and x86-64. Tested on i686 and x86-64. There are no differences in outputs of "readelf -a" and "objdump -dw" on all glibc shared objects before and after the patch. * sysdeps/i386/sysdep.h: Include <sysdeps/x86/sysdep.h> instead of <sysdeps/generic/sysdep.h>. (ALIGNARG): Removed. (ASM_SIZE_DIRECTIVE): Likewise. (ENTRY): Likewise. (END): Likewise. (ENTRY_CHK): Likewise. (END_CHK): Likewise. (syscall_error): Likewise. (mcount): Likewise. (PSEUDO_END): Likewise. (L): Likewise. (atom_text_section): Likewise. * sysdeps/x86/sysdep.h: New file. * sysdeps/x86_64/sysdep.h: Include <sysdeps/x86/sysdep.h> instead of <sysdeps/generic/sysdep.h>. (ALIGNARG): Removed. (ASM_SIZE_DIRECTIVE): Likewise. (ENTRY): Likewise. (END): Likewise. (ENTRY_CHK): Likewise. (END_CHK): Likewise. (syscall_error): Likewise. (mcount): Likewise. (PSEUDO_END): Likewise. (L): Likewise. (atom_text_section): Likewise.	2017-11-01 05:37:26 -07:00
Yury Norov	87bbc4cf1e	Remove useless #ifdefs from Linux sig.c syscalls sigprocmask.c, sigtimedwait.c, sigwait.c and sigwaitinfo.c files from sysdeps/unix/sysv/linux include nptl-signals.h via nptl/pthreadP.h, and so SIGCANCEL and SIGSETXID become defined unconditionally. But later in the code, there are some checks weither symbols defined, which is useless. This patch removes useless checks. Checked on x86_64-linux-gnu. sysdeps/unix/sysv/linux/sigprocmask.c: Remove useless #ifdefs. * sysdeps/unix/sysv/linux/sigtimedwait.c: Likewise. * sysdeps/unix/sysv/linux/sigwait.c: Likewise. * sysdeps/unix/sysv/linux/sigwaitinfo.c: Likewise. Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Reviewed-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2017-10-31 17:49:18 -02:00
Yury Norov	e8681faa01	Consolidate Linux sigpending() implementation ia64, s390-64, sparc64 and x86_64 host their own implementation of sigpending() in corresponding files, but they are identical to generic linux file despite few comments. This patch removes that files, so the implementation of sigpending() is taken from sysdeps/unix/sysv/linux for all ports. Build-tested on x86_64. * sysdeps/unix/sysv/linux/ia64/sigpending.c: Remove file. * sysdeps/unix/sysv/linux/s390/s390-64/sigpending.c: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/sigpending.c: Likewise. * sysdeps/unix/sysv/linux/x86_64/sigpending.c: Likewise. Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2017-10-31 17:49:11 -02:00
Alan Modra	58af72b4e2	[PowerPC64] sysdep.h doesn't need to be included in multiarch files When the .c/.S file neither uses nor modifies macros defined in sysdep.h there is no point to #include it. The same goes for math_ldbl_opt.h except that it includes shlib-compat.h, and if compat_symbol is redefined we need to include shlib-compat.h first. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-power8.S: Don't include sysdep.h. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_cosf-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_cosf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_sinf-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_sinf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memchr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcmp-power4.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcmp-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-a2.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-cell.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power4.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power6.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/mempcpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memrchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memrchr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset-power4.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset-power6.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/rawmemchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpcpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasestr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchr-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchrnul-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchrnul-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp-power9.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcspn-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strlen-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strlen-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strlen-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncase-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power4.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power9.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strnlen-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strnlen-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strspn-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-ppc64.S: Don't include sysdep.h and math_ldbl_opt.h. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-power5+.S: Don't include sysdep.h and math_ldbl_opt.h. Include shlib-compat.h. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc-ppc64.S: Likewise.	2017-10-31 12:27:19 +10:30
Alan Modra	112f30b3f1	[PowerPC64] strncase_l-power7.c should use strncase_l.c This is another one where we'll be wanting the base symbols for powerpc64le rather than just a power7 variant. * sysdeps/powerpc/powerpc64/multiarch/strncase_l-power7.c: Include string/strncase_l.c, not string/strncase.c. (USE_IN_EXTENDED_LOCALE_MODEL): Don't define. (libc_hidden_def): Redefine.	2017-10-31 12:27:19 +10:30
Alan Modra	e9e7eced01	[PowerPC64] Tidy strcasecmp_l-power7.S symbols The routine being assembled here is strcasecmp_l, so ask for that via __STRCMP and STRCMP defines. That change means tweaking the power7 override. Needed for later powerpc64le changes where we want the base symbols, not just a power7 variant. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l-power7.S: (__STRCMP, STRCMP, __strcasecmp_l): Define. (__strcasecmp): Don't define.	2017-10-31 12:27:19 +10:30
Alan Modra	f7b465cfcb	[PowerPC64] Wrap str{,n}cmp-power{8,9}.S in IS_IN(libc) These functions aren't used in ld.so at the moment since we don't have strcmp or strncmp ifuncs for them there. Remove the ld.so bloat. * sysdeps/powerpc/powerpc64/multiarch/strcmp-power8.S: Wrap in IS_IN (libc). * sysdeps/powerpc/powerpc64/multiarch/strcmp-power9.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power9.S: Likewise.	2017-10-31 12:27:19 +10:30
Alan Modra	d46b09f988	[PowerPC64] Remove duplicate define in stpncpy-power8.S USE_AS_STPNCPY is defined by sysdeps/powerpc/powerpc64/power8/stpncpy.S, included by this file. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Don't define USE_AS_STPNCPY.	2017-10-31 12:27:19 +10:30
Alan Modra	e9b8e19419	[PowerPC64] Don't define __GI_ variant of isnan for static lib It seems to me that libc.a should not contain any of the __GI_ symbols, and certainly --enable-multi-arch ought to not add to the list. At the end of this patch series we have the following in both --enable-multi-arch and --disable-multi-arch libc.a: 0000000000000000 T __GI___readdir64 0000000000000000 T __GI___fxstatat64 0000000000000000 T __GI_getrlimit 0000000000000000 T __GI___getrlimit * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S (hidden_def): Redefine only when SHARED.	2017-10-31 12:27:19 +10:30
H.J. Lu	4ad5106e3b	sysdeps/x86/libc-start.c: Add /* !SHARED / sysdeps/x86/libc-start.c: Add /* !SHARED */.	2017-10-30 13:40:28 -07:00
H.J. Lu	fe326df7b0	Reformat sysdeps/x86/libc-start.c * sysdeps/x86/libc-start.c: Reformat.	2017-10-30 13:01:18 -07:00
H.J. Lu	c5cc45148c	i586: Use conditional branches in strcpy.S [BZ #22353 ] i586 strcpy.S used a clever trick with LEA to implement jump table: /* ECX has the last 2 bits of the address of source - 1. / andl $3, %ecx call 2f 2: popl %edx / 0xb is the distance between 2: and 1:. / leal 0xb(%edx,%ecx,8), %ecx jmp %ecx .align 8 1: /* ECX == 0 / orb (%esi), %al jz L(end) stosb xorl %eax, %eax incl %esi / ECX == 1 / orb (%esi), %al jz L(end) stosb xorl %eax, %eax incl %esi / ECX == 2 / orb (%esi), %al jz L(end) stosb xorl %eax, %eax incl %esi / ECX == 3 / L(1): movl (%esi), %ecx leal 4(%esi),%esi This fails if there are instruction length changes before L(1):. This patch replaces it with conditional branches: cmpb $2, %cl je L(Src2) ja L(Src3) cmpb $1, %cl je L(Src1) L(Src0): which have similar performance and work with any instruction lengths. Tested on i586 and i686 with and without --disable-multi-arch. [BZ #22353] sysdeps/i386/i586/strcpy.S (STRCPY): Use conditional branches. (1): Renamed to ... (L(Src0)): This. (L(Src1)): New. (L(Src2)): Likewise. (L(1)): Renamed to ... (L(Src3)): This.	2017-10-30 10:02:30 -07:00
H.J. Lu	63d3b468c1	i386: Regenerate libm-test-ulps for for gcc 7 Regenerate libm-test-ulps for gcc 7 with "-m32 -O2 -march=i586". * sysdeps/i386/fpu/libm-test-ulps: Regenerated for GCC 7 with "-O2 -march=i586".	2017-10-27 14:09:14 -07:00
Rajalakshmi Srinivasaraghavan	63da5cd4a0	powerpc: Replace lxvd2x/stxvd2x with lvx/stvx in P7's memcpy/memmove POWER9 DD2.1 and earlier has an issue where some cache inhibited vector load traps to the kernel, causing a performance degradation. To handle this in memcpy and memmove, lvx/stvx is used for aligned addresses instead of lxvd2x/stxvd2x. Reference: https://patchwork.ozlabs.org/patch/814059/ * sysdeps/powerpc/powerpc64/power7/memcpy.S: Replace lxvd2x/stxvd2x with lvx/stvx. * sysdeps/powerpc/powerpc64/power7/memmove.S: Likewise. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2017-10-25 13:14:30 -02:00
H.J. Lu	a122dbfb2e	Replace "if if " with "if " in comments * include/alloc_buffer.h: Replace "if if " with "if " in comments. * sysdeps/mips/memcpy.S: Likkewise. * sysdeps/mips/memset.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise.	2017-10-25 08:05:51 -07:00
Joseph Myers	91c3985c23	Update x86 fix-fp-int-compare-invalid.h for GCC 8. The glibc implementation of iseqsig relies on ordered comparison operators raising the "invalid" exception for quiet NaN operands, with a workaround on platforms where a GCC bug means that exception is not raised. For x86, that bug has now been fixed for GCC 8, so this patch disables the workaround in that case. If and when the corresponding bugs for powerpc and s390 are fixed, the headers for those platforms should of course be updated similarly. Tested for x86_64 and x86, including with GCC mainline. Note that other failures appear with GCC mainline because of spurious use of ordered comparison instructions for unordered operations <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82692>. * sysdeps/x86/fpu/fix-fp-int-compare-invalid.h (FIX_COMPARE_INVALID): Define to 0 if [__GNUC_PREREQ (8, 0)].	2017-10-24 00:33:08 +00:00
Adhemerval Zanella	aa95a2414e	posix: Do not use WNOHANG in waitpid call for Linux posix_spawn As shown in some buildbot issues on aarch64 and powerpc, calling clone (VFORK) and waitpid (WNOHANG) does not guarantee the child is ready to be collected. This patch changes the call back to 0 as before `fe05e1cb6d` fix. This change can lead to the scenario 4.3 described in the commit, where the waitpid call can hang undefinitely on the call. However this is also a very unlikely and also undefinied situation where both the caller is trying to terminate a pid before posix_spawn returns and the race pid reuse is triggered. I don't see how to correct handle this specific situation within posix_spawn. Checked on x86_64-linux-gnu, aarch64-linux-gnu and powerpc64-linux-gnu. * sysdeps/unix/sysv/linux/spawni.c (__spawnix): Use 0 instead of WNOHANG in waitpid call.	2017-10-23 13:31:26 -02:00
Szabolcs Nagy	be080b6c14	aarch64: Add missing math Makefile for recent commit Without -fno-math-errno, the builtins just do a call instead of inlining a single instruction.	2017-10-23 15:34:36 +01:00
Michael Collison	5062680c60	aarch64: Implement math acceleration via builtins This patch converts asm statements into builtins for AArch64. As an example for the file sysdeps/aarch64/fpu/s_ceil.c, we convert the function from double __ceil (double x) { double result; asm ("frintp\t%d0, %d1" : "=w" (result) : "w" (x) ); return result; } into double __ceil (double x) { return __builtin_ceil (x); } Tested on aarch64-linux-gnu with gcc-4.9.4 and gcc-6. * sysdeps/aarch64/fpu/e_sqrt.c (ieee754_sqrt): Replace asm statements with __builtin_sqrt. * sysdeps/aarch64/fpu/e_sqrtf.c (ieee754_sqrtf): Replace asm statements with __builtin_sqrtf. * sysdeps/aarch64/fpu/s_ceil.c (__ceil): Replace asm statements with __builtin_ceil. * sysdeps/aarch64/fpu/s_ceilf.c (__ceilf): Replace asm statements with __builtin_ceilf. * sysdeps/aarch64/fpu/s_floor.c (__floor): Replace asm statements with __builtin_floor. * sysdeps/aarch64/fpu/s_floorf.c (__floorf): Replace asm statements with __builtin_floorf. * sysdeps/aarch64/fpu/s_fma.c (__fma): Replace asm statements with __builtin_fma. * sysdeps/aarch64/fpu/s_fmaf.c (__fmaf): Replace asm statements with __builtin_fmaf. * sysdeps/aarch64/fpu/s_fmax.c (__fmax): Replace asm statements with __builtin_fmax. * sysdeps/aarch64/fpu/s_fmaxf.c (__fmaxf): Replace asm statements with __builtin_fmaxf. * sysdeps/aarch64/fpu/s_fmin.c (__fmin): Replace asm statements with __builtin_fmin. * sysdeps/aarch64/fpu/s_fminf.c (__fminf): Replace asm statements with __builtin_fminf. * sysdeps/aarch64/fpu/s_frint.c: Delete file. * sysdeps/aarch64/fpu/s_frintf.c: Delete file. * sysdeps/aarch64/fpu/s_llrint.c (__llrint): Replace asm statements with builtin_rint and conversion to int. * sysdeps/aarch64/fpu/s_llrintf.c (__llrintf): Likewise. * sysdeps/aarch64/fpu/s_llround.c (__llround): Replace asm statements with builtin_llround. * sysdeps/aarch64/fpu/s_llroundf.c (__llroundf): Likewise. * sysdeps/aarch64/fpu/s_lrint.c (__lrint): Replace asm statements with builtin_rint and conversion to long int. * sysdeps/aarch64/fpu/s_lrintf.c (__lrintf): Likewise. * sysdeps/aarch64/fpu/s_lround.c (__lround): Replace asm statements with builtin_lround. * sysdeps/aarch64/fpu/s_lroundf.c (__lroundf): Replace asm statements with builtin_lroundf. * sysdeps/aarch64/fpu/s_nearbyint.c (__nearbyint): Replace asm statements with __builtin_nearbyint. * sysdeps/aarch64/fpu/s_nearbyintf.c (__nearbyintf): Replace asm statements with __builtin_nearbyintf. * sysdeps/aarch64/fpu/s_rint.c (__rint): Replace asm statements with __builtin_rint. * sysdeps/aarch64/fpu/s_rintf.c (__rintf): Replace asm statements with __builtin_rintf. * sysdeps/aarch64/fpu/s_round.c (__round): Replace asm statements with __builtin_round. * sysdeps/aarch64/fpu/s_roundf.c (__roundf): Replace asm statements with __builtin_roundf. * sysdeps/aarch64/fpu/s_trunc.c (__trunc): Replace asm statements with __builtin_trunc. * sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace asm statements with __builtin_truncf. * sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with -fno-math-errno.	2017-10-23 10:32:56 +01:00
Alan Modra	174935af03	PowerPC64 power8 strncpy cfi fixes cfi info for stack adjust needs to be on the insn doing the adjust. cfi describing register saves can be anywhere after the save insn but before the reg is altered. Fewer locations with cfi result in smaller cfi programs and possibly slightly faster exception handling. Thus the LR cfi_offset move. The idea behind ajusting sp after restoring regs is to break a register dependency chain, in this case not be using r1 immediately after it is modified. The missing LR cfi_restore meant that code after the blr, unaligned_lt_16 and other labels, would have cfi that said LR was at cfa+16, but that code is reached without LR being saved. * sysdeps/powerpc/powerpc64/power8/strncpy.S: Move LR cfi. Adjust stack after restoring regs. Add missing LR cfi_restore. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>	2017-10-23 07:46:58 +10:30
Alan Modra	750a0e4967	PowerPC64 power7 strncpy stack handling and cfi This patch moves the frame setup and teardown to immediately around the single memset call, as has been done for power8. I've also decreased FRAMESIZE to that needed to save the two callee-saved registers used. Plus added cfi. * sysdeps/powerpc/powerpc64/power7/strncpy.S: Decrease FRAMESIZE. Move LR save and frame setup/teardown and LR restore to immediately around memset call. Provide cfi. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>	2017-10-23 07:46:07 +10:30
H.J. Lu	5313581cb5	i386: Replace assembly versions of e_powf with generic e_powf.c This patch replaces i386 assembly versions of e_powf with generic e_powf.c. For workload-spec2017.wrf, on Nehalem, it improves performance by: Before After Improvement reciprocal-throughput 230.855 78.3358 194% latency 231.685 94.1259 146% On Skylake, it improves performance by: Before After Improvement reciprocal-throughput 239.858 47.4713 405% latency 247.57 93.8798 163% On IvyBridge with --disable-multi-arch, it improves performance by: Before After Improvement reciprocal-throughput 269.078 63.3758 324% latency 271.473 102.091 165% * sysdeps/i386/fpu/e_powf.S: Removed. * sysdeps/i386/fpu/e_powf_log2_data.c: Likewise. * sysdeps/i386/fpu/w_powf.c: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_powf.c. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_powf-sse2. (CFLAGS-e_powf-sse2.c): New. * sysdeps/i386/i686/fpu/multiarch/e_powf-sse2.c: New file. * sysdeps/i386/i686/fpu/multiarch/e_powf.c: Likewise.	2017-10-22 08:12:41 -07:00
H.J. Lu	6089a3ee24	i386: Replace assembly versions of e_log2f with generic e_log2f.c This patch replaces i386 assembly versions of e_log2f with generic e_log2f.c. For workload-spec2017.wrf, on Nehalem, it improves performance by: Before After Improvement reciprocal-throughput 92.3845 30.8752 199% latency 112.855 54.8645 105% On Skylake, it improves performance by: Before After Improvement reciprocal-throughput 98.7488 22.7507 334% latency 118.01 51.6083 128% On IvyBridge with --disable-multi-arch, it improves performance by: Before After Improvement reciprocal-throughput 106.635 28.8596 269% latency 129.888 56.9187 128% * sysdeps/i386/fpu/e_log2f.S: Removed. * sysdeps/i386/fpu/e_log2f_data.c: Likewise. * sysdeps/i386/fpu/w_log2f.c: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_log2f.c. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_log2f-sse2. (CFLAGS-e_log2f-sse2.c): New. * sysdeps/i386/i686/fpu/multiarch/e_log2f-sse2.c: New file. * sysdeps/i386/i686/fpu/multiarch/e_log2f.c: Likewise.	2017-10-22 08:10:18 -07:00
H.J. Lu	80bb593563	x86-64: Add powf with FMA For workload-spec2017.wrf, on Skylake, it improves performance by: Before After Improvement reciprocal-throughput 35.4713 27.3842 29% latency 82.4537 66.3175 24% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_powf-fma. (CFLAGS-e_powf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_powf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_powf.c: Likewise.	2017-10-22 08:08:00 -07:00
H.J. Lu	5c7adbd8ed	x86-64: Add log2f with FMA For workload-spec2017.wrf, on Skylake, it improves performance by: Before After Improvement reciprocal-throughput 16.5937 14.0789 17% latency 41.7755 35.3586 18% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_log2f-fma. (CFLAGS-e_log2f-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_log2f-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_log2f.c: Likewise.	2017-10-22 08:06:58 -07:00
H.J. Lu	0ccc7153cc	x86-64: Add logf with FMA For workload-spec2017.wrf, on Skylake, it improves performance by: Before After Improvement reciprocal-throughput 16.1534 13.8874 16% latency 41.9642 34.3072 22% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_logf-fma. (CFLAGS-e_logf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_logf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_logf.c: Likewise.	2017-10-22 08:05:15 -07:00
H.J. Lu	fe596486d6	i386: Replace assembly versions of e_logf with generic e_logf.c This patch replaces i386 assembly versions of e_logf with generic e_logf.c. For workload-spec2017.wrf, on Nehalem, it improves performance by: Before After Improvement reciprocal-throughput 73.3865 40.0454 83% latency 90.0985 54.4479 65% On Skylake, it improves performance by: Before After Improvement reciprocal-throughput 75.1384 22.1452 239% latency 91.9441 50.7925 81% On IvyBridge with --disable-multi-arch, it improves performance by: Before After Improvement reciprocal-throughput 84.5575 28.7879 193% latency 103.971 57.5231 80% * sysdeps/i386/fpu/e_logf.S: Removed. * sysdeps/i386/fpu/e_logf_data.c: Likewise. * sysdeps/i386/fpu/w_logf.c: Likewise. * sysdeps/i386/i686/fpu/e_logf.S: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_logf.c. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_logf-sse2. (CFLAGS-e_logf-sse2.c): New. * sysdeps/i386/i686/fpu/multiarch/e_logf-sse2.c: New file. * sysdeps/i386/i686/fpu/multiarch/e_logf.c: Likewise.	2017-10-22 08:02:58 -07:00
H.J. Lu	7eda65f69e	i386: Replace assembly versions of e_exp2f with generic e_exp2f.c This patch replaces i386 assembly versions of e_exp2f with generic e_exp2f.c. For workload-spec2017.wrf, on Nehalem, it improves performance by: Before After Improvement reciprocal-throughput 112.996 40.0454 182% latency 126.581 54.4479 132% On Skylake, it improves performance by: Before After Improvement reciprocal-throughput 113.14 39.447 186% latency 136.068 55.684 144% On IvyBridge with --disable-multi-arch, it improves performance by: Before After Improvement reciprocal-throughput 132.521 40.3759 228% latency 145.791 58.4587 149% * sysdeps/i386/fpu/e_exp2f.S: Removed. * sysdeps/i386/fpu/w_exp2f.c: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_exp2f.c. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_exp2f-sse2. (CFLAGS-e_exp2f-sse2.c): New. * sysdeps/i386/i686/fpu/multiarch/e_exp2f-sse2.c: New file. * sysdeps/i386/i686/fpu/multiarch/e_exp2f.c: Likewise.	2017-10-22 08:00:18 -07:00
H.J. Lu	5d15c96975	x86-64: Add exp2f with FMA For workload-spec2017.wrf, on Skylake, it improves performance by: Before After Improvement reciprocal-throughput 13.0291 11.2225 16% latency 44.5154 37.5766 18% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_exp2f-fma. (CFLAGS-e_exp2f-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_exp2f-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Likewise.	2017-10-22 07:57:50 -07:00
H.J. Lu	b2f6137ea5	i386: Replace assembly versions of e_expf with generic e_expf.c This patch replaces i386 assembly versions of e_expf with generic e_expf.c. For workload-spec2017.wrf, on Nehalem, it improves performance by: Before After Improvement reciprocal-throughput 55.5724 40.2664 38% latency 80.0687 60.8517 31% On Skylake, it improves performance by: Before After Improvement reciprocal-throughput 62.4056 39.4188 58% latency 85.5496 59.6377 43% On IvyBridge with --disable-multi-arch, it improves performance by: Before After Improvement reciprocal-throughput 133.707 40.3778 231% latency 149.191 63.2515 135% * sysdeps/i386/fpu/e_exp2f_data.c: Removed. * sysdeps/i386/fpu/e_expf.S: Likewise. * sysdeps/i386/fpu/math_errf.c: Likewise. * sysdeps/i386/fpu/w_expf.c: Likewise. * sysdeps/i386/i686/fpu/multiarch/e_expf-ia32.S: Likewise. * sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S: Likewise. * sysdeps/i386/i686/fpu/multiarch/w_expf.c: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_expf.c. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines): Remove e_expf-ia32. (CFLAGS-e_expf-sse2.c): New. * sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.c: New file. * sysdeps/i386/i686/fpu/multiarch/e_expf.c: Rewritten.	2017-10-22 07:54:50 -07:00
H.J. Lu	e1f59bebd8	x86-64: Replace assembly versions of e_expf with generic e_expf.c This patch replaces x86-64 assembly versions of e_expf with generic e_expf.c. For workload-spec2017.wrf, on Nehalem, it improves performance by: Before After Improvement reciprocal-throughput 36.039 20.7749 73% latency 58.8096 40.8715 43% On Skylake, it improves Before After Improvement reciprocal-throughput 18.4436 11.1693 65% latency 47.5162 37.5411 26% * sysdeps/x86_64/fpu/e_expf.S: Removed. * sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: Likewise. * sysdeps/x86_64/fpu/w_expf.c: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Updated for generic e_expf.c. * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_expf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_expf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf): Renamed to ... (__redirect_expf): This. (SYMBOL_NAME): Changed to expf. (__ieee754_expf): Renamed to ... (__expf): This. (__GI___expf): This. (__ieee754_expf): Add strong_alias. (__expf_finite): Likewise. (__expf): New. Include <sysdeps/ieee754/flt-32/e_expf.c>.	2017-10-22 07:49:55 -07:00
Joseph Myers	797ba44ba2	Add bits/floatn.h defines for more _FloatN / _FloatNx types. The bits/floatn.h header currently only has defines relating to _Float128. This patch adds defines relating to other _FloatN / _FloatNx types. The approach taken is to add defines for all _FloatN / _FloatNx types known to GCC, and to put them in a common bits/floatn-common.h header included at the end of all the individual bits/floatn.h headers. If in future some defines become different for different glibc configurations, they will move out into the separate bits/floatn.h headers. Some defines are expected always to be the same across glibc ports. Corresponding defines are nevertheless put in this header. The intent is that where there are conditionals (in headers or in non-installed files) that can just repeat the same or nearly the same logic for each floating-point type, they should do so, even if in fact the cases for some types could be unconditionally present or absent because the same conditionals are true or false for all glibc configurations. This should make the glibc code with such conditionals easier to read, because the reader can just see that the same conditionals are repeated for each type, rather than seeing different conditionals for different types and needing to reason, at each location with such differences, why those differences are indeed correct there. (Cases involving per-format rather than per-type logic are more likely still to need differences in how they handle different types.) Having such defines and conditionals also helps in incremental preparation for adding _Float32 / _Float64 / _Float32x / _Float64x function aliases. I intend subsequent patches to add such conditionals corresponding to those already present for _Float128, as well as making more architecture-specific function implementations use common macros to define aliases in preparation for adding such _FloatN / _FloatNx aliases. Tested for x86_64. * bits/floatn-common.h: New file. * math/Makefile (headers): Add bits/floatn-common.h. * bits/floatn.h: Include <bits/floatn-common.h>. * sysdeps/ia64/bits/floatn.h: Likewise. * sysdeps/ieee754/ldbl-128/bits/floatn.h: Likewise. * sysdeps/mips/ieee754/bits/floatn.h: Likewise. * sysdeps/powerpc/bits/floatn.h: Likewise. * sysdeps/x86/bits/floatn.h: Likewise.	2017-10-20 21:42:51 +00:00
Adhemerval Zanella	fe05e1cb6d	posix: Fix improper assert in Linux posix_spawn (BZ#22273) As noted by Florian Weimer, current Linux posix_spawn implementation can trigger an assert if the auxiliary process is terminated before actually setting the err member: 340 /* Child must set args.err to something non-negative - we rely on 341 the parent and child sharing VM. / 342 args.err = -1; [...] 362 new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size, 363 CLONE_VM \| CLONE_VFORK \| SIGCHLD, &args); 364 365 if (new_pid > 0) 366 { 367 ec = args.err; 368 assert (ec >= 0); Another possible issue is killing the child between setting the err and actually calling execve. In this case the process will not ran, but posix_spawn also will not report any error: 269 270 args->err = 0; 271 args->exec (args->file, args->argv, args->envp); As suggested by Andreas Schwab, this patch removes the faulty assert and also handles any signal that happens before fork and execve as the spawn was successful (and thus relaying the handling to the caller to figure this out). Different than Florian, I can not see why using atomics to set err would help here, essentially the code runs sequentially (due CLONE_VFORK) and I think it would not be legal the compiler evaluate ec without checking for new_pid result (thus there is no need to compiler barrier). Summarizing the possible scenarios on posix_spawn execution, we have: 1. For default case with a success execution, args.err will be 0, pid will not be collected and it will be reported to caller. 2. For default failure case, args.err will be positive and the it will be collected by the waitpid. An error will be reported to the caller. 3. For the unlikely case where the process was terminated and not collected by a caller signal handler, it will be reported as succeful execution and not be collected by posix_spawn (since args.err will be 0). The caller will need to actually handle this case. 4. For the unlikely case where the process was terminated and collected by caller we have 3 other possible scenarios: 4.1. The auxiliary process was terminated with args.err equal to 0: it will handled as 1. (so it does not matter if we hit the pid reuse race since we won't possible collect an unexpected process). 4.2. The auxiliary process was terminated after execve (due a failure in calling it) and before setting args.err to -1: it will also be handle as 1. but with the issue of not be able to report the caller a possible execve failures. 4.3. The auxiliary process was terminated after args.err is set to -1: this is the case where it will be possible to hit the pid reuse case where we will need to collected the auxiliary pid but we can not be sure if it will be expected one. I think for this case we need to actually change waitpid to use WNOHANG to avoid hanging indefinitely on the call and report an error to caller since we can't differentiate between a default failure as 2. and a possible pid reuse race issue. Checked on x86_64-linux-gnu. sysdeps/unix/sysv/linux/spawni.c (__spawnix): Handle the case where the auxiliary process is terminated by a signal before calling _exit or execve.	2017-10-20 16:25:59 -02:00
H.J. Lu	b52b0d793d	x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265 ] In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector, mask and bound registers. It simplifies _dl_runtime_resolve and supports different calling conventions. ld.so code size is reduced by more than 1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles than saving and restoring vector and bound registers individually. Latency for _dl_runtime_resolve to lookup the function, foo, from one shared library plus libc.so: Before After Change Westmere (SSE)/fxsave 345 866 151% IvyBridge (AVX)/xsave 420 643 53% Haswell (AVX)/xsave 713 1252 75% Skylake (AVX+MPX)/xsavec 559 719 28% Skylake (AVX512+MPX)/xsavec 145 272 87% Ryzen (AVX)/xsavec 280 553 97% This is the worst case where portion of time spent for saving and restoring registers is bigger than majority of cases. With smaller _dl_runtime_resolve code size, overall performance impact is negligible. On IvyBridge, differences in build and test time of binutils with lazy binding GCC and binutils are noises. On Westmere, differences in bootstrap and "makc check" time of GCC 7 with lazy binding GCC and binutils are also noises. [BZ #21265] * sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET): New. * sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>. (get_common_indeces): Set xsave_state_size, xsave_state_full_size and bit_arch_XSAVEC_Usable if needed. (init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow and bit_arch_Use_dl_runtime_resolve_opt. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): Removed. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (bit_arch_Prefer_No_AVX512): Updated. (bit_arch_MathVec_Prefer_No_AVX512): Likewise. (bit_arch_XSAVEC_Usable): New. (STATE_SAVE_OFFSET): Likewise. (STATE_SAVE_MASK): Likewise. [__ASSEMBLER__]: Include <cpu-features-offsets.h>. (cpu_features): Add xsave_state_size and xsave_state_full_size. (index_arch_Use_dl_runtime_resolve_opt): Removed. (index_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_XSAVEC_Usable): New. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Support XSAVEC_Usable. Remove Use_dl_runtime_resolve_slow. * sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables is enabled. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, _dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt, _dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec. * sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE): Removed. (DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT instead of VEC_SIZE. (REGISTER_SAVE_BND0): Removed. (REGISTER_SAVE_BND1): Likewise. (REGISTER_SAVE_BND3): Likewise. (REGISTER_SAVE_RAX): Always defined to 0. (VMOV): Removed. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_slow): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_avx512): Likewise. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (USE_FXSAVE): New. (_dl_runtime_resolve_fxsave): Likewise. (USE_XSAVE): Likewise. (_dl_runtime_resolve_xsave): Likewise. (USE_XSAVEC): Likewise. (_dl_runtime_resolve_xsavec): Likewise. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512): Removed. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (_dl_runtime_resolve_fxsave): New. (_dl_runtime_resolve_xsave): Likewise. (_dl_runtime_resolve_xsavec): Likewise.	2017-10-20 11:00:34 -07:00
H.J. Lu	9ba7e81028	m68k: Update elf_machine_load_address for static PIE When --enable-static-pie is used to configure glibc, we need to use _dl_relocate_static_pie to compute load address in static PIE. * sysdeps/m68k/dl-machine.h (elf_machine_load_address): Use _dl_relocate_static_pie instead of _dl_start to compute load address in static PIE.	2017-10-20 03:36:47 -07:00

1 2 3 4 5 ...

11376 Commits