Commit Graph

140 Commits

Author SHA1 Message Date
Adhemerval Zanella eab380d8ec Move shared pthread definitions to common headers
This patch removes all the replicated pthread definition accross the
architectures and consolidates it on shared headers.  The new
organization is as follow:

  * Architecture specific definition (such as pthread types sizes) are
    place in the new pthreadtypes-arch.h header in arch specific path.

  * All shared structure definition are moved to a common NPTL header
    at sysdeps/nptl/bits/pthreadtypes.h (with now includes the arch
    specific one for internal definitions).

  * Also, for C11 future thread support, both mutex and condition
    definition are placed in a common header at
    sysdeps/nptl/bits/thread-shared-types.h.

It is also a refactor patch without expected functional changes.
Checked with a build for all major ABI (aarch64-linux-gnu, alpha-linux-gnu,
arm-linux-gnueabi, i386-linux-gnu, ia64-linux-gnu,
m68k-linux-gnu, microblaze-linux-gnu, mips{64}-linux-gnu, nios2-linux-gnu,
powerpc{64le}-linux-gnu, s390{x}-linux-gnu, sparc{64}-linux-gnu,
tile{pro,gx}-linux-gnu, and x86_64-linux-gnu).

	* posix/Makefile (headers): Add pthreadtypes-arch.h and
	thread-shared-types.h.
	* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h: New file: arch
	specific thread definition.
	* sysdeps/alpha/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/arm/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/hppa/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/ia64/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/mips/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/powerpc/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/s390/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/sh/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/sparc/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/tile/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/x86/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/nptl/bits/thread-shared-types.h: New file: shared
	thread definition between POSIX and C11.
	* sysdeps/aarch64/nptl/bits/pthreadtypes.h.: Remove file.
	* sysdeps/alpha/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/arm/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/hppa/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/m68k/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/microblaze/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/mips/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/nios2/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/ia64/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/powerpc/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/s390/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/sh/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/sparc/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/tile/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/x86/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/nptl/bits/pthreadtypes.h: New file: common thread
	definitions shared across all architectures.
2017-05-09 17:49:17 -03:00
H.J. Lu 1432d38ea0 x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391]
dl_platform and dl_hwcap are set from AT_PLATFORM and AT_HWCAP very
early during startup.  They are used by dynamic linker to determine
platform and build an array of hardware capability names, which are
added to search path when loading shared object.  dl_platform and
dl_hwcap are unused on x86-64.  On i386, i386, i486, i586 and i686
platforms were supported and only SSE2 capability was used.

On x86, usage of AT_PLATFORM and AT_HWCAP to determine platform and
processor capabilities is obsolete since all information is available
in dl_x86_cpu_features.  This patch sets dl_platform and dl_hwcap from
dl_x86_cpu_features in dynamic linker.  On i386, the available plaforms
are changed to i586 and i686 since i386 has been deprecated.  On x86-64,
the available plaforms are haswell, which is for Haswell class processors
with BMI1, BMI2, LZCNT, MOVBE, POPCNT, AVX2 and FMA, and xeon_phi, which
is for Xeon Phi class processors with AVX512F, AVX512CD, AVX512ER and
AVX512PF.  A capability, avx512_1, is also added to x86-64 for AVX512
ISAs: AVX512F, AVX512CD, AVX512BW, AVX512DQ and AVX512VL.

	[BZ #21391]
	* sysdeps/i386/dl-machine.h (dl_platform_init) [IS_IN (rtld)]:
	Only call init_cpu_features.
	[!IS_IN (rtld)]: Only set GLRO(dl_platform) to NULL if needed.
	* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
	* sysdeps/i386/dl-procinfo.h: Removed.
	* sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Don't include
	<sysdeps/i386/dl-procinfo.h> nor <ldsodefs.h>.  Include
	<sysdeps/x86/dl-procinfo.h>.
	(_dl_procinfo): Replace _DL_HWCAP_COUNT with 32.
	* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.h [!IS_IN (ldconfig)]:
	Include <sysdeps/x86/dl-procinfo.h> instead of
	 <sysdeps/generic/dl-procinfo.h>.
	* sysdeps/x86/cpu-features.c: Include <dl-hwcap.h>.
	(init_cpu_features): Set dl_platform, dl_hwcap and dl_hwcap_mask.
	* sysdeps/x86/cpu-features.h (bit_cpu_LZCNT): New.
	(bit_cpu_MOVBE): Likewise.
	(bit_cpu_BMI1): Likewise.
	(bit_cpu_BMI2): Likewise.
	(index_cpu_BMI1): Likewise.
	(index_cpu_BMI2): Likewise.
	(index_cpu_LZCNT): Likewise.
	(index_cpu_MOVBE): Likewise.
	(index_cpu_POPCNT): Likewise.
	(reg_BMI1): Likewise.
	(reg_BMI2): Likewise.
	(reg_LZCNT): Likewise.
	(reg_MOVBE): Likewise.
	(reg_POPCNT): Likewise.
	* sysdeps/x86/dl-hwcap.h: New file.
	* sysdeps/x86/dl-procinfo.h: Likewise.
	* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): New.
	(_dl_x86_platforms): Likewise.
2017-05-03 13:44:35 -07:00
H.J. Lu 4cb334c4d6 x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]
On Skylake server, AVX512 load/store instructions in memcpy/memset may
lead to lower CPU turbo frequency in certain situations.  Use of AVX2
in memcpy/memset has been observed to have improved overall performance
in many workloads due to the higher frequency.

Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
used on Skylake server.

	[BZ #21396]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	Prefer_No_AVX512 if AVX512ER isn't available.
	* sysdeps/x86/cpu-features.h (bit_arch_Prefer_No_AVX512): New.
	(index_arch_Prefer_No_AVX512): Likewise.
	* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Don't use
	AVX512 version if Prefer_No_AVX512 is set.
	* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk):
	Likewise.
	* sysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Likewise.
	* sysdeps/x86_64/multiarch/memmove_chk.S (__memmove_chk):
	Likewise.
	* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
	* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk):
	Likewise.
	* sysdeps/x86_64/multiarch/memset.S (memset): Likewise.
	* sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk):
	Likewise.
2017-04-18 14:01:45 -07:00
H.J. Lu 1c53cb49de x86: Set Prefer_No_VZEROUPPER if AVX512ER is available
AVX512ER won't be implemented in any Xeon processors and will be in
all Xeon Phi processors.  Don't check CPU model number when setting
Prefer_No_VZEROUPPER for Xeon Phi.  Instead, set Prefer_No_VZEROUPPER
if AVX512ER is available.  It works with current and future Xeon Phi
and non-Xeon Phi processors.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	Prefer_No_VZEROUPPER if AVX512ER is available.
	* sysdeps/x86/cpu-features.h
	(bit_cpu_AVX512PF): New.
	(bit_cpu_AVX512ER): Likewise.
	(bit_cpu_AVX512CD): Likewise.
	(bit_cpu_AVX512BW): Likewise.
	(bit_cpu_AVX512VL): Likewise.
	(index_cpu_AVX512PF): Likewise.
	(index_cpu_AVX512ER): Likewise.
	(index_cpu_AVX512CD): Likewise.
	(index_cpu_AVX512BW): Likewise.
	(index_cpu_AVX512VL): Likewise.
	(reg_AVX512PF): Likewise.
	(reg_AVX512ER): Likewise.
	(reg_AVX512CD): Likewise.
	(reg_AVX512BW): Likewise.
	(reg_AVX512VL): Likewise.
2017-04-18 08:27:32 -07:00
Adhemerval Zanella 38efe8c5a5 Consolidate pthreadtype.h placementConsolidate pthreadtype.h placement
This patch moves all arch specific pthreadtypes.h to a similar path
for all architectures (sysdeps/unix/sysv/<arch>/bits).  No functional
or build change is expected.  The idea is mainly to organize the
header placement for all architectures.

Checked with a build for all major ABI (aarch64-linux-gnu, alpha-linux-gnu,
arm-linux-gnueabi, i386-linux-gnu, ia64-linux-gnu,
m68k-linux-gnu, microblaze-linux-gnu [1], mips{64}-linux-gnu, nios2-linux-gnu,
powerpc{64le}-linux-gnu, s390{x}-linux-gnu, sparc{64}-linux-gnu,
tile{pro,gx}-linux-gnu, and x86_64-linux-gnu).

	* sysdeps/unix/sysv/linux/x86/Implies: New file.
	* sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h: Move to ...
	* sysdeps/alpha/nptl/bits/pthreadtypes.h: ... here.
	* sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h: Move to ...
	* sysdeps/powerpc/nptl/bits/pthreadtypes.h: ... here.
	* sysdeps/x86/bits/pthreadtypes.h: Move to ...
	* sysdeps/x86/nptl/bits/pthreadtypes.h: ... here.
2017-04-10 17:33:10 -03:00
H.J. Lu fda19e0438 Add sysdeps/x86/dl-procinfo.c
Add sysdeps/x86/dl-procinfo.c for x86 version of processor capability
information to reduce duplication between i386 and x86_64 dl-procinfo.c.

	* sysdeps/i386/dl-procinfo.c: Include
	<sysdeps/x86/dl-procinfo.c>.
	* sysdeps/x86_64/dl-procinfo.c: Likewise.
	* sysdeps/x86/dl-procinfo.c: New file.
2017-04-10 12:01:45 -07:00
H.J. Lu bf7730194f Check if SSE is available with HAS_CPU_FEATURE
Similar to other CPU feature checks, check if SSE is available with
HAS_CPU_FEATURE.

	* sysdeps/i386/fpu/fclrexcpt.c (__feclearexcept): Use
	HAS_CPU_FEATURE to check for SSE.
	* sysdeps/i386/fpu/fedisblxcpt.c (fedisableexcept): Likewise.
	* sysdeps/i386/fpu/feenablxcpt.c (feenableexcept): Likewise.
	* sysdeps/i386/fpu/fegetenv.c (__fegetenv): Likewise.
	* sysdeps/i386/fpu/fegetmode.c (fegetmode): Likewise.
	* sysdeps/i386/fpu/feholdexcpt.c (__feholdexcept): Likewise.
	* sysdeps/i386/fpu/fesetenv.c (__fesetenv): Likewise.
	* sysdeps/i386/fpu/fesetmode.c (fesetmode): Likewise.
	* sysdeps/i386/fpu/fesetround.c (__fesetround): Likewise.
	* sysdeps/i386/fpu/feupdateenv.c (__feupdateenv): Likewise.
	* sysdeps/i386/fpu/fgetexcptflg.c (__fegetexceptflag): Likewise.
	* sysdeps/i386/fpu/fsetexcptflg.c (__fesetexceptflag): Likewise.
	* sysdeps/i386/fpu/ftestexcept.c (fetestexcept): Likewise.
	* sysdeps/i386/setfpucw.c (__setfpucw): Likewise.
	* sysdeps/x86/cpu-features.h (bit_cpu_SSE): New.
	(index_cpu_SSE): Likewise.
	(reg_SSE): Likewise.
2017-04-07 07:44:59 -07:00
H.J. Lu b170d2e7ab Use CPU_FEATURES_CPU_P to check if AVX is available
Don't use bit_cpu_AVX directly.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Check AVX with
	CPU_FEATURES_CPU_P.
2017-03-17 11:38:13 -07:00
Joseph Myers 2072f5c34e Remove C++ namespace handling from glibc headers.
glibc headers include some code (not particularly consistent or
systematic) to put various declarations in C++ namespaces std and
__c99, if _GLIBCPP_USE_NAMESPACES is defined.

As noted in <https://gcc.gnu.org/ml/libstdc++/2017-03/msg00025.html>,
this macro was removed from libstdc++ in 2000.  I don't expect
compilation with such old versions of libstdc++ to work with current
glibc headers anyway (whereas old *binaries* are expected to stay
working with current glibc); this patch (which should be a no-op with
any libstdc++ version postdating that removal) removes all this code
from the glibc headers.

The begin-end-check.pl test, whose comments say it is about checking
these namespace macro calls, is also removed.  The code in that test
would have covered __BEGIN_DECLS / __END_DECLS as well, but if those
weren't properly matched it would show up with the
check-installed-headers-cxx tests, so I don't think there is an actual
use for keeping begin-end-check.pl with the namespace code removed.

Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).

	* misc/sys/cdefs.h (__BEGIN_NAMESPACE_STD): Remove macro.
	(__END_NAMESPACE_STD): Likewise.
	(__USING_NAMESPACE_STD): Likewise.
	(__BEGIN_NAMESPACE_C99): Likewise.
	(__END_NAMESPACE_C99): Likewise.
	(__USING_NAMESPACE_C99): Likewise.
	* math/math.h (_Mdouble_BEGIN_NAMESPACE): Do not define and
	undefine macro.
	(_Mdouble_END_NAMESPACE): Likewise.
	* ctype/ctype.h: Do not handle C++ namespaces.
	* libio/bits/stdio-ldbl.h: Likewise.
	* libio/stdio.h: Likewise.
	* locale/locale.h: Likewise.
	* math/bits/mathcalls.h: Likewise.
	* setjmp/setjmp.h: Likewise.
	* signal/signal.h: Likewise.
	* stdlib/bits/stdlib-float.h: Likewise.
	* stdlib/bits/stdlib-ldbl.h: Likewise.
	* stdlib/stdlib.h: Likewise.
	* string/string.h: Likewise.
	* sysdeps/x86/fpu/bits/mathinline.h: Likewise.
	* time/bits/types/clock_t.h: Likewise.
	* time/bits/types/struct_tm.h: Likewise.
	* time/bits/types/time_t.h: Likewise.
	* time/time.h: Likewise.
	* wcsmbs/bits/wchar-ldbl.h: Likewise.
	* wcsmbs/uchar.h: Likewise.
	* wcsmbs/wchar.h: Likewise.
	[_GLIBCPP_USE_NAMESPACES] (wint_t): Remove conditional definition.
	* wctype/wctype.h: Do not handle C++ namespaces.
	* scripts/begin-end-check.pl: Remove.
	* Makefile (installed-headers): Likewise.
	(tests-special): Do not add $(objpfx)begin-end-check.out.
	($(objpfx)begin-end-check.out): Remove.
2017-03-16 13:31:57 +00:00
Joseph Myers ffe308e4fc Fix test-math-vector-sincos.h aliasing.
x86_64 libmvec tests have been failing to build lately with GCC
mainline with -Wuninitialized errors, and Markus Trippelsdorf traced
this to an aliasing issue
<https://sourceware.org/ml/libc-alpha/2017-03/msg00169.html>.

This patch fixes the aliasing issue, so that the vectors-of-pointers
are initialized using a union instead of pointer casts.  This also
fixes the testsuite build failures with GCC mainline.

Tested for x86_64 (full testsuite with GCC 6; testsuite build with GCC
mainline with build-many-glibcs.py).

	* sysdeps/x86/fpu/test-math-vector-sincos.h (INIT_VEC_PTRS_LOOP):
	Use a union when storing pointers.
	(VECTOR_WRAPPER_fFF_2): Do not take address of integer vector and
	cast result when passing to INIT_VEC_PTRS_LOOP.
	(VECTOR_WRAPPER_fFF_3): Likewise.
	(VECTOR_WRAPPER_fFF_4): Likewise.
2017-03-15 17:32:46 +00:00
H.J. Lu 52ac22365a Use index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bit
* sysdeps/x86/cpu-features.c (init_cpu_features): Use
	index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bit.
2017-02-17 11:53:26 -08:00
Torvald Riegel cc25c8b4c1 New pthread rwlock that is more scalable.
This replaces the pthread rwlock with a new implementation that uses a
more scalable algorithm (primarily through not using a critical section
anymore to make state changes).  The fast path for rdlock acquisition and
release is now basically a single atomic read-modify write or CAS and a few
branches.  See nptl/pthread_rwlock_common.c for details.

	* nptl/DESIGN-rwlock.txt: Remove.
	* nptl/lowlevelrwlock.sym: Remove.
	* nptl/Makefile: Add new tests.
	* nptl/pthread_rwlock_common.c: New file.  Contains the new rwlock.
	* nptl/pthreadP.h (PTHREAD_RWLOCK_PREFER_READER_P): Remove.
	(PTHREAD_RWLOCK_WRPHASE, PTHREAD_RWLOCK_WRLOCKED,
	PTHREAD_RWLOCK_RWAITING, PTHREAD_RWLOCK_READER_SHIFT,
	PTHREAD_RWLOCK_READER_OVERFLOW, PTHREAD_RWLOCK_WRHANDOVER,
	PTHREAD_RWLOCK_FUTEX_USED): New.
	* nptl/pthread_rwlock_init.c (__pthread_rwlock_init): Adapt to new
	implementation.
	* nptl/pthread_rwlock_rdlock.c (__pthread_rwlock_rdlock_slow): Remove.
	(__pthread_rwlock_rdlock): Adapt.
	* nptl/pthread_rwlock_timedrdlock.c
	(pthread_rwlock_timedrdlock): Adapt.
	* nptl/pthread_rwlock_timedwrlock.c
	(pthread_rwlock_timedwrlock): Adapt.
	* nptl/pthread_rwlock_trywrlock.c (pthread_rwlock_trywrlock): Adapt.
	* nptl/pthread_rwlock_tryrdlock.c (pthread_rwlock_tryrdlock): Adapt.
	* nptl/pthread_rwlock_unlock.c (pthread_rwlock_unlock): Adapt.
	* nptl/pthread_rwlock_wrlock.c (__pthread_rwlock_wrlock_slow): Remove.
	(__pthread_rwlock_wrlock): Adapt.
	* nptl/tst-rwlock10.c: Adapt.
	* nptl/tst-rwlock11.c: Adapt.
	* nptl/tst-rwlock17.c: New file.
	* nptl/tst-rwlock18.c: New file.
	* nptl/tst-rwlock19.c: New file.
	* nptl/tst-rwlock2b.c: New file.
	* nptl/tst-rwlock8.c: Adapt.
	* nptl/tst-rwlock9.c: Adapt.
	* sysdeps/aarch64/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/arm/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/hppa/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/ia64/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/m68k/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/microblaze/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/mips/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/nios2/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/s390/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/sh/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/sparc/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/tile/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h
	(pthread_rwlock_t): Adapt.
	* sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h
	(pthread_rwlock_t): Adapt.
	* sysdeps/x86/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* nptl/nptl-printers.py (): Adapt.
	* nptl/nptl_lock_constants.pysym: Adapt.
	* nptl/test-rwlock-printers.py: Adapt.
	* nptl/test-rwlockattr-printers.c: Adapt.
	* nptl/test-rwlockattr-printers.py: Adapt.
2017-01-10 11:50:17 +01:00
Joseph Myers bfff8b1bec Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
Torvald Riegel ed19993b5b New condvar implementation that provides stronger ordering guarantees.
This is a new implementation for condition variables, required
after http://austingroupbugs.net/view.php?id=609 to fix bug 13165.  In
essence, we need to be stricter in which waiters a signal or broadcast
is required to wake up; this couldn't be solved using the old algorithm.
ISO C++ made a similar clarification, so this also fixes a bug in
current libstdc++, for example.

We can't use the old algorithm anymore because futexes do not guarantee
to wake in FIFO order.  Thus, when we wake, we can't simply let any
waiter grab a signal, but we need to ensure that one of the waiters
happening before the signal is woken up.  This is something the previous
algorithm violated (see bug 13165).

There's another issue specific to condvars: ABA issues on the underlying
futexes.  Unlike mutexes that have just three states, or semaphores that
have no tokens or a limited number of them, the state of a condvar is
the *order* of the waiters.  A waiter on a semaphore can grab a token
whenever one is available; a condvar waiter must only consume a signal
if it is eligible to do so as determined by the relative order of the
waiter and the signal.
Therefore, this new algorithm maintains two groups of waiters: Those
eligible to consume signals (G1), and those that have to wait until
previous waiters have consumed signals (G2).  Once G1 is empty, G2
becomes the new G1.  64b counters are used to avoid ABA issues.

This condvar doesn't yet use a requeue optimization (ie, on a broadcast,
waking just one thread and requeueing all others on the futex of the
mutex supplied by the program).  I don't think doing the requeue is
necessarily the right approach (but I haven't done real measurements
yet):
* If a program expects to wake many threads at the same time and make
that scalable, a condvar isn't great anyway because of how it requires
waiters to operate mutually exclusive (due to the mutex usage).  Thus, a
thundering herd problem is a scalability problem with or without the
optimization.  Using something like a semaphore might be more
appropriate in such a case.
* The scalability problem is actually at the mutex side; the condvar
could help (and it tries to with the requeue optimization), but it
should be the mutex who decides how that is done, and whether it is done
at all.
* Forcing all but one waiter into the kernel-side wait queue of the
mutex prevents/avoids the use of lock elision on the mutex.  Thus, it
prevents the only cure against the underlying scalability problem
inherent to condvars.
* If condvars use short critical sections (ie, hold the mutex just to
check a binary flag or such), which they should do ideally, then forcing
all those waiter to proceed serially with kernel-based hand-off (ie,
futex ops in the mutex' contended state, via the futex wait queues) will
be less efficient than just letting a scalable mutex implementation take
care of it.  Our current mutex impl doesn't employ spinning at all, but
if critical sections are short, spinning can be much better.
* Doing the requeue stuff requires all waiters to always drive the mutex
into the contended state.  This leads to each waiter having to call
futex_wake after lock release, even if this wouldn't be necessary.

	[BZ #13165]
	* nptl/pthread_cond_broadcast.c (__pthread_cond_broadcast): Rewrite to
	use new algorithm.
	* nptl/pthread_cond_destroy.c (__pthread_cond_destroy): Likewise.
	* nptl/pthread_cond_init.c (__pthread_cond_init): Likewise.
	* nptl/pthread_cond_signal.c (__pthread_cond_signal): Likewise.
	* nptl/pthread_cond_wait.c (__pthread_cond_wait): Likewise.
	(__pthread_cond_timedwait): Move here from pthread_cond_timedwait.c.
	(__condvar_confirm_wakeup, __condvar_cancel_waiting,
	__condvar_cleanup_waiting, __condvar_dec_grefs,
	__pthread_cond_wait_common): New.
	(__condvar_cleanup): Remove.
	* npt/pthread_condattr_getclock.c (pthread_condattr_getclock): Adapt.
	* npt/pthread_condattr_setclock.c (pthread_condattr_setclock):
	Likewise.
	* npt/pthread_condattr_getpshared.c (pthread_condattr_getpshared):
	Likewise.
	* npt/pthread_condattr_init.c (pthread_condattr_init): Likewise.
	* nptl/tst-cond1.c: Add comment.
	* nptl/tst-cond20.c (do_test): Adapt.
	* nptl/tst-cond22.c (do_test): Likewise.
	* sysdeps/aarch64/nptl/bits/pthreadtypes.h (pthread_cond_t): Adapt
	structure.
	* sysdeps/arm/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise.
	* sysdeps/ia64/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise.
	* sysdeps/m68k/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise.
	* sysdeps/microblaze/nptl/bits/pthreadtypes.h (pthread_cond_t):
	Likewise.
	* sysdeps/mips/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise.
	* sysdeps/nios2/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise.
	* sysdeps/s390/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise.
	* sysdeps/sh/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise.
	* sysdeps/tile/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise.
	* sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h (pthread_cond_t):
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h (pthread_cond_t):
	Likewise.
	* sysdeps/x86/bits/pthreadtypes.h (pthread_cond_t): Likewise.
	* sysdeps/nptl/internaltypes.h (COND_NWAITERS_SHIFT): Remove.
	(COND_CLOCK_BITS): Adapt.
	* sysdeps/nptl/pthread.h (PTHREAD_COND_INITIALIZER): Adapt.
	* nptl/pthreadP.h (__PTHREAD_COND_CLOCK_MONOTONIC_MASK,
	__PTHREAD_COND_SHARED_MASK): New.
	* nptl/nptl-printers.py (CLOCK_IDS): Remove.
	(ConditionVariablePrinter, ConditionVariableAttributesPrinter): Adapt.
	* nptl/nptl_lock_constants.pysym: Adapt.
	* nptl/test-cond-printers.py: Adapt.
	* sysdeps/unix/sysv/linux/hppa/internaltypes.h (cond_compat_clear,
	cond_compat_check_and_clear): Adapt.
	* sysdeps/unix/sysv/linux/hppa/pthread_cond_timedwait.c: Remove file ...
	* sysdeps/unix/sysv/linux/hppa/pthread_cond_wait.c
	(__pthread_cond_timedwait): ... and move here.
	* nptl/DESIGN-condvar.txt: Remove file.
	* nptl/lowlevelcond.sym: Likewise.
	* nptl/pthread_cond_timedwait.c: Likewise.
	* sysdeps/unix/sysv/linux/i386/i486/pthread_cond_broadcast.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i486/pthread_cond_signal.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i486/pthread_cond_timedwait.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i486/pthread_cond_wait.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i586/pthread_cond_broadcast.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i586/pthread_cond_signal.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i586/pthread_cond_timedwait.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i586/pthread_cond_wait.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i686/pthread_cond_broadcast.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i686/pthread_cond_signal.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i686/pthread_cond_timedwait.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/i686/pthread_cond_wait.S: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: Likewise.
2016-12-31 14:56:47 +01:00
Andrew Senkevich 2702856bf4 Disable TSX on some Haswell processors.
Patch disables Intel TSX on some Haswell processors to avoid TSX
on kernels that weren't updated with the latest microcode package
(which disables broken feature by default).

    * sysdeps/x86/cpu-features.c (get_common_indeces): Add
    stepping identification.
    (init_cpu_features): Add handle of Haswell.
2016-12-19 14:15:57 +03:00
Joseph Myers 0acb8a2a85 Refactor long double information into bits/long-double.h.
Information about whether the ABI of long double is the same as that
of double is split between bits/mathdef.h and bits/wordsize.h.

When the ABIs are the same, bits/mathdef.h defines
__NO_LONG_DOUBLE_MATH.  In addition, in the case where the same glibc
binary supports both -mlong-double-64 and -mlong-double-128,
bits/wordsize.h defines __LONG_DOUBLE_MATH_OPTIONAL, along with
__NO_LONG_DOUBLE_MATH if this particular compilation is with
-mlong-double-64.

As part of the refactoring I proposed in
<https://sourceware.org/ml/libc-alpha/2016-11/msg00745.html>, this
patch puts all that information in a single header,
bits/long-double.h.  It is included from sys/cdefs.h alongside the
include of bits/wordsize.h, so other headers generally do not need to
include bits/long-double.h directly.

Previously, various bits/mathdef.h headers and bits/wordsize.h headers
had this long double information (including implicitly in some
bits/mathdef.h headers through not having the defines present in the
default version).  After the patch, it's all in six bits/long-double.h
headers.  Furthermore, most of those new headers are not
architecture-specific.  Architectures with optional long double all
use the ldbl-opt sysdeps directory, either in the order (ldbl-64-128,
ldbl-opt, ldbl-128) or (ldbl-128ibm, ldbl-opt).  Thus a generic header
for the case where long double = double, and headers in ldbl-128,
ldbl-96 and ldbl-opt, suffices to cover every architecture except for
cases where long double properties vary between different ABIs sharing
a set of installed headers; fortunately all the ldbl-opt cases share a
single compiler-predefined macro __LONG_DOUBLE_128__ that can be used
to tell whether this compilation is -mlong-double-64 or
-mlong-double-128.

The two cases where a set of headers is shared between ABIs with
different long double properties, MIPS (o32 has long double = double,
other ABIs use ldbl-128) and SPARC (32-bit has optional long double,
64-bit has required long double), need their own bits/long-double.h
headers.

As with bits/wordsize.h, multiple-include protection for this header
is generally implicit through the include guards on sys/cdefs.h, and
multiple inclusion is harmless in any case.  There is one subtlety:
the header must not define __LONG_DOUBLE_MATH_OPTIONAL if
__NO_LONG_DOUBLE_MATH was defined before its inclusion, because doing
so breaks how sysdeps/ieee754/ldbl-opt/nldbl-compat.h defines
__NO_LONG_DOUBLE_MATH itself before including system headers.  Subject
to keeping that working, it would be reasonable to move these macros
from defined/undefined #ifdef to always-defined 1/0 #if semantics, but
this patch does not attempt to do so, just rearranges where the macros
are defined.

After this patch, the only use of bits/mathdef.h is the alpha one for
modifying complex function ABIs for old GCC.  Thus, all versions of
the header other than the default and alpha versions are removed, as
is the include from math.h.

Tested for x86_64 and x86.  Also did compilation-only testing with
build-many-glibcs.py.

	* bits/long-double.h: New file.
	* sysdeps/ieee754/ldbl-128/bits/long-double.h: Likewise.
	* sysdeps/ieee754/ldbl-96/bits/long-double.h: Likewise.
	* sysdeps/ieee754/ldbl-opt/bits/long-double.h: Likewise.
	* sysdeps/mips/bits/long-double.h: Likewise.
	* sysdeps/unix/sysv/linux/sparc/bits/long-double.h: Likewise.
	* math/Makefile (headers): Add bits/long-double.h.
	* misc/sys/cdefs.h: Include <bits/long-double.h>.
	* stdlib/strtold.c: Include <bits/long-double.h> instead of
	<bits/wordsize.h>.
	* bits/mathdef.h [!_COMPLEX_H]: Do not allow inclusion.
	[!__NO_LONG_DOUBLE_MATH]: Remove conditional code.
	* math/math.h: Do not include <bits/mathdef.h>.
	* sysdeps/aarch64/bits/mathdef.h: Remove file.
	* sysdeps/alpha/bits/mathdef.h [!_COMPLEX_H]: Do not allow
	inclusion.
	* sysdeps/ia64/bits/mathdef.h: Remove file.
	* sysdeps/m68k/m680x0/bits/mathdef.h: Likewise.
	* sysdeps/mips/bits/mathdef.h: Likewise.
	* sysdeps/powerpc/bits/mathdef.h: Likewise.
	* sysdeps/s390/bits/mathdef.h: Likewise.
	* sysdeps/sparc/bits/mathdef.h: Likewise.
	* sysdeps/x86/bits/mathdef.h: Likewise.
	* sysdeps/s390/s390-32/bits/wordsize.h
	[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Remove
	conditional code.
	* sysdeps/s390/s390-64/bits/wordsize.h
	[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]:
	Likewise.
	* sysdeps/unix/sysv/linux/alpha/bits/wordsize.h
	[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]:
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h
	[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]:
	Likewise.
	* sysdeps/unix/sysv/linux/sparc/bits/wordsize.h
	[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]:
	Likewise.
2016-12-14 18:27:56 +00:00
Torvald Riegel ca6e601a9d Use C11-like atomics instead of plain memory accesses in x86 lock elision.
This uses atomic operations to access lock elision metadata that is accessed
concurrently (ie, adapt_count fields).  The size of the data is less than a
word but accessed only with atomic loads and stores; therefore, we add
support for shorter-size atomic load and stores too.

	* include/atomic.h (__atomic_check_size_ls): New.
	(atomic_load_relaxed, atomic_load_acquire, atomic_store_relaxed,
	atomic_store_release): Use it.
	* sysdeps/x86/elide.h (ACCESS_ONCE): Remove.
	(elision_adapt, ELIDE_LOCK): Use atomics.
	* sysdeps/unix/sysv/linux/x86/elision-lock.c (__lll_lock_elision): Use
	atomics and improve code comments.
	* sysdeps/unix/sysv/linux/x86/elision-trylock.c
	(__lll_trylock_elision): Likewise.
2016-12-05 16:19:43 +01:00
Joseph Myers b2491db6c8 Refactor FP_ILOGB* out of bits/mathdef.h.
Continuing the refactoring of bits/mathdef.h, this patch stops it
defining FP_ILOGB0 and FP_ILOGBNAN, moving the required information to
a new header bits/fp-logb.h.

There are only two possible values of each of those macros permitted
by ISO C.  TS 18661-1 adds corresponding macros for llogb, and their
values are required to correspond to those of the ilogb macros in the
obvious way.  Thus two boolean values - for which the same choices are
correct for most architectures - suffice to determine the value of all
these macros, and by defining macros for those boolean values in
bits/fp-logb.h we can then define the public FP_* macros in math.h and
avoid the present duplication of the associated feature test macro
logic.

This patch duly moves to bits/fp-logb.h defining __FP_LOGB0_IS_MIN and
__FP_LOGBNAN_IS_MIN.  Default definitions of those to 0 are correct
for both architectures, while ia64, m68k and x86 get their own
versions of bits/fp-logb.h to reflect their use of values different
from the defaults.

The patch renders many copies of bits/mathdef.h trivial (needed only
to avoid the default __NO_LONG_DOUBLE_MATH).  I'll revise
<https://sourceware.org/ml/libc-alpha/2016-11/msg00865.html>
accordingly so that it removes all bits/mathdef.h headers except the
default one and the alpha one, and arranges for the header to be
included only by complex.h as the only remaining use at that point
will be for the alpha ABI issues there.

Tested for x86_64 and x86.  Also did compile-only testing with
build-many-glibcs.py (using glibc sources from before the commit that
introduced many build failures with undefined __GI___sigsetjmp).

	* bits/fp-logb.h: New file.
	* sysdeps/ia64/bits/fp-logb.h: Likewise.
	* sysdeps/m68k/m680x0/bits/fp-logb.h: Likewise.
	* sysdeps/x86/bits/fp-logb.h: Likewise.
	* math/Makefile (headers): Add bits/fp-logb.h.
	* math/math.h: Include <bits/fp-logb.h>.
	[__USE_ISOC99] (FP_ILOGB0): Define based on __FP_LOGB0_IS_MIN.
	[__USE_ISOC99] (FP_ILOGBNAN): Define based on __FP_LOGBNAN_IS_MIN.
	* bits/mathdef.h (FP_ILOGB0): Remove.
	(FP_ILOGBNAN): Likewise.
	* sysdeps/aarch64/bits/mathdef.h (FP_ILOGB0): Likewise.
	(FP_ILOGBNAN): Likewise.
	* sysdeps/alpha/bits/mathdef.h (FP_ILOGB0): Likewise.
	(FP_ILOGBNAN): Likewise.
	* sysdeps/ia64/bits/mathdef.h (FP_ILOGB0): Likewise.
	(FP_ILOGBNAN): Likewise.
	* sysdeps/m68k/m680x0/bits/mathdef.h (FP_ILOGB0): Likewise.
	(FP_ILOGBNAN): Likewise.
	* sysdeps/mips/bits/mathdef.h (FP_ILOGB0): Likewise.
	(FP_ILOGBNAN): Likewise.
	* sysdeps/powerpc/bits/mathdef.h (FP_ILOGB0): Likewise.
	(FP_ILOGBNAN): Likewise.
	* sysdeps/s390/bits/mathdef.h (FP_ILOGB0): Likewise.
	(FP_ILOGBNAN): Likewise.
	* sysdeps/sparc/bits/mathdef.h (FP_ILOGB0): Likewise.
	(FP_ILOGBNAN): Likewise.
	* sysdeps/x86/bits/mathdef.h (FP_ILOGB0): Likewise.
	(FP_ILOGBNAN): Likewise.
2016-12-01 02:56:55 +00:00
Joseph Myers f11e220d2d Refactor FP_FAST_* into bits/fp-fast.h.
Continuing the refactoring of bits/mathdef.h, this patch moves the
FP_FAST_* definitions into a new bits/fp-fast.h header.  Currently
this is only for FP_FAST_FMA*, but in future it would be the
appropriate place for the FP_FAST_* macros from TS 18661-1 as well.

The generic bits/mathdef.h header defines these macros based on
whether the compiler defines __FP_FAST_*.  Most architecture-specific
headers, however, fail to do so, meaning that if the architecture (or
some particular processors) does in fact have fused operations, and
GCC knows to use them inline, the FP_FAST_* macros will still not be
defined.

By refactoring, this patch causes the generic version (based on
__FP_FAST_*) to be used in more cases, and so the macro definitions to
be more accurate.  Architectures that already defined some or all of
these macros other than based on the predefines have their own
versions of fp-fast.h, which are arranged so they define FP_FAST_* if
either the architecture-specific conditions are true or __FP_FAST_*
are defined.

After this refactoring, various bits/mathdef.h headers for
architectures with long double = double are semantically identical to
the generic version.  The patch removes those headers that are
redundant.  (In fact two of the four removed were already redundant
before this patch because they did use __FP_FAST_*.)

Tested for x86_64 and x86, and compilation-only with
build-many-glibcs.py.

	* bits/fp-fast.h: New file.
	* sysdeps/aarch64/bits/fp-fast.h: Likewise.
	* sysdeps/powerpc/bits/fp-fast.h: Likewise.
	* math/Makefile (headers): Add bits/fp-fast.h.
	* math/math.h: Include <bits/fp-fast.h>.
	* bits/mathdef.h (FP_FAST_FMA): Remove.
	(FP_FAST_FMAF): Likewise.
	(FP_FAST_FMAL): Likewise.
	* sysdeps/aarch64/bits/mathdef.h (FP_FAST_FMA): Likewise.
	(FP_FAST_FMAF): Likewise.
	* sysdeps/powerpc/bits/mathdef.h (FP_FAST_FMA): Likewise.
	(FP_FAST_FMAF): Likewise.
	* sysdeps/x86/bits/mathdef.h (FP_FAST_FMA): Likewise.
	(FP_FAST_FMAF): Likewise.
	(FP_FAST_FMAL): Likewise.
	* sysdeps/arm/bits/mathdef.h: Remove file.
	* sysdeps/hppa/fpu/bits/mathdef.h: Likewise.
	* sysdeps/sh/sh4/bits/mathdef.h: Likewise.
	* sysdeps/tile/bits/mathdef.h: Likewise.
2016-11-29 01:45:00 +00:00
Joseph Myers 93eb85ceb2 Refactor float_t, double_t information into bits/flt-eval-method.h.
At present, definitions of float_t and double_t are split among many
bits/mathdef.h headers.

For all but three architectures, these types are float and double.
Furthermore, if you assume __FLT_EVAL_METHOD__ to be defined, that
provides a more generic way of determining the correct values of these
typedefs.  Defining these typedefs more generally based on
__FLT_EVAL_METHOD__ was previously proposed by Paul Eggert in
<https://sourceware.org/ml/libc-alpha/2012-02/msg00002.html>.

This patch refactors things in the way I proposed in
<https://sourceware.org/ml/libc-alpha/2016-11/msg00745.html>.  A new
header bits/flt-eval-method.h defines a single macro,
__GLIBC_FLT_EVAL_METHOD, which is then used by math.h to define
float_t and double_t.  The default is based on __FLT_EVAL_METHOD__
(although actually a default to 0 would have the same effect for
current ports, because ports where values other than 0 or 16 are
possible all have their own headers).

To avoid changing the existing semantics in any case, including for
compilers not defining __FLT_EVAL_METHOD__, architecture-specific
files are then added for m68k, s390, x86 which replicate the existing
semantics.  At least with __FLT_EVAL_METHOD__ values possible with
GCC, there should be no change to the choices of float_t and double_t
for any supported configuration.

Architecture maintainer notes:

* m68k: sysdeps/m68k/m680x0/bits/flt-eval-method.h always defines
  __GLIBC_FLT_EVAL_METHOD to 2 to replicate the existing logic.  But
  actually GCC defines __FLT_EVAL_METHOD__ to 0 if TARGET_68040.  It
  might make sense to make the header prefer to base things on
  __FLT_EVAL_METHOD__ if defined, like the x86 version, and so make
  the choices of these types more accurate (with a NEWS entry as for
  the other changes to these types on particular architectures).

* s390: sysdeps/s390/bits/flt-eval-method.h always defines
  __GLIBC_FLT_EVAL_METHOD to 1 to replicate the existing logic.  As
  previously discussed, it might make sense in coordination with GCC
  to eliminate the historic mistake, avoid excess precision in the
  -fexcess-precision=standard case and make the typedefs match (with a
  NEWS entry, again).

Tested for x86-64 and x86.  Also did compilation-only testing with
build-many-glibcs.py.

	* bits/flt-eval-method.h: New file.
	* sysdeps/m68k/m680x0/bits/flt-eval-method.h: Likewise.
	* sysdeps/s390/bits/flt-eval-method.h: Likewise.
	* sysdeps/x86/bits/flt-eval-method.h: Likewise.
	* math/Makefile (headers): Add bits/flt-eval-method.h.
	* math/math.h: Include <bits/flt-eval-method.h>.
	[__USE_ISOC99] (float_t): Define based on __GLIBC_FLT_EVAL_METHOD.
	[__USE_ISOC99] (double_t): Likewise.
	* bits/mathdef.h (float_t): Remove.
	(double_t): Likewise.
	* sysdeps/aarch64/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/alpha/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/arm/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/hppa/fpu/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/ia64/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/m68k/m680x0/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/mips/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/powerpc/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/s390/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/sh/sh4/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/sparc/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/tile/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
	* sysdeps/x86/bits/mathdef.h (float_t): Likewise.
	(double_t): Likewise.
2016-11-24 18:44:50 +00:00
Joseph Myers 56ede9ed59 Fix x86_64 -mfpmath=387 float_t, double_t (bug 20787).
Bug 20787 reports that, while float_t and double_t for 32-bit x86
properly respect -mfpmath=sse, for x86_64 they fail to reflect
-mfpmath=387, which is valid if unusual and results in FLT_EVAL_METHOD
being 2.  This patch fixes the definitions to respect
__FLT_EVAL_METHOD__ in that case, arranging for the test that the
types correspond with FLT_EVAL_METHOD to be run with both -mfpmath=387
and -mfpmath=sse.

Note: this patch will also have the effect of making float_t and
double_t be long double for x86_64 with -mfpmath=sse+387, when
FLT_EVAL_METHOD is -1.  It seems reasonable for x86_64 to be
consistent with 32-bit x86 in this case (and that definition is
conservatively safe, in that it makes the types correspond to the
widest evaluation format that might be used).

Tested for x86-64 and x86.

	[BZ #20787]
	* sysdeps/x86/bits/mathdef.h (float_t): Do not define to float if
	[__x86_64__] when __FLT_EVAL_METHOD__ is nonzero.
	(double_t): Do not define to double if [__x86_64__] when
	__FLT_EVAL_METHOD__ is nonzero.
	* sysdeps/x86/fpu/test-flt-eval-method-387.c: New file.
	* sysdeps/x86/fpu/test-flt-eval-method-sse.c: Likewise.
	* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
	test-flt-eval-method-387 and test-flt-eval-method-sse.
	[$(subdir) = math] (CFLAGS-test-flt-eval-method-387.c): New
	variable.
	[$(subdir) = math] (CFLAGS-test-flt-eval-method-sse.c): Likewise.
2016-11-23 17:56:31 +00:00
Florian Weimer c74940f2a7 nptl: Document the reason why __kind in pthread_mutex_t is part of the ABI 2016-11-07 20:24:32 +01:00
Steve Ellcey d060cd002d Define wordsize.h macros everywhere
* bits/wordsize.h: Add documentation.
	* sysdeps/aarch64/bits/wordsize.h : New file
	* sysdeps/generic/stdint.h (PTRDIFF_MIN, PTRDIFF_MAX): Update
	definitions.
	(SIZE_MAX): Change ifdef to if in __WORDSIZE32_SIZE_ULONG check.
	* sysdeps/gnu/bits/utmp.h (__WORDSIZE_TIME64_COMPAT32): Check
	with #if instead of #ifdef.
	* sysdeps/gnu/bits/utmpx.h (__WORDSIZE_TIME64_COMPAT32): Ditto.
	* sysdeps/mips/bits/wordsize.h (__WORDSIZE32_SIZE_ULONG,
	__WORDSIZE32_PTRDIFF_LONG, __WORDSIZE_TIME64_COMPAT32):
	Add or change defines.
	* sysdeps/powerpc/powerpc32/bits/wordsize.h: Likewise.
	* sysdeps/powerpc/powerpc64/bits/wordsize.h: Likewise.
	* sysdeps/s390/s390-32/bits/wordsize.h: Likewise.
	* sysdeps/s390/s390-64/bits/wordsize.h: Likewise.
	* sysdeps/sparc/sparc32/bits/wordsize.h: Likewise.
	* sysdeps/sparc/sparc64/bits/wordsize.h: Likewise.
	* sysdeps/tile/tilegx/bits/wordsize.h: Likewise.
	* sysdeps/tile/tilepro/bits/wordsize.h: Likewise.
	* sysdeps/unix/sysv/linux/alpha/bits/wordsize.h: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h: Likewise.
	* sysdeps/unix/sysv/linux/sparc/bits/wordsize.h: Likewise.
	* sysdeps/wordsize-32/bits/wordsize.h: Likewise.
	* sysdeps/wordsize-64/bits/wordsize.h: Likewise.
	* sysdeps/x86/bits/wordsize.h: Likewise.
2016-11-04 09:37:44 -07:00
Carlos O'Donell b3d17c1cf2 Bug 20689: Fix FMA and AVX2 detection on Intel
In the Intel Architecture Instruction Set Extensions Programming
reference the recommended way to test for FMA in section
'2.2.1 Detection of FMA' is:

"Application Software must identify that hardware supports AVX as
explained in ... after that it must also detect support for FMA..."

We don't do that in glibc. We use osxsave to detect the use of xgetbv,
and after that we check for AVX and FMA orthogonally. It is conceivable
that you could have the AVX bit clear and the FMA bit in an undefined
state.

This commit fixes FMA and AVX2 detection to depend on usable AVX
as required by the recommended Intel sequences.

v1: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00241.html
v2: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00265.html
2016-10-17 19:39:54 -04:00
H.J. Lu 6a824767d8 X86: Don't assert on older Intel CPUs [BZ #20647]
Since the maximum CPUID level of older Intel CPUs is 1, change
handle_intel to return -1, instead of assert, when the maximum
CPUID level is less than 2.

	[BZ #20647]
	* sysdeps/x86/cacheinfo.c (handle_intel): Return -1 if the
	maximum CPUID level is less than 2.
2016-10-12 08:22:52 -07:00
Joseph Myers 1e7c8fcca5 Add iseqsig.
TS 18661-1 adds an iseqsig type-generic comparison macro to <math.h>.
This macro is like the == operator except that unordered operands
result in the "invalid" exception and errno being set to EDOM.

This patch implements this macro for glibc.  Given the need to set
errno, this is implemented with out-of-line functions __iseqsigf,
__iseqsig and __iseqsigl (of which the last only exists at all if long
double is ABI-distinct from double, so no function aliases or compat
support are needed).  The present patch ignores excess precision
issues; I intend to deal with those in a followup patch.  (Like
comparison operators, type-generic comparison macros should *not*
convert operands to their semantic types but should preserve excess
range and precision, meaning that for some argument types and values
of FLT_EVAL_METHOD, an underlying function should be called for a
wider type than that of the arguments.)

The underlying functions are implemented with the type-generic
template machinery.  Comparing x <= y && x >= y is sufficient in ISO C
to achieve an equality comparison with "invalid" raised for unordered
operands (and the results of those two comparisons can also be used to
tell whether errno needs to be set).  However, some architectures have
GCC bugs meaning that unordered comparison instructions are used
instead of ordered ones.  Thus, a mechanism is provided for
architectures to use an explicit call to feraiseexcept to raise
exceptions if required.  If your architecture has such a bug you
should add a fix-fp-int-compare-invalid.h header for it, with a
comment pointing to the relevant GCC bug report; if such a GCC bug is
fixed, that header's contents should have a __GNUC_PREREQ conditional
added so that the workaround can eventually be removed for that
architecture.

Tested for x86_64, x86, mips64, arm and powerpc.

	* math/math.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (iseqsig): New
	macro.
	* math/bits/mathcalls.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(__iseqsig): New declaration.
	* math/s_iseqsig_template.c: New file.
	* math/Versions (__iseqsigf): New libm symbol at version
	GLIBC_2.25.
	(__iseqsig): Likewise.
	(__iseqsigl): Likewise.
	* math/libm-test.inc (iseqsig_test_data): New array.
	(iseqsig_test): New function.
	(main): Call iseqsig_test.
	* math/Makefile (gen-libm-calls): Add s_iseqsigF.
	* manual/arith.texi (FP Comparison Functions): Document iseqsig.
	* manual/libm-err-tab.pl: Update comment on interfaces without
	ulps tabulated.
	* sysdeps/generic/fix-fp-int-compare-invalid.h: New file.
	* sysdeps/powerpc/fpu/fix-fp-int-compare-invalid.h: Likewise.
	* sysdeps/x86/fpu/fix-fp-int-compare-invalid.h: Likewise.
	* sysdeps/nacl/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
2016-10-06 22:19:38 +00:00
Zack Weinberg 4775578486 Installed header hygiene (BZ#20366): Test of installed headers.
This adds a test to ensure that the problems fixed in the last several
patches do not recur.  Each directory checks the headers that it
installs for two properties: first, each header must be compilable in
isolation, as both C and C++, under a representative combination of
language and library conformance levels; second, there is a blacklist
of identifiers that may not appear in any installed header, currently
consisting of the legacy BSD typedefs.  (There is an exemption for the
headers that define those typedefs, and for the RPC headers.  It may be
necessary to make this more sophisticated if we add more stuff to the
blacklist in the future.)

In order for this test to work correctly, every wrapper header
that actually defines something must guard those definitions with
 #ifndef _ISOMAC.  This is the existing mechanism used by the conform/
tests to tell wrapper headers not to define anything that the public
header wouldn't, and not to use anything from libc-symbols.h.  conform/
only cares for headers that we need to check for standards conformance,
whereas this test applies to *every* header.  (Headers in include/ that
are either installed directly, or are internal-use-only and do *not*
correspond to any installed header, are not affected.)

	* scripts/check-installed-headers.sh: New script.
	* Rules: In each directory that defines header files to be installed,
	run check-installed-headers.sh on them as a special test.
	* Makefile: Likewise for the headers installed at top level.

	* include/aliases.h, include/alloca.h, include/argz.h
	* include/arpa/nameser.h, include/arpa/nameser_compat.h
	* include/elf.h, include/envz.h, include/err.h
	* include/execinfo.h, include/fpu_control.h, include/getopt.h
	* include/gshadow.h, include/ifaddrs.h, include/libintl.h
	* include/link.h, include/malloc.h, include/mcheck.h
	* include/mntent.h, include/netinet/ether.h
	* include/nss.h, include/obstack.h, include/printf.h
	* include/pty.h, include/resolv.h, include/rpc/auth.h
	* include/rpc/auth_des.h, include/rpc/auth_unix.h
	* include/rpc/clnt.h, include/rpc/des_crypt.h
	* include/rpc/key_prot.h, include/rpc/netdb.h
	* include/rpc/pmap_clnt.h, include/rpc/pmap_prot.h
	* include/rpc/pmap_rmt.h, include/rpc/rpc.h
	* include/rpc/rpc_msg.h, include/rpc/svc.h
	* include/rpc/svc_auth.h, include/rpc/xdr.h
	* include/rpcsvc/nis_callback.h, include/rpcsvc/nislib.h
	* include/rpcsvc/yp.h, include/rpcsvc/ypclnt.h
	* include/rpcsvc/ypupd.h, include/shadow.h
	* include/stdio_ext.h, include/sys/epoll.h
	* include/sys/file.h, include/sys/gmon.h, include/sys/ioctl.h
	* include/sys/prctl.h, include/sys/profil.h
	* include/sys/statfs.h, include/sys/sysctl.h
	* include/sys/sysinfo.h, include/ttyent.h, include/utmp.h
	* sysdeps/arm/nacl/include/bits/setjmp.h
	* sysdeps/mips/include/sys/asm.h
	* sysdeps/unix/sysv/linux/include/sys/sysinfo.h
	* sysdeps/unix/sysv/linux/include/sys/timex.h
	* sysdeps/x86/fpu/include/bits/fenv.h:
	Add #ifndef _ISOMAC guard around internal declarations.
	Add multiple-inclusion guard if not already present.
2016-09-23 08:43:56 -04:00
Joseph Myers ec94343f59 Add femode_t functions.
TS 18661-1 defines a type femode_t to represent the set of dynamic
floating-point control modes (such as the rounding mode and trap
enablement modes), and functions fegetmode and fesetmode to manipulate
those modes (without affecting other state such as the raised
exception flags) and a corresponding macro FE_DFL_MODE.

This patch series implements those interfaces for glibc.  This first
patch adds the architecture-independent pieces, the x86 and x86_64
implementations, and the <bits/fenv.h> and ABI baseline updates for
all architectures so glibc keeps building and passing the ABI tests on
all architectures.  Subsequent patches add the fegetmode and fesetmode
implementations for other architectures.

femode_t is generally an integer type - the same type as fenv_t, or as
the single element of fenv_t where fenv_t is a structure containing a
single integer (or the single relevant element, where it has elements
for both status and control registers) - except where architecture
properties or consistency with the fenv_t implementation indicate
otherwise.  FE_DFL_MODE follows FE_DFL_ENV in whether it's a magic
pointer value (-1 cast to const femode_t *), a value that can be
distinguished from valid pointers by its high bits but otherwise
contains a representation of the desired register contents, or a
pointer to a constant variable (the powerpc case; __fe_dfl_mode is
added as an exported constant object, an alias to __fe_dfl_env).

Note that where architectures (that share a register between control
and status bits) gain definitions of new floating-point control or
status bits in future, the implementations of fesetmode for those
architectures may need updating (depending on whether the new bits are
control or status bits and what the implementation does with
previously unknown bits), just like existing implementations of
<fenv.h> functions that take care not to touch reserved bits may need
updating when the set of reserved bits changes.  (As any new bits are
outside the scope of ISO C, that's just a quality-of-implementation
issue for supporting them, not a conformance issue.)

As with fenv_t, femode_t should properly include any software DFP
rounding mode (and for both fenv_t and femode_t I'd consider that
fragment of DFP support appropriate for inclusion in glibc even in the
absence of the rest of libdfp; hardware DFP rounding modes should
already be included if the definitions of which bits are status /
control bits are correct).

Tested for x86_64, x86, mips64 (hard float, and soft float to test the
fallback version), arm (hard float) and powerpc (hard float, soft
float and e500).  Other architecture versions are untested.

	* math/fegetmode.c: New file.
	* math/fesetmode.c: Likewise.
	* sysdeps/i386/fpu/fegetmode.c: Likewise.
	* sysdeps/i386/fpu/fesetmode.c: Likewise.
	* sysdeps/x86_64/fpu/fegetmode.c: Likewise.
	* sysdeps/x86_64/fpu/fesetmode.c: Likewise.
	* math/fenv.h: Update comment on inclusion of <bits/fenv.h>.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (fegetmode): New function
	declaration.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (fesetmode): Likewise.
	* bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New
	typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/aarch64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/alpha/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/arm/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/hppa/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/ia64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/m68k/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/microblaze/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/mips/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/nios2/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/powerpc/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (__fe_dfl_mode): New variable
	declaration.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/s390/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/sh/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/sparc/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/tile/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* sysdeps/x86/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
	(femode_t): New typedef.
	[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
	* manual/arith.texi (FE_DFL_MODE): Document macro.
	(fegetmode): Document function.
	(fesetmode): Likewise.
	* math/Versions (fegetmode): New libm symbol at version
	GLIBC_2.25.
	(fesetmode): Likewise.
	* math/Makefile (libm-support): Add fegetmode and fesetmode.
	(tests): Add test-femode and test-femode-traps.
	* math/test-femode-traps.c: New file.
	* math/test-femode.c: Likewise.
	* sysdeps/powerpc/fpu/fenv_const.c (__fe_dfl_mode): Declare as
	alias for __fe_dfl_env.
	* sysdeps/powerpc/nofpu/fenv_const.c (__fe_dfl_mode): Likewise.
	* sysdeps/powerpc/powerpc32/e500/nofpu/fenv_const.c
	(__fe_dfl_mode): Likewise.
	* sysdeps/powerpc/Versions (__fe_dfl_mode): New libm symbol at
	version GLIBC_2.25.
	* sysdeps/nacl/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
2016-09-07 16:40:09 +00:00
H.J. Lu fb0f7a6755 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508]
There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions.  Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.

To avoid SSE transition penalty, if only the lower 128 bits of the first
8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers
with the zero upper bits.

For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero.  We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.

This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions.  It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.

_dl_runtime_resolve_avx_slow is added and used for AVX processors which
don't support XGETBV with ECX == 1.  Since there is no SSE transition
penalty on AVX512 processors which don't support XGETBV with ECX == 1,
_dl_runtime_resolve_avx512_slow isn't provided.

	[BZ #20495]
	[BZ #20508]
	* sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
	processors, set Use_dl_runtime_resolve_slow and set
	Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	New.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_Use_dl_runtime_resolve_opt): Likewise.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
	_dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
	if Use_dl_runtime_resolve_opt is set.  Use
	_dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
	(_dl_runtime_resolve_opt): New.  Defined for AVX and AVX512.
	(_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
	New.
	(_dl_runtime_resolve_opt): Likewise.
	(_dl_runtime_profile): Define only if _dl_runtime_profile is
	defined.
2016-09-06 08:51:07 -07:00
H.J. Lu a6f20b6763 X86: Change bit_YMM_state to (1 << 2)
All other state bits, except for bit_YMM_state, are defined as (1 << N).
This patch changes bit_YMM_state from (2 << 1) to (1 << 2).

	* sysdeps/x86/cpu-features.h (bit_YMM_state): Set to (1 << 2).
2016-08-19 13:32:34 -07:00
Andrew Senkevich ee2196bb67 Fixed wrong vector sincos/sincosf ABI to have it compatible with
current vector function declaration "#pragma omp declare simd notinbranch",
according to which vector sincos should have vector of pointers for second and
third parameters. It is fixed with implementation as wrapper to version
having second and third parameters as pointers.

    [BZ #20024]
    * sysdeps/x86/fpu/test-math-vector-sincos.h: New.
    * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI
    of this implementation of vector function.
    * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise.
    * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise.
    * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
    Likewise.
    * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise.
    * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise.
    * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise.
    * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise.
    * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise.
    * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise.
    * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise.
    * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise.
    * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise.
    * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise.
    * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Use another wrapper
    for testing vector sincos with fixed ABI.
    * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx.c: New test.
    * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx2.c: Likewise.
    * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx512.c: Likewise.
    * sysdeps/x86_64/fpu/test-double-libmvec-sincos.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx2.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx512.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-libmvec-sincosf.c: Likewise.
    * sysdeps/x86_64/fpu/Makefile: Added new tests.
2016-07-01 14:15:38 +03:00
H.J. Lu 13efa86ece Check Prefer_ERMS in memmove/memcpy/mempcpy/memset
Although the Enhanced REP MOVSB/STOSB (ERMS) implementations of memmove,
memcpy, mempcpy and memset aren't used by the current processors, this
patch adds Prefer_ERMS check in memmove, memcpy, mempcpy and memset so
that they can be used in the future.

	* sysdeps/x86/cpu-features.h (bit_arch_Prefer_ERMS): New.
	(index_arch_Prefer_ERMS): Likewise.
	* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Return
	__memcpy_erms for Prefer_ERMS.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
	(__memmove_erms): Enabled for libc.a.
	* ysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Return
	__memmove_erms or Prefer_ERMS.
	* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Return
	__mempcpy_erms for Prefer_ERMS.
	* sysdeps/x86_64/multiarch/memset.S (memset): Return
	__memset_erms for Prefer_ERMS.
2016-06-30 07:58:11 -07:00
Andreas Schwab fea56491c4 Avoid array-bounds warning for strncat on i586 (bug 20260) 2016-06-29 17:15:40 +02:00
H.J. Lu 91655fc307 Check FMA after COMMON_CPUID_INDEX_80000001
Since the FMA4 bit is in COMMON_CPUID_INDEX_80000001 and FMA4 requires
AVX, determine if FMA4 is usable after COMMON_CPUID_INDEX_80000001 is
available and if AVX is usable.

	[BZ #20195]
	* sysdeps/x86/cpu-features.c (get_common_indeces): Move FMA4
	check to ...
	(init_cpu_features): Here.
2016-06-07 08:00:40 -07:00
H.J. Lu d6af2388f7 Count number of logical processors sharing L2 cache
For Intel processors, when there are both L2 and L3 caches, SMT level
type should be ued to count number of available logical processors
sharing L2 cache.  If there is only L2 cache, core level type should
be used to count number of available logical processors sharing L2
cache.  Number of available logical processors sharing L2 cache should
be used for non-inclusive L2 and L3 caches.

	* sysdeps/x86/cacheinfo.c (init_cacheinfo): Count number of
	available logical processors with SMT level type sharing L2
	cache for Intel processors.
2016-05-27 15:16:51 -07:00
H.J. Lu b7598b1b85 Remove special L2 cache case for Knights Landing
L2 cache is shared by 2 cores on Knights Landing, which has 4 threads
per core:

https://en.wikipedia.org/wiki/Xeon_Phi#Knights_Landing

So L2 cache is shared by 8 threads on Knights Landing as reported by
CPUID.  We should remove special L2 cache case for Knights Landing.

	[BZ #18185]
	* sysdeps/x86/cacheinfo.c (init_cacheinfo): Don't limit threads
	sharing L2 cache to 2 for Knights Landing.
2016-05-20 14:42:00 -07:00
H.J. Lu de71e0421b Correct Intel processor level type mask from CPUID
Intel CPUID with EAX == 11 returns:

ECX Bits 07 - 00: Level number. Same value in ECX input.
    Bits 15 - 08: Level type.
    ^^^^^^^^^^^^^^^^^^^^^^^^ This is level type.
    Bits 31 - 16: Reserved.

Intel processor level type mask should be 0xff00, not 0xff0.

	[BZ #20119]
	* sysdeps/x86/cacheinfo.c (init_cacheinfo): Correct Intel
	processor level type mask for CPUID with EAX == 11.
2016-05-19 10:02:36 -07:00
H.J. Lu 7c08d791ee Check the HTT bit before counting logical threads
Skip counting logical threads for Intel processors if the HTT bit is 0
which indicates there is only a single logical processor.

	* sysdeps/x86/cacheinfo.c (init_cacheinfo): Skip counting
	logical threads if the HTT bit is 0.
	* sysdeps/x86/cpu-features.h (bit_cpu_HTT): New.
	(index_cpu_HTT): Likewise.
	(reg_HTT): Likewise.
2016-05-19 09:09:00 -07:00
H.J. Lu 9e4ec3e816 Support non-inclusive caches on Intel processors
* sysdeps/x86/cacheinfo.c (init_cacheinfo): Check and support
	non-inclusive caches on Intel processors.
2016-05-13 07:18:35 -07:00
H.J. Lu 2a1f15b1a9 Remove x86 ifunc-defines.sym and rtld-global-offsets.sym
Merge x86 ifunc-defines.sym with x86 cpu-features-offsets.sym.  Remove
x86 ifunc-defines.sym and rtld-global-offsets.sym.  No code changes on
i686 and x86-64.

	* sysdeps/i386/i686/multiarch/Makefile (gen-as-const-headers):
	Remove ifunc-defines.sym.
	* sysdeps/x86_64/multiarch/Makefile (gen-as-const-headers):
	Likewise.
	* sysdeps/i386/i686/multiarch/ifunc-defines.sym: Removed.
	* sysdeps/x86/rtld-global-offsets.sym: Likewise.
	* sysdeps/x86_64/multiarch/ifunc-defines.sym: Likewise.
	* sysdeps/x86/Makefile (gen-as-const-headers): Remove
	rtld-global-offsets.sym.
	* sysdeps/x86_64/multiarch/ifunc-defines.sym: Merged with ...
	* sysdeps/x86/cpu-features-offsets.sym: This.
	* sysdeps/x86/cpu-features.h: Include <cpu-features-offsets.h>
	instead of <ifunc-defines.h> and <rtld-global-offsets.h>.
2016-05-11 05:51:39 -07:00
H.J. Lu a9558b49b3 Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86
Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86.  No code changes on x86
and x86_64.

	* sysdeps/i386/cacheinfo.c: Include <sysdeps/x86/cacheinfo.c>
	instead of <sysdeps/x86_64/cacheinfo.c>.
	* sysdeps/x86_64/cacheinfo.c: Moved to ...
	* sysdeps/x86/cacheinfo.c: Here.
2016-05-08 08:49:18 -07:00
H.J. Lu 2e2d9796da Detect Intel Goldmont and Airmont processors
Updated from the model numbers of Goldmont and Airmont processors in
Intel64 And IA-32 Processor Architectures Software Developer's Manual
Volume 3 Revision 058.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Detect Intel
	Goldmont and Airmont processors.
2016-04-15 05:23:06 -07:00
H.J. Lu 27d3ce1467 Remove Fast_Copy_Backward from Intel Core processors
Intel Core i3, i5 and i7 processors have fast unaligned copy and
copy backward is ignored.  Remove Fast_Copy_Backward from Intel Core
processors to avoid confusion.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
	bit_arch_Fast_Copy_Backward for Intel Core proessors.
2016-04-01 15:09:14 -07:00
H.J. Lu 0791f91dff Initial Enhanced REP MOVSB/STOSB (ERMS) support
The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which
has a feature bit in CPUID.  This patch adds the Enhanced REP MOVSB/STOSB
(ERMS) bit to x86 cpu-features.

	* sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New.
	(index_cpu_ERMS): Likewise.
	(reg_ERMS): Likewise.
2016-03-28 19:23:31 -07:00
H.J. Lu e41b395523 [x86] Add a feature bit: Fast_Unaligned_Copy
On AMD processors, memcpy optimized with unaligned SSE load is
slower than emcpy optimized with aligned SSSE3 while other string
functions are faster with unaligned SSE load.  A feature bit,
Fast_Unaligned_Copy, is added to select memcpy optimized with
unaligned SSE load.

	[BZ #19583]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel
	processors.  Set Fast_Copy_Backward for AMD Excavator
	processors.
	* sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy):
	New.
	(index_arch_Fast_Unaligned_Copy): Likewise.
	* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check
	Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
2016-03-28 04:40:03 -07:00
H.J. Lu f781a9e961 Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors
Since only Intel processors with AVX2 have fast unaligned load, we
should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors.

Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces
and call get_common_indeces for other processors.

Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading
GLRO(dl_x86_cpu_features) in cpu-features.c.

	[BZ #19583]
	* sysdeps/x86/cpu-features.c (get_common_indeces): Remove
	inline.  Check family before setting family, model and
	extended_model.  Set AVX, AVX2, AVX512, FMA and FMA4 usable
	bits here.
	(init_cpu_features): Replace HAS_CPU_FEATURE and
	HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and
	CPU_FEATURES_ARCH_P.  Set index_arch_AVX_Fast_Unaligned_Load
	for Intel processors with usable AVX2.  Call get_common_indeces
	for other processors with family == NULL.
	* sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro.
	(CPU_FEATURES_ARCH_P): Likewise.
	(HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P.
	(HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.
2016-03-22 07:47:20 -07:00
H.J. Lu 6aa3e97e25 Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h
index_* and bit_* macros are used to access cpuid and feature arrays o
struct cpu_features.  It is very easy to use bits and indices of cpuid
array on feature array, especially in assembly codes.  For example,
sysdeps/i386/i686/multiarch/bcopy.S has

	HAS_CPU_FEATURE (Fast_Rep_String)

which should be

	HAS_ARCH_FEATURE (Fast_Rep_String)

We change index_* and bit_* to index_cpu_*/index_arch_* and
bit_cpu_*/bit_arch_* so that we can catch such error at build time.

	[BZ #19762]
	* sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
	(EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
	* sysdeps/x86/cpu-features.h (bit_*): Renamed to ...
	(bit_arch_*): This for feature array.
	(bit_*): Renamed to ...
	(bit_cpu_*): This for cpu array.
	(index_*): Renamed to ...
	(index_arch_*): This for feature array.
	(index_*): Renamed to ...
	(index_cpu_*): This for cpu array.
	[__ASSEMBLER__] (HAS_FEATURE): Add and use field.
	[__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE.
	[__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE.
	[!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and
	bit_##name with index_cpu_##name and bit_cpu_##name.
	[!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and
	bit_##name with index_arch_##name and bit_arch_##name.
2016-03-10 05:27:07 -08:00
H.J. Lu 2b35e48c0c Define _HAVE_STRING_ARCH_mempcpy to 1 for x86
Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86,
define _HAVE_STRING_ARCH_mempcpy to 1 for x86.

	[BZ #19759]
	* sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New.
2016-03-08 10:57:41 -08:00
H.J. Lu 16396c41de Add _STRING_INLINE_unaligned and string_private.h
As discussed in

https://sourceware.org/ml/libc-alpha/2015-10/msg00403.html

the setting of _STRING_ARCH_unaligned currently controls the external
GLIBC ABI as well as selecting the use of unaligned accesses withing
GLIBC.

Since _STRING_ARCH_unaligned was recently changed for AArch64, this
would potentially break the ABI in GLIBC 2.23, so split the uses and add
_STRING_INLINE_unaligned to select the string ABI. This setting must be
fixed for each target, while _STRING_ARCH_unaligned may be changed from
release to release.  _STRING_ARCH_unaligned is used unconditionally in
glibc.  But <bits/string.h>, which defines _STRING_ARCH_unaligned, isn't
included with -Os.  Since _STRING_ARCH_unaligned is internal to glibc and
may change between glibc releases, it should be made private to glibc.
_STRING_ARCH_unaligned should defined in the new string_private.h heade
file which is included unconditionally from internal <string.h> for glibc
build.

	[BZ #19462]
	* bits/string.h (_STRING_ARCH_unaligned): Renamed to ...
	(_STRING_INLINE_unaligned): This.
	* include/string.h: Include <string_private.h>.
	* string/bits/string2.h: Replace _STRING_ARCH_unaligned with
	_STRING_INLINE_unaligned.
	* sysdeps/aarch64/bits/string.h (_STRING_ARCH_unaligned): Removed.
	(_STRING_INLINE_unaligned): New.
	* sysdeps/aarch64/string_private.h: New file.
	* sysdeps/generic/string_private.h: Likewise.
	* sysdeps/m68k/m680x0/m68020/string_private.h: Likewise.
	* sysdeps/s390/string_private.h: Likewise.
	* sysdeps/x86/string_private.h: Likewise.
	* sysdeps/m68k/m680x0/m68020/bits/string.h
	(_STRING_ARCH_unaligned): Renamed to ...
	(_STRING_INLINE_unaligned): This.
	* sysdeps/s390/bits/string.h (_STRING_ARCH_unaligned): Renamed
	to ...
	(_STRING_INLINE_unaligned): This.
	* sysdeps/sparc/bits/string.h (_STRING_ARCH_unaligned): Renamed
	to ...
	(_STRING_INLINE_unaligned): This.
	* sysdeps/x86/bits/string.h (_STRING_ARCH_unaligned): Renamed
	to ...
	(_STRING_INLINE_unaligned): This.
2016-02-18 14:55:29 -02:00
Amit Pawar d7890e6947 Set index_Fast_Unaligned_Load for Excavator family CPUs
GLIBC benchtest testcases shows SSE2_Unaligned based implementations
are performing faster compare to SSE2 based implementations for
routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy
and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family
0x15h CPU's. This makes SSE2_Unaligned based implementations as
default for these routines.

	[BZ #19467]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	index_Fast_Unaligned_Load flag for Excavator family CPUs.
2016-01-14 08:14:31 -08:00