Commit Graph

30009 Commits

Author SHA1 Message Date
Mike Frysinger ff01283962 localedata: CLDRv29: update LC_IDENTIFICATION language/territory fields
This updates all the territory fields based on CLDR v29 data.  Many of
them were obviously incorrect where people used a two letter code and
not the English name.
  aa_DJ: changing DJ to Djibouti
  aa_ER@saaho: changing ER to Eritrea
  aa_ER: changing ER to Eritrea
  aa_ET: changing ET to Ethiopia
  am_ET: changing ET to Ethiopia
  ar_LY: changing Libyan Arab Jamahiriya to Libya
  ar_SY: changing Syrian Arab Republic to Syria
  bo_CN: changing P.R. of China to China
  bs_BA: changing Bosnia and Herzegowina to Bosnia & Herzegovina
  byn_ER: changing ER to Eritrea
  ca_IT: changing Italy (L'Alguer) to Italy
  ce_RU: changing RUSSIAN FEDERATION to Russia
  cmn_TW: changing Republic of China to Taiwan
  cy_GB: changing Great Britain to United Kingdom
  de_LU@euro: changing Luxemburg to Luxembourg
  de_LU: changing Luxemburg to Luxembourg
  en_AG: changing Antigua and Barbuda to Antigua & Barbuda
  en_GB: changing Great Britain to United Kingdom
  en_HK: changing Hong Kong to Hong Kong SAR China
  en_US: changing USA to United States
  es_US: changing USA to United States
  fr_LU@euro: changing Luxemburg to Luxembourg
  fr_LU: changing Luxemburg to Luxembourg
  fy_DE: changing DE to Germany
  gd_GB: changing Great Britain to United Kingdom
  gez_ER@abegede: changing ER to Eritrea
  gez_ER: changing ER to Eritrea
  gez_ET@abegede: changing ET to Ethiopia
  gez_ET: changing ET to Ethiopia
  gv_GB: changing Britain to United Kingdom
  hak_TW: changing Republic of China to Taiwan
  iu_CA: changing CA to Canada
  ko_KR: changing Republic of Korea to South Korea
  kw_GB: changing Britain to United Kingdom
  li_BE: changing BE to Belgium
  li_NL: changing NL to Netherlands
  lzh_TW: changing Republic of China to Taiwan
  my_MM: changing Myanmar to Myanmar (Burma)
  nan_TW: changing Republic of China to Taiwan
  nds_DE: changing DE to Germany
  nds_NL: changing NL to Netherlands
  om_ET: changing ET to Ethiopia
  om_KE: changing KE to Kenya
  pap_AW: changing AW to Aruba
  pap_CW: changing CW to Curaçao
  pt_BR: changing Brasil to Brazil
  sid_ET: changing ET to Ethiopia
  sk_SK: changing Slovak to Slovakia
  so_DJ: changing DJ to Djibouti
  so_ET: changing ET to Ethiopia
  so_KE: changing KE to Kenya
  so_SO: changing SO to Somalia
  ti_ER: changing ER to Eritrea
  ti_ET: changing ET to Ethiopia
  tig_ER: changing ER to Eritrea
  tt_RU@iqtelif: changing Tatarstan, Russian Federation to Russia
  uk_UA: changing UA to Ukraine
  unm_US: changing USA to United States
  wal_ET: changing ET to Ethiopia
  yi_US: changing USA to United States
  yue_HK: changing Hong Kong to Hong Kong SAR China
  zh_CN: changing P.R. of China to China
  zh_HK: changing Hong Kong to Hong Kong SAR China
  zh_TW: changing Taiwan R.O.C. to Taiwan

This updates all the language fields based on CLDR v29 data.  Many of
them were obviously incorrect where people used a two letter code and
not the English name.
  aa_DJ: changing aa to Afar
  aa_ER: changing aa to Afar
  aa_ER@saaho: changing aa to Afar
  aa_ET: changing aa to Afar
  am_ET: changing am to Amharic
  az_AZ: changing Azeri to Azerbaijani
  bn_BD: changing Bengali/Bangla to Bengali
  byn_ER: changing byn to Blin
  de_AT: changing German to Austrian German
  de_CH: changing German to Swiss High German
  en_AU: changing English to Australian English
  en_CA: changing English to Canadian English
  en_GB: changing English to British English
  en_US: changing English to American English
  es_ES: changing Spanish to European Spanish
  es_MX: changing Spanish to Mexican Spanish
  ff_SN: changing ff to Fulah
  fr_CA: changing French to Canadian French
  fr_CH: changing French to Swiss French
  fur_IT: changing Furlan to Friulian
  fy_DE: changing fy to Western Frisian
  fy_NL: changing Frisian to Western Frisian
  gd_GB: changing Scots Gaelic to Scottish Gaelic
  gez_ER@abegede: changing gez to Geez
  gez_ER: changing gez to Geez
  gez_ET@abegede: changing gez to Geez
  gez_ET: changing gez to Geez
  gv_GB: changing Manx Gaelic to Manx
  ht_HT: changing Kreyol to Haitian Creole
  kl_GL: changing Greenlandic to Kalaallisut
  lg_UG: changing Luganda to Ganda
  li_BE: changing li to Limburgish
  li_NL: changing li to Limburgish
  nan_TW@latin: changing Minnan to Min Nan Chinese
  nb_NO: changing Norwegian, Bokmål to Norwegian Bokmål
  nds_DE: changing nds to Low German
  nds_NL: changing nds to Low Saxon
  niu_NU: changing Vagahau Niue (Niuean) to Niuean
  niu_NZ: changing Vagahau Niue (Niuean) to Niuean
  nl_BE: changing Dutch to Flemish
  nn_NO: changing Norwegian, Nynorsk to Norwegian Nynorsk
  nr_ZA: changing Southern Ndebele to South Ndebele
  om_ET: changing om to Oromo
  om_KE: changing om to Oromo
  or_IN: changing Odia to Oriya
  os_RU: changing Ossetian to Ossetic
  pap_AW: changing pap to Papiamento
  pap_CW: changing pap to Papiamento
  pa_PK: changing Punjabi (Shahmukhi) to Punjabi
  pt_BR: changing Portuguese to Brazilian Portuguese
  pt_PT: changing Portuguese to European Portuguese
  se_NO: changing Northern Saami to Northern Sami
  sid_ET: changing sid to Sidamo
  so_DJ: changing so to Somali
  so_ET: changing so to Somali
  so_KE: changing so to Somali
  so_SO: changing so to Somali
  st_ZA: changing Sotho to Southern Sotho
  sw_KE: changing sw to Swahili
  sw_TZ: changing sw to Swahili
  ti_ER: changing ti to Tigrinya
  ti_ET: changing ti to Tigrinya
  tig_ER: changing tig to Tigre
  uk_UA: changing uk to Ukrainian
  wal_ET: changing wal to Wolaytta
  yue_HK: changing Yue Chinese to Cantonese
2016-04-12 15:16:10 -04:00
Mike Frysinger 2e7a461328 localedata: LC_TIME.date_fmt: delete entries same as the default value
There's no real value in populating this field when it's the same as the
default POSIX setting, so drop it from most locales so it's clear what's
going on.
2016-04-12 14:03:56 -04:00
H.J. Lu a057f5f8cd X86-64: Use non-temporal store in memcpy on large data
The large memcpy micro benchmark in glibc shows that there is a
regression with large data on Haswell machine.  non-temporal store in
memcpy on large data can improve performance significantly.  This
patch adds a threshold to use non temporal store which is 6 times of
shared cache size.  When size is above the threshold, non temporal
store will be used, but avoid non-temporal store if there is overlap
between destination and source since destination may be in cache when
source is loaded.

For size below 8 vector register width, we load all data into registers
and store them together.  Only forward and backward loops, which move 4
vector registers at a time, are used to support overlapping addresses.
For forward loop, we load the last 4 vector register width of data and
the first vector register width of data into vector registers before the
loop and store them after the loop.  For backward loop, we load the first
4 vector register width of data and the last vector register width of
data into vector registers before the loop and store them after the loop.

	[BZ #19928]
	* sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold):
	New.
	(init_cacheinfo): Set __x86_shared_non_temporal_threshold to 6
	times of shared cache size.
	* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S
	(VMOVNT): New.
	* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S
	(VMOVNT): Likewise.
	* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
	(VMOVNT): Likewise.
	(VMOVU): Changed to movups for smaller code sizes.
	(VMOVA): Changed to movaps for smaller code sizes.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update
	comments.
	(PREFETCH): New.
	(PREFETCH_SIZE): Likewise.
	(PREFETCHED_LOAD_SIZE): Likewise.
	(PREFETCH_ONE_SET): Likewise.
	Rewrite to use forward and backward loops, which move 4 vector
	registers at a time, to support overlapping addresses and use
	non temporal store if size is above the threshold and there is
	no overlap between destination and source.
2016-04-12 08:10:47 -07:00
Matthew Fortune b39d84adff VDSO support for MIPS
This patch adds support for using the implementations of gettimeofday()
and clock_gettime() provided by the kernel in the VDSO. The VDSO will
always provide clock_gettime() as CLOCK_{REALTIME,MONOTONIC}_COARSE can
be implemented regardless of platform. CLOCK_{REALTIME,MONOTONIC}, along
with gettimeofday(), are only implemented on platforms which make use of
either the CP0 count or GIC as their clocksource. On other platforms,
the VDSO does not provide the __vdso_gettimeofday symbol, as it is
never useful.

The VDSO functions return ENOSYS when they encounter an unsupported
request, in which case glibc should fall back to the standard syscall.

Tested with upstream kernel 4.5 and QEMU emulating Malta.

./vdsotest gettimeofday bench
gettimeofday: syscall: 1021 nsec/call
gettimeofday:    libc: 262 nsec/call
gettimeofday:    vdso: 174 nsec/call

	* sysdeps/unix/sysv/linux/mips/Makefile (sysdep_routines):
	Include dl-vdso.
	* sysdeps/unix/sysv/linux/mips/Versions: Add
	__vdso_clock_gettime.
	* sysdeps/unix/sysv/linux/mips/init-first.c: New file.
	* sysdeps/unix/sysv/linux/mips/libc-vdso.h: New file.
	* sysdeps/unix/sysv/linux/mips/mips32/sysdep.h:
	(INTERNAL_VSYSCALL_CALL): Define to be compatible with MIPS
	definitions of INTERNAL_SYSCALL_{ERROR_P,ERRNO}.
	(HAVE_CLOCK_GETTIME_VSYSCALL): Define.
	(HAVE_GETTIMEOFDAY_VSYSCALL): Define.
	* sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/n64/sysdep.h: Likewise.
2016-04-12 11:05:13 +01:00
Adhemerval Zanella 071af4769f Consolidate pwrite/pwrite64 implementations
This patch consolidates all the pwrite/pwrite64 implementation for Linux
in only one (sysdeps/unix/sysv/linux/pwrite{64}.c).  It also removes the
syscall from the auto-generation using assembly macros.

For pwrite{64} offset argument placement the new SYSCALL_LL{64} macro
is used.  For pwrite ports that do not define __NR_pwrite will use
__NR_pwrite64 and for pwrite64 ports that dot define __NR_pwrite64 will
use __NR_pwrite for the syscall.

Checked on x86_64, x32, i386, aarch64, and ppc64le.

	* sysdeps/unix/sysv/linux/arm/pwrite.c: Remove file.
	* sysdeps/unix/sysv/linux/arm/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/generic/wordsize-32/pwrite.c: Likewise.
	* sysdeps/unix/sysv/linux/generic/wordsize-32/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/pwrite.c: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/wordsize-64/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (prite): Remove
	syscalls generation.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h
	[__NR_pwrite64] (__NR_write): Remove define.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h
	[__NR_pwrite64] (__NR_write): Remove define.
	* sysdeps/unix/sysv/linux/pwrite.c [__NR_pwrite64] (__NR_pwrite):
	Remove define.
	(__libc_pwrite): Use SYSCALL_LL macro on offset argument.
	* sysdeps/unix/sysv/linux/pwrite64.c [__NR_pwrite64] (__NR_pwrite):
	Remove define.
	(__libc_pwrite64): Use SYSCALL_LL64 macro on offset argument.
	* sysdeps/unix/sysv/linux/sh/pwrite.c: Rewrite using default
	Linux implementation as base.
	* sysdeps/unix/sysv/linux/sh/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/mips/pwrite.c: Likewise.
	* sysdeps/unix/sysv/linux/mips/pwrite64.c: Likewise.
2016-04-11 10:08:01 -03:00
Adhemerval Zanella 77a4fbd536 Consolidate pread/pread64 implementations
This patch consolidates all the pread/pread64 implementation for Linux
in only one (sysdeps/unix/sysv/linux/pread.c).  It also removes the
syscall from the auto-generation using assembly macros.

For pread{64} offset argument placement the new SYSCALL_LL{64} macro
is used.  For pread ports that do not define __NR_pread will use
__NR_pread64 and for pread64 ports that dot define __NR_pread64 will
use __NR_pread for the syscall.

Checked on x86_64, x32, i386, aarch64, and ppc64le.

	* sysdeps/unix/sysv/linux/arm/pread.c: Remove file.
	* sysdeps/unix/sysv/linux/arm/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/generic/wordsize-32/pread.c: Likewise.
	* sysdeps/unix/sysv/linux/generic/wordsize-32/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/pread.c: Likewise,
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/wordsize-64/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (pread): Remove
	syscall generation.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h
	[__NR_pread64] (__NR_pread): Remove define.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h:
	[__NR_pread64] (__NR_pread): Likewise.
	* sysdeps/unix/sysv/linux/pread.c [__NR_pread64] (__NR_pread): Remove
	define.
	(__libc_pread): Use SYSCALL_LL macro on offset argument.
	* sysdeps/unix/sysv/linux/pread64.c [__NR_pread64] (__NR_pread):
	Remove define.
	(__libc_pread64): Use SYSCALL_LL64 macro on offset argument.
	* sysdeps/unix/sysv/linux/sh/pread.c: Rewrite using default
	Linux implementation as base.
	* sysdeps/unix/sysv/linux/sh/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/mips/pread.c: Likewise.
	* sysdeps/unix/sysv/linux/mips/pread64.c: Likewise.
2016-04-11 10:08:01 -03:00
Adhemerval Zanella eeddfa91cb Consolidate off_t/off64_t syscall argument passing
This patch add three new macros (SYSCALL_LL, SYSCALL_LL64, and
__ASSUME_WORDSIZE64_ILP32) to use along with off_t and off64_t argument
syscalls.  The rationale for this change is:

1. Remove multiple implementations for the same syscall for different
   architectures (for instance, pread have 6 different implementations).

2. Also remove the requirement to use syscall wrappers for cancellable
   entrypoints.

The macro usage should be used along __ALIGNMENT_ARG to follow ABI constrains
for architecture where it applies.  For instance, pread can be rewritten as:

  return SYSCALL_CANCEL (pread, fd, buf, count,
                         __ALIGNMENT_ARG SYSCALL_LL (offset));

Another macro, SYSCALL_LL64, is provided for off64_t.  The macro
__ASSUME_WORDSIZE64_ILP32 is used by the ABI to define is uses 64-bit register
even if ABI is ILP32 (for instance x32 and mips64-n32).

The changes itself are not currently used in any implementation, so no
code change is expected.

	* sysdeps/unix/sysv/linux/generic/sysdep.h (__ALIGNMENT_ARG): Move
	definition.
	(__ALIGNMENT_COUNT): Likewise.
	* sysdeps/unix/sysv/linux/sysdep.h (__ALIGNMENT_ARG): To here.
	(__ALIGNMENT_COUNT): Likewise.
	(SYSCALL_LL): New define.
	(SYSCALL_LL64): Likewise.
	* sysdeps/unix/sysv/linux/mips/kernel-features.h:
	[_MIPS_SIM == _ABIO32] (__ASSUME_WORDSIZE64_ILP32): Define.
	* sysdeps/unix/sysv/linux/x86_64/kernel-features.h:
	[ILP32] (__ASUME_WORDSIZE64_ILP32): Likewise.
2016-04-11 10:07:53 -03:00
Adhemerval Zanella 482b2f87a8 Define __ASSUME_ALIGNED_REGISTER_PAIRS for missing ports
This patch defines __ASSUME_ALIGNED_REGISTER_PAIRS for the missing
ports that require 64-bit value (e.g., long long) to be aligned to
an even register pair in argument passing.

No code change is expected, tested with builds for powerpc32,
mips-o32, and armhf.

	* sysdeps/unix/sysv/linux/arm/kernel-features.h
	(__ASSUME_ALIGNED_REGISTER_PAIRS): Define.
	* sysdeps/unix/sysv/linux/mips/kernel-features.h
	[_MIPS_SIM == _ABIO32] (__ASSUME_ALIGNED_REGISTER_PAIRS): Likewise.
	* sysdeps/unix/sysv/linux/powerpc/kernel-features.h
	[!__powerpc64__] (__ASSUME_ALIGNED_REGISTER_PAIRS): Likewise.
2016-04-11 09:15:11 -03:00
Florian Weimer d29fb41f44 nss_dns: Fix assertion failure in _nss_dns_getcanonname_r [BZ #19865] 2016-04-11 10:55:43 +02:00
Florian Weimer eb68636fed Add missing bug number to ChangeLog 2016-04-11 10:54:05 +02:00
Samuel Thibault e1ef505659 Fix build with HAVE_AUX_VECTOR
* sysdeps/unix/sysv/linux/ldsodefs.h (HAVE_AUX_VECTOR): Define before
	including <ldsodefs.h>.
	* sysdeps/nacl/ldsodefs.h (HAVE_AUX_VECTOR): Likewise.
2016-04-11 10:27:25 +02:00
Samuel Thibault 0cdc5e930a Fix crash on getauxval call without HAVE_AUX_VECTOR
* sysdeps/generic/ldsodefs.h (struct rtld_global_ro)
	[!HAVE_AUX_VECTOR]: Do not define _dl_auxv field.
	* misc/getauxval.c (__getauxval) [!HAVE_AUX_VECTOR]: Do not go through
	GLRO(dl_auxv) list.
2016-04-10 23:58:43 +02:00
Nick Alcock 5057feffcc Allow overriding of CFLAGS as well as CPPFLAGS for rtld.
We need this to pass -fno-stack-protector to all the pieces of rtld in
non-elf/ directories.
2016-04-09 23:48:32 -04:00
Khem Raj 1a5d01e79e When disabling SSE, make sure -fpmath is not set to use SSE either
This fixes errors when we inject sse options through CFLAGS and now
that we have -Werror turned on by default this warning turns into an
error on x86:

$ gcc -m32 -march=core2 -mtune=core2 -msse3 -mfpmath=sse -x c /dev/null -S -mno-sse -mno-mmx
/dev/null:1:0: warning: SSE instruction set disabled, using 387 arithmetics

Where as:

$ gcc -m32 -march=core2 -mtune=core2 -msse3 -mfpmath=sse -x c /dev/null -S -mno-sse -mno-mmx -mfpmath=387

Generates no warnings.
2016-04-09 22:14:24 -04:00
Mike Frysinger ef9ec89760 localedata: CLDRv28: update LC_PAPER values
These locales should be using A4 paper size rather than US-Letter.
Update the copy points to match the others in the file.  All other
locales have been verified against the CLDR and hand checking.
2016-04-09 20:23:44 -04:00
Mike Frysinger b2d4456b33 configure: fix `test ==` usage
POSIX defines the = operator, but not ==.  Fix the few places where we
incorrectly used ==.
2016-04-09 20:05:13 -04:00
Mike Frysinger 20003c4988 localedata: iw_IL: delete old/deprecated locale [BZ #16137]
From the bug:
Obsolete locale.  The ISO-639 code for Hebrew was changed from 'iw'
to 'he' in 1989, according to Bruno Haible on libc-alpha 2003-09-01.

Reported-by: Chris Leonard <cjlhomeaddress@gmail.com>
2016-04-08 18:56:34 -04:00
Joseph Myers eb64b6d457 Fix limits.h NL_NMAX namespace (bug 19929).
bits/xopen_lim.h (included by limits.h if __USE_XOPEN) defines
NL_NMAX, but this constant was removed in the 2008 edition of POSIX so
should not be defined in that case.  This patch duly disables that
define for __USE_XOPEN2K8.  It remains enabled for __USE_GNU to avoid
affecting sysconf (_SC_NL_NMAX), the implementation of which uses
"#ifdef NL_NMAX".

Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).

	[BZ #19929]
	* include/bits/xopen_lim.h (NL_NMAX): Do not define if
	[__USE_XOPEN2K8 && !__USE_GNU].
	* conform/Makefile (test-xfail-XOPEN2K8/limits.h/conform): Remove
	variable.
2016-04-08 22:52:51 +00:00
Mike FABIAN ed80f206f4 localedata: i18n: fix typos in tel_int_fmt
Adding the %t avoids a double space if the area code %a happens
to be empty.  There are countries without area codes.
2016-04-08 18:30:33 -04:00
Joseph Myers fb3227b95c Fix termios.h XCASE namespace (bug 19925).
bits/termios.h (various versions under sysdeps/unix/sysv/linux)
defines XCASE if defined __USE_MISC || defined __USE_XOPEN.  This
macro was removed in the 2001 edition of POSIX, and is not otherwise
reserved, so should not be defined for 2001 and later versions of
POSIX.  This patch fixes the conditions accordingly (leaving the macro
defined for __USE_MISC, so still in the default namespace).

Tested for x86_64 and x86 (testsuite, and that installed shared
libraries are unchanged by the patch).

	[BZ #19925]
	* sysdeps/unix/sysv/linux/alpha/bits/termios.h (XCASE): Do not
	define if [!__USE_MISC && __USE_XOPEN2K].
	* sysdeps/unix/sysv/linux/bits/termios.h (XCASE): Likewise.
	* sysdeps/unix/sysv/linux/mips/bits/termios.h (XCASE): Likewise.
	* sysdeps/unix/sysv/linux/powerpc/bits/termios.h (XCASE):
	Likewise.
	* sysdeps/unix/sysv/linux/sparc/bits/termios.h (XCASE): Likewise.
	* conform/Makefile (test-xfail-XOPEN2K/termios.h/conform): Remove
	variable.
	(test-xfail-XOPEN2K8/termios.h/conform): Likewise.
2016-04-08 18:16:09 +00:00
Paul E. Murphy 25dba0ad05 powerpc: Add optimized P8 strspn
This utilizes vectors and bitmasks.  For small needle, large
haystack, the performance improvement is upto 8x.  For short
strings (0-4B), the cost of computing the bitmask dominates,
and is a tad slower.
2016-04-07 15:51:28 -05:00
Florian Weimer 1d2a8245ff hsearch_r: Include <limits.h>
It is needed for UINT_MAX.
2016-04-07 13:48:00 +02:00
Florian Weimer c04af6068b scratch_buffer_set_array_size: Include <limits.h>
It is needed for CHAR_BIT.
2016-04-07 13:46:28 +02:00
H.J. Lu a7d1c51482 X86-64: Prepare memmove-vec-unaligned-erms.S
Prepare memmove-vec-unaligned-erms.S to make the SSE2 version as the
default memcpy, mempcpy and memmove.

	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
	(MEMCPY_SYMBOL): New.
	(MEMPCPY_SYMBOL): Likewise.
	(MEMMOVE_CHK_SYMBOL): Likewise.
	Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk
	symbols.  Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on
	__mempcpy symbols.  Provide alias for __memcpy_chk in libc.a.
	Provide alias for memcpy in libc.a and ld.so.
2016-04-06 10:19:16 -07:00
H.J. Lu 4af1bb06c5 X86-64: Prepare memset-vec-unaligned-erms.S
Prepare memset-vec-unaligned-erms.S to make the SSE2 version as the
default memset.

	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
	(MEMSET_CHK_SYMBOL): New.  Define if not defined.
	(__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH.
	Disabled fro now.
	Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk
	symbols.  Properly check USE_MULTIARCH on __memset symbols.
2016-04-06 09:10:35 -07:00
H.J. Lu a25322f4e8 Add memcpy/memmove/memset benchmarks with large data
Add memcpy, memmove and memset benchmarks with large data sizes.

	* benchtests/Makefile (string-benchset): Add memcpy-large,
	memmove-large and memset-large.
	* benchtests/bench-memcpy-large.c: New file.
	* benchtests/bench-memmove-large.c: Likewise.
	* benchtests/bench-memmove-large.c: Likewise.
	* benchtests/bench-string.h (TIMEOUT): Don't redefine.
2016-04-06 08:37:39 -07:00
Stefan Liebler aa7353ce5c Mention Bug in ChangeLog for S390: Save and restore fprs/vrs while resolving symbols.
The Bugzilla 19916 is added to the ChangeLog for
commit 4603c51ef7.
2016-04-06 15:21:00 +02:00
H.J. Lu ec0cac9a1f Force 32-bit displacement in memset-vec-unaligned-erms.S
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force
	32-bit displacement to avoid long nop between instructions.
2016-04-05 05:21:19 -07:00
H.J. Lu 696ac77484 Add a comment in memset-sse2-unaligned-erms.S
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add
	a comment on VMOVU and VMOVA.
2016-04-05 05:19:18 -07:00
Florian Weimer 985fc132f2 strfmon_l: Use specified locale for number formatting [BZ #19633] 2016-04-04 15:18:13 +02:00
H.J. Lu 5cd7af016d Don't put SSE2/AVX/AVX512 memmove/memset in ld.so
Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX
and AVX512 memmove and memset in ld.so.

	* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip
	if not in libc.
	* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
	Likewise.
2016-04-03 14:35:38 -07:00
H.J. Lu ea2785e96f Fix memmove-vec-unaligned-erms.S
__mempcpy_erms and __memmove_erms can't be placed between __memmove_chk
and __memmove it breaks __memmove_chk.

Don't check source == destination first since it is less common.

	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
	(__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk
	with unaligned_erms.
	(__memmove_erms): Skip if source == destination.
	(__memmove_unaligned_erms): Don't check source == destination
	first.
2016-04-03 12:38:25 -07:00
H.J. Lu 27d3ce1467 Remove Fast_Copy_Backward from Intel Core processors
Intel Core i3, i5 and i7 processors have fast unaligned copy and
copy backward is ignored.  Remove Fast_Copy_Backward from Intel Core
processors to avoid confusion.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
	bit_arch_Fast_Copy_Backward for Intel Core proessors.
2016-04-01 15:09:14 -07:00
Adhemerval Zanella 2e51bc3813 Use PTR_ALIGN_DOWN on strcspn and strspn
Tested on aarch64.

	* string/strcspn.c (strcspn): Use PTR_ALIGN_DOWN.
	* string/strspn.c (strspn): Likewise.
2016-04-01 18:33:03 -03:00
H.J. Lu 344303f3cf Test 64-byte alignment in memset benchtest
Add 64-byte alignment tests in memset benchtest for 64-byte vector
registers.

	* benchtests/bench-memset.c (do_test): Support 64-byte
	alignment.
	(test_main): Test 64-byte alignment.
2016-04-01 10:00:12 -07:00
H.J. Lu aea44bf61a Test 64-byte alignment in memmove benchtest
Add 64-byte alignment tests in memmove benchtest for 64-byte vector
registers.

	* benchtests/bench-memmove.c (test_main): Test 64-byte
	alignment.
2016-04-01 09:59:09 -07:00
H.J. Lu 32b28d24a1 Test 64-byte alignment in memcpy benchtest
Add 64-byte alignment tests in memcpy benchtest for 64-byte vector
registers.

	* benchtests/bench-memcpy.c (test_main): Test 64-byte alignment.
2016-04-01 09:57:53 -07:00
Adhemerval Zanella 528ffb3a04 Remove powerpc64 strspn, strcspn, and strpbrk implementation
This patch removes the powerpc64 optimized strspn, strcspn, and
strpbrk assembly implementation now that the default C one
implements the same strategy.  On internal glibc benchtests
current implementations shows similar performance with -O2.

Tested on powerpc64le (POWER8).

	* sysdeps/powerpc/powerpc64/strcspn.S: Remove file.
	* sysdeps/powerpc/powerpc64/strpbrk.S: Remove file.
	* sysdeps/powerpc/powerpc64/strspn.S: Remove file.
2016-04-01 10:44:45 -03:00
Adhemerval Zanella 282b71f07e Improve generic strpbrk performance
With now a faster strcspn implementation, it is faster to just use
it with some return tests than reimplementing strpbrk itself.
As for strcspn optimization, it is generally at least 10 times faster
than the existing implementation on bench-strspn on a few AArch64
implementations.

Also the string/bits/string2.h inlines make no longer sense, as current
implementation will already implement most of the optimizations.

Tested on x86_64, i386, and aarch64.

	* string/strpbrk.c (strpbrk): Rewrite function.
	* string/bits/string2.h (strpbrk): Use __builtin_strpbrk.
	(__strpbrk_c2): Likewise.
	(__strpbrk_c3): Likewise.
	* string/string-inlines.c
	[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strpbrk_c2):
	Likewise.
	[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strpbrk_c3):
	Likewise.
2016-04-01 10:44:45 -03:00
Adhemerval Zanella 91f3b75f47 Improve generic strspn performance
As for strcspn, this patch improves strspn performance using a much
faster algorithm.  It first constructs a 256-entry table based on
the accept string and then uses it as a lookup table for the
input string.  As for strcspn optimization, it is generally at least
10 times faster than the existing implementation on bench-strspn
on a few AArch64 implementations.

Also the string/bits/string2.h inlines make no longer sense, as current
implementation will already implement most of the optimizations.

Tested on x86_64, i686, and aarch64.

	* string/strspn.c (strcspn): Rewrite function.
	* string/bits/string2.h (strspn): Use __builtin_strcspn.
	(__strspn_c1): Remove inline function.
	(__strspn_c2): Likewise.
	(__strspn_c3): Likewise.
	* string/string-inlines.c
	[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c1): Add
	compatibility symbol.
	[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c2):
	Likewise.
	[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c3):
	Likewise.
2016-04-01 10:44:44 -03:00
Wilco Dijkstra d3496c9f4f Improve generic strcspn performance
Improve strcspn performance using a much faster algorithm.  It is kept simple
so it works well on most targets.  It is generally at least 10 times faster
than the existing implementation on bench-strcspn on a few AArch64
implementations, and for some tests 100 times as fast (repeatedly calling
strchr on a small string is extremely slow...).

In fact the string/bits/string2.h inlines make no longer sense, as GCC
already uses strlen if reject is an empty string, strchrnul is 5 times as
fast as __strcspn_c1, while __strcspn_c2 and __strcspn_c3 are slower than
the strcspn main loop for large strings (though reject length 2-4 could be
special cased in the future to gain even more performance).

Tested on x86_64, i686, and aarch64.

	* string/Version (libc): Add GLIBC_2.24.
	* string/strcspn.c (strcspn): Rewrite function.
	* string/bits/string2.h (strcspn): Use __builtin_strcspn.
	(__strcspn_c1): Remove inline function.
	(__strcspn_c2): Likewise.
	(__strcspn_c3): Likewise.
	* string/string-inline.c
	[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c1): Add
	compatibility symbol.
	[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c2):
	Likewise.
	[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c3):
	Likewise.
	* sysdeps/i386/string-inlines.c: Include generic string-inlines.c.
2016-04-01 10:44:40 -03:00
Stefan Liebler d8a012c5c9 S390: Use ahi instead of aghi in 32bit _dl_runtime_resolve.
This patch uses ahi instead of aghi in 32bit _dl_runtime_resolve
to adjust the stack pointer. This is no functional change,
but a cosmetic one.

ChangeLog:

	* sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_resolve):
	Use ahi instead of aghi to adjust stack pointer.
2016-04-01 10:42:54 +02:00
Paul E. Murphy 37a4c70bd4 Increase internal precision of ldbl-128ibm decimal printf [BZ #19853]
When the signs differ, the precision of the conversion sometimes
drops below 106 bits.  This strategy is identical to the
hexadecimal variant.

I've refactored tst-sprintf3 to enable testing a value with more
than 30 significant digits in order to demonstrate this failure
and its solution.

Additionally, this implicitly fixes a typo in the shift
quantities when subtracting from the high mantissa to compute
the difference.
2016-03-31 12:14:33 -05:00
H.J. Lu 830566307f Add x86-64 memset with unaligned store and rep stosb
Implement x86-64 memset with unaligned store and rep movsb.  Support
16-byte, 32-byte and 64-byte vector register sizes.  A single file
provides 2 implementations of memset, one with rep stosb and the other
without rep stosb.  They share the same codes when size is between 2
times of vector register size and REP_STOSB_THRESHOLD which defaults
to 2KB.

Key features:

1. Use overlapping store to avoid branch.
2. For size <= 4 times of vector register size, fully unroll the loop.
3. For size > 4 times of vector register size, store 4 times of vector
register size at a time.

	[BZ #19881]
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and
	memset-avx512-unaligned-erms.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned,
	__memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned,
	__memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned,
	__memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned,
	__memset_sse2_unaligned_erms, __memset_erms,
	__memset_avx2_unaligned, __memset_avx2_unaligned_erms,
	__memset_avx512_unaligned_erms and __memset_avx512_unaligned.
	* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New
	file.
	* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:
	Likewise.
2016-03-31 10:06:07 -07:00
H.J. Lu 88b57b8ed4 Add x86-64 memmove with unaligned load/store and rep movsb
Implement x86-64 memmove with unaligned load/store and rep movsb.
Support 16-byte, 32-byte and 64-byte vector register sizes.  When
size <= 8 times of vector register size, there is no check for
address overlap bewteen source and destination.  Since overhead for
overlap check is small when size > 8 times of vector register size,
memcpy is an alias of memmove.

A single file provides 2 implementations of memmove, one with rep movsb
and the other without rep movsb.  They share the same codes when size is
between 2 times of vector register size and REP_MOVSB_THRESHOLD which
is 2KB for 16-byte vector register size and scaled up by large vector
register size.

Key features:

1. Use overlapping load and store to avoid branch.
2. For size <= 8 times of vector register size, load  all sources into
registers and store them together.
3. If there is no address overlap bewteen source and destination, copy
from both ends with 4 times of vector register size at a time.
4. If address of destination > address of source, backward copy 8 times
of vector register size at a time.
5. Otherwise, forward copy 8 times of vector register size at a time.
6. Use rep movsb only for forward copy.  Avoid slow backward rep movsb
by fallbacking to backward copy 8 times of vector register size at a
time.
7. Skip when address of destination == address of source.

	[BZ #19776]
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
	memmove-avx512-unaligned-erms.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test
	__memmove_chk_avx512_unaligned_2,
	__memmove_chk_avx512_unaligned_erms,
	__memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
	__memmove_chk_sse2_unaligned_2,
	__memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
	__memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
	__memmove_avx512_unaligned_erms, __memmove_erms,
	__memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
	__memcpy_chk_avx512_unaligned_2,
	__memcpy_chk_avx512_unaligned_erms,
	__memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
	__memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
	__memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
	__memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
	__memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
	__memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
	__mempcpy_chk_avx512_unaligned_erms,
	__mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
	__mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
	__mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
	__mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
	__mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
	__mempcpy_erms.
	* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
	file.
	* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
	Likwise.
	* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
	Likwise.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
	Likwise.
2016-03-31 10:04:40 -07:00
Stefan Liebler 5cdd1989d1 S390: Extend structs La_s390_regs / La_s390_retval with vector-registers.
Starting with z13, vector registers can also occur as argument registers.
Thus the passed input/output register structs for
la_s390_[32|64]_gnu_plt[enter|exit] functions should reflect those new
registers. This patch extends these structs La_s390_regs and La_s390_retval
and adjusts _dl_runtime_profile() to handle those fields in case of
running on a z13 machine.

ChangeLog:

	* sysdeps/s390/bits/link.h: (La_s390_vr) New typedef.
	(La_s390_32_regs): Append vector register lr_v24-lr_v31.
	(La_s390_64_regs): Likewise.
	(La_s390_32_retval): Append vector register lrv_v24.
	(La_s390_64_retval): Likeweise.
	* sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_profile):
	Handle extended structs La_s390_32_regs and La_s390_32_retval.
	* sysdeps/s390/s390-64/dl-trampoline.h (_dl_runtime_profile):
	Handle extended structs La_s390_64_regs and La_s390_64_retval.
2016-03-31 17:37:16 +02:00
Stefan Liebler 4603c51ef7 S390: Save and restore fprs/vrs while resolving symbols.
On s390, no fpr/vrs were saved while resolving a symbol
via _dl_runtime_resolve/_dl_runtime_profile.

According to the abi, the fpr-arguments are defined as call clobbered.
In leaf-functions, gcc 4.9 and newer can use fprs for saving/restoring gprs
instead of saving them to the stack.
If gcc do this in one of the resolver-functions, then the floating point
arguments of a library-function are invalid for the first library-function-call.
Thus, this patch saves/restores the fprs around the resolving code.

The same could occur for vector registers. Furthermore an ifunc-resolver
could also clobber the vector/floating point argument registers.
Thus this patch provides the further variants _dl_runtime_resolve_vx/
_dl_runtime_profile_vx, which are used if the kernel claims, that
we run on a machine with vector registers.

Furthermore, if _dl_runtime_profile calls _dl_call_pltexit,
the pointers to inregs-/outregs-structs were setup invalid.
Now they point to the correct location in the stack-frame.
Before branching back to the caller, the return values are now
restored instead of containing the return values of the
_dl_call_pltexit() call.
On s390-32, an endless loop occurs if _dl_call_pltexit() should be called.
Now, this code-path branches to this function instead of just after the
preceding basr-instruction.

ChangeLog:

	* sysdeps/s390/s390-32/dl-trampoline.S: Include dl-trampoline.h twice
	to create a non-vector/vector version for _dl_runtime_resolve and
	_dl_runtime_profile. Move implementation to ...
	* sysdeps/s390/s390-32/dl-trampoline.h: ... here.
	(_dl_runtime_resolve) Save and restore fpr/vrs.
	(_dl_runtime_profile) Save and restore vrs and fix some issues
	if _dl_call_pltexit is called.
	* sysdeps/s390/s390-32/dl-machine.h (elf_machine_runtime_setup):
	Choose the correct resolver function if running on a machine with vx.
	* sysdeps/s390/s390-64/dl-trampoline.S: Include dl-trampoline.h twice
	to create a non-vector/vector version for _dl_runtime_resolve and
	_dl_runtime_profile. Move implementation to ...
	* sysdeps/s390/s390-64/dl-trampoline.h: ... here.
	(_dl_runtime_resolve) Save and restore fpr/vrs.
	(_dl_runtime_profile) Save and restore vrs and fix some issues
	* sysdeps/s390/s390-64/dl-machine.h: (elf_machine_runtime_setup):
	Choose the correct resolver function if running on a machine with vx.
2016-03-31 17:37:16 +02:00
Adhemerval Zanella e91bd74658 Fix tst-dlsym-error build
This patch fixes the new test tst-dlsym-error build on aarch64
(and possible other architectures as well) due missing strchrnul
definition.

	* elf/tst-dlsym-error.c: Include <string.h> for strchrnul.
2016-03-31 10:51:51 -03:00
Florian Weimer 7d45c163d0 Report dlsym, dlvsym lookup errors using dlerror [BZ #19509]
* elf/dl-lookup.c (_dl_lookup_symbol_x): Report error even if
	skip_map != NULL.
	* elf/tst-dlsym-error.c: New file.
	* elf/Makefile (tests): Add tst-dlsym-error.
	(tst-dlsym-error): Link against libdl.
2016-03-31 11:26:55 +02:00
Joseph Myers 258ec8abc1 [microblaze] Remove __ASSUME_FUTIMESAT.
MicroBlaze has a special version of futimesat.c because it gained the
futimesat syscall later than other non-asm-generic architectures.  Now
the minimum kernel is recent enough that this syscall can always be
assumed to be present for MicroBlaze, so this patch removes the
special version and the __ASSUME_FUTIMESAT macro, resulting in the
sysdeps/unix/sysv/linux/futimesat.c version being used.

Untested.

	* sysdeps/unix/sysv/linux/microblaze/kernel-features.h
	(__ASSUME_FUTIMESAT): Remove macro.
	* sysdeps/unix/sysv/linux/microblaze/futimesat.c: Remove file.
2016-03-29 22:13:36 +00:00