Commit Graph

28715 Commits

Author SHA1 Message Date
Florian Weimer 2dd6ee79b1 posix_fallocate, posix_fallocate64 stub: Do not set errno
These functions return an error code.
2015-04-24 20:08:43 +02:00
Roland McGrath c25fec6f57 ARM: Define PI_STATIC_AND_HIDDEN. 2015-04-24 10:51:49 -07:00
Florian Weimer 42261ad731 Make time zone file parser more robust [BZ #17715] 2015-04-24 17:34:48 +02:00
Florian Weimer ed159672eb Do not build with -Winline
-Winline causes architecture- and optimization-dependent build failures
due to -Werror.  -Winline warns about inlining decisions based on
branch hints, in effect preventing the use of inline functions in
header files (because they might be called on unlikely branches, leading
to a decision not to inline).

The option was apparently added to the glibc build at a time when GCC
did not support the always_inline attribute.  With current GCC versions,
inlining failure for functions declared always_inline will receive a
warning under -Wattributes, which is enabled by default, so -Winline
appears unnecessary.
2015-04-24 17:06:39 +02:00
Stefan Liebler 8666ab5c42 S/390: Regenerate ULPs 2015-04-24 13:37:48 +02:00
Stefan Liebler f7fba80508 S/390: Get cache information via sysconf
This patch adds support to query cache information on s390
via sysconf() function - e.g. with _SC_LEVEL1_ICACHE_SIZE.
The attributes size, linesize and assoc can be queried
for cache level 1 - 4 via "extract cpu attribute" instruction,
which was first available with z10.

* NEWS: Mention sysconf() cache information support for s390.
* sysdeps/unix/sysv/linux/s390/sysconf.c: New File.
2015-04-24 13:37:39 +02:00
Wilco Dijkstra 92f2897953 Use __copysign rather than copysign. 2015-04-22 12:07:56 +00:00
Arjun Shankar 2959eda927 CVE-2015-1781: resolv/nss_dns/dns-host.c buffer overflow [BZ#18287] 2015-04-21 14:06:50 +02:00
Adhemerval Zanella 7bf8fb1042 libc-vdso.h place consolidation
This patch moves the libc-vdso.h internal header from bits folder to
default architecture one and also corrects the remaning includes in
the files.
2015-04-20 08:51:17 -03:00
Paul Eggert 03c1e456b0 Better fix for setenv (..., NULL, ...)
* stdlib/setenv.c (__add_to_environ):
Dump core quickly if setenv (..., NULL, ...) is called.
This time, do it the right way, and pacify GCC with a pragma.
2015-04-19 01:07:31 -07:00
Roland McGrath 2bd2cad9e8 Avoid confusing compiler with dynamically impossible statically invalid dereference in _dl_close_worker. 2015-04-17 14:29:40 -07:00
Roland McGrath 328c44c367 Fuller check for invalid NSID in _dl_open. 2015-04-17 12:11:58 -07:00
David S. Miller aa4980fc31 Sparc memchr/memcmp/strncmp fixes from Il'ya Malakhov.
[BZ #17825]
	* sysdeps/sparc/sparc64/memchr.S: Fix signedness handling of length.
	* sysdeps/sparc/sparc64/memcmp.S: Likewise.
	* sysdeps/sparc/sparc64/strncmp.S: Likewise.
2015-04-17 10:26:14 -07:00
Roland McGrath d1e44df1fa Add arm-nacl port. 2015-04-17 09:02:19 -07:00
David S. Miller f70925993a Convert sparc over to lowlevellock-futex.h
* sysdeps/unix/sysv/linux/sparc/lowlevellock.h: Make use of
	lowlevellock-futex.h
2015-04-16 13:05:41 -07:00
Chris Metcalf da6989f9a5 tile: Enable PI_STATIC_AND_HIDDEN
This does make ld.so very slightly larger (0.3%) and doesn't seem to
actually improve performance; in fact, my limited testing suggested a
slight (0.1%) performance decrease (running fork/exec of a no-op program
in a loop), but I didn't do enough testing to establish statistical
significance.

However, Roland agrees that it makes sense to switch tile to using
this path, since it's the more standard way.
2015-04-16 09:40:21 -04:00
Adhemerval Zanella fb78612a96 powerpc: Fix __wcschr static build
This patch fix the static build for strftime, which uses __wcschr.
Current powerpc32 implementation defines the __wcschr be an alias to
__wcschr_ppc32 and current implementation misses the correct alias for
static build.

It also changes the default wcschr.c logic so a IFUNC implementation
should just define WCSCHR and undefine the required alias/internal
definitions.
2015-04-15 16:01:48 -03:00
David S. Miller a8b6a3a6c1 Rebuilt fresh sparc ULPS to get rid of removed tests.
* sysdeps/sparc/fpu/libm-test-ulps: Regenerate from scratch.
2015-04-15 09:42:54 -07:00
Stefan Liebler 920a0395ba Use correct signedness in wcsncmp
[BZ #18206]
	* wcsmbs/wcsncmp.c (wcsncmp): Compare as wchar_t, not wint_t.
	  Use signed comparision instead of substraction to avoid
	  overflow bug.
	* localedata/tests-mbwc/tst_wcsncmp.c (tst_wcsncmp):
	  Take the sign of ret.
	* localedata/tests-mbwc/dat_wcsncmp.c (tst_wcsncmp_loc):
	  Do not expect precise return values. Only the sign matters.
	* wcsmbs/Makefile (strop-tests): Add wcsncmp.
	* wcsmbs/test-wcsncmp.c: New File.
	* string/test-strncmp.c: Add wcsncmp support.
2015-04-13 21:25:04 +02:00
Stefan Liebler de8aadd52c Set errno for log1p on pole/domain error.
According to bug 6792, errno is not set to ERANGE/EDOM
by calling log1p/log1pf/log1pl with x = -1 or x < -1.

This patch adds a wrapper which sets errno in those cases
and returns the value of the existing __log1p function.
The log1p is now an alias to the wrapper function
instead of __log1p.

The files in sysdeps are reflecting these changes.
The ia64 implementation sets errno by itself,
thus the wrapper-file is empty.

The libm-test is adjusted for log1p-tests to check errno.

	[BZ #6792]
	* math/w_log1p.c: New file.
	* math/w_log1pf.c: Likewise.
	* math/w_log1pl.c: Likewise.
	* math/Makefile (libm-calls): Add w_log1p.
	* math/s_log1pl.c (log1pl): Remove weak_alias.
	* sysdeps/i386/fpu/s_log1p.S (log1p): Likewise.
	* sysdeps/i386/fpu/s_log1pf.S (log1pf): Likewise.
	* sysdeps/i386/fpu/s_log1pl.S (log1pl): Likewise.
	* sysdeps/x86_64/fpu/s_log1pl.S (log1pl): Likewise.
	* sysdeps/ieee754/dbl-64/s_log1p.c (log1p): Likewise.
	[NO_LONG_DOUBLE] (log1pl): Likewise.
	* sysdeps/ieee754/flt-32/s_log1pf.c (log1pf): Likewise.
	* sysdeps/ieee754/ldbl-128/s_log1pl.c (log1pl): Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_log1pl.c
	(log1p): Remove long_double_symbol.
	* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c (log1pl): Likewise.
	* sysdeps/ieee754/ldbl-64-128/w_log1pl.c: New file.
	* sysdeps/ieee754/ldbl-128ibm/w_log1pl.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_log1p.c: Define empty weak_alias to
	remove weak_alias for corresponding log1p function.
	* sysdeps/m68k/m680x0/fpu/s_log1pf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_log1pl.c: Likewise.
	* sysdeps/ia64/fpu/w_log1p.c: New file.
	* sysdeps/ia64/fpu/w_log1pf.c: Likewise.
	* sysdeps/ia64/fpu/w_log1pl.c: Likewise.
	* math/libm-test.inc (log1p_test_data):	Add errno expectations.
2015-04-13 21:19:27 +02:00
Stefan Liebler 7378b1f8f8 Update tst_mbrlen/tst_mbrtowc for mblen change
commit 9781a37002 changed the expected
results for mbrlen in case of passing n=0 to -2. The initialization of
tst_mbrlen_loc and tst_mbrtowc should be updated accordingly.

	* tests-mbwc/dat_mbrlen.c (tst_mbrlen_loc): Change expected
	result to -2 in case of n == 0.
	* tests-mbwc/tst_mbrtowc.c (tst_mbrtowc): Check result against
	-2 instead of 0.
2015-04-10 15:45:53 -07:00
Joseph Myers 5556d30cae Fix strtof decimal rounding close to half least subnormal (bug 18247).
Bug 18247 is an off-by-one error in strtof's determination of a
decimal exponent such that any value with that decimal exponent is at
most half the least subnormal and so the appropriate underflowing
value for the rounding mode can be determined with no
multiple-precision computations.  (Whether the value is in fact safe
despite the off-by-one depends on the floating-point format in
question.  It's wrong for float and for m68k ldbl-96 but not for other
supported formats.)  This patch corrects the computation of the
exponent in question to be safe in general, adding a comment
explaining the new computation.

Tested for x86_64.

	[BZ #18247]
	* stdlib/strtod_l.c (____STRTOF_INTERNAL): Decrease minimum
	decimal exponent by 1.
	* stdlib/tst-strtod-round-data: Add more tests.
	* stdlib/tst-strtod-round.c (tests): Regenerated.
2015-04-10 20:45:30 +00:00
Joseph Myers b3c66c534f Add more tests of clog and clog10.
This patch adds some randomly-generated tests of clog and clog10 that
are observed to increase ulps on x86_64.

Tested for x86_64 and x86 and ulps updated accordingly.

	* math/auto-libm-test-in: Add more tests of clog and clog10.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-04-09 22:14:34 +00:00
Roland McGrath 8a257e2cb5 Omit libc-modules.h for all .v.i files. 2015-04-09 14:42:29 -07:00
Roland McGrath 054392910b Let non-add-on preconfigure scripts set libc_config_ok. 2015-04-09 13:55:11 -07:00
Roland McGrath b0b88abc1c Make test-skeleton.c grok TEST_DIRECT magic environment variable. 2015-04-09 11:15:17 -07:00
Florian Weimer 2902af1631 scratch_buffer: Suppress truncation warning on 32-bit 2015-04-09 17:12:42 +02:00
David S. Miller 23ebf74307 Update SPARC ulps.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
2015-04-08 20:34:49 -07:00
Joseph Myers 787d22bce6 Add more tests of atanh.
This patch adds some randomly-generated tests of atanh that are
observed to increase ulps on x86_64.

Tested for x86_64 and x86 and ulps updated accordingly.

	* math/auto-libm-test-in: Add more tests of atanh.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-04-08 21:13:35 +00:00
Joseph Myers 024bcc5106 Add more tests of atan.
This patch adds some randomly-generated tests of atan that are
observed to increase ulps on x86_64.

Tested for x86_64 and x86 and ulps updated accordingly.

	* math/auto-libm-test-in: Add more tests of atan.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-04-08 21:00:03 +00:00
Florian Weimer 561052ad35 nscd_getgr_r: Use struct scratch_buffer instead of extend_alloca
The lack of alloca accounting means that the old code could run out of
stack space if multiple retries are needed.
2015-04-08 21:08:03 +02:00
Florian Weimer c6ee40da8b getnameinfo: Use struct scratch_buffer instead of extend_alloca
This patch adjusts the internal function nrl_domainname, too.
2015-04-08 21:07:44 +02:00
Florian Weimer 794a74af4d _nss_compat_initgroups_dyn: Use struct scratch_buffer instead of extend_alloca 2015-04-08 21:07:24 +02:00
Florian Weimer 866ba63b31 grp: Rewrite to use struct scratch_buffer instead of extend_alloca
grp/compat-initgroups.c is included from nscd/initgrcache.c, which is
why the #include directive has to be added there as well.
2015-04-08 21:07:03 +02:00
Florian Weimer 7b8399f479 pldd: Use struct scratch_buffer instead of extend_alloca 2015-04-08 21:06:49 +02:00
Joseph Myers da0cf658c6 Add more tests of cbrt.
This patch adds some randomly-generated tests of cbrt that are
observed to increase ulps on x86_64.

Tested for x86_64 and x86 and ulps updated accordingly.

	* math/auto-libm-test-in: Add more tests of cbrt.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2015-04-08 17:56:15 +00:00
Joseph Myers 80352c01c1 Add more tests of cabs.
This patch adds some randomly-generated tests of cabs that are
observed to increase ulps on x86_64.

Tested for x86_64 and x86 and ulps updated accordingly.

	* math/auto-libm-test-in: Add more tests of cabs.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-04-08 17:46:07 +00:00
Joseph Myers 8431838dde Fix dbl-64 atan2 in non-default rounding modes (bug 18210, bug 18211).
The dbl-64 implementation of atan2 does computations that expect to
run in round-to-nearest mode, and in other modes the errors can
accumulate to more than the maximum accepted 9ulp.  This patch makes
it use FE_TONEAREST internally, similar to other functions with such
issues.  Tests that previously produced large errors are added for
atan2 and the closely related carg, clog and clog10 functions.

Tested for x86_64 and x86 and ulps updated accordingly.

	[BZ #18210]
	[BZ #18211]
	* sysdeps/ieee754/dbl-64/e_atan2.c: Include <fenv.h>.
	(__ieee754_atan2): Set FE_TONEAREST mode for internal
	computations.
	* math/auto-libm-test-in: Add more tests of atan2, carg, clog and
	clog10.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-04-08 17:32:17 +00:00
Joseph Myers ae63c7ebed Fix dbl-64 atan in non-default rounding modes (bug 18197).
The dbl-64 implementation of atan does computations that expect to run
in round-to-nearest mode, and in other modes the errors can accumulate
to more than the maximum accepted 9ulp.  This patch makes it use
FE_TONEAREST internally, similar to other functions with such issues.

Tested for x86_64 and x86; no ulps updates needed.

	[BZ #18197]
	* sysdeps/ieee754/dbl-64/s_atan.c: Include <fenv.h>.
	(atan): Set FE_TONEAREST mode for internal computations.
	* math/auto-libm-test-in: Add more tests of atan.
	* math/auto-libm-test-out: Regenerated.
2015-04-08 17:14:12 +00:00
James Cowgill d5856d06c3 [BZ #17930] MIPS: Define SHM_NORESERVE.
[BZ #17930]
	* sysdeps/unix/sysv/linux/mips/bits/shm.h (SHM_NORESERVE): Define.
2015-04-07 17:23:54 +00:00
Florian Weimer 72301304a5 scratch_buffer_grow_preserve: Add missing #include <string.h> 2015-04-07 17:46:58 +02:00
Florian Weimer cfcfd4614b Add struct scratch_buffer and its internal helper functions
These will be used from NSS modules, so they have to be exported.
2015-04-07 11:03:43 +02:00
Richard Henderson 9e8c0381bb math/test-fenvinline: Cast fe_exc to unsigned int before printing
On Alpha and IA-64, fexcept_t is unsigned long.  But all the values
fit within an int, so the cast is ok for printing.  All other hosts
use unsigned int or unsigned short already.
2015-04-06 10:43:59 -07:00
Richard Henderson 974c4a36d8 alpha: Update libm-test-ulps
Regenerated from scratch.
2015-04-06 10:38:16 -07:00
Richard Henderson cc47c82476 alpha: Unconditionally include dl-sysdep.h in sysdep.h
Fixes a -Wundef error wrt RTLD_PRIVATE_ERRNO.
2015-04-06 10:36:44 -07:00
Ondřej Bílka 9781a37002 Handle mblen return code when n is zero. 2015-04-03 15:47:12 +02:00
Florian Weimer 37d60d970c Define libc_max_align_t for internal use 2015-04-02 19:55:21 +02:00
Andreas Schwab b763f6ae85 aarch64: Increase MINSIGSTKSZ and SIGSTKSZ (bug 16850) 2015-04-02 12:18:11 +02:00
Mel Gorman c26efef979 malloc: Consistently apply trim_threshold to all heaps [BZ #17195]
Trimming heaps is a balance between saving memory and the system overhead
required to update page tables and discard allocated pages. The malloc
option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide
where this balance point is but it is only applied to the main arena.

For scalability reasons, glibc malloc has per-thread heaps but these are
shrunk with madvise() if there is one page free at the top of the heap.
In some circumstances this can lead to high system overhead if a thread
has a control flow like

    while (data_to_process) {
        buf = malloc(large_size);
        do_stuff();
        free(buf);
    }

For a large size, the free() will call madvise (pagetable teardown, page
free and TLB flush) every time followed immediately by a malloc (fault,
kernel page alloc, zeroing and charge accounting). The kernel overhead
can dominate such a workload.

This patch allows the user to tune when madvise gets called by applying
the trim threshold to the per-thread heaps and using similar logic to the
main arena when deciding whether to shrink. Alternatively if the dynamic
brk/mmap threshold gets adjusted then the new values will be obeyed by
the per-thread heaps.

Bug 17195 was a test case motivated by a problem encountered in scientific
applications written in python that performance badly due to high page fault
overhead. The basic operation of such a program was posted by Julian Taylor
https://sourceware.org/ml/libc-alpha/2015-02/msg00373.html

With this patch applied, the overhead is eliminated. All numbers in this
report are in seconds and were recorded by running Julian's program 30
times.

pyarray
                                 glibc               madvise
                                  2.21                    v2
System  min             1.81 (  0.00%)        0.00 (100.00%)
System  mean            1.93 (  0.00%)        0.02 ( 99.20%)
System  stddev          0.06 (  0.00%)        0.01 ( 88.99%)
System  max             2.06 (  0.00%)        0.03 ( 98.54%)
Elapsed min             3.26 (  0.00%)        2.37 ( 27.30%)
Elapsed mean            3.39 (  0.00%)        2.41 ( 28.84%)
Elapsed stddev          0.14 (  0.00%)        0.02 ( 82.73%)
Elapsed max             4.05 (  0.00%)        2.47 ( 39.01%)

               glibc     madvise
                2.21          v2
User          141.86      142.28
System         57.94        0.60
Elapsed       102.02       72.66

Note that almost a minutes worth of system time is eliminted and the
program completes 28% faster on average.

To illustrate the problem without python this is a basic test-case for
the worst case scenario where every free is a madvise followed by a an alloc

/* gcc bench-free.c -lpthread -o bench-free */
static int num = 1024;

void __attribute__((noinline,noclone)) dostuff (void *p)
{
}

void *worker (void *data)
{
  int i;

  for (i = num; i--;)
    {
      void *m = malloc (48*4096);
      dostuff (m);
      free (m);
    }

  return NULL;
}

int main()
{
  int i;
  pthread_t t;
  void *ret;
  if (pthread_create (&t, NULL, worker, NULL))
    exit (2);
  if (pthread_join (t, &ret))
    exit (3);
  return 0;
}

Before the patch, this resulted in 1024 calls to madvise. With the patch applied,
madvise is called twice because the default trim threshold is high enough to avoid
this.

This a more complex case where there is a mix of frees. It's simply a different worker
function for the test case above

void *worker (void *data)
{
  int i;
  int j = 0;
  void *free_index[num];

  for (i = num; i--;)
    {
      void *m = malloc ((i % 58) *4096);
      dostuff (m);
      if (i % 2 == 0) {
        free (m);
      } else {
        free_index[j++] = m;
      }
    }
  for (; j >= 0; j--)
    {
      free(free_index[j]);
    }

  return NULL;
}

glibc 2.21 calls malloc 90305 times but with the patch applied, it's
called 13438. Increasing the trim threshold will decrease the number of
times it's called with the option of eliminating the overhead.

ebizzy is meant to generate a workload resembling common web application
server workloads. It is threaded with a large working set that at its core
has an allocation, do_stuff, free loop that also hits this case. The primary
metric of the benchmark is records processed per second. This is running on
my desktop which is a single socket machine with an I7-4770 and 8 cores.
Each thread count was run for 30 seconds. It was only run once as the
performance difference is so high that the variation is insignificant.

                glibc 2.21              patch
threads 1            10230              44114
threads 2            19153              84925
threads 4            34295             134569
threads 8            51007             183387

Note that the saving happens to be a concidence as the size allocated
by ebizzy was less than the default threshold. If a different number of
chunks were specified then it may also be necessary to tune the threshold
to compensate

This is roughly quadrupling the performance of this benchmark. The difference in
system CPU usage illustrates why.

ebizzy running 1 thread with glibc 2.21
10230 records/s 306904
real 30.00 s
user  7.47 s
sys  22.49 s

22.49 seconds was spent in the kernel for a workload runinng 30 seconds. With the
patch applied

ebizzy running 1 thread with patch applied
44126 records/s 1323792
real 30.00 s
user 29.97 s
sys   0.00 s

system CPU usage was zero with the patch applied. strace shows that glibc
running this workload calls madvise approximately 9000 times a second. With
the patch applied madvise was called twice during the workload (or 0.06
times per second).

2015-02-10  Mel Gorman  <mgorman@suse.de>

  [BZ #17195]
  * malloc/arena.c (free): Apply trim threshold to per-thread heaps
    as well as the main arena.
2015-04-02 12:14:14 +05:30
H.J. Lu a3d9ab5070 Limit threads sharing L2 cache to 2 for SLM/KNL
Silvermont and Knights Landing have a modular system design with two cores
sharing an L2 cache.  If more than 2 cores are detected to shared L2 cache,
it should be adjusted for Silvermont and Knights Landing.

	[BZ #18185]
	* sysdeps/x86_64/cacheinfo.c (init_cacheinfo): Limit threads
	sharing L2 cache to 2 for Silvermont/Knights Landing.
2015-03-31 13:18:10 -07:00