glibc

Commit Graph

Author	SHA1	Message	Date
Wilco Dijkstra	c3d466cba1	Remove slow paths from pow Remove the slow paths from pow. Like several other double precision math functions, pow is exactly rounded. This is not required from math functions and causes major overheads as it requires multiple fallbacks using higher precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns of up to 100000x have been reported when the highest precision path triggers. All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1). The worst case error is ~0.506ULP. A simple test over a few hundred million values shows pow is 10% faster on average. This fixes BZ #13932. [BZ #13932] * sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove. * benchtests/pow-inputs: Update comment for slow path cases. * manual/probes.texi (slowpow_p10): Delete removed probe. (slowpow_p10): Likewise. * math/Makefile: Remove halfulp.c and slowpow.c. * sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1. * sysdeps/generic/math_private.h (__exp1): Remove error argument. (__halfulp): Remove. (__slowpow): Remove. * sysdeps/i386/fpu/halfulp.c: Delete file. * sysdeps/i386/fpu/slowpow.c: Likewise. * sysdeps/ia64/fpu/halfulp.c: Likewise. * sysdeps/ia64/fpu/slowpow.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument, improve comments and add error analysis. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis. (power1): Remove function: (log1): Remove error argument, add error analysis. (my_log2): Remove function. * sysdeps/ieee754/dbl-64/halfulp.c: Delete file. * sysdeps/ieee754/dbl-64/slowpow.c: Likewise. * sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise. * sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c. * sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c, slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file. * sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.	2018-02-12 10:47:09 +00:00
Siddhesh Poyarekar	dfa1b402a0	Benchmark inputs for pow These inputs cover all normal processing paths for pow including all its slow paths.	2013-10-28 16:36:46 +05:30
Siddhesh Poyarekar	a357259bf8	Add more directives to benchmark input files This patch adds some more directives to the benchmark inputs file, moving functionality from the Makefile and making the code generation script a bit cleaner. The function argument and return types that were earlier added as variables in the makefile and passed to the script via command line arguments are now the 'args' and 'ret' directive respectively. 'args' should be a colon separated list of argument types (skipped if the function doesn't accept any arguments) and 'ret' should be the return type. Additionally, an 'includes' directive may have a comma separated list of headers to include in the source. For example, the pow input file now looks like this: 42.0, 42.0 1.0000000000000020, 1.5 I did this to unclutter the benchtests Makefile a bit and eventually eliminate dependency of the tests on the Makefile and have tests depend on their respective include files only.	2013-10-07 11:51:25 +05:30
Siddhesh Poyarekar	f0ee064b7d	Allow multiple input domains to be run in the same benchmark program Some math functions have distinct performance characteristics in specific domains of inputs, where some inputs return via a fast path while other inputs require multiple precision calculations, that too at different precision levels. The way to implement different domains was to have a separate source file and benchmark definition, resulting in separate programs. This clutters up the benchmark, so this change allows these domains to be consolidated into the same input file. To do this, the input file format is now enhanced to allow comments with a preceding # and directives with two # at the begining of a line. A directive that looks like: tells the benchmark generation script that what follows is a different domain of inputs. The value of the 'name' directive (in this case, foo) is used in the output. The two input domains are then executed sequentially and their results collated separately. with the above directive, there would be two lines in the result that look like: func(): .... func(foo): ...	2013-04-30 14:17:57 +05:30
Siddhesh Poyarekar	81f311c2ee	Add benchmark tests for slowpow and slowexp Separate benchmarks for the fast and slow implementations of pow and exp since measuring both together doesn't make sense. Adjust the iterations for pow and exp accordingly so that they run long enough for the measurements to be meaningful.	2013-04-02 17:45:45 +05:30
Siddhesh Poyarekar	8cfdb7e056	Framework for performance benchmarking of functions See benchtests/Makefile to know how to use it.	2013-03-15 12:30:03 +05:30

6 Commits