glibc/sysdeps/i386/fpu
Szabolcs Nagy 72aa623345 Optimized generic expf and exp2f with wrappers
Based on new expf and exp2f code from
https://github.com/ARM-software/optimized-routines/

with wrapper on aarch64:
expf reciprocal-throughput: 2.3x faster
expf latency: 1.7x faster
without wrapper on aarch64:
expf reciprocal-throughput: 3.3x faster
expf latency: 1.7x faster
without wrapper on aarch64:
exp2f reciprocal-throughput: 2.8x faster
exp2f latency: 1.3x faster
libm.so size on aarch64:
.text size: -152 bytes
.rodata size: -1740 bytes
expf/exp2f worst case nearest rounding error: 0.502 ulp
worst case non-nearest rounding error: 1 ulp

Error checks are inline and errno setting is in separate tail called
functions, but the wrappers are kept in this patch to handle the
_LIB_VERSION==_SVID_ case.  (So e.g. errno is set twice for expf calls
and once for __expf_finite calls on targets where the new code is used.)

Double precision arithmetics is used which is expected to be faster on
most targets (including soft-float) than using single precision and it
is easier to get good precision result with it.

Const data is kept in a separate translation unit which complicates
maintenance a bit, but is expected to give good code for literal loads
on most targets and allows sharing data across expf, exp2f and powf.
(This data is disabled on i386, m68k and ia64 which have their own
expf, exp2f and powf code.)

Some details may need target specific tweaks:
- best convert and round to int operation in the arg reduction may be
different across targets.
- code was optimized on fma target, optimal polynomial eval may be
different without fma.
- gcc does not always generate good code for fp bit representation
access via unions or it may be inherently slow on some targets.

The libm-test-ulps will need adjustment because..
- The argument reduction ideally uses nearest rounded rint, but that is
not efficient on most targets, so the polynomial can get evaluated on a
wider interval in non-nearest rounding mode making 1 ulp errors common
in that case.
- The polynomial is evaluated such that it may have 1 ulp error on
negative tiny inputs with upward rounding.

	* math/Makefile (type-float-routines): Add math_errf and e_exp2f_data.
	* sysdeps/aarch64/fpu/math_private.h (TOINT_INTRINSICS): Define.
	(roundtoint, converttoint): Likewise.
	* sysdeps/ieee754/flt-32/e_expf.c: New implementation.
	* sysdeps/ieee754/flt-32/e_exp2f.c: New implementation.
	* sysdeps/ieee754/flt-32/e_exp2f_data.c: New file.
	* sysdeps/ieee754/flt-32/math_config.h: New file.
	* sysdeps/ieee754/flt-32/math_errf.c: New file.
	* sysdeps/ieee754/flt-32/t_exp2f.h: Remove.
	* sysdeps/i386/fpu/e_exp2f_data.c: New file.
	* sysdeps/i386/fpu/math_errf.c: New file.
	* sysdeps/ia64/fpu/e_exp2f_data.c: New file.
	* sysdeps/ia64/fpu/math_errf.c: New file.
	* sysdeps/m68k/m680x0/fpu/e_exp2f_data.c: New file.
	* sysdeps/m68k/m680x0/fpu/math_errf.c: New file.
2017-09-25 10:44:39 +01:00
..
Implies Use x86_64 fpu/bits/fenv.h for i386 and x86_64 2012-06-06 10:13:19 -07:00
Versions
doasin.c
e_acos.S Fix x86 acos near 1 (bug 13942). 2012-04-30 18:56:39 +00:00
e_acosf.S Fix acos (-1) in round-downwards mode on x86 (bug 14034). 2012-04-30 09:38:06 +00:00
e_acosh.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_acoshf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_acoshl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_acosl.c Fix x86 acos near 1 (bug 13942). 2012-04-30 18:56:39 +00:00
e_asin.S Refactor i386 libm code forcing underflow exceptions. 2015-09-24 21:41:00 +00:00
e_asinf.S Refactor i386 libm code forcing underflow exceptions. 2015-09-24 21:41:00 +00:00
e_atan2.S Refactor i386 libm code forcing underflow exceptions. 2015-09-24 21:41:00 +00:00
e_atan2f.S Refactor i386 libm code forcing underflow exceptions. 2015-09-24 21:41:00 +00:00
e_atan2l.c Optimize libm 2011-10-12 11:27:51 -04:00
e_atanh.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_atanhf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_atanhl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_exp.S Avoid excess range in results from i386 exp, hypot, pow functions (bug 18980). 2015-09-18 21:53:22 +00:00
e_exp2.S Avoid excess range in results from i386 exp, hypot, pow functions (bug 18980). 2015-09-18 21:53:22 +00:00
e_exp2f.S Avoid excess range in results from i386 exp, hypot, pow functions (bug 18980). 2015-09-18 21:53:22 +00:00
e_exp2f_data.c Optimized generic expf and exp2f with wrappers 2017-09-25 10:44:39 +01:00
e_exp2l.S Refactor i386 libm code forcing underflow exceptions. 2015-09-24 21:41:00 +00:00
e_exp10.S Avoid excess range in results from i386 exp, hypot, pow functions (bug 18980). 2015-09-18 21:53:22 +00:00
e_exp10f.S Avoid excess range in results from i386 exp, hypot, pow functions (bug 18980). 2015-09-18 21:53:22 +00:00
e_exp10l.S Fix exp10 inaccuracy and exceptions (bugs 13884, 13914). 2012-05-06 18:23:44 +00:00
e_expf.S Avoid excess range in results from i386 exp, hypot, pow functions (bug 18980). 2015-09-18 21:53:22 +00:00
e_expl.S Fix i386/x86_64 expl, exp10l, expm1l for sNaN input (bug 20226). 2016-06-08 21:55:06 +00:00
e_fmod.S Optimize libm 2011-10-12 11:27:51 -04:00
e_fmodf.S Optimize libm 2011-10-12 11:27:51 -04:00
e_fmodl.c Optimize libm 2011-10-12 11:27:51 -04:00
e_hypot.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_hypotf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_ilogb.S Remove useless __ilogb*_finite aliases 2012-04-18 00:40:13 +02:00
e_ilogbf.S Remove useless __ilogb*_finite aliases 2012-04-18 00:40:13 +02:00
e_ilogbl.S Remove useless __ilogb*_finite aliases 2012-04-18 00:40:13 +02:00
e_log.S Fix i386/x86_64 log* (1) zero sign for -ffinite-math-only (bug 19213). 2015-11-05 21:56:31 +00:00
e_log2.S Fix log2 (1) in round-downward mode (bug 17042). 2014-06-10 12:07:15 +00:00
e_log2f.S Fix log2 (1) in round-downward mode (bug 17042). 2014-06-10 12:07:15 +00:00
e_log2l.S Fix i386/x86_64 log2l (sNaN) (bug 20235). 2016-06-09 18:04:30 +00:00
e_log10.S Fix log10 (1) in round-downward mode (bug 16977). 2014-05-23 12:07:50 +00:00
e_log10f.S Fix log10 (1) in round-downward mode (bug 16977). 2014-05-23 12:07:50 +00:00
e_log10l.S Fix i386/x86_64 log10l (sNaN) (bug 20228). 2016-06-08 22:59:18 +00:00
e_logf.S Fix i386/x86_64 log* (1) zero sign for -ffinite-math-only (bug 19213). 2015-11-05 21:56:31 +00:00
e_logl.S Fix i386/x86_64 logl (sNaN) (bug 20227). 2016-06-08 22:24:06 +00:00
e_pow.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_powf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_powl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
e_rem_pio2.c
e_remainder.S Optimize libm 2011-10-12 11:27:51 -04:00
e_remainderf.S Optimize libm 2011-10-12 11:27:51 -04:00
e_remainderl.S Optimize libm 2011-10-12 11:27:51 -04:00
e_scalb.S Avoid excess range in results from i386 scalb functions (bug 18981). 2015-09-18 20:34:59 +00:00
e_scalbf.S Avoid excess range in results from i386 scalb functions (bug 18981). 2015-09-18 20:34:59 +00:00
e_scalbl.S Fix i386/x86_64 scalbl with sNaN input (bug 20296). 2016-06-23 22:17:41 +00:00
e_sqrt.S Fix x86 sqrt rounding (bug 14032). 2013-11-29 16:31:16 +00:00
e_sqrtf.S Optimize libm 2011-10-12 11:27:51 -04:00
e_sqrtl.c Optimize libm 2011-10-12 11:27:51 -04:00
fclrexcpt.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
fedisblxcpt.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
feenablxcpt.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
fegetenv.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
fegetexcept.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
fegetmode.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
fegetround.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
feholdexcpt.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
fenv_private.h Add float128 support for x86_64, x86. 2017-06-26 22:02:24 +00:00
fesetenv.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
fesetexcept.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
fesetmode.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
fesetround.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
feupdateenv.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
fgetexcptflg.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
fraiseexcpt.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
fsetexcptflg.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
ftestexcept.c Check if SSE is available with HAS_CPU_FEATURE 2017-04-07 07:44:59 -07:00
halfulp.c
i386-math-asm.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
libm-test-ulps Obsolete pow10 functions. 2017-09-01 21:13:18 +00:00
libm-test-ulps-name Do not hardcode platform names in manual/libm-err-tab.pl (bug 14139). 2016-11-04 16:49:06 +00:00
math-tests.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
math_errf.c Optimized generic expf and exp2f with wrappers 2017-09-25 10:44:39 +01:00
math_private.h Fix math_private.h multiple include guards. 2015-11-20 23:46:23 +00:00
mpatan.c
mpatan2.c
mpexp.c
mplog.c
mpsqrt.c
s_asinh.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_asinhf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_asinhl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_atan.S Refactor i386 libm code forcing underflow exceptions. 2015-09-24 21:41:00 +00:00
s_atanf.S Refactor i386 libm code forcing underflow exceptions. 2015-09-24 21:41:00 +00:00
s_atanl.c
s_cbrt.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_cbrtf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_cbrtl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_ceil.S Avoid "inexact" exceptions in i386/x86_64 ceil functions (bug 15479). 2016-06-27 17:24:30 +00:00
s_ceilf.S Avoid "inexact" exceptions in i386/x86_64 ceil functions (bug 15479). 2016-06-27 17:24:30 +00:00
s_ceill.S Avoid "inexact" exceptions in i386/x86_64 ceil functions (bug 15479). 2016-06-27 17:24:30 +00:00
s_copysign.S
s_copysignf.S
s_copysignl.S
s_expm1.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_expm1f.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_expm1l.S Fix x86/x86_64 expm1l inaccuracy and exceptions (bugs 13885, 13923). 2012-05-07 19:13:08 +00:00
s_fabs.S
s_fabsf.S
s_fabsl.S
s_fdim.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_finite.S
s_finitef.S
s_finitel.S
s_floor.S Avoid "inexact" exceptions in i386/x86_64 floor functions (bug 15479). 2016-06-27 17:25:47 +00:00
s_floorf.S Avoid "inexact" exceptions in i386/x86_64 floor functions (bug 15479). 2016-06-27 17:25:47 +00:00
s_floorl.S Avoid "inexact" exceptions in i386/x86_64 floor functions (bug 15479). 2016-06-27 17:25:47 +00:00
s_fmax.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_fmaxf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_fmaxl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_fmin.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_fminf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_fminl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_fpclassifyl.c Consistently use uintN_t not u_intN_t in libm. 2017-08-03 19:55:04 +00:00
s_frexp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_frexpf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_frexpl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_isinfl.c Use <> for math.h and math_private.h everywhere. 2012-03-09 16:09:10 -08:00
s_isnanl.c Consistently use uintN_t not u_intN_t in libm. 2017-08-03 19:55:04 +00:00
s_llrint.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_llrintf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_llrintl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_log1p.S Refactor i386 libm code forcing underflow exceptions. 2015-09-24 21:41:00 +00:00
s_log1pf.S Refactor i386 libm code forcing underflow exceptions. 2015-09-24 21:41:00 +00:00
s_log1pl.S Fix i386/x86_64 log1pl (sNaN) (bug 20229). 2016-06-08 23:11:42 +00:00
s_logb.S
s_logbf.S
s_logbl.c
s_lrint.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_lrintf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_lrintl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_nearbyint.S Simplify x86 nearbyint functions. 2016-06-22 15:40:30 +00:00
s_nearbyintf.S Simplify x86 nearbyint functions. 2016-06-22 15:40:30 +00:00
s_nearbyintl.S Simplify x86 nearbyint functions. 2016-06-22 15:40:30 +00:00
s_nextafterl.c Consistently use uintN_t not u_intN_t in libm. 2017-08-03 19:55:04 +00:00
s_nexttoward.c Consistently use uintN_t not u_intN_t in libm. 2017-08-03 19:55:04 +00:00
s_nexttowardf.c Consistently use uintN_t not u_intN_t in libm. 2017-08-03 19:55:04 +00:00
s_remquo.S Remove remaining bounded-pointers support from i386 .S files. 2013-02-21 22:21:52 +00:00
s_remquof.S Remove remaining bounded-pointers support from i386 .S files. 2013-02-21 22:21:52 +00:00
s_remquol.S Remove remaining bounded-pointers support from i386 .S files. 2013-02-21 22:21:52 +00:00
s_rint.S
s_rintf.S
s_rintl.c
s_scalbln.c
s_scalblnf.c
s_scalblnl.c
s_scalbn.S Avoid excess range in results from i386 scalb functions (bug 18981). 2015-09-18 20:34:59 +00:00
s_scalbnf.S Avoid excess range in results from i386 scalb functions (bug 18981). 2015-09-18 20:34:59 +00:00
s_scalbnl.S Make scalbn set errno (bug 6803). 2015-09-16 21:11:00 +00:00
s_significand.S
s_significandf.S
s_significandl.c
s_trunc.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_truncf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
s_truncl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
slowexp.c
slowpow.c
t_exp.c
w_sqrt.c Prefer new libm function wrappers for !LIBM_SVID_COMPAT. 2017-09-05 23:35:55 +00:00
w_sqrt_compat.c Move wrappers to libm-compat-calls-auto 2017-01-04 16:25:04 -02:00