Based on new expf and exp2f code from
https://github.com/ARM-software/optimized-routines/
with wrapper on aarch64:
expf reciprocal-throughput: 2.3x faster
expf latency: 1.7x faster
without wrapper on aarch64:
expf reciprocal-throughput: 3.3x faster
expf latency: 1.7x faster
without wrapper on aarch64:
exp2f reciprocal-throughput: 2.8x faster
exp2f latency: 1.3x faster
libm.so size on aarch64:
.text size: -152 bytes
.rodata size: -1740 bytes
expf/exp2f worst case nearest rounding error: 0.502 ulp
worst case non-nearest rounding error: 1 ulp
Error checks are inline and errno setting is in separate tail called
functions, but the wrappers are kept in this patch to handle the
_LIB_VERSION==_SVID_ case. (So e.g. errno is set twice for expf calls
and once for __expf_finite calls on targets where the new code is used.)
Double precision arithmetics is used which is expected to be faster on
most targets (including soft-float) than using single precision and it
is easier to get good precision result with it.
Const data is kept in a separate translation unit which complicates
maintenance a bit, but is expected to give good code for literal loads
on most targets and allows sharing data across expf, exp2f and powf.
(This data is disabled on i386, m68k and ia64 which have their own
expf, exp2f and powf code.)
Some details may need target specific tweaks:
- best convert and round to int operation in the arg reduction may be
different across targets.
- code was optimized on fma target, optimal polynomial eval may be
different without fma.
- gcc does not always generate good code for fp bit representation
access via unions or it may be inherently slow on some targets.
The libm-test-ulps will need adjustment because..
- The argument reduction ideally uses nearest rounded rint, but that is
not efficient on most targets, so the polynomial can get evaluated on a
wider interval in non-nearest rounding mode making 1 ulp errors common
in that case.
- The polynomial is evaluated such that it may have 1 ulp error on
negative tiny inputs with upward rounding.
* math/Makefile (type-float-routines): Add math_errf and e_exp2f_data.
* sysdeps/aarch64/fpu/math_private.h (TOINT_INTRINSICS): Define.
(roundtoint, converttoint): Likewise.
* sysdeps/ieee754/flt-32/e_expf.c: New implementation.
* sysdeps/ieee754/flt-32/e_exp2f.c: New implementation.
* sysdeps/ieee754/flt-32/e_exp2f_data.c: New file.
* sysdeps/ieee754/flt-32/math_config.h: New file.
* sysdeps/ieee754/flt-32/math_errf.c: New file.
* sysdeps/ieee754/flt-32/t_exp2f.h: Remove.
* sysdeps/i386/fpu/e_exp2f_data.c: New file.
* sysdeps/i386/fpu/math_errf.c: New file.
* sysdeps/ia64/fpu/e_exp2f_data.c: New file.
* sysdeps/ia64/fpu/math_errf.c: New file.
* sysdeps/m68k/m680x0/fpu/e_exp2f_data.c: New file.
* sysdeps/m68k/m680x0/fpu/math_errf.c: New file.
This directory contains the sources of the GNU C Library.
See the file "version.h" for what release version you have.
The GNU C Library is the standard system C library for all GNU systems,
and is an important part of what makes up a GNU system. It provides the
system API for all programs written in C and C-compatible languages such
as C++ and Objective C; the runtime facilities of other programming
languages use the C library to access the underlying operating system.
In GNU/Linux systems, the C library works with the Linux kernel to
implement the operating system behavior seen by user applications.
In GNU/Hurd systems, it works with a microkernel and Hurd servers.
The GNU C Library implements much of the POSIX.1 functionality in the
GNU/Hurd system, using configurations i[4567]86-*-gnu. The current
GNU/Hurd support requires out-of-tree patches that will eventually be
incorporated into an official GNU C Library release.
When working with Linux kernels, this version of the GNU C Library
requires Linux kernel version 3.2 or later.
Also note that the shared version of the libgcc_s library must be
installed for the pthread library to work correctly.
The GNU C Library supports these configurations for using Linux kernels:
aarch64*-*-linux-gnu
alpha*-*-linux-gnu
arm-*-linux-gnueabi
hppa-*-linux-gnu Not currently functional without patches.
i[4567]86-*-linux-gnu
x86_64-*-linux-gnu Can build either x86_64 or x32
ia64-*-linux-gnu
m68k-*-linux-gnu
microblaze*-*-linux-gnu
mips-*-linux-gnu
mips64-*-linux-gnu
powerpc-*-linux-gnu Hardware or software floating point, BE only.
powerpc64*-*-linux-gnu Big-endian and little-endian.
s390-*-linux-gnu
s390x-*-linux-gnu
sh[34]-*-linux-gnu
sparc*-*-linux-gnu
sparc64*-*-linux-gnu
tilegx-*-linux-gnu
tilepro-*-linux-gnu
If you are interested in doing a port, please contact the glibc
maintainers; see http://www.gnu.org/software/libc/ for more
information.
See the file INSTALL to find out how to configure, build, and install
the GNU C Library. You might also consider reading the WWW pages for
the C library at http://www.gnu.org/software/libc/.
The GNU C Library is (almost) completely documented by the Texinfo manual
found in the `manual/' subdirectory. The manual is still being updated
and contains some known errors and omissions; we regret that we do not
have the resources to work on the manual as much as we would like. For
corrections to the manual, please file a bug in the `manual' component,
following the bug-reporting instructions below. Please be sure to check
the manual in the current development sources to see if your problem has
already been corrected.
Please see http://www.gnu.org/software/libc/bugs.html for bug reporting
information. We are now using the Bugzilla system to track all bug reports.
This web page gives detailed information on how to report bugs properly.
The GNU C Library is free software. See the file COPYING.LIB for copying
conditions, and LICENSES for notices about a few contributions that require
these additional notices to be distributed. License copyright years may be
listed using range notation, e.g., 1996-2015, indicating that every year in
the range, inclusive, is a copyrightable year that would otherwise be listed
individually.