Use VSQRT instruction for ARM sqrt (bug 20660).

This patch makes ARM sqrt and sqrtf use the VSQRT VFP square root
instruction when available, instead of much larger generic code for
computing square roots.

Now, GCC will normally inline sqrt calls except for negative arguments
where errno needs to be set, and because the benchtests fail to use
-fno-builtin that means no significant difference in benchmark results
for sqrt (note, however, there are lots of __ieee754_sqrt calls
internally in libm, which are *not* inlined - although some
architectures define __ieee754_sqrt in their math_private.h for that
purpose, ARM doesn't - so improving out-of-line sqrt performance is
still relevant to those other functions, if not for most ordinary
direct users of sqrt).  With the benchtests changed to use
-fno-builtin for sqrt tests, typical performance results before the
change are ("max" is wildly varying in any case):

    "duration": 9.88358e+09,
    "iterations": 4.8783e+07,
    "max": 457.764,
    "min": 183.105,
    "mean": 202.603

and after it are:

    "duration": 9.45663e+09,
    "iterations": 2.24385e+08,
    "max": 274.659,
    "min": 30.517,
    "mean": 42.1447

Tested for ARM (hard-float and soft-float).

	[BZ #20660]
	* sysdeps/arm/e_sqrt.c: New file.
	* sysdeps/arm/e_sqrtf.c: Likewise.
This commit is contained in:
Joseph Myers 2016-10-20 23:24:44 +00:00
parent 05f3ed0a79
commit 0f04fc07f6
3 changed files with 96 additions and 0 deletions

View File

@ -1,3 +1,9 @@
2016-10-20 Joseph Myers <joseph@codesourcery.com>
[BZ #20660]
* sysdeps/arm/e_sqrt.c: New file.
* sysdeps/arm/e_sqrtf.c: Likewise.
2016-10-19 Joseph Myers <joseph@codesourcery.com>
[BZ #20718]

45
sysdeps/arm/e_sqrt.c Normal file
View File

@ -0,0 +1,45 @@
/* Compute square root for double. ARM version.
Copyright (C) 2016 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
#ifdef __SOFTFP__
/* Use architecture-indendent sqrt implementation. */
# include <sysdeps/ieee754/dbl-64/e_sqrt.c>
#else
/* Use VFP square root instruction. */
# include <math.h>
# include <sysdep.h>
double
__ieee754_sqrt (double x)
{
double ret;
# if __ARM_ARCH >= 6
asm ("vsqrt.f64 %P0, %P1" : "=w" (ret) : "w" (x));
# else
/* As in GCC, for VFP9 Erratum 760019 avoid overwriting the
input. */
asm ("vsqrt.f64 %P0, %P1" : "=&w" (ret) : "w" (x));
# endif
return ret;
}
strong_alias (__ieee754_sqrt, __sqrt_finite)
#endif

45
sysdeps/arm/e_sqrtf.c Normal file
View File

@ -0,0 +1,45 @@
/* Compute square root for float. ARM version.
Copyright (C) 2016 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
#ifdef __SOFTFP__
/* Use architecture-indendent sqrtf implementation. */
# include <sysdeps/ieee754/flt-32/e_sqrtf.c>
#else
/* Use VFP square root instruction. */
# include <math.h>
# include <sysdep.h>
float
__ieee754_sqrtf (float x)
{
float ret;
# if __ARM_ARCH >= 6
asm ("vsqrt.f32 %0, %1" : "=t" (ret) : "t" (x));
# else
/* As in GCC, for VFP9 Erratum 760019 avoid overwriting the
input. */
asm ("vsqrt.f32 %0, %1" : "=&t" (ret) : "t" (x));
# endif
return ret;
}
strong_alias (__ieee754_sqrtf, __sqrtf_finite)
#endif