aarch64: Add --params to control the number of recip steps [PR94154]

-mlow-precision-div hard-coded the number of iterations to 2 for double
and 1 for float.  This patch adds a --param to control the number.

2020-03-13  Bu Le  <bule1@huawei.com>

gcc/
	PR target/94154
	* config/aarch64/aarch64.opt (-param=aarch64-float-recp-precision=)
	(-param=aarch64-double-recp-precision=): New options.
	* doc/invoke.texi: Document them.
	* config/aarch64/aarch64.c (aarch64_emit_approx_div): Use them
	instead of hard-coding the choice of 1 for float and 2 for double.
This commit is contained in:
Bu Le 2020-03-12 22:39:12 +00:00 committed by Richard Sandiford
parent 3e6ab5cefa
commit dbf3dc7588
4 changed files with 34 additions and 3 deletions

View File

@ -1,3 +1,12 @@
2020-03-13 Bu Le <bule1@huawei.com>
PR target/94154
* config/aarch64/aarch64.opt (-param=aarch64-float-recp-precision=)
(-param=aarch64-double-recp-precision=): New options.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_emit_approx_div): Use them
instead of hard-coding the choice of 1 for float and 2 for double.
2019-03-13 Eric Botcazou <ebotcazou@adacore.com>
PR rtl-optimization/94119

View File

@ -12911,10 +12911,12 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
/* Iterate over the series twice for SF and thrice for DF. */
int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
/* Optionally iterate over the series once less for faster performance,
while sacrificing the accuracy. */
/* Optionally iterate over the series less for faster performance,
while sacrificing the accuracy. The default is 2 for DF and 1 for SF. */
if (flag_mlow_precision_div)
iterations--;
iterations = (GET_MODE_INNER (mode) == DFmode
? aarch64_double_recp_precision
: aarch64_float_recp_precision);
/* Iterate over the series to calculate the approximate reciprocal. */
rtx xtmp = gen_reg_rtx (mode);

View File

@ -262,3 +262,12 @@ Generate local calls to out-of-line atomic operations.
-param=aarch64-sve-compare-costs=
Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) IntegerRange(0, 1) Param
When vectorizing for SVE, consider using unpacked vectors for smaller elements and use the cost model to pick the cheapest approach. Also use the cost model to choose between SVE and Advanced SIMD vectorization.
-param=aarch64-float-recp-precision=
Target Joined UInteger Var(aarch64_float_recp_precision) Init(1) IntegerRange(1, 5) Param
The number of Newton iterations for calculating the reciprocal for float type. The precision of division is proportional to this param when division approximation is enabled. The default value is 1.
-param=aarch64-double-recp-precision=
Target Joined UInteger Var(aarch64_double_recp_precision) Init(2) IntegerRange(1, 5) Param
The number of Newton iterations for calculating the reciprocal for double type. The precision of division is proportional to this param when division approximation is enabled. The default value is 2.

View File

@ -13179,6 +13179,17 @@ Also use the cost model to choose between SVE and Advanced SIMD vectorization.
Using unpacked vectors includes storing smaller elements in larger
containers and accessing elements with extending loads and truncating
stores.
@item aarch64-float-recp-precision
The number of Newton iterations for calculating the reciprocal for float type.
The precision of division is proportional to this param when division
approximation is enabled. The default value is 1.
@item aarch64-double-recp-precision
The number of Newton iterations for calculating the reciprocal for double type.
The precision of division is propotional to this param when division
approximation is enabled. The default value is 2.
@end table
@end table