From 70099a6acf5169eca55ef74549fb64de14e668f0 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Mon, 15 Feb 2021 09:16:06 +0100 Subject: [PATCH] match.pd: Fix up A % (cast) (pow2cst << B) simplification [PR99079] The (mod @0 (convert?@3 (power_of_two_cand@1 @2))) simplification uses tree_nop_conversion_p (type, TREE_TYPE (@3)) condition, but I believe it doesn't check what it was meant to check. On convert?@3 TREE_TYPE (@3) is not the type of what it has been converted from, but what it has been converted to, which needs to be (because it is operand of normal binary operation) equal or compatible to type of the modulo result and first operand - type. I could fix that by using && tree_nop_conversion_p (type, TREE_TYPE (@1)) and be done with it, but actually most of the non-nop conversions are IMHO ok and so we would regress those optimizations. In particular, if we have say narrowing conversions (foo5 and foo6 in the new testcase), I think we are fine, either the shift of the power of two constant after narrowing conversion is still that power of two (or negation of that) and then it will still work, or the result of narrowing conversion is 0 and then we would have UB which we can ignore. Similarly, widening conversions where the shift result is unsigned are fine, or even widening conversions where the shift result is signed, but we sign extend to a signed wider divisor, the problematic case of INT_MIN will become x % (long long) INT_MIN and we can still optimize that to x & (long long) INT_MAX. What doesn't work is the case in the pr99079.c testcase, widening conversion of a signed shift result to wider unsigned divisor, where if the shift is negative, we end up with x % (unsigned long long) INT_MIN which is x % 0xffffffff80000000ULL where the divisor is not a power of two and we can't optimize that to x & 0x7fffffffULL. So, the patch rejects only the single problematic case. Furthermore, when the shift result is signed, we were introducing UB into a program which previously didn't have one (well, left shift into the sign bit is UB in some language/version pairs, but it is definitely valid in C++20 - wonder if I shouldn't move the gcc.c-torture/execute/pr99079.c testcase to g++.dg/torture/pr99079.C and use -std=c++20), by adding that subtraction of 1, x % (1 << 31) in C++20 is well defined, but x & ((1 << 31) - 1) triggers UB on the subtraction. So, the patch performs the subtraction in the unsigned type if it isn't wrapping. 2021-02-15 Jakub Jelinek PR tree-optimization/99079 * match.pd (A % (pow2pcst << N) -> A & ((pow2pcst << N) - 1)): Remove useless tree_nop_conversion_p (type, TREE_TYPE (@3)) check. Instead require both type and TREE_TYPE (@1) to be integral types and either type having smaller or equal precision, or TREE_TYPE (@1) being unsigned type, or type being signed type. If TREE_TYPE (@1) doesn't have wrapping overflow, perform the subtraction of one in unsigned type. * gcc.dg/fold-modpow2-2.c: New test. * gcc.c-torture/execute/pr99079.c: New test. --- gcc/match.pd | 23 ++++++--- gcc/testsuite/gcc.c-torture/execute/pr99079.c | 18 +++++++ gcc/testsuite/gcc.dg/fold-modpow2-2.c | 47 +++++++++++++++++++ 3 files changed, 82 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr99079.c create mode 100644 gcc/testsuite/gcc.dg/fold-modpow2-2.c diff --git a/gcc/match.pd b/gcc/match.pd index 17c35ee470e..e14f69744d7 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -619,12 +619,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2), 1); })))))) (simplify - (mod @0 (convert?@3 (power_of_two_cand@1 @2))) - (if ((TYPE_UNSIGNED (type) - || tree_expr_nonnegative_p (@0)) - && tree_nop_conversion_p (type, TREE_TYPE (@3)) - && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0) - (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1); })))))) + (mod @0 (convert? (power_of_two_cand@1 @2))) + (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0)) + /* Allow any integral conversions of the divisor, except + conversion from narrower signed to wider unsigned type + where if @1 would be negative power of two, the divisor + would not be a power of two. */ + && INTEGRAL_TYPE_P (type) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@1)) + || TYPE_UNSIGNED (TREE_TYPE (@1)) + || !TYPE_UNSIGNED (type)) + && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0) + (with { tree utype = TREE_TYPE (@1); + if (!TYPE_OVERFLOW_WRAPS (utype)) + utype = unsigned_type_for (utype); } + (bit_and @0 (convert (minus (convert:utype @1) + { build_one_cst (utype); }))))))) /* Simplify (unsigned t * 2)/2 -> unsigned t & 0x7FFFFFFF. */ (simplify diff --git a/gcc/testsuite/gcc.c-torture/execute/pr99079.c b/gcc/testsuite/gcc.c-torture/execute/pr99079.c new file mode 100644 index 00000000000..78bb2186650 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr99079.c @@ -0,0 +1,18 @@ +/* PR tree-optimization/99079 */ + +__attribute__((noipa)) unsigned long long +foo (int x) +{ + unsigned long long s = 1 << x; + return 4897637220ULL % s; +} + +int +main () +{ + if (__SIZEOF_INT__ * __CHAR_BIT__ != 32) + return 0; + if (foo (31) != 4897637220ULL) + __builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/fold-modpow2-2.c b/gcc/testsuite/gcc.dg/fold-modpow2-2.c new file mode 100644 index 00000000000..803d5275476 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-modpow2-2.c @@ -0,0 +1,47 @@ +/* PR tree-optimization/99079 */ +/* { dg-do compile { target { lp64 || ilp32 } } } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +unsigned int +foo1 (unsigned int a, unsigned int b) +{ + return a % (1 << b); +} + +int +foo2 (int b) +{ + return 371 % (1U << b); +} + +long long +foo3 (int b) +{ + return 371LL % (1U << b); +} + +unsigned long long +foo4 (unsigned long long a, int b) +{ + return a % (1U << b); +} + +unsigned +foo5 (unsigned a, int b) +{ + return a % (unsigned) (1ULL << b); +} + +int +foo6 (int b) +{ + return 371 % (int) (1ULL << b); +} + +long long +foo7 (int b) +{ + return 371LL % (1 << b); +} + +/* { dg-final { scan-tree-dump-not " % " "optimized" } } */