Try placing RTL folded constants in the constant pool.
My recent attempts to come up with a testcase for my patch to evaluate ss_plus in simplify-rtx.c, identified a missed optimization opportunity (that's potentially a long-time regression): The RTL optimizers no longer place constants in the constant pool. The motivating x86_64 example is the simple program: typedef char v8qi __attribute__ ((vector_size (8))); v8qi foo() { v8qi tx = { 1, 0, 0, 0, 0, 0, 0, 0 }; v8qi ty = { 2, 0, 0, 0, 0, 0, 0, 0 }; v8qi t = __builtin_ia32_paddsb(tx, ty); return t; } which (with my previous patch) currently results in: foo: movq .LC0(%rip), %xmm0 movq .LC1(%rip), %xmm1 paddsb %xmm1, %xmm0 ret even though the RTL contains the result in a REG_EQUAL note: (insn 7 6 12 2 (set (reg:V8QI 83) (ss_plus:V8QI (reg:V8QI 84) (reg:V8QI 85))) "ssaddqi3.c":7:12 1419 {*mmx_ssaddv8qi3} (expr_list:REG_DEAD (reg:V8QI 85) (expr_list:REG_DEAD (reg:V8QI 84) (expr_list:REG_EQUAL (const_vector:V8QI [ (const_int 3 [0x3]) (const_int 0 [0]) repeated x7 ]) (nil))))) Together with the patch below, GCC will now generate the much more sensible: foo: movq .LC2(%rip), %xmm0 ret My first approach was to look in cse.c (where the REG_EQUAL note gets added) and notice that the constant pool handling functionality has been unreachable for a while. A quick search for constant_pool_entries_cost shows that it's initialized to zero, but never set to a non-zero value, meaning that force_const_mem is never called. This functionality used to work way back in 2003, but has been lost over time: https://gcc.gnu.org/pipermail/gcc-patches/2003-October/116435.html The changes to cse.c below restore this functionality (placing suitable constants in the constant pool) with two significant refinements; (i) it only attempts to do this if the function already uses a constant pool (thanks to the availability of crtl->uses_constant_pool since 2003). (ii) it allows different constants (i.e. modes) to have different costs, so that floating point "doubles" and 64-bit, 128-bit, 256-bit and 512-bit vectors don't all have the share the same cost. Back in 2003, the assumption was that everything in a constant pool had the same cost, hence the global variable constant_pool_entries_cost. Although this is a useful CSE fix, it turns out that it doesn't cure my motivating problem above. CSE only considers a single instruction, so determines that it's cheaper to perform the ss_plus (COSTS_N_INSNS(1)) than read the result from the constant pool (COSTS_N_INSNS(2)). It's only when the other reads from the constant pool are also eliminated, that this transformation is a win. Hence a better place to perform this transformation is in combine, where after failing to "recog" the load of a suitable constant, it can retry after calling force_const_mem. This achieves the desired transformation and allows the backend insn_cost call-back to control whether or not using the constant pool is preferrable. Alas, it's rare to change code generation without affecting something in GCC's testsuite. On x86_64-pc-linux-gnu there were two families of new failures (and I'd predict similar benign fallout on other platforms). One failure was gcc.target/i386/387-12.c (aka PR target/26915), where the test is missing an explicit -m32 flag. On i686, it's very reasonable to materialize -1.0 using "fld1; fchs", but on x86_64-pc-linux-gnu we currently generate the awkward: testm1: fld1 fchs fstpl -8(%rsp) movsd -8(%rsp), %xmm0 ret which combine now very reasonably simplifies to just: testm1: movsd .LC3(%rip), %xmm0 ret The other class of x86_64-pc-linux-gnu failure was from materialization of vector constants using vpbroadcast (e.g. gcc.target/i386/pr90773-17.c) where the decision is finely balanced; the load of an integer register with an immediate constant, followed by a vpbroadcast is deemed to be COSTS_N_INSNS(2), whereas a load from the constant pool is also reported as COSTS_N_INSNS(2). My solution is to tweak the i386.c's rtx_costs so that all other things being equal, an instruction (sequence) that accesses memory is fractionally more expensive than one that doesn't. 2021-10-18 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * combine.c (recog_for_combine): For an unrecognized move/set of a constant, try force_const_mem to place it in the constant pool. * cse.c (constant_pool_entries_cost, constant_pool_entries_regcost): Delete global variables (that are no longer assigned a cost value). (cse_insn): Simplify logic for deciding whether to place a folded constant in the constant pool using force_const_mem. (cse_main): Remove zero initialization of constant_pool_entries_cost and constant_pool_entries_regcost. * config/i386/i386.c (ix86_rtx_costs): Make memory accesses fractionally more expensive, when optimizing for speed. gcc/testsuite/ChangeLog * gcc.target/i386/387-12.c: Add explicit -m32 option.
This commit is contained in:
parent
815f15d338
commit
247c407c83
|
@ -11567,7 +11567,27 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
|
|||
bool changed = false;
|
||||
|
||||
if (GET_CODE (pat) == SET)
|
||||
changed = change_zero_ext (pat);
|
||||
{
|
||||
/* For an unrecognized single set of a constant, try placing it in
|
||||
the constant pool, if this function already uses one. */
|
||||
rtx src = SET_SRC (pat);
|
||||
if (CONSTANT_P (src)
|
||||
&& !CONST_INT_P (src)
|
||||
&& crtl->uses_const_pool)
|
||||
{
|
||||
machine_mode mode = GET_MODE (src);
|
||||
if (mode == VOIDmode)
|
||||
mode = GET_MODE (SET_DEST (pat));
|
||||
src = force_const_mem (mode, src);
|
||||
if (src)
|
||||
{
|
||||
SUBST (SET_SRC (pat), src);
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
else
|
||||
changed = change_zero_ext (pat);
|
||||
}
|
||||
else if (GET_CODE (pat) == PARALLEL)
|
||||
{
|
||||
int i;
|
||||
|
|
|
@ -20801,6 +20801,13 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
|
|||
*total = cost->sse_op;
|
||||
return true;
|
||||
|
||||
case MEM:
|
||||
/* An insn that accesses memory is slightly more expensive
|
||||
than one that does not. */
|
||||
if (speed)
|
||||
*total += 1;
|
||||
return false;
|
||||
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
|
|
48
gcc/cse.c
48
gcc/cse.c
|
@ -491,14 +491,6 @@ static struct table_elt *table[HASH_SIZE];
|
|||
|
||||
static struct table_elt *free_element_chain;
|
||||
|
||||
/* Set to the cost of a constant pool reference if one was found for a
|
||||
symbolic constant. If this was found, it means we should try to
|
||||
convert constants into constant pool entries if they don't fit in
|
||||
the insn. */
|
||||
|
||||
static int constant_pool_entries_cost;
|
||||
static int constant_pool_entries_regcost;
|
||||
|
||||
/* Trace a patch through the CFG. */
|
||||
|
||||
struct branch_path
|
||||
|
@ -4609,9 +4601,6 @@ cse_insn (rtx_insn *insn)
|
|||
int src_folded_regcost = MAX_COST;
|
||||
int src_related_regcost = MAX_COST;
|
||||
int src_elt_regcost = MAX_COST;
|
||||
/* Set nonzero if we need to call force_const_mem on with the
|
||||
contents of src_folded before using it. */
|
||||
int src_folded_force_flag = 0;
|
||||
scalar_int_mode int_mode;
|
||||
|
||||
dest = SET_DEST (sets[i].rtl);
|
||||
|
@ -5166,15 +5155,7 @@ cse_insn (rtx_insn *insn)
|
|||
src_related_cost, src_related_regcost) <= 0
|
||||
&& preferable (src_folded_cost, src_folded_regcost,
|
||||
src_elt_cost, src_elt_regcost) <= 0)
|
||||
{
|
||||
trial = src_folded, src_folded_cost = MAX_COST;
|
||||
if (src_folded_force_flag)
|
||||
{
|
||||
rtx forced = force_const_mem (mode, trial);
|
||||
if (forced)
|
||||
trial = forced;
|
||||
}
|
||||
}
|
||||
trial = src_folded, src_folded_cost = MAX_COST;
|
||||
else if (src
|
||||
&& preferable (src_cost, src_regcost,
|
||||
src_eqv_cost, src_eqv_regcost) <= 0
|
||||
|
@ -5361,23 +5342,24 @@ cse_insn (rtx_insn *insn)
|
|||
break;
|
||||
}
|
||||
|
||||
/* If we previously found constant pool entries for
|
||||
constants and this is a constant, try making a
|
||||
pool entry. Put it in src_folded unless we already have done
|
||||
this since that is where it likely came from. */
|
||||
/* If the current function uses a constant pool and this is a
|
||||
constant, try making a pool entry. Put it in src_folded
|
||||
unless we already have done this since that is where it
|
||||
likely came from. */
|
||||
|
||||
else if (constant_pool_entries_cost
|
||||
else if (crtl->uses_const_pool
|
||||
&& CONSTANT_P (trial)
|
||||
&& (src_folded == 0
|
||||
|| (!MEM_P (src_folded)
|
||||
&& ! src_folded_force_flag))
|
||||
&& !CONST_INT_P (trial)
|
||||
&& (src_folded == 0 || !MEM_P (src_folded))
|
||||
&& GET_MODE_CLASS (mode) != MODE_CC
|
||||
&& mode != VOIDmode)
|
||||
{
|
||||
src_folded_force_flag = 1;
|
||||
src_folded = trial;
|
||||
src_folded_cost = constant_pool_entries_cost;
|
||||
src_folded_regcost = constant_pool_entries_regcost;
|
||||
src_folded = force_const_mem (mode, trial);
|
||||
if (src_folded)
|
||||
{
|
||||
src_folded_cost = COST (src_folded, mode);
|
||||
src_folded_regcost = approx_reg_cost (src_folded);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -6630,8 +6612,6 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs)
|
|||
cse_cfg_altered = false;
|
||||
cse_jumps_altered = false;
|
||||
recorded_label_ref = false;
|
||||
constant_pool_entries_cost = 0;
|
||||
constant_pool_entries_regcost = 0;
|
||||
ebb_data.path_size = 0;
|
||||
ebb_data.nsets = 0;
|
||||
rtl_hooks = cse_rtl_hooks;
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
/* PR target/26915 */
|
||||
/* { dg-do compile } */
|
||||
/* { dg-options "-O -mfpmath=387 -mfancy-math-387" } */
|
||||
/* { dg-options "-O -m32 -mfpmath=387 -mfancy-math-387" } */
|
||||
|
||||
double testm0(void)
|
||||
{
|
||||
|
|
Loading…
Reference in New Issue