rtl-ssa: Reduce the amount of temporary memory needed [PR98863]

The rtl-ssa code uses an on-the-side IL and needs to build that IL
for each block and RTL insn.  I'd originally not used the classical
dominance frontier method for placing phis on the basis that it seemed
like more work in this context: we're having to visit everything in
an RPO walk anyway, so for non-backedge cases we can tell immediately
whether a phi node is needed.  We then speculatively created phis for
registers that are live across backedges and simplified them later.
This avoided having to walk most of the IL twice (once to build the
initial IL, and once to link uses to phis).

However, as shown in PR98863, this leads to excessive temporary
memory in extreme cases, since we had to record the value of
every live register on exit from every block.  In that PR,
there were many registers that were live (but unused) across
a large region of code.

This patch does use the classical approach to placing phis, but tries
to use the existing DF defs information to avoid two walks of the IL.
We still use the previous approach for memory, since there is no
up-front information to indicate whether a block defines memory or not.
However, since memory is just treated as a single unified thing
(like for gimple vops), memory doesn't suffer from the same
scalability problems as registers.

With this change, fwprop no longer seems to be a memory-hog outlier
in the PR: the maximum RSS is similar with and without fwprop.

The PR also shows the problems inherent in using bitmap operations
involving the live-in and live-out sets, which in the testcase are
very large.  I've therefore tried to reduce those operations to the
bare minimum.

The patch also includes other compile-time optimisations motivated
by the PR; see the changelog for details.

I tried adding:

    for (int i = 0; i < 200; ++i)
      {
	crtl->ssa = new rtl_ssa::function_info (cfun);
	delete crtl->ssa;
      }

to fwprop.c to stress the code.  fwprop then took 35% of the compile
time for the problematic partition in the PR (measured on a release
build).  fwprop takes less than .5% of the compile time when running
normally.

The command:

  git diff 0b76990a9d75d97b84014e37519086b81824c307~ gcc/fwprop.c | \
    patch -p1 -R

still gives a working compiler that uses the old fwprop.c.  The compile
time with that version is very similar.

For a more reasonable testcase like optabs.ii at -O, I saw a 6.7%
compile time regression with the loop above added (i.e. creating
the info 201 times per pass instead of once per pass).  That goes
down to 4.8% with -O -g.  I can't measure a significant difference
with a normal compiler (no 200-iteration loop).

So I think that (as expected) the patch does make things a bit
slower in the normal case.  But like Richi says, peak memory usage
is harder for users to work around than slighter slower compile times.

gcc/
	PR rtl-optimization/98863
	* rtl-ssa/functions.h (function_info::bb_live_out_info): Delete.
	(function_info::build_info): Turn into a declaration, moving the
	definition to internals.h.
	(function_info::bb_walker): Declare.
	(function_info::create_reg_use): Likewise.
	(function_info::calculate_potential_phi_regs): Take a build_info
	parameter.
	(function_info::place_phis, function_info::create_ebbs): Declare.
	(function_info::calculate_ebb_live_in_for_debug): Likewise.
	(function_info::populate_backedge_phis): Delete.
	(function_info::start_block, function_info::end_block): Declare.
	(function_info::populate_phi_inputs): Delete.
	(function_info::m_potential_phi_regs): Move information to build_info.
	* rtl-ssa/internals.h: New file.
	(function_info::bb_phi_info): New class.
	(function_info::build_info): Moved from functions.h.
	Add a constructor and destructor.
	(function_info::build_info::ebb_use): Delete.
	(function_info::build_info::ebb_def): Likewise.
	(function_info::build_info::bb_live_out): Likewise.
	(function_info::build_info::tmp_ebb_live_in_for_debug): New variable.
	(function_info::build_info::potential_phi_regs): Likewise.
	(function_info::build_info::potential_phi_regs_for_debug): Likewise.
	(function_info::build_info::ebb_def_regs): Likewise.
	(function_info::build_info::bb_phis): Likewise.
	(function_info::build_info::bb_mem_live_out): Likewise.
	(function_info::build_info::bb_to_rpo): Likewise.
	(function_info::build_info::def_stack): Likewise.
	(function_info::build_info::old_def_stack_limit): Likewise.
	* rtl-ssa/internals.inl (function_info::build_info::record_reg_def):
	Remove the regno argument.  Push the previous definition onto the
	definition stack where necessary.
	* rtl-ssa/accesses.cc: Include internals.h.
	* rtl-ssa/changes.cc: Likewise.
	* rtl-ssa/blocks.cc: Likewise.
	(function_info::build_info::build_info): Define.
	(function_info::build_info::~build_info): Likewise.
	(function_info::bb_walker): New class.
	(function_info::bb_walker::bb_walker): Define.
	(function_info::add_live_out_use): Convert a logarithmic-complexity
	test into a linear one.  Allow the same definition to be passed
	multiple times.
	(function_info::calculate_potential_phi_regs): Moved from
	functions.cc.  Take a build_info parameter and store the
	information there instead.
	(function_info::place_phis): New function.
	(function_info::add_entry_block_defs): Update call to record_reg_def.
	(function_info::calculate_ebb_live_in_for_debug): New function.
	(function_info::add_phi_nodes): Use bb_phis to decide which
	registers need phi nodes and initialize ebb_def_regs accordingly.
	Do not add degenerate phis here.
	(function_info::add_artificial_accesses): Use create_reg_use.
	Assert that all definitions are listed in the DF LR sets.
	Update call to record_reg_def.
	(function_info::record_block_live_out): Record live-out register
	values in the phis of successor blocks.  Use the live-out set
	when processing the last block in an EBB, instead of always
	using the live-in sets of successor blocks.  AND the live sets
	with the set of registers that have been defined in the EBB,
	rather than with all potential phi registers.  Cope correctly
	with branches back to the start of the current EBB.
	(function_info::start_block): New function.
	(function_info::end_block): Likewise.
	(function_info::populate_phi_inputs): Likewise.
	(function_info::create_ebbs): Likewise.
	(function_info::process_all_blocks): Rewrite into a multi-phase
	process.
	* rtl-ssa/functions.cc: Include internals.h.
	(function_info::calculate_potential_phi_regs): Move to blocks.cc.
	(function_info::init_function_data): Remove caller.
	* rtl-ssa/insns.cc: Include internals.h
	(function_info::create_reg_use): New function.  Lazily any
	degenerate phis needed by the linear RPO view.
	(function_info::record_use): Use create_reg_use.  When processing
	debug uses, use potential_phi_regs and test it before checking
	whether the register is live on entry to the current EBB.  Lazily
	calculate ebb_live_in_for_debug.
	(function_info::record_call_clobbers): Update call to record_reg_def.
	(function_info::record_def): Likewise.
This commit is contained in:
Richard Sandiford 2021-02-15 15:05:22 +00:00
parent 40f235b5f0
commit abe07a74bb
8 changed files with 735 additions and 474 deletions

View File

@ -26,6 +26,7 @@
#include "rtl.h"
#include "df.h"
#include "rtl-ssa.h"
#include "rtl-ssa/internals.h"
#include "rtl-ssa/internals.inl"
using namespace rtl_ssa;

File diff suppressed because it is too large Load Diff

View File

@ -26,6 +26,7 @@
#include "rtl.h"
#include "df.h"
#include "rtl-ssa.h"
#include "rtl-ssa/internals.h"
#include "rtl-ssa/internals.inl"
#include "target.h"
#include "predict.h"

View File

@ -26,6 +26,7 @@
#include "rtl.h"
#include "df.h"
#include "rtl-ssa.h"
#include "rtl-ssa/internals.h"
#include "rtl-ssa/internals.inl"
using namespace rtl_ssa;
@ -74,23 +75,6 @@ function_info::print (pretty_printer *pp) const
}
}
// Calculate m_potential_phi_regs.
void
function_info::calculate_potential_phi_regs ()
{
auto *lr_info = DF_LR_BB_INFO (ENTRY_BLOCK_PTR_FOR_FN (m_fn));
for (unsigned int regno = 0; regno < m_num_regs; ++regno)
if (regno >= DF_REG_SIZE (DF)
// Exclude registers that have a single definition that dominates
// all uses. If the definition does not dominate all uses,
// the register will be exposed upwards to the entry block but
// will not be defined by the entry block.
|| DF_REG_DEF_COUNT (regno) > 1
|| (!bitmap_bit_p (&lr_info->def, regno)
&& bitmap_bit_p (&lr_info->out, regno)))
bitmap_set_bit (m_potential_phi_regs, regno);
}
// Initialize all member variables in preparation for (re)building
// SSA form from scratch.
void
@ -107,8 +91,6 @@ function_info::init_function_data ()
m_last_insn = nullptr;
m_last_nondebug_insn = nullptr;
m_free_phis = nullptr;
calculate_potential_phi_regs ();
}
// The initial phase of the phi simplification process. The cumulative

View File

@ -176,81 +176,9 @@ public:
void print (pretty_printer *pp) const;
private:
// Information about the values that are live on exit from a basic block.
// This class is only used when constructing the SSA form, it isn't
// designed for being kept up-to-date.
class bb_live_out_info
{
public:
// REG_VALUES contains all the registers that live out from the block,
// in order of increasing register number. There are NUM_REG_VALUES
// in total. Registers do not appear here if their values are known
// to be completely undefined; in that sense, the information is
// closer to DF_LIVE than to DF_LR.
unsigned int num_reg_values;
set_info **reg_values;
// The memory value that is live on exit from the block.
set_info *mem_value;
};
// Information used while constructing the SSA form and discarded
// afterwards.
class build_info
{
public:
set_info *current_reg_value (unsigned int) const;
set_info *current_mem_value () const;
void record_reg_def (unsigned int, def_info *);
void record_mem_def (def_info *);
// The block that we're currently processing.
bb_info *current_bb;
// The EBB that contains CURRENT_BB.
ebb_info *current_ebb;
// Except for the local exception noted below:
//
// - If register R has been defined in the current EBB, LAST_ACCESS[R + 1]
// is the last definition of R in the EBB.
//
// - If register R is currently live but has not yet been defined
// in the EBB, LAST_ACCESS[R + 1] is the current value of R,
// or null if the register's value is completely undefined.
//
// - The contents are not meaningful for other registers.
//
// Similarly:
//
// - If the current EBB has defined memory, LAST_ACCESS[0] is the last
// definition of memory in the EBB.
//
// - Otherwise LAST_ACCESS[0] is the value of memory that is live on
// - entry to the EBB.
//
// The exception is that while building instructions, LAST_ACCESS[I]
// can temporarily be the use of regno I - 1 by that instruction.
access_info **last_access;
// A bitmap of registers that are live on entry to this EBB, with a tree
// view for quick lookup. Only used if MAY_HAVE_DEBUG_INSNS.
bitmap ebb_live_in_for_debug;
// A conservative superset of the registers that are used by
// instructions in CURRENT_EBB. That is, all used registers
// are in the set, but some unused registers might be too.
bitmap ebb_use;
// A similarly conservative superset of the registers that are defined
// by instructions in CURRENT_EBB.
bitmap ebb_def;
// BB_LIVE_OUT[BI] gives the live-out values for the basic block
// with index BI.
bb_live_out_info *bb_live_out;
};
class bb_phi_info;
class build_info;
class bb_walker;
// Return an RAII object that owns all objects allocated by
// allocate_temp during its lifetime.
@ -307,6 +235,7 @@ private:
void start_insn_accesses ();
void finish_insn_accesses (insn_info *);
use_info *create_reg_use (build_info &, insn_info *, resource_info);
void record_use (build_info &, insn_info *, rtx_obj_reference);
void record_call_clobbers (build_info &, insn_info *, rtx_call_insn *);
void record_def (build_info &, insn_info *, rtx_obj_reference);
@ -327,7 +256,6 @@ private:
bb_info *create_bb_info (basic_block);
void append_bb (bb_info *);
void calculate_potential_phi_regs ();
insn_info *add_placeholder_after (insn_info *);
void possibly_queue_changes (insn_change &);
@ -335,12 +263,18 @@ private:
void apply_changes_to_insn (insn_change &);
void init_function_data ();
void calculate_potential_phi_regs (build_info &);
void place_phis (build_info &);
void create_ebbs (build_info &);
void add_entry_block_defs (build_info &);
void calculate_ebb_live_in_for_debug (build_info &);
void add_phi_nodes (build_info &);
void add_artificial_accesses (build_info &, df_ref_flags);
void add_block_contents (build_info &);
void record_block_live_out (build_info &);
void populate_backedge_phis (build_info &);
void start_block (build_info &, bb_info *);
void end_block (build_info &, bb_info *);
void populate_phi_inputs (build_info &);
void process_all_blocks ();
void simplify_phi_setup (phi_info *, set_info **, bitmap);
@ -400,13 +334,6 @@ private:
auto_vec<access_info *> m_temp_defs;
auto_vec<access_info *> m_temp_uses;
// The set of registers that might need to have phis associated with them.
// Registers outside this set are known to have a single definition that
// dominates all uses.
//
// Before RA, about 5% of registers are typically in the set.
auto_bitmap m_potential_phi_regs;
// A list of phis that are no longer in use. Their uids are still unique
// and so can be recycled.
phi_info *m_free_phis;

View File

@ -26,6 +26,7 @@
#include "rtl.h"
#include "df.h"
#include "rtl-ssa.h"
#include "rtl-ssa/internals.h"
#include "rtl-ssa/internals.inl"
#include "predict.h"
#include "print-rtl.h"
@ -406,6 +407,33 @@ function_info::finish_insn_accesses (insn_info *insn)
insn->set_accesses (static_cast<access_info **> (addr), num_defs, num_uses);
}
// Called while building SSA form using BI. Create and return a use of
// register RESOURCE in INSN. Create a degenerate phi where necessary.
use_info *
function_info::create_reg_use (build_info &bi, insn_info *insn,
resource_info resource)
{
set_info *value = bi.current_reg_value (resource.regno);
if (value && value->ebb () != bi.current_ebb)
{
if (insn->is_debug_insn ())
value = look_through_degenerate_phi (value);
else if (bitmap_bit_p (bi.potential_phi_regs, resource.regno))
{
// VALUE is defined by a previous EBB and RESOURCE has multiple
// definitions. Create a degenerate phi in the current EBB
// so that all definitions and uses follow a linear RPO view;
// see rtl.texi for details.
access_info *inputs[] = { look_through_degenerate_phi (value) };
value = create_phi (bi.current_ebb, value->resource (), inputs, 1);
bi.record_reg_def (value);
}
}
auto *use = allocate<use_info> (insn, resource, value);
add_use (use);
return use;
}
// Called while building SSA form using BI. Record that INSN contains
// read reference REF. If this requires new entries to be added to
// INSN->uses (), add those entries to the list we're building in
@ -450,18 +478,16 @@ function_info::record_use (build_info &bi, insn_info *insn,
if (value->ebb () == bi.current_ebb)
return true;
// If the register is live on entry to the EBB but not used
// within it, VALUE is the correct live-in value.
if (bitmap_bit_p (bi.ebb_live_in_for_debug, regno))
// Check if VALUE is the function's only definition of REGNO.
// (We already know that it dominates the use.)
if (!bitmap_bit_p (bi.potential_phi_regs, regno))
return true;
// Check if VALUE is the function's only definition of REGNO
// and if it dominates the use.
if (regno != MEM_REGNO
&& regno < DF_REG_SIZE (DF)
&& DF_REG_DEF_COUNT (regno) == 1
&& dominated_by_p (CDI_DOMINATORS, insn->bb ()->cfg_bb (),
value->bb ()->cfg_bb ()))
// If the register is live on entry to the EBB but not used
// within it, VALUE is the correct live-in value.
if (!bi.ebb_live_in_for_debug)
calculate_ebb_live_in_for_debug (bi);
if (bitmap_bit_p (bi.ebb_live_in_for_debug, regno))
return true;
// Punt for other cases.
@ -470,8 +496,7 @@ function_info::record_use (build_info &bi, insn_info *insn,
if (insn->is_debug_insn () && !value_is_valid ())
value = nullptr;
use = allocate<use_info> (insn, resource_info { mode, regno }, value);
add_use (use);
use = create_reg_use (bi, insn, { mode, regno });
m_temp_uses.safe_push (use);
bi.last_access[ref.regno + 1] = use;
use->record_reference (ref, true);
@ -547,7 +572,7 @@ function_info::record_call_clobbers (build_info &bi, insn_info *insn,
def->m_is_call_clobber = true;
append_def (def);
m_temp_defs.safe_push (def);
bi.last_access[regno + 1] = def;
bi.record_reg_def (def);
}
}
}
@ -599,7 +624,7 @@ function_info::record_def (build_info &bi, insn_info *insn,
def->record_reference (ref, true);
append_def (def);
m_temp_defs.safe_push (def);
bi.last_access[ref.regno + 1] = def;
bi.record_reg_def (def);
}
// Called while building SSA form using BI. Add an insn_info for RTL

140
gcc/rtl-ssa/internals.h Normal file
View File

@ -0,0 +1,140 @@
// Definition of private classes for RTL SSA -*- C++ -*-
// Copyright (C) 2020-2021 Free Software Foundation, Inc.
//
// This file is part of GCC.
//
// GCC is free software; you can redistribute it and/or modify it under
// the terms of the GNU General Public License as published by the Free
// Software Foundation; either version 3, or (at your option) any later
// version.
//
// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
// WARRANTY; without even the implied warranty of MERCHANTABILITY or
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License
// along with GCC; see the file COPYING3. If not see
// <http://www.gnu.org/licenses/>.
namespace rtl_ssa {
// Information about a basic block's phi nodes. This class is only used when
// constructing the SSA form, it isn't meant to be kept up-to-date.
class function_info::bb_phi_info
{
public:
// The set of registers that need phi nodes.
bitmap_head regs;
// The number of registers in REGS.
unsigned int num_phis;
// The number of inputs to each phi node. Caching the information here
// is at best a minor optimisation, but it fills a 32-bit hole that would
// otherwise exist on 64-bit hosts.
unsigned int num_preds;
// An array of all the phi inputs for this block. It lists all inputs
// from the first incoming edge followed by all inputs for the next
// incoming edge, and so on. The inputs for a given edge are sorted
// by increasing register number.
set_info **inputs;
};
// Information used while constructing the SSA form and discarded
// afterwards.
class function_info::build_info
{
public:
build_info (unsigned int, unsigned int);
~build_info ();
set_info *current_reg_value (unsigned int) const;
set_info *current_mem_value () const;
void record_reg_def (def_info *);
void record_mem_def (def_info *);
// The block that we're currently processing.
bb_info *current_bb;
// The EBB that contains CURRENT_BB.
ebb_info *current_ebb;
// Except for the local exception noted below:
//
// - If register R has been defined in the current EBB, LAST_ACCESS[R + 1]
// is the last definition of R in the EBB.
//
// - Otherwise, if the current EBB is dominated by a definition of R,
// LAST_ACCESS[R + 1] is the nearest dominating definition.
//
// - Otherwise, LAST_ACCESS[R + 1] is null.
//
// Similarly:
//
// - If the current EBB has defined memory, LAST_ACCESS[0] is the last
// definition of memory in the EBB.
//
// - Otherwise LAST_ACCESS[0] is the value of memory that is live on
// - entry to the EBB.
//
// The exception is that while building instructions, LAST_ACCESS[I]
// can temporarily be the use of regno I - 1 by that instruction.
auto_vec<access_info *> last_access;
// A bitmap used to hold EBB_LIVE_IN_FOR_DEBUG.
auto_bitmap tmp_ebb_live_in_for_debug;
// If nonnull, a bitmap of registers that are live on entry to this EBB,
// with a tree view for quick lookup. This bitmap is calculated lazily
// and is only used if MAY_HAVE_DEBUG_INSNS.
bitmap ebb_live_in_for_debug;
// The set of registers that might need to have phis associated with them.
// Registers outside this set are known to have a single definition that
// dominates all uses.
//
// Before RA, about 5% of registers are typically in the set.
auto_sbitmap potential_phi_regs;
// A sparse bitmap representation of POTENTIAL_PHI_REGS. Only used if
// MAY_HAVE_DEBUG_INSNS.
auto_bitmap potential_phi_regs_for_debug;
// The set of registers that have been defined so far in the current EBB.
auto_bitmap ebb_def_regs;
// BB_PHIS[B] describes the phis for basic block B.
auto_vec<bb_phi_info> bb_phis;
// BB_MEM_LIVE_OUT[B] is the memory value that is live on exit from
// basic block B.
auto_vec<set_info *> bb_mem_live_out;
// BB_TO_RPO[B] gives the position of block B in a reverse postorder
// of the CFG. The RPO is a tweaked version of the one normally
// returned by pre_and_rev_post_order_compute, with all blocks in
// an EBB having consecutive positions.
auto_vec<int> bb_to_rpo;
// This stack is divided into sections, with one section for the
// current basic block and one section for each dominating block.
// Each element is a register definition.
//
// If the section for block B contains a definition D of a register R,
// then one of two things is true:
//
// - D occurs in B and no definition of R dominates B.
// - D dominates B and is the nearest dominating definition of R.
//
// The two cases are distinguished by the value of D->bb ().
auto_vec<def_info *> def_stack;
// The top of this stack records the start of the current block's
// section in DEF_STACK.
auto_vec<unsigned int> old_def_stack_limit;
};
}

View File

@ -574,10 +574,20 @@ inline ebb_info::ebb_info (bb_info *first_bb, bb_info *last_bb)
{
}
// Set the contents of last_access for register REGNO to DEF.
// Record register definition DEF in last_access, pushing a definition
// to def_stack where appropriate.
inline void
function_info::build_info::record_reg_def (unsigned int regno, def_info *def)
function_info::build_info::record_reg_def (def_info *def)
{
unsigned int regno = def->regno ();
auto *prev_dominating_def = safe_as_a<def_info *> (last_access[regno + 1]);
if (!prev_dominating_def)
// Indicate that DEF is the first dominating definition of REGNO.
def_stack.safe_push (def);
else if (prev_dominating_def->bb () != def->bb ())
// Record that PREV_DOMINATING_DEF was the dominating definition
// of REGNO on entry to the current block.
def_stack.safe_push (prev_dominating_def);
last_access[regno + 1] = def;
}