Kewen Lin
fbeead55e0
rs6000: Add load density heuristic
We noticed that SPEC2017 503.bwaves_r run time degrades by about 8% on P8 and P9 if we enabled vectorization at O2 fast-math (with cheap vect cost model). Comparing to Ofast, compiler doesn't do the loop interchange on the innermost loop, it's not profitable to vectorize it then. As Richi's comments [1], this follows the similar idea to over price the vector construction fed by VMAT_ELEMENTWISE or VMAT_STRIDED_SLP. Instead of adding the extra cost on vector construction costing immediately, it firstly records how many loads and vectorized statements in the given loop, later in rs6000_density_test (called by finish_cost) it computes the load density ratio against all vectorized statements, and check with the corresponding thresholds DENSITY_LOAD_NUM_THRESHOLD and DENSITY_LOAD_PCT_THRESHOLD, do the actual extra pricing if both thresholds are exceeded. Note that this new load density heuristic check is based on some fields in target cost which are updated as needed when scanning each add_stmt_cost entry, it's independent of the current function rs6000_density_test which requires to scan non_vect stmts. Since it's checking the load stmts count vs. all vectorized stmts, it's kind of density, so I put it in function rs6000_density_test. With the same reason to keep it independent, I didn't put it as an else arm of the current existing density threshold check hunk or before this hunk. In the investigation of -1.04% degradation from 526.blender_r on Power8, I noticed that the extra penalized cost 320 on one single vector construction for mode V16QI is much exaggerated, which makes the final body cost unreliable, so this patch adds one maximum bound for the extra penalized cost for each vector construction statement. Full SPEC2017 performance evaluation on Power8/Power9 with option combinations: * -O2 -ftree-vectorize {,-fvect-cost-model=very-cheap} {,-ffast-math} * {-O3, -Ofast} {,-funroll-loops} bwaves_r degradations on P8/P9 have been fixed, nothing else remarkable was observed. Power10 -Ofast -funroll-loops run shows it's neutral, while -O2 -ftree-vectorize run shows the bwaves_r degradation is fixed expectedly. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570076.html gcc/ChangeLog: * config/rs6000/rs6000.c (struct rs6000_cost_data): New members nstmts, nloads and extra_ctor_cost. (rs6000_density_test): Add load density related heuristics. Do extra costing on vector construction statements if need. (rs6000_init_cost): Init new members. (rs6000_update_target_cost_per_stmt): New function. (rs6000_add_stmt_cost): Factor vect_nonmem hunk out to function rs6000_update_target_cost_per_stmt and call it.
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
This directory contains the GNU Compiler Collection (GCC). The GNU Compiler Collection is free software. See the files whose names start with COPYING for copying permission. The manuals, and some of the runtime libraries, are under different terms; see the individual source files for details. The directory INSTALL contains copies of the installation information as HTML and plain text. The source of this information is gcc/doc/install.texi. The installation information includes details of what is included in the GCC sources and what files GCC installs. See the file gcc/doc/gcc.texi (together with other files that it includes) for usage and porting information. An online readable version of the manual is in the files gcc/doc/gcc.info*. See http://gcc.gnu.org/bugs/ for how to report bugs usefully. Copyright years on GCC source files may be listed using range notation, e.g., 1987-2012, indicating that every year in the range, inclusive, is a copyrightable year that could otherwise be listed individually.
Description
Languages
C
48%
Ada
18.3%
C++
14.1%
Go
7%
GCC Machine Description
4.6%
Other
7.7%