Auto merge of #61268 - michaelwoerister:stabilize-pgo, r=alexcrichton

Stabilize support for Profile-guided Optimization

This PR makes profile-guided optimization available via the `-C profile-generate` / `-C profile-use` pair of commandline flags and adds end-user documentation for the feature to the [rustc book](https://doc.rust-lang.org/rustc/). The PR thus ticks the last two remaining checkboxes of the [stabilization tracking issue](https://github.com/rust-lang/rust/issues/59913).

From the tracking issue:
> Profile-guided optimization (PGO) is a common optimization technique for ahead-of-time compilers. It works by collecting data about a program's typical execution (e.g. probability of branches taken, typical runtime values of variables, etc) and then uses this information during program optimization for things like inlining decisions, machine code layout, or indirect call promotion.

If you are curious about how this can be used, there is a rendered version of the documentation this PR adds available [here](
https://github.com/michaelwoerister/rust/blob/stabilize-pgo/src/doc/rustc/src/profile-guided-optimization.md).

r? @alexcrichton
cc @rust-lang/compiler
This commit is contained in:
bors 2019-07-02 20:00:29 +00:00
commit 0beb2ba16a
17 changed files with 189 additions and 36 deletions

View File

@ -13,5 +13,6 @@
- [Targets](targets/index.md)
- [Built-in Targets](targets/built-in.md)
- [Custom Targets](targets/custom.md)
- [Profile-guided Optimization](profile-guided-optimization.md)
- [Linker-plugin based LTO](linker-plugin-lto.md)
- [Contributing to `rustc`](contributing.md)

View File

@ -214,3 +214,20 @@ This option lets you control what happens when the code panics.
## incremental
This flag allows you to enable incremental compilation.
## profile-generate
This flag allows for creating instrumented binaries that will collect
profiling data for use with profile-guided optimization (PGO). The flag takes
an optional argument which is the path to a directory into which the
instrumented binary will emit the collected data. See the chapter on
[profile-guided optimization](profile-guided-optimization.html) for more
information.
## profile-use
This flag specifies the profiling data file to be used for profile-guided
optimization (PGO). The flag takes a mandatory argument which is the path
to a valid `.profdata` file. See the chapter on
[profile-guided optimization](profile-guided-optimization.html) for more
information.

View File

@ -0,0 +1,136 @@
# Profile Guided Optimization
`rustc` supports doing profile-guided optimization (PGO).
This chapter describes what PGO is, what it is good for, and how it can be used.
## What Is Profiled-Guided Optimization?
The basic concept of PGO is to collect data about the typical execution of
a program (e.g. which branches it is likely to take) and then use this data
to inform optimizations such as inlining, machine-code layout,
register allocation, etc.
There are different ways of collecting data about a program's execution.
One is to run the program inside a profiler (such as `perf`) and another
is to create an instrumented binary, that is, a binary that has data
collection built into it, and run that.
The latter usually provides more accurate data and it is also what is
supported by `rustc`.
## Usage
Generating a PGO-optimized program involves following a workflow with four steps:
1. Compile the program with instrumentation enabled
(e.g. `rustc -Cprofile-generate=/tmp/pgo-data main.rs`)
2. Run the instrumented program (e.g. `./main`) which generates a
`default_<id>.profraw` file
3. Convert the `.profraw` file into a `.profdata` file using
LLVM's `llvm-profdata` tool
4. Compile the program again, this time making use of the profiling data
(for example `rustc -Cprofile-use=merged.profdata main.rs`)
An instrumented program will create one or more `.profraw` files, one for each
instrumented binary. E.g. an instrumented executable that loads two instrumented
dynamic libraries at runtime will generate three `.profraw` files. Running an
instrumented binary multiple times, on the other hand, will re-use the
respective `.profraw` files, updating them in place.
These `.profraw` files have to be post-processed before they can be fed back
into the compiler. This is done by the `llvm-profdata` tool. This tool
is most easily installed via
```bash
rustup component add llvm-tools-preview
```
Note that installing the `llvm-tools-preview` component won't add
`llvm-profdata` to the `PATH`. Rather, the tool can be found in:
```bash
~/.rustup/toolchains/<toolchain>/lib/rustlib/<target-triple>/bin/
```
Alternatively, an `llvm-profdata` coming with a recent LLVM or Clang
version usually works too.
The `llvm-profdata` tool merges multiple `.profraw` files into a single
`.profdata` file that can then be fed back into the compiler via
`-Cprofile-use`:
```bash
# STEP 1: Compile the binary with instrumentation
rustc -Cprofile-generate=/tmp/pgo-data -O ./main.rs
# STEP 2: Run the binary a few times, maybe with common sets of args.
# Each run will create or update `.profraw` files in /tmp/pgo-data
./main mydata1.csv
./main mydata2.csv
./main mydata3.csv
# STEP 3: Merge and post-process all the `.profraw` files in /tmp/pgo-data
llvm-profdata merge -o ./merged.profdata /tmp/pgo-data
# STEP 4: Use the merged `.profdata` file during optimization. All `rustc`
# flags have to be the same.
rustc -Cprofile-use=./merged.profdata -O ./main.rs
```
### A Complete Cargo Workflow
Using this feature with Cargo works very similar to using it with `rustc`
directly. Again, we generate an instrumented binary, run it to produce data,
merge the data, and feed it back into the compiler. Some things of note:
- We use the `RUSTFLAGS` environment variable in order to pass the PGO compiler
flags to the compilation of all crates in the program.
- We pass the `--target` flag to Cargo, which prevents the `RUSTFLAGS`
arguments to be passed to Cargo build scripts. We don't want the build
scripts to generate a bunch of `.profraw` files.
- We pass `--release` to Cargo because that's where PGO makes the most sense.
In theory, PGO can also be done on debug builds but there is little reason
to do so.
- It is recommended to use *absolute paths* for the argument of
`-Cprofile-generate` and `-Cprofile-use`. Cargo can invoke `rustc` with
varying working directories, meaning that `rustc` will not be able to find
the supplied `.profdata` file. With absolute paths this is not an issue.
- It is good practice to make sure that there is no left-over profiling data
from previous compilation sessions. Just deleting the directory is a simple
way of doing so (see `STEP 0` below).
This is what the entire workflow looks like:
```bash
# STEP 0: Make sure there is no left-over profiling data from previous runs
rm -rf /tmp/pgo-data
# STEP 1: Build the instrumented binaries
RUSTFLAGS="-Cprofile-generate=/tmp/pgo-data" \
cargo build --release --target=x86_64-unknown-linux-gnu
# STEP 2: Run the instrumented binaries with some typical data
./target/x86_64-unknown-linux-gnu/release/myprogram mydata1.csv
./target/x86_64-unknown-linux-gnu/release/myprogram mydata2.csv
./target/x86_64-unknown-linux-gnu/release/myprogram mydata3.csv
# STEP 3: Merge the `.profraw` files into a `.profdata` file
llvm-profdata merge -o /tmp/pgo-data/merged.profdata /tmp/pgo-data
# STEP 4: Use the `.profdata` file for guiding optimizations
RUSTFLAGS="-Cprofile-use=/tmp/pgo-data/merged.profdata" \
cargo build --release --target=x86_64-unknown-linux-gnu
```
## Further Reading
`rustc`'s PGO support relies entirely on LLVM's implementation of the feature
and is equivalent to what Clang offers via the `-fprofile-generate` /
`-fprofile-use` flags. The [Profile Guided Optimization][clang-pgo] section
in Clang's documentation is therefore an interesting read for anyone who wants
to use PGO with Rust.
[clang-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

View File

@ -1207,7 +1207,11 @@ options! {CodegenOptions, CodegenSetter, basic_codegen_options,
linker_plugin_lto: LinkerPluginLto = (LinkerPluginLto::Disabled,
parse_linker_plugin_lto, [TRACKED],
"generate build artifacts that are compatible with linker-based LTO."),
profile_generate: SwitchWithOptPath = (SwitchWithOptPath::Disabled,
parse_switch_with_opt_path, [TRACKED],
"compile the program with profiling instrumentation"),
profile_use: Option<PathBuf> = (None, parse_opt_pathbuf, [TRACKED],
"use the given `.profdata` file for profile-guided optimization"),
}
options! {DebuggingOptions, DebuggingSetter, basic_debugging_options,
@ -1379,11 +1383,6 @@ options! {DebuggingOptions, DebuggingSetter, basic_debugging_options,
"extra arguments to prepend to the linker invocation (space separated)"),
profile: bool = (false, parse_bool, [TRACKED],
"insert profiling code"),
pgo_gen: SwitchWithOptPath = (SwitchWithOptPath::Disabled,
parse_switch_with_opt_path, [TRACKED],
"Generate PGO profile data, to a given file, or to the default location if it's empty."),
pgo_use: Option<PathBuf> = (None, parse_opt_pathbuf, [TRACKED],
"Use PGO profile data from the given profile file."),
disable_instrumentation_preinliner: bool = (false, parse_bool, [TRACKED],
"Disable the instrumentation pre-inliner, useful for profiling / PGO."),
relro_level: Option<RelroLevel> = (None, parse_relro_level, [TRACKED],
@ -2036,13 +2035,6 @@ pub fn build_session_options_and_crate_config(
}
}
if debugging_opts.pgo_gen.enabled() && debugging_opts.pgo_use.is_some() {
early_error(
error_format,
"options `-Z pgo-gen` and `-Z pgo-use` are exclusive",
);
}
let mut output_types = BTreeMap::new();
if !debugging_opts.parse_only {
for list in matches.opt_strs("emit") {
@ -2154,6 +2146,13 @@ pub fn build_session_options_and_crate_config(
);
}
if cg.profile_generate.enabled() && cg.profile_use.is_some() {
early_error(
error_format,
"options `-C profile-generate` and `-C profile-use` are exclusive",
);
}
let mut prints = Vec::<PrintRequest>::new();
if cg.target_cpu.as_ref().map_or(false, |s| s == "help") {
prints.push(PrintRequest::TargetCPUs);

View File

@ -519,11 +519,11 @@ fn test_codegen_options_tracking_hash() {
assert!(reference.dep_tracking_hash() != opts.dep_tracking_hash());
opts = reference.clone();
opts.debugging_opts.pgo_gen = SwitchWithOptPath::Enabled(None);
opts.cg.profile_generate = SwitchWithOptPath::Enabled(None);
assert_ne!(reference.dep_tracking_hash(), opts.dep_tracking_hash());
opts = reference.clone();
opts.debugging_opts.pgo_use = Some(PathBuf::from("abc"));
opts.cg.profile_use = Some(PathBuf::from("abc"));
assert_ne!(reference.dep_tracking_hash(), opts.dep_tracking_hash());
opts = reference.clone();

View File

@ -1295,9 +1295,9 @@ fn validate_commandline_args_with_session_available(sess: &Session) {
// Make sure that any given profiling data actually exists so LLVM can't
// decide to silently skip PGO.
if let Some(ref path) = sess.opts.debugging_opts.pgo_use {
if let Some(ref path) = sess.opts.cg.profile_use {
if !path.exists() {
sess.err(&format!("File `{}` passed to `-Zpgo-use` does not exist.",
sess.err(&format!("File `{}` passed to `-C profile-use` does not exist.",
path.display()));
}
}
@ -1306,7 +1306,7 @@ fn validate_commandline_args_with_session_available(sess: &Session) {
// an error to combine the two for now. It always runs into an assertions
// if LLVM is built with assertions, but without assertions it sometimes
// does not crash and will probably generate a corrupted binary.
if sess.opts.debugging_opts.pgo_gen.enabled() &&
if sess.opts.cg.profile_generate.enabled() &&
sess.target.target.options.is_like_msvc &&
sess.panic_strategy() == PanicStrategy::Unwind {
sess.err("Profile-guided optimization does not yet work in conjunction \

View File

@ -102,8 +102,8 @@ pub fn set_probestack(cx: &CodegenCx<'ll, '_>, llfn: &'ll Value) {
return
}
// probestack doesn't play nice either with pgo-gen.
if cx.sess().opts.debugging_opts.pgo_gen.enabled() {
// probestack doesn't play nice either with `-C profile-generate`.
if cx.sess().opts.cg.profile_generate.enabled() {
return;
}

View File

@ -1179,7 +1179,7 @@ fn link_args<'a, B: ArchiveBuilder<'a>>(cmd: &mut dyn Linker,
cmd.build_static_executable();
}
if sess.opts.debugging_opts.pgo_gen.enabled() {
if sess.opts.cg.profile_generate.enabled() {
cmd.pgo_gen();
}

View File

@ -203,7 +203,7 @@ fn exported_symbols_provider_local<'tcx>(
}
}
if tcx.sess.opts.debugging_opts.pgo_gen.enabled() {
if tcx.sess.opts.cg.profile_generate.enabled() {
// These are weak symbols that point to the profile version and the
// profile name, which need to be treated as exported so LTO doesn't nix
// them.

View File

@ -423,8 +423,8 @@ pub fn start_async_codegen<B: ExtraBackendMethods>(
modules_config.passes.push("insert-gcov-profiling".to_owned())
}
modules_config.pgo_gen = sess.opts.debugging_opts.pgo_gen.clone();
modules_config.pgo_use = sess.opts.debugging_opts.pgo_use.clone();
modules_config.pgo_gen = sess.opts.cg.profile_generate.clone();
modules_config.pgo_use = sess.opts.cg.profile_use.clone();
modules_config.opt_level = Some(sess.opts.optimize);
modules_config.opt_size = Some(sess.opts.optimize);

View File

@ -868,7 +868,7 @@ impl<'a> CrateLoader<'a> {
fn inject_profiler_runtime(&mut self) {
if self.sess.opts.debugging_opts.profile ||
self.sess.opts.debugging_opts.pgo_gen.enabled()
self.sess.opts.cg.profile_generate.enabled()
{
info!("loading profiler");

View File

@ -1,8 +1,8 @@
// Test that `-Zpgo-gen` creates expected instrumentation artifacts in LLVM IR.
// Test that `-Cprofile-generate` creates expected instrumentation artifacts in LLVM IR.
// Compiling with `-Cpanic=abort` because PGO+unwinding isn't supported on all platforms.
// needs-profiler-support
// compile-flags: -Z pgo-gen -Ccodegen-units=1 -Cpanic=abort
// compile-flags: -Cprofile-generate -Ccodegen-units=1 -Cpanic=abort
// CHECK: @__llvm_profile_raw_version =
// CHECK: @__profc_{{.*}}pgo_instrumentation{{.*}}some_function{{.*}} = private global

View File

@ -21,7 +21,7 @@ all: cpp-executable rust-executable
cpp-executable:
$(RUSTC) -Clinker-plugin-lto=on \
-Zpgo-gen="$(TMPDIR)"/cpp-profdata \
-Cprofile-generate="$(TMPDIR)"/cpp-profdata \
-o "$(TMPDIR)"/librustlib-xlto.a \
$(COMMON_FLAGS) \
./rustlib.rs
@ -39,7 +39,7 @@ cpp-executable:
-o "$(TMPDIR)"/cpp-profdata/merged.profdata \
"$(TMPDIR)"/cpp-profdata/default_*.profraw
$(RUSTC) -Clinker-plugin-lto=on \
-Zpgo-use="$(TMPDIR)"/cpp-profdata/merged.profdata \
-Cprofile-use="$(TMPDIR)"/cpp-profdata/merged.profdata \
-o "$(TMPDIR)"/librustlib-xlto.a \
$(COMMON_FLAGS) \
./rustlib.rs
@ -57,7 +57,7 @@ rust-executable:
$(CLANG) ./clib.c -fprofile-generate="$(TMPDIR)"/rs-profdata -flto=thin -c -o $(TMPDIR)/clib.o -O3
(cd $(TMPDIR); $(AR) crus ./libxyz.a ./clib.o)
$(RUSTC) -Clinker-plugin-lto=on \
-Zpgo-gen="$(TMPDIR)"/rs-profdata \
-Cprofile-generate="$(TMPDIR)"/rs-profdata \
-L$(TMPDIR) \
$(COMMON_FLAGS) \
-Clinker=$(CLANG) \
@ -78,7 +78,7 @@ rust-executable:
rm "$(TMPDIR)"/libxyz.a
(cd $(TMPDIR); $(AR) crus ./libxyz.a ./clib.o)
$(RUSTC) -Clinker-plugin-lto=on \
-Zpgo-use="$(TMPDIR)"/rs-profdata/merged.profdata \
-Cprofile-use="$(TMPDIR)"/rs-profdata/merged.profdata \
-L$(TMPDIR) \
$(COMMON_FLAGS) \
-Clinker=$(CLANG) \

View File

@ -2,7 +2,7 @@
-include ../tools.mk
COMPILE_FLAGS=-Copt-level=3 -Clto=fat -Z pgo-gen="$(TMPDIR)"
COMPILE_FLAGS=-Copt-level=3 -Clto=fat -Cprofile-generate="$(TMPDIR)"
# LLVM doesn't yet support instrumenting binaries that use unwinding on MSVC:
# https://github.com/rust-lang/rust/issues/61002

View File

@ -2,7 +2,7 @@
-include ../tools.mk
COMPILE_FLAGS=-O -Ccodegen-units=1 -Z pgo-gen="$(TMPDIR)"
COMPILE_FLAGS=-O -Ccodegen-units=1 -Cprofile-generate="$(TMPDIR)"
# LLVM doesn't yet support instrumenting binaries that use unwinding on MSVC:
# https://github.com/rust-lang/rust/issues/61002

View File

@ -2,7 +2,7 @@
-include ../tools.mk
COMPILE_FLAGS=-g -Z pgo-gen="$(TMPDIR)"
COMPILE_FLAGS=-g -Cprofile-generate="$(TMPDIR)"
# LLVM doesn't yet support instrumenting binaries that use unwinding on MSVC:
# https://github.com/rust-lang/rust/issues/61002

View File

@ -33,7 +33,7 @@ endif
all:
# Compile the test program with instrumentation
$(RUSTC) $(COMMON_FLAGS) -Z pgo-gen="$(TMPDIR)" main.rs
$(RUSTC) $(COMMON_FLAGS) -Cprofile-generate="$(TMPDIR)" main.rs
# Run it in order to generate some profiling data
$(call RUN,main some-argument) || exit 1
# Postprocess the profiling data so it can be used by the compiler
@ -41,7 +41,7 @@ all:
-o "$(TMPDIR)"/merged.profdata \
"$(TMPDIR)"/default_*.profraw
# Compile the test program again, making use of the profiling data
$(RUSTC) $(COMMON_FLAGS) -Z pgo-use="$(TMPDIR)"/merged.profdata --emit=llvm-ir main.rs
$(RUSTC) $(COMMON_FLAGS) -Cprofile-use="$(TMPDIR)"/merged.profdata --emit=llvm-ir main.rs
# Check that the generate IR contains some things that we expect
#
# We feed the file into LLVM FileCheck tool *in reverse* so that we see the