diff --git a/src/doc/rustc/src/SUMMARY.md b/src/doc/rustc/src/SUMMARY.md index 34708d1847f..3cda8d92797 100644 --- a/src/doc/rustc/src/SUMMARY.md +++ b/src/doc/rustc/src/SUMMARY.md @@ -13,5 +13,6 @@ - [Targets](targets/index.md) - [Built-in Targets](targets/built-in.md) - [Custom Targets](targets/custom.md) +- [Profile-guided Optimization](profile-guided-optimization.md) - [Linker-plugin based LTO](linker-plugin-lto.md) - [Contributing to `rustc`](contributing.md) diff --git a/src/doc/rustc/src/codegen-options/index.md b/src/doc/rustc/src/codegen-options/index.md index a616409d9a4..3773a778302 100644 --- a/src/doc/rustc/src/codegen-options/index.md +++ b/src/doc/rustc/src/codegen-options/index.md @@ -214,3 +214,20 @@ This option lets you control what happens when the code panics. ## incremental This flag allows you to enable incremental compilation. + +## profile-generate + +This flag allows for creating instrumented binaries that will collect +profiling data for use with profile-guided optimization (PGO). The flag takes +an optional argument which is the path to a directory into which the +instrumented binary will emit the collected data. See the chapter on +[profile-guided optimization](profile-guided-optimization.html) for more +information. + +## profile-use + +This flag specifies the profiling data file to be used for profile-guided +optimization (PGO). The flag takes a mandatory argument which is the path +to a valid `.profdata` file. See the chapter on +[profile-guided optimization](profile-guided-optimization.html) for more +information. diff --git a/src/doc/rustc/src/profile-guided-optimization.md b/src/doc/rustc/src/profile-guided-optimization.md new file mode 100644 index 00000000000..38be07a6440 --- /dev/null +++ b/src/doc/rustc/src/profile-guided-optimization.md @@ -0,0 +1,136 @@ +# Profile Guided Optimization + +`rustc` supports doing profile-guided optimization (PGO). +This chapter describes what PGO is, what it is good for, and how it can be used. + +## What Is Profiled-Guided Optimization? + +The basic concept of PGO is to collect data about the typical execution of +a program (e.g. which branches it is likely to take) and then use this data +to inform optimizations such as inlining, machine-code layout, +register allocation, etc. + +There are different ways of collecting data about a program's execution. +One is to run the program inside a profiler (such as `perf`) and another +is to create an instrumented binary, that is, a binary that has data +collection built into it, and run that. +The latter usually provides more accurate data and it is also what is +supported by `rustc`. + +## Usage + +Generating a PGO-optimized program involves following a workflow with four steps: + +1. Compile the program with instrumentation enabled + (e.g. `rustc -Cprofile-generate=/tmp/pgo-data main.rs`) +2. Run the instrumented program (e.g. `./main`) which generates a + `default_.profraw` file +3. Convert the `.profraw` file into a `.profdata` file using + LLVM's `llvm-profdata` tool +4. Compile the program again, this time making use of the profiling data + (for example `rustc -Cprofile-use=merged.profdata main.rs`) + +An instrumented program will create one or more `.profraw` files, one for each +instrumented binary. E.g. an instrumented executable that loads two instrumented +dynamic libraries at runtime will generate three `.profraw` files. Running an +instrumented binary multiple times, on the other hand, will re-use the +respective `.profraw` files, updating them in place. + +These `.profraw` files have to be post-processed before they can be fed back +into the compiler. This is done by the `llvm-profdata` tool. This tool +is most easily installed via + +```bash +rustup component add llvm-tools-preview +``` + +Note that installing the `llvm-tools-preview` component won't add +`llvm-profdata` to the `PATH`. Rather, the tool can be found in: + +```bash +~/.rustup/toolchains//lib/rustlib//bin/ +``` + +Alternatively, an `llvm-profdata` coming with a recent LLVM or Clang +version usually works too. + +The `llvm-profdata` tool merges multiple `.profraw` files into a single +`.profdata` file that can then be fed back into the compiler via +`-Cprofile-use`: + +```bash +# STEP 1: Compile the binary with instrumentation +rustc -Cprofile-generate=/tmp/pgo-data -O ./main.rs + +# STEP 2: Run the binary a few times, maybe with common sets of args. +# Each run will create or update `.profraw` files in /tmp/pgo-data +./main mydata1.csv +./main mydata2.csv +./main mydata3.csv + +# STEP 3: Merge and post-process all the `.profraw` files in /tmp/pgo-data +llvm-profdata merge -o ./merged.profdata /tmp/pgo-data + +# STEP 4: Use the merged `.profdata` file during optimization. All `rustc` +# flags have to be the same. +rustc -Cprofile-use=./merged.profdata -O ./main.rs +``` + +### A Complete Cargo Workflow + +Using this feature with Cargo works very similar to using it with `rustc` +directly. Again, we generate an instrumented binary, run it to produce data, +merge the data, and feed it back into the compiler. Some things of note: + +- We use the `RUSTFLAGS` environment variable in order to pass the PGO compiler + flags to the compilation of all crates in the program. + +- We pass the `--target` flag to Cargo, which prevents the `RUSTFLAGS` + arguments to be passed to Cargo build scripts. We don't want the build + scripts to generate a bunch of `.profraw` files. + +- We pass `--release` to Cargo because that's where PGO makes the most sense. + In theory, PGO can also be done on debug builds but there is little reason + to do so. + +- It is recommended to use *absolute paths* for the argument of + `-Cprofile-generate` and `-Cprofile-use`. Cargo can invoke `rustc` with + varying working directories, meaning that `rustc` will not be able to find + the supplied `.profdata` file. With absolute paths this is not an issue. + +- It is good practice to make sure that there is no left-over profiling data + from previous compilation sessions. Just deleting the directory is a simple + way of doing so (see `STEP 0` below). + +This is what the entire workflow looks like: + +```bash +# STEP 0: Make sure there is no left-over profiling data from previous runs +rm -rf /tmp/pgo-data + +# STEP 1: Build the instrumented binaries +RUSTFLAGS="-Cprofile-generate=/tmp/pgo-data" \ + cargo build --release --target=x86_64-unknown-linux-gnu + +# STEP 2: Run the instrumented binaries with some typical data +./target/x86_64-unknown-linux-gnu/release/myprogram mydata1.csv +./target/x86_64-unknown-linux-gnu/release/myprogram mydata2.csv +./target/x86_64-unknown-linux-gnu/release/myprogram mydata3.csv + +# STEP 3: Merge the `.profraw` files into a `.profdata` file +llvm-profdata merge -o /tmp/pgo-data/merged.profdata /tmp/pgo-data + +# STEP 4: Use the `.profdata` file for guiding optimizations +RUSTFLAGS="-Cprofile-use=/tmp/pgo-data/merged.profdata" \ + cargo build --release --target=x86_64-unknown-linux-gnu +``` + +## Further Reading + +`rustc`'s PGO support relies entirely on LLVM's implementation of the feature +and is equivalent to what Clang offers via the `-fprofile-generate` / +`-fprofile-use` flags. The [Profile Guided Optimization][clang-pgo] section +in Clang's documentation is therefore an interesting read for anyone who wants +to use PGO with Rust. + +[clang-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization diff --git a/src/librustc/session/config.rs b/src/librustc/session/config.rs index 895f9c6d8fb..9f033262850 100644 --- a/src/librustc/session/config.rs +++ b/src/librustc/session/config.rs @@ -1207,7 +1207,11 @@ options! {CodegenOptions, CodegenSetter, basic_codegen_options, linker_plugin_lto: LinkerPluginLto = (LinkerPluginLto::Disabled, parse_linker_plugin_lto, [TRACKED], "generate build artifacts that are compatible with linker-based LTO."), - + profile_generate: SwitchWithOptPath = (SwitchWithOptPath::Disabled, + parse_switch_with_opt_path, [TRACKED], + "compile the program with profiling instrumentation"), + profile_use: Option = (None, parse_opt_pathbuf, [TRACKED], + "use the given `.profdata` file for profile-guided optimization"), } options! {DebuggingOptions, DebuggingSetter, basic_debugging_options, @@ -1379,11 +1383,6 @@ options! {DebuggingOptions, DebuggingSetter, basic_debugging_options, "extra arguments to prepend to the linker invocation (space separated)"), profile: bool = (false, parse_bool, [TRACKED], "insert profiling code"), - pgo_gen: SwitchWithOptPath = (SwitchWithOptPath::Disabled, - parse_switch_with_opt_path, [TRACKED], - "Generate PGO profile data, to a given file, or to the default location if it's empty."), - pgo_use: Option = (None, parse_opt_pathbuf, [TRACKED], - "Use PGO profile data from the given profile file."), disable_instrumentation_preinliner: bool = (false, parse_bool, [TRACKED], "Disable the instrumentation pre-inliner, useful for profiling / PGO."), relro_level: Option = (None, parse_relro_level, [TRACKED], @@ -2036,13 +2035,6 @@ pub fn build_session_options_and_crate_config( } } - if debugging_opts.pgo_gen.enabled() && debugging_opts.pgo_use.is_some() { - early_error( - error_format, - "options `-Z pgo-gen` and `-Z pgo-use` are exclusive", - ); - } - let mut output_types = BTreeMap::new(); if !debugging_opts.parse_only { for list in matches.opt_strs("emit") { @@ -2154,6 +2146,13 @@ pub fn build_session_options_and_crate_config( ); } + if cg.profile_generate.enabled() && cg.profile_use.is_some() { + early_error( + error_format, + "options `-C profile-generate` and `-C profile-use` are exclusive", + ); + } + let mut prints = Vec::::new(); if cg.target_cpu.as_ref().map_or(false, |s| s == "help") { prints.push(PrintRequest::TargetCPUs); diff --git a/src/librustc/session/config/tests.rs b/src/librustc/session/config/tests.rs index b8477f8dd17..3d6312548a4 100644 --- a/src/librustc/session/config/tests.rs +++ b/src/librustc/session/config/tests.rs @@ -519,11 +519,11 @@ fn test_codegen_options_tracking_hash() { assert!(reference.dep_tracking_hash() != opts.dep_tracking_hash()); opts = reference.clone(); - opts.debugging_opts.pgo_gen = SwitchWithOptPath::Enabled(None); + opts.cg.profile_generate = SwitchWithOptPath::Enabled(None); assert_ne!(reference.dep_tracking_hash(), opts.dep_tracking_hash()); opts = reference.clone(); - opts.debugging_opts.pgo_use = Some(PathBuf::from("abc")); + opts.cg.profile_use = Some(PathBuf::from("abc")); assert_ne!(reference.dep_tracking_hash(), opts.dep_tracking_hash()); opts = reference.clone(); diff --git a/src/librustc/session/mod.rs b/src/librustc/session/mod.rs index bb4ef2d7bd4..9486f353b3f 100644 --- a/src/librustc/session/mod.rs +++ b/src/librustc/session/mod.rs @@ -1295,9 +1295,9 @@ fn validate_commandline_args_with_session_available(sess: &Session) { // Make sure that any given profiling data actually exists so LLVM can't // decide to silently skip PGO. - if let Some(ref path) = sess.opts.debugging_opts.pgo_use { + if let Some(ref path) = sess.opts.cg.profile_use { if !path.exists() { - sess.err(&format!("File `{}` passed to `-Zpgo-use` does not exist.", + sess.err(&format!("File `{}` passed to `-C profile-use` does not exist.", path.display())); } } @@ -1306,7 +1306,7 @@ fn validate_commandline_args_with_session_available(sess: &Session) { // an error to combine the two for now. It always runs into an assertions // if LLVM is built with assertions, but without assertions it sometimes // does not crash and will probably generate a corrupted binary. - if sess.opts.debugging_opts.pgo_gen.enabled() && + if sess.opts.cg.profile_generate.enabled() && sess.target.target.options.is_like_msvc && sess.panic_strategy() == PanicStrategy::Unwind { sess.err("Profile-guided optimization does not yet work in conjunction \ diff --git a/src/librustc_codegen_llvm/attributes.rs b/src/librustc_codegen_llvm/attributes.rs index 4735588f29a..94abf1796d3 100644 --- a/src/librustc_codegen_llvm/attributes.rs +++ b/src/librustc_codegen_llvm/attributes.rs @@ -102,8 +102,8 @@ pub fn set_probestack(cx: &CodegenCx<'ll, '_>, llfn: &'ll Value) { return } - // probestack doesn't play nice either with pgo-gen. - if cx.sess().opts.debugging_opts.pgo_gen.enabled() { + // probestack doesn't play nice either with `-C profile-generate`. + if cx.sess().opts.cg.profile_generate.enabled() { return; } diff --git a/src/librustc_codegen_ssa/back/link.rs b/src/librustc_codegen_ssa/back/link.rs index 618e8b8699f..e3d297e7862 100644 --- a/src/librustc_codegen_ssa/back/link.rs +++ b/src/librustc_codegen_ssa/back/link.rs @@ -1179,7 +1179,7 @@ fn link_args<'a, B: ArchiveBuilder<'a>>(cmd: &mut dyn Linker, cmd.build_static_executable(); } - if sess.opts.debugging_opts.pgo_gen.enabled() { + if sess.opts.cg.profile_generate.enabled() { cmd.pgo_gen(); } diff --git a/src/librustc_codegen_ssa/back/symbol_export.rs b/src/librustc_codegen_ssa/back/symbol_export.rs index b9ee82f108a..3e0f030527f 100644 --- a/src/librustc_codegen_ssa/back/symbol_export.rs +++ b/src/librustc_codegen_ssa/back/symbol_export.rs @@ -203,7 +203,7 @@ fn exported_symbols_provider_local<'tcx>( } } - if tcx.sess.opts.debugging_opts.pgo_gen.enabled() { + if tcx.sess.opts.cg.profile_generate.enabled() { // These are weak symbols that point to the profile version and the // profile name, which need to be treated as exported so LTO doesn't nix // them. diff --git a/src/librustc_codegen_ssa/back/write.rs b/src/librustc_codegen_ssa/back/write.rs index 309187ca2ea..6364843d772 100644 --- a/src/librustc_codegen_ssa/back/write.rs +++ b/src/librustc_codegen_ssa/back/write.rs @@ -423,8 +423,8 @@ pub fn start_async_codegen( modules_config.passes.push("insert-gcov-profiling".to_owned()) } - modules_config.pgo_gen = sess.opts.debugging_opts.pgo_gen.clone(); - modules_config.pgo_use = sess.opts.debugging_opts.pgo_use.clone(); + modules_config.pgo_gen = sess.opts.cg.profile_generate.clone(); + modules_config.pgo_use = sess.opts.cg.profile_use.clone(); modules_config.opt_level = Some(sess.opts.optimize); modules_config.opt_size = Some(sess.opts.optimize); diff --git a/src/librustc_metadata/creader.rs b/src/librustc_metadata/creader.rs index df0957254cc..2073b317939 100644 --- a/src/librustc_metadata/creader.rs +++ b/src/librustc_metadata/creader.rs @@ -868,7 +868,7 @@ impl<'a> CrateLoader<'a> { fn inject_profiler_runtime(&mut self) { if self.sess.opts.debugging_opts.profile || - self.sess.opts.debugging_opts.pgo_gen.enabled() + self.sess.opts.cg.profile_generate.enabled() { info!("loading profiler"); diff --git a/src/test/codegen/pgo-instrumentation.rs b/src/test/codegen/pgo-instrumentation.rs index e9436505886..8200cf4e016 100644 --- a/src/test/codegen/pgo-instrumentation.rs +++ b/src/test/codegen/pgo-instrumentation.rs @@ -1,8 +1,8 @@ -// Test that `-Zpgo-gen` creates expected instrumentation artifacts in LLVM IR. +// Test that `-Cprofile-generate` creates expected instrumentation artifacts in LLVM IR. // Compiling with `-Cpanic=abort` because PGO+unwinding isn't supported on all platforms. // needs-profiler-support -// compile-flags: -Z pgo-gen -Ccodegen-units=1 -Cpanic=abort +// compile-flags: -Cprofile-generate -Ccodegen-units=1 -Cpanic=abort // CHECK: @__llvm_profile_raw_version = // CHECK: @__profc_{{.*}}pgo_instrumentation{{.*}}some_function{{.*}} = private global diff --git a/src/test/run-make-fulldeps/cross-lang-lto-pgo-smoketest/Makefile b/src/test/run-make-fulldeps/cross-lang-lto-pgo-smoketest/Makefile index 59a7d61892f..f8efeca5614 100644 --- a/src/test/run-make-fulldeps/cross-lang-lto-pgo-smoketest/Makefile +++ b/src/test/run-make-fulldeps/cross-lang-lto-pgo-smoketest/Makefile @@ -21,7 +21,7 @@ all: cpp-executable rust-executable cpp-executable: $(RUSTC) -Clinker-plugin-lto=on \ - -Zpgo-gen="$(TMPDIR)"/cpp-profdata \ + -Cprofile-generate="$(TMPDIR)"/cpp-profdata \ -o "$(TMPDIR)"/librustlib-xlto.a \ $(COMMON_FLAGS) \ ./rustlib.rs @@ -39,7 +39,7 @@ cpp-executable: -o "$(TMPDIR)"/cpp-profdata/merged.profdata \ "$(TMPDIR)"/cpp-profdata/default_*.profraw $(RUSTC) -Clinker-plugin-lto=on \ - -Zpgo-use="$(TMPDIR)"/cpp-profdata/merged.profdata \ + -Cprofile-use="$(TMPDIR)"/cpp-profdata/merged.profdata \ -o "$(TMPDIR)"/librustlib-xlto.a \ $(COMMON_FLAGS) \ ./rustlib.rs @@ -57,7 +57,7 @@ rust-executable: $(CLANG) ./clib.c -fprofile-generate="$(TMPDIR)"/rs-profdata -flto=thin -c -o $(TMPDIR)/clib.o -O3 (cd $(TMPDIR); $(AR) crus ./libxyz.a ./clib.o) $(RUSTC) -Clinker-plugin-lto=on \ - -Zpgo-gen="$(TMPDIR)"/rs-profdata \ + -Cprofile-generate="$(TMPDIR)"/rs-profdata \ -L$(TMPDIR) \ $(COMMON_FLAGS) \ -Clinker=$(CLANG) \ @@ -78,7 +78,7 @@ rust-executable: rm "$(TMPDIR)"/libxyz.a (cd $(TMPDIR); $(AR) crus ./libxyz.a ./clib.o) $(RUSTC) -Clinker-plugin-lto=on \ - -Zpgo-use="$(TMPDIR)"/rs-profdata/merged.profdata \ + -Cprofile-use="$(TMPDIR)"/rs-profdata/merged.profdata \ -L$(TMPDIR) \ $(COMMON_FLAGS) \ -Clinker=$(CLANG) \ diff --git a/src/test/run-make-fulldeps/pgo-gen-lto/Makefile b/src/test/run-make-fulldeps/pgo-gen-lto/Makefile index 56f31434ade..6c70d951c35 100644 --- a/src/test/run-make-fulldeps/pgo-gen-lto/Makefile +++ b/src/test/run-make-fulldeps/pgo-gen-lto/Makefile @@ -2,7 +2,7 @@ -include ../tools.mk -COMPILE_FLAGS=-Copt-level=3 -Clto=fat -Z pgo-gen="$(TMPDIR)" +COMPILE_FLAGS=-Copt-level=3 -Clto=fat -Cprofile-generate="$(TMPDIR)" # LLVM doesn't yet support instrumenting binaries that use unwinding on MSVC: # https://github.com/rust-lang/rust/issues/61002 diff --git a/src/test/run-make-fulldeps/pgo-gen-no-imp-symbols/Makefile b/src/test/run-make-fulldeps/pgo-gen-no-imp-symbols/Makefile index bb86160d2df..3fbfeb09eb3 100644 --- a/src/test/run-make-fulldeps/pgo-gen-no-imp-symbols/Makefile +++ b/src/test/run-make-fulldeps/pgo-gen-no-imp-symbols/Makefile @@ -2,7 +2,7 @@ -include ../tools.mk -COMPILE_FLAGS=-O -Ccodegen-units=1 -Z pgo-gen="$(TMPDIR)" +COMPILE_FLAGS=-O -Ccodegen-units=1 -Cprofile-generate="$(TMPDIR)" # LLVM doesn't yet support instrumenting binaries that use unwinding on MSVC: # https://github.com/rust-lang/rust/issues/61002 diff --git a/src/test/run-make-fulldeps/pgo-gen/Makefile b/src/test/run-make-fulldeps/pgo-gen/Makefile index f0ab3b7d13d..3b66427c14c 100644 --- a/src/test/run-make-fulldeps/pgo-gen/Makefile +++ b/src/test/run-make-fulldeps/pgo-gen/Makefile @@ -2,7 +2,7 @@ -include ../tools.mk -COMPILE_FLAGS=-g -Z pgo-gen="$(TMPDIR)" +COMPILE_FLAGS=-g -Cprofile-generate="$(TMPDIR)" # LLVM doesn't yet support instrumenting binaries that use unwinding on MSVC: # https://github.com/rust-lang/rust/issues/61002 diff --git a/src/test/run-make-fulldeps/pgo-use/Makefile b/src/test/run-make-fulldeps/pgo-use/Makefile index 72c3c34ee37..61a73587759 100644 --- a/src/test/run-make-fulldeps/pgo-use/Makefile +++ b/src/test/run-make-fulldeps/pgo-use/Makefile @@ -33,7 +33,7 @@ endif all: # Compile the test program with instrumentation - $(RUSTC) $(COMMON_FLAGS) -Z pgo-gen="$(TMPDIR)" main.rs + $(RUSTC) $(COMMON_FLAGS) -Cprofile-generate="$(TMPDIR)" main.rs # Run it in order to generate some profiling data $(call RUN,main some-argument) || exit 1 # Postprocess the profiling data so it can be used by the compiler @@ -41,7 +41,7 @@ all: -o "$(TMPDIR)"/merged.profdata \ "$(TMPDIR)"/default_*.profraw # Compile the test program again, making use of the profiling data - $(RUSTC) $(COMMON_FLAGS) -Z pgo-use="$(TMPDIR)"/merged.profdata --emit=llvm-ir main.rs + $(RUSTC) $(COMMON_FLAGS) -Cprofile-use="$(TMPDIR)"/merged.profdata --emit=llvm-ir main.rs # Check that the generate IR contains some things that we expect # # We feed the file into LLVM FileCheck tool *in reverse* so that we see the