Add cg_clif as optional codegen backend
Rustc_codegen_cranelift is an alternative codegen backend for rustc based on Cranelift. It has the potential to improve compilation times in debug mode. In my experience the compile time improvements over debug mode LLVM for a clean build are about 20-30% in most cases.
This PR adds cg_clif as optional codegen backend. By default it is only enabled for `./x.py check`. It can be enabled for `./x.py build` too by adding `cranelift` to the `rust.codegen-backends` array in `config.toml`.
MCP: https://github.com/rust-lang/compiler-team/issues/270
r? `@Mark-Simulacrum`
Allow creating a list of files shipped in a release
This PR adds the `BUILD_MANIFEST_SHIPPED_FILES_PATH` environment variable to `build-manifest`, which writes a list of all the files referenced in the manifest to the path defined in the variable. The use for this is for `promote-release` to prune files unused files before publishing a release.
This PR **does not implement any pruning**, it just adds support for it to be implemented in the future on `promote-release`'s side.
r? `@Mark-Simulacrum`
Optimise align_offset for stride=1 further
`stride == 1` case can be computed more efficiently through `-p (mod
a)`. That, then translates to a nice and short sequence of LLVM
instructions:
%address = ptrtoint i8* %p to i64
%negptr = sub i64 0, %address
%offset = and i64 %negptr, %a_minus_one
And produces pretty much ideal code-gen when this function is used in
isolation.
Typical use of this function will, however, involve use of
the result to offset a pointer, i.e.
%aligned = getelementptr inbounds i8, i8* %p, i64 %offset
This still looks very good, but LLVM does not really translate that to
what would be considered ideal machine code (on any target). For example
that's the codegen we obtain for an unknown alignment:
; x86_64
dec rsi
mov rax, rdi
neg rax
and rax, rsi
add rax, rdi
In particular negating a pointer is not something that’s going to be
optimised for in the design of CISC architectures like x86_64. They
are much better at offsetting pointers. And so we’d love to utilize this
ability and produce code that's more like this:
; x86_64
lea rax, [rsi + rdi - 1]
neg rsi
and rax, rsi
To achieve this we need to give LLVM an opportunity to apply its
various peep-hole optimisations that it does during DAG selection. In
particular, the `and` instruction appears to be a major inhibitor here.
We cannot, sadly, get rid of this load-bearing operation, but we can
reorder operations such that LLVM has more to work with around this
instruction.
One such ordering is proposed in #75579 and results in LLVM IR that
looks broadly like this:
; using add enables `lea` and similar CISCisms
%offset_ptr = add i64 %address, %a_minus_one
%mask = sub i64 0, %a
%masked = and i64 %offset_ptr, %mask
; can be folded with `gepi` that may follow
%offset = sub i64 %masked, %address
…and generates the intended x86_64 machine code.
One might also wonder how the increased amount of code would impact a
RISC target. Turns out not much:
; aarch64 previous ; aarch64 new
sub x8, x1, #1 add x8, x1, x0
neg x9, x0 sub x8, x8, #1
and x8, x9, x8 neg x9, x1
add x0, x0, x8 and x0, x8, x9
(and similarly for ppc, sparc, mips, riscv, etc)
The only target that seems to do worse is… wasm32.
Onto actual measurements – the best way to evaluate snipets like these
is to use llvm-mca. Much like Aarch64 assembly would allow to suspect,
there isn’t any performance difference to be found. Both snippets
execute in same number of cycles for the CPUs I tried. On x86_64,
we get throughput improvement of >50%!
Fixes#75579
Rollup of 10 pull requests
Successful merges:
- #74477 (`#[deny(unsafe_op_in_unsafe_fn)]` in sys/wasm)
- #77836 (transmute_copy: explain that alignment is handled correctly)
- #78126 (Properly define va_arg and va_list for aarch64-apple-darwin)
- #78137 (Initialize tracing subscriber in compiletest tool)
- #78161 (Add issue template link to IRLO)
- #78214 (Tweak match arm semicolon removal suggestion to account for futures)
- #78247 (Fix#78192)
- #78252 (Add codegen test for #45964)
- #78268 (Do not try to report on closures to avoid ICE)
- #78295 (Add some regression tests)
Failed merges:
r? `@ghost`
Tweak match arm semicolon removal suggestion to account for futures
* Tweak and extend "use `.await`" suggestions
* Suggest removal of semicolon on prior match arm
* Account for `impl Future` when suggesting semicolon removal
* Silence some errors when encountering `await foo()?` as can't be certain what the intent was
*Thanks to https://twitter.com/a_hoverbear/status/1318960787105353728 for pointing this out!*
Initialize tracing subscriber in compiletest tool
The logging in compiletest was migrated from log crate to a tracing, but
the initialization code was never changed, so logging is non-functional.
Initialize tracing subscriber using default settings.
Properly define va_arg and va_list for aarch64-apple-darwin
From [Apple][]:
> Because of these changes, the type `va_list` is an alias for `char*`,
> and not for the struct type in the generic procedure call standard.
With this change `/x.py test --stage 1 src/test/ui/abi/variadic-ffi`
passes.
Fixes#78092
[Apple]: https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms
transmute_copy: explain that alignment is handled correctly
The doc comment currently is somewhat misleading because if it actually transmuted `&T` to `&U`, a higher-aligned `U` would be problematic.
`#[deny(unsafe_op_in_unsafe_fn)]` in sys/wasm
This is part of #73904.
This encloses unsafe operations in unsafe fn in `libstd/sys/wasm`.
@rustbot modify labels: F-unsafe-block-in-unsafe-fn
Rollup of 8 pull requests
Successful merges:
- #77984 (Compute proper module parent during resolution)
- #78085 (MIR validation should check `SwitchInt` values are valid for the type)
- #78208 (replace `#[allow_internal_unstable]` with `#[rustc_allow_const_fn_unstable]` for `const fn`s)
- #78209 (Update `compiler_builtins` to 0.1.36)
- #78276 (Bump backtrace-rs to enable Mach-O support on iOS.)
- #78320 (Link to cargo's `build-std` feature instead of `xargo` in custom target docs)
- #78322 (BTreeMap: stop mistaking node::MIN_LEN for a node level constraint)
- #78326 (Split out statement attributes changes from #78306)
Failed merges:
r? `@ghost`
Split out statement attributes changes from #78306
This is the same as PR https://github.com/rust-lang/rust/pull/78306, but `unused_doc_comments` is modified to explicitly ignore statement items (which preserves the current behavior).
This shouldn't have any user-visible effects, so it can be landed without lang team discussion.
---------
When the 'early' and 'late' visitors visit an attribute target, they
activate any lint attributes (e.g. `#[allow]`) that apply to it.
This can affect warnings emitted on sibiling attributes. For example,
the following code does not produce an `unused_attributes` for
`#[inline]`, since the sibiling `#[allow(unused_attributes)]` suppressed
the warning.
```rust
trait Foo {
#[allow(unused_attributes)] #[inline] fn first();
#[inline] #[allow(unused_attributes)] fn second();
}
```
However, we do not do this for statements - instead, the lint attributes
only become active when we visit the struct nested inside `StmtKind`
(e.g. `Item`).
Currently, this is difficult to observe due to another issue - the
`HasAttrs` impl for `StmtKind` ignores attributes for `StmtKind::Item`.
As a result, the `unused_doc_comments` lint will never see attributes on
item statements.
This commit makes two interrelated fixes to the handling of inert
(non-proc-macro) attributes on statements:
* The `HasAttr` impl for `StmtKind` now returns attributes for
`StmtKind::Item`, treating it just like every other `StmtKind`
variant. The only place relying on the old behavior was macro
which has been updated to explicitly ignore attributes on item
statements. This allows the `unused_doc_comments` lint to fire for
item statements.
* The `early` and `late` lint visitors now activate lint attributes when
invoking the callback for `Stmt`. This ensures that a lint
attribute (e.g. `#[allow(unused_doc_comments)]`) can be applied to
sibiling attributes on an item statement.
For now, the `unused_doc_comments` lint is explicitly disabled on item
statements, which preserves the current behavior. The exact locatiosn
where this lint should fire are being discussed in PR #78306
BTreeMap: stop mistaking node::MIN_LEN for a node level constraint
Correcting #77612 that fell into the trap of assuming that node::MIN_LEN is an imposed minimum everywhere, and trying to make it much more clear it is an offered minimum at the node level.
r? @Mark-Simulacrum
Link to cargo's `build-std` feature instead of `xargo` in custom target docs
The `xargo` tool is in maintenance mode since 2018 and the `build-std` feature of cargo already works reasonably well for most use cases.
Bump backtrace-rs to enable Mach-O support on iOS.
Related to rust-lang/backtrace-rs#378. Fixes backtraces on iOS that were missing in Rust v1.47.0 after switching to gimli because it only enabled Mach-O support on macOS.
Update `compiler_builtins` to 0.1.36
So, the libc build with cargo's `build-std` feature emits a lot of warnings like:
```
warning: a method with this name may be added to the standard library in the future
--> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/compiler_builtins-0.1.35/src/int/udiv.rs:98:23
|
98 | q = n << (<$ty>::BITS - sr);
| ^^^^^^^^^^^
...
268 | udivmod_inner!(n, d, rem, u128)
| ------------------------------- in this macro invocation
|
= warning: once this method is added to the standard library, the ambiguity may cause an error or change in behavior!
= note: for more information, see issue #48919 <rust-lang/rust/issues/48919>
= help: call with fully qualified syntax `Int::BITS(...)` to keep using the current method
= help: add `#![feature(int_bits_const)]` to the crate attributes to enable `num::<impl u128>::BITS`
= note: this warning originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)
```
(You can find the full log in https://github.com/rust-lang/libc/runs/1283695796?check_suite_focus=true for example.)
0.1.36 contains https://github.com/rust-lang/compiler-builtins/pull/332 so this version should remove this warning.
cc https://github.com/rust-lang/libc/issues/1942
replace `#[allow_internal_unstable]` with `#[rustc_allow_const_fn_unstable]` for `const fn`s
`#[allow_internal_unstable]` is currently used to side-step feature gate and stability checks.
While it was originally only meant to be used only on macros, its use was expanded to `const fn`s.
This pr adds stricter checks for the usage of `#[allow_internal_unstable]` (only on macros) and introduces the `#[rustc_allow_const_fn_unstable]` attribute for usage on `const fn`s.
This pr does not change any of the functionality associated with the use of `#[allow_internal_unstable]` on macros or the usage of `#[rustc_allow_const_fn_unstable]` (instead of `#[allow_internal_unstable]`) on `const fn`s (see https://github.com/rust-lang/rust/issues/69399#issuecomment-712911540).
Note: The check for `#[rustc_allow_const_fn_unstable]` currently only validates that the attribute is used on a function, because I don't know how I would check if the function is a `const fn` at the place of the check. I therefore openend this as a 'draft pull request'.
Closesrust-lang/rust#69399
r? @oli-obk
Compute proper module parent during resolution
Fixes#75982
The direct parent of a module may not be a module
(e.g. `const _: () = { #[path = "foo.rs"] mod foo; };`).
To find the parent of a module for purposes of resolution, we need to
walk up the tree until we hit a module or a crate root.
perf: buffer SipHasher128
This is an attempt to improve Siphasher128 performance by buffering input. Although it reduces instruction count, I'm not confident the effect on wall times, or lack-thereof, is worth the change.
---
Additional notes not reflected in source comments:
* Implementation choices were guided by a combination of results from rustc-perf and micro-benchmarks, mostly the former.
* ~~I tried a couple of different struct layouts that might be more cache friendly with no obvious effect.~~ Update: a particular struct layout was chosen, but it's not critical to performance. See comments in source and discussion below.
* I suspect that buffering would be important to a SIMD-accelerated algorithm, but from what I've read and my own tests, SipHash does not seem very amenable to SIMD acceleration, at least by SSE.
fix def collector for impl trait
fixes#77329
We now consistently make `impl Trait` a hir owner, requiring some special casing for synthetic generic params.
r? `@eddyb`
Upgrade to measureme 9.0.0
I believe I did this correctly but there's still a reference to `measureme@0.7.1` coming from `rustc-ap-rustc_data_structures` and I'm not sure how to resolve that.
r? `@Mark-Simulacrum`
We'll also need to deploy the new version of the tools on perf.rlo.
stop promoting union field accesses in 'const'
Turns out that promotion of union field accesses is the only difference between "promotion in `const`/`static` bodies" and "explicit promotion". So if we can remove this, we have finally achieved what I thought to already be the case -- that the bodies of `const`/`static` initializers behave the same as explicit promotion contexts.
The reason we do not want to promote union field accesses is that they can introduce UB, i.e., they can go wrong. We want to [minimize the ways promoteds can fail to evaluate](https://github.com/rust-lang/const-eval/issues/53). Also this change makes things more consistent overall, removing a special case that was added without much consideration (as far as I can tell).
Cc `@rust-lang/wg-const-eval`