Add new #[target_feature = "..."] attribute.
This commit adds a new attribute that instructs the compiler to emit
target specific code for a single function. For example, the following
function is permitted to use instructions that are part of SSE 4.2:
#[target_feature = "+sse4.2"]
fn foo() { ... }
In particular, use of this attribute does not require setting the
-C target-feature or -C target-cpu options on rustc.
This attribute does not have any protections built into it. For example,
nothing stops one from calling the above `foo` function on hosts without
SSE 4.2 support. Doing so may result in a SIGILL.
I've also expanded the x86 target feature whitelist.
print option to dump target spec as JSON
This lets the user dump out the target spec that the compiler is using. This is useful to people defining their own target.json to compare it against existing targets or understand how different targets change internal settings. It is also potentially useful for Cargo to determine if something has changed with a target and it needs to rebuild things.
evaluate obligations in LIFO order during closure projection
This is an annoying gotcha with the projection cache's handling of
nested obligations.
Nested projection obligations enter the issue in this case:
```
DEBUG:rustc::traits::project: AssociatedTypeNormalizer: depth=3
normalized
<std::iter::Map<std::ops::Range<i32>,
[closure@not-a-recursion-error.rs:5:30: 5:53]> as
std::iter::IntoIterator>::Item to _#7t with 12 add'l obligations
```
Here the normalization result is the result of the nested impl
`<[closure@not-a-recursion-error.rs:5:30: 5:53] as FnMut(i32)>::Output`,
which is an additional obligation that is a part of "add'l obligations".
By itself, this is proper behaviour - the additional obligation is
returned, and the RFC 447 rules ensure that it is processed before the
output `#_7t` is used in any way.
However, the projection cache breaks this - it caches the
`<std::iter::Map<std::ops::Range<i32>,[closure@not-a-recursion-error.rs:5:30:
5:53]> as std::iter::IntoIterator>::Item = #_7t` resolution. Now
everybody else that attempts to look up the projection will just get
`#_7t` *without* any additional obligations. This obviously causes all
sorts of trouble (here a spurious `EvaluatedToAmbig` results in
specializations not being discarded
[here](9ca50bd4d5/src/librustc/traits/select.rs (L1705))).
The compiler works even with this projection cache gotcha because in most
cases during "one-pass evaluation". we tend to process obligations in LIFO
order - after an obligation is added to the cache, we process its nested
obligations before we do anything else (and if we have a cycle, we handle
it specifically) - which makes sure the inference variables are resolved
before they are used.
That "LIFO" order That was not done when projecting out of a closure, so
let's just fix that for the time being.
Fixes#38033.
Beta-nominating because regression.
r? @nikomatsakis
In LLVM 4.0, this enum becomes an actual type-safe enum, which breaks
all of the interfaces. Introduce our own copy of the bitflags that we
can then safely convert to the LLVM one.
This option provides the user the ability to dump the configuration that
is in use by rustc for the target they are building for.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
[9/n] rustc: move type information out of AdtDef and TraitDef.
_This is part of a series ([prev](https://github.com/rust-lang/rust/pull/37688) | [next]()) of patches designed to rework rustc into an out-of-order on-demand pipeline model for both better feature support (e.g. [MIR-based](https://github.com/solson/miri) early constant evaluation) and incremental execution of compiler passes (e.g. type-checking), with beneficial consequences to IDE support as well.
If any motivation is unclear, please ask for additional PR description clarifications or code comments._
<hr>
Both `AdtDef` and `TraitDef` contained type information (field types, generics and predicates) which was required to create them, preventing their use before that type information exists, or in the case of field types, *mutation* was required, leading to a variance-magicking implementation of `ivar`s.
This PR takes that information out and the resulting cleaner setup could even eventually end up merged with HIR, because, just like `AssociatedItem` before it, there's no dependency on types anymore.
(With one exception, variant discriminants should probably be moved into their own map later.)
Fuchsia support for std::process via liblaunchpad.
Now we can launch processes on Fuchsia via the Rust standard library! ... Mostly.
Right now, ~5% of the time, reading the stdout/stderr off the pipes will fail. Some Magenta kernel people think it's probably a bug in Magenta's pipes. I wrote a unit test that demonstrates the issue in C, which I was told will expedite a fix. https://fuchsia-review.googlesource.com/#/c/15628/
Hopefully this can get merged once the issue is fixed :)
@raphlinus
limit the length of types in monomorphization
This adds the new insta-stable `#![type_size_limit]` crate attribute to control
the limit, and is obviously a [breaking-change] fixable by that.
Fixes#37311.
r? @nikomatsakis
The `visit_fn` code mutates its surrounding context. Between *items*,
this was saved/restored, but between impl items it was not. This meant
that we wound up with `CallSiteScope` entries with two parents (or
more!). As far as I can tell, this is harmless in actual type-checking,
since the regions you interact with are always from at most one of those
branches. But it can slow things down.
Before, the effect was limited, since it only applied to impl items
within an impl. After #37660, impl items are visisted all together at
the end, and hence this could create a very messed up
hierarchy. Isolating impl item properly solves both issues.
I cannot come up with a way to unit-test this; for posterity, however,
you can observe the messed up hierarchies with a test as simple as the
following, which would create a callsite scope with two parents both
before and after
```
struct Foo {
}
impl Foo {
fn bar(&self) -> usize {
22
}
fn baz(&self) -> usize {
22
}
}
fn main() { }
```
Fixes#37864.
the .git directory is modified by `bootstrap` when it updates this git
submodule; this triggered rebuilds every time `bootstrap` was called.
likely fixes#38094
This causes us to dump a bunch of has information to stdout that can be
useful in tracking down incremental compilation invalidations,
particularly across crates.
[LLVM 4.0] Don't assume llvm::StringRef is null terminated
StringRefs have a length and their contents are not usually null-terminated. The solution is to either copy the string data (in `rustc_llvm::diagnostic`) or take the size into account (in LLVMRustPrintPasses).
I couldn't trigger a bug caused by this (apparently all the strings returned in practice are actually null-terminated) but this is more correct and more future-proof.
cc #37609
rustdoc: get back missing crate-name when --playground-url is used
follow up PR #37763
r? @alexcrichton (since you r+ed to #37763 )
----
Edit: When `#![doc(html_playground_url="")]` is used, the current crate name is saved to `PLAYGROUND`, so rustdoc may generate `extern crate NAME;` into code snips automatically. But when `--playground-url` was introduced in PR #37763, I forgot saving crate name to `PLAYGROUND`. This PR fix that.
----
Update:
- add test
- unstable `--playground-url`
Add small-copy optimization for copy_from_slice
## Summary
During benchmarking, I found that one of my programs spent between 5 and 10 percent of the time doing memmoves. Ultimately I tracked these down to single-byte slices being copied with a memcopy. Doing a manual copy if the slice contains only one element can speed things up significantly. For my program, this reduced the running time by 20%.
## Background
I am optimizing a program that relies heavily on reading a single byte at a time. To avoid IO overhead, I read all data into a vector once, and then I use a `Cursor` around that vector to read from. During profiling, I noticed that `__memmove_avx_unaligned_erms` was hot, taking up 7.3% of the running time. It turns out that these were caused by calls to `Cursor::read()`, which calls `<&[u8] as Read>::read()`, which calls `&[T]::copy_from_slice()`, which calls `ptr::copy_nonoverlapping()`. This one is implemented as a memcopy. Copying a single byte with a memcopy is very wasteful, because (at least on my platform) it involves calling `memcpy` in libc. This is an indirect call when libc is linked dynamically, and furthermore `memcpy` is optimized for copying large amounts of data at the cost of a bit of overhead for small copies.
## Benchmarks
Before I made this change, `perf` reported the following for my program. I only included the relevant functions, and how they rank. (This is on a different machine than where I ran the original benchmarks. It has an older CPU, so `__memmove_sse2_unaligned_erms` is called instead of `__memmove_avx_unaligned_erms`.)
```
#3 5.47% bench_decode libc-2.24.so [.] __memmove_sse2_unaligned_erms
#5 1.67% bench_decode libc-2.24.so [.] memcpy@GLIBC_2.2.5
#6 1.51% bench_decode bench_decode [.] memcpy@plt
```
`memcpy` is eating up 8.65% of the total running time, and the overhead of dispatching to a specialized fast copy function (`memcpy@GLIBC` showing up) is clearly visible. The price of dynamic linking (`memcpy@plt` showing up) is visible too.
After this change, this is what `perf` reports:
```
#5 0.33% bench_decode libc-2.24.so [.] __memmove_sse2_unaligned_erms
#14 0.01% bench_decode libc-2.24.so [.] memcpy@GLIBC_2.2.5
```
Now only 0.34% of the running time is spent on memcopies. The dynamic linking overhead is not significant at all any more.
To add some more data, my program generates timing results for the operation in its main loop. These are the timings before and after the change:
| Time before | Time after | After/Before |
|---------------|---------------|--------------|
| 29.8 ± 0.8 ns | 23.6 ± 0.5 ns | 0.79 ± 0.03 |
The time is basically the total running time divided by a constant; the actual numbers are not important. This change reduced the total running time by 21% (much more than the original 9% spent on memmoves, likely because the CPU is stalling a lot less because data dependencies are more transparent). Of course YMMV and for most programs this will not matter at all. But when it does, the gains can be significant!
## Alternatives
* At first I implemented this in `io::Cursor`. I moved it to `&[T]::copy_from_slice()` instead, but this might be too intrusive, especially because it applies to all `T`, not just `u8`. To restrict this to `io::Read`, `<&[u8] as Read>::read()` is probably the best place.
* I tried copying bytes in a loop up to 64 or 8 bytes before calling `Read::read`, but both resulted in about a 20% slowdown instead of speedup.
This adds a CommandExt trait for Windows along with an implementation of it
for std::process::Command with methods to set the process creation flags that
are passed to CreateProcess.
Make core::fmt::Void a non-empty type.
Adding back this change that was removed from PR #36449 because it's a fix and because I immediately hit a problem with it again when I started implementing my fix for #12609.