incr.comp.: Make DepNode `Copy` and valid across compilation sessions
This PR moves `DepNode` to a representation that does not need retracing and thus simplifies comparing dep-graphs from different compilation sessions. The code also gets a lot simpler in many places, since we don't need the generic parameter on `DepNode` anymore. See https://github.com/rust-lang/rust/issues/42294 for details.
~~NOTE: Only the last commit of this is new, the rest is already reviewed in https://github.com/rust-lang/rust/pull/42504.~~
This PR is almost done but there are some things I still want to do:
- [x] Add some module-level documentation to `dep_node.rs`, explaining especially what the `define_dep_nodes!()` macro is about.
- [x] Do another pass over the dep-graph loading logic. I suspect that we can get rid of building the `edges` map and also use arrays instead of hash maps in some places.
cc @rust-lang/compiler
r? @nikomatsakis
rustdoc: Use `create_dir_all` to create output directory
Currently rustdoc will fail if passed `-o foo/doc` if the `foo`
directory doesn't exist.
Also remove unneeded `mkdir` as `create_dir_all` can now handle
concurrent invocations since #39799.
Explicate what "Rc" and "Arc" stand for.
A person on the weekly "Easy Questions" Reddit thread [was mystified by what `Arc`/`Rc` means](https://www.reddit.com/r/rust/comments/6dyud9/hey_rustaceans_got_an_easy_question_ask_here/did87ds/). Though this is explained in various places, it's not mentioned in the documentation directly.
This PR adds an explanation of the `Rc`/`Arc` acronyms to their respective documentations. There are two things I'm not sure of:
* Does "Rc" mean "Reference Count**er**" or "Reference Count**ed**"? ~~I went with the former.~~ *Edit:* I've changed this to use the latter alternative.
* Should this information be spelled out elsewhere, such as in the docs for the `rc` module?
core: allow messages in unimplemented!() macro
This makes `unimplemented!()` match `unreachable!()`, allowing a message and possible formatting to be provided to better explain what and/or why something is not implemented.
I've used this myself in hyper for a while, include the type and method name, to better help while prototyping new modules, like `unimplemented!("Conn::poll_complete")`, or `unimplemented!("Conn::poll; state={:?}", state)`.
speed up mem::swap
I would have thought that the mem::swap code didn't need an intermediate variable precisely because the pointers are guaranteed never to alias. And.. it doesn't! It seems that llvm will also auto-vectorize this case for large structs, but alas it doesn't seem to have all the aliasing info it needs and so will add redundant checks (and even not bother with autovectorizing for small types). Looks like a lot of performance could still be gained here, so this might be a good test case for future optimizer improvements.
Here are the current benchmarks for the simd version of mem::swap; the timings are in cycles (code below) measured with 10 iterations. The timings for sizes > 32 which are not a multiple of 8 tend to be ever so slightly faster in the old code, but not always. For large struct sizes (> 1024) the new code shows a marked improvement.
\* = latest commit
† = subtracted from other measurements
| arr_length | noop<sup>†</sup> | rust_stdlib | simd_u64x4\* | simd_u64x8
|------------------|------------|-------------------|-------------------|-------------------
8|80|90|90|90
16|72|177|177|177
24|32|76|76|76
32|68|188|112|188
40|32|80|60|80
48|32|84|56|84
56|32|108|72|108
64|32|108|72|76
72|80|350|220|230
80|80|350|220|230
88|80|420|270|270
96|80|420|270|270
104|80|500|320|320
112|80|490|320|320
120|72|528|342|342
128|48|360|234|234
136|72|987|387|387
144|80|1070|420|420
152|64|856|376|376
160|68|804|400|400
168|80|1060|520|520
176|80|1070|520|520
184|32|464|228|228
192|32|504|228|228
200|32|440|248|248
208|72|987|573|573
216|80|1464|220|220
224|48|852|450|450
232|72|1182|666|666
240|32|428|288|288
248|32|428|308|308
256|80|860|770|770
264|80|1130|820|820
272|80|1340|820|820
280|80|1220|870|870
288|72|1227|804|804
296|72|1356|849|849
Only emit one error for `use foo::self;`
Currently `use foo::self;` would emit both E0429 and E0432. This commit silence the latter one (assuming `foo` is a valid module).
Fixes#42559
Disentangle InferCtxt, MemCategorizationContext and ExprUseVisitor.
At some point in the past, `InferCtxt` started being used to replace an old "`Typer`" abstraction, which provided access to `TypeckTables` and had optionally type inference to account for.
That didn't play so nicely with the `'gcx`/`'tcx` split and I had to introduce `borrowck_fake_infer_ctxt`.
The situation wasn't great but it wasn't too painful inside `rustc` itself.
Recently I've found that method being used in clippy, which does need EUV (before we make it plausible to run lints on HAIR or MIR), and set out to separate inference from tables, for the sake of lint authors.
Also fixes#42435 to make it trivial to compute type layout or use EUV from lints.
The remaining uses of `TypeckTables` in `InferCtxt` are for closure kinds and signatures, used in trait selection and projection normalization. The solution there is likely to add them as bounds to `ParamEnv`.
r? @nikomatsakis
cc @mcarton @llogiq @Manishearth
Get LLVM to stop generating dead assembly in next_power_of_two
It turns out that LLVM can turn `@llvm.ctlz.i64(_, true)` into `@llvm.ctlz.i64(_, false)` ([`ctlz`](http://llvm.org/docs/LangRef.html#llvm-ctlz-intrinsic)) where valuable, but never does the opposite. That leads to some silly assembly getting generated in certain cases.
A contrived-but-clear example https://is.gd/VAIKuC:
```rust
fn foo(x:u64) -> u32 {
if x == 0 { return !0; }
x.leading_zeros()
}
```
Generates
```asm
testq %rdi, %rdi
je .LBB0_1
je .LBB0_3 ; <-- wha?
bsrq %rdi, %rax
xorq $63, %rax
retq
.LBB0_1:
movl $-1, %eax
retq
.LBB0_3:
movl $64, %eax ; <-- dead
retq
```
I noticed this in `next_power_of_two`, which without this PR generates the following:
```asm
cmpq $2, %rcx
jae .LBB1_2
movl $1, %eax
retq
.LBB1_2:
decq %rcx
je .LBB1_3
bsrq %rcx, %rcx
xorq $63, %rcx
jmp .LBB1_5
.LBB1_3:
movl $64, %ecx ; <-- dead
.LBB1_5:
movq $-1, %rax
shrq %cl, %rax
incq %rax
retq
```
And with this PR becomes
```asm
cmpq $2, %rcx
jae .LBB0_2
movl $1, %eax
retq
.LBB0_2:
decq %rcx
bsrq %rcx, %rcx
xorl $63, %ecx
movq $-1, %rax
shrq %cl, %rax
incq %rax
retq
```
Speed up expansion
This reduces duplication, thereby increasing expansion speed. Based on tests with rust-uinput, this produces a 29x performance win (440 seconds to 15 seconds). I want to land this first, since it's a minimal patch, but with more changes to the macro parsing I can get down to 12 seconds locally.
There is one FIXME added to the code that I'll keep for now since changing it will spread outward and increase the patch size, I think.
Fixes#37074.
r? @jseyfried
cc @oberien
Ignore variadic FFI test on AArch64
I've cross compiled Rust to `aarch64-linux-gnu`, and tried to run the compile-fail tests, but `variadic-ffi.rs` fails with the following error:
```
The ABI `"stdcall"` is not supported for the current target [E0570]
```
The test seems to be ignored on (32-bit) ARM, so I turned it off for AArch64 too.
Vec<T> is pronounced 'vec'
I've never heard it pronounced "vector". Is this an outdated recommendation?
(or have I been doing it wrong all this time)
r? @steveklabnik