Make [u8]::reverse() 5x faster
Since LLVM doesn't vectorize the loop for us, do unaligned reads of a larger type and use LLVM's bswap intrinsic to do the reversing of the actual bytes. cfg!-restricted to x86 and x86_64, as I assume it wouldn't help on things like ARMv5.
Also makes [u16]::reverse() a more modest 1.5x faster by loading/storing u32 and swapping the u16s with ROT16.
Thank you ptr::*_unaligned for making this easy :)
Benchmark results (from my i5-2500K):
```text
# Before
test slice::reverse_u8 ... bench: 273,836 ns/iter (+/- 15,592) = 3829 MB/s
test slice::reverse_u16 ... bench: 139,793 ns/iter (+/- 17,748) = 7500 MB/s
test slice::reverse_u32 ... bench: 74,997 ns/iter (+/- 5,130) = 13981 MB/s
test slice::reverse_u64 ... bench: 47,452 ns/iter (+/- 2,213) = 22097 MB/s
# After
test slice::reverse_u8 ... bench: 52,170 ns/iter (+/- 3,962) = 20099 MB/s
test slice::reverse_u16 ... bench: 93,330 ns/iter (+/- 4,412) = 11235 MB/s
test slice::reverse_u32 ... bench: 74,731 ns/iter (+/- 1,425) = 14031 MB/s
test slice::reverse_u64 ... bench: 47,556 ns/iter (+/- 3,025) = 22049 MB/s
```
If you're curious about the assembly, instead of doing this
```
movzx eax, byte ptr [rdi]
movzx ecx, byte ptr [rsi]
mov byte ptr [rdi], cl
mov byte ptr [rsi], al
```
it does this
```
mov rax, qword ptr [rdx]
mov rbx, qword ptr [r11 + rcx - 8]
bswap rbx
mov qword ptr [rdx], rbx
bswap rax
mov qword ptr [r11 + rcx - 8], rax
```
impl Clone for .split_whitespace()
Use custom closure structs for the predicates so that the iterator's
clone can simply be derived. This should also reduce virtual call
overhead by not using function pointers.
Fixes#41655
Bump cargo for rust-lang/cargo#4000
rust-lang/cargo#4000 recently landed, which fixes warnings about using `-Z` when `CARGO_INCREMENTAL` is set while running stable/beta builds. As #41751 has now landed, these warnings will turn to errors in the next release, so getting the cargo fix in place is necessary unless we want people confused about why they can no longer compile anything on stable/beta.
[Doc] improve `thread::Thread` and `thread::Builder` documentations
Part of #29378
- Adds information about the stack_size when using `Builder`. This might be considered too low level, but I assume that if someone wants to create their own builder instead of using `thread::spawn` they may be interested in that info.
- Updates the `thread::Thread` structure doc, mostly by explaining how to get one, the previous example was removed because it was not related to `thread::Thread`, but rather to `thread::Builder::name`.
Not much is present there, mostly because this API is not often used (the only method that seems useful is `unpark`, which is documented in #41809).
incr.comp.: Hash more pieces of crate metadata to detect changes there.
This PR adds incr. comp. hashes for non-`Entry` pieces of data in crate metadata.
The first part of it I like: `EntryBuilder` is refactored into the more generally applicable `IsolatedEncoder` which provides means of encoding something into metadata while also feeding the encoded data into an incr. comp. hash. We already did this for `Entry`, now we are doing it for various other pieces of data too, like the set of exported symbols and so on. The hashes generated there are persisted together with the per-`Entry` hashes and are also used for dep-graph dirtying the same way.
The second part of the PR I'm not entirely happy with: In order to make sure that we don't forget registering a read to the new `DepNodes` introduced here, I added the `Tracked<T>` struct. This struct wraps a value and requires a `DepNode` when accessing the wrapped value. This makes it harder to overlook adding read edges in the right places and works just fine.
However, crate metadata is already used in places where there is no `tcx` yet or even in places where no `cnum` has been assigned -- this makes it harder to apply this feature consistently or implement it ergonomically. The result is not too bad but there's a bit more code churn and a bit more opportunity to get something wrong than I would have liked. On the other hand, wrapping things in `Tracked<T>` already has revealed some bugs, so there's definitely some value in it.
This is still a work in progress:
- [x] I need to write some test cases.
- [x] Accessing the CodeMap should really be dependency tracked too, especially with the new path-remapping feature.
cc @nikomatsakis
dump-mir was causing cycles by invoking item-path-str at bad times
Workaround for now, but probably a better fix is to opt **in** to using the types for impls (if we do that at all; maybe filename/line is better).
Fixes#41697
Fixed argument inference for closures when coercing into 'fn'
This fixes https://github.com/rust-lang/rust/issues/41755. The tests `compile-fail/closure-no-fn.rs` and `compile-fail/issue-40000.rs` were modified. A new test `run-pass/closure_to_fn_coercion-expected-types.rs` was added
r? @nikomatsakis
@bors: r+ 38fe8d2 rollup
1) changed "long way into" to "long way toward"
2) changed "developer lives" to "developers' lives"
3) removed the "either... or..." format from second paragraph because there are more than 2 options
4) Minor revisions to paragraphs 3-6 to make them more consistent in format and to fix minor grammar issues.
Implement the illegal_floating_point_literal_pattern compat lint
Adds a future-compatibility lint for the [breaking-change] introduced by issue #41620 . cc issue #41255 .
rustc: treat const bodies like fn bodies in middle::region.
Allows `T::ASSOC_CONST` to be used without a `T: 'static` bound.
cc @rust-lang/compiler @rust-lang/lang
1) changed "long way into" to "long way toward"
2) changed "developer lives" to "developers' lives"
3) removed the "either... or..." format from second paragraph because there are more than 2 options
4) Minor revisions to paragraphs 3-6 to make them more consistent in format and to fix minor grammar issues.
Remove need for &format!(...) or &&"" dances in `span_label` calls
These were always a thorn in my eye. Note that this will monomorphize to two impls, one for `String` and one for `&str`. But I think that cost is worth the ergonomics at the call sites that can be seen throughout this PR.
Add support for Hexagon v60 HVX intrinsics
HVX is a SIMD coprocessor available on newer hexagon cores. It can be configured for 512 or 1024 bit registers, and some instructions use pairs of registers. It only does integer operations, but it probably has every integer operation you'd want for 8/16/32 bit elements.
There are a lot of intrinsics. The generator outputs 582 of them. I probably got some wrong. I did some scripting to make sure that every llvm intrinsic name exists, but intrinsic names provided for programs have only been compared by eye to Qualcomm's own names. 64/128 is also appended to the names to select between 512/1024 bit. The C intrinsics don't do this, but they only expose one set, selected at compile time.
The json specifying the intrinsics required a bit of duplication since I didn't see an easy way to specify combinations of signed/unsigned types (eg. u(8-16) and s(16-32)). I also didn't see an easy way to specify variants of instructions like saturating or rounding.
Basic multiplication and load/store tested on the hexagon simulator.
[DOC] Improve `thread::panicking` documentaion.
Part of #29378
Takes care of: `panicking` could use some more advice on when to use this.
I mays have done a poor choice of introducing `Mutex`s.
r? @steveklabnik
Point at fields that make the type recursive
On recursive types of infinite size, point at all the fields that make
the type recursive.
```rust
struct Foo {
bar: Bar,
}
struct Bar {
foo: Foo,
}
```
outputs
```
error[E0072]: recursive type `Foo` has infinite size
--> file.rs:1:1
1 | struct Foo {
| ^^^^^^^^^^ recursive type has infinite size
2 | bar: Bar,
| -------- recursive here
|
= help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `Foo` representable
error[E0072]: recursive type `Bar` has infinite size
--> file.rs:5:1
|
5 | struct Bar {
| ^^^^^^^^^^ recursive type has infinite size
6 | foo: Foo,
| -------- recursive here
|
= help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `Bar` representable
```
Allow # to appear in rustdoc code output.
"##" at the start of a trimmed rustdoc line is now cut to "#" and then
shown. If the user wanted to show "##", they can type "###".
I'm somewhat concerned about the potential implications for users, since this does make a potentially backwards-incompatible change. Previously, `##` had no special handling, and now we do change it. However, I'm not really sure what we can do here to improve this, and I can't think of any cases where `##` would likely be correct in a code block, though of course I could be wrong.
Fixes#41783.