This branch improves the performance of Ord and PartialOrd methods for slices compared to the iter-based implementation.
Based on the approach used in #26884.
In order to get rid of all range checks, the compiler needs to
explicitly see that the slices it iterates over are as long as the
loop variable upper bound.
This further improves the performance of slice comparison:
```
test u8_cmp ... bench: 4,761 ns/iter (+/- 1,203)
test u8_lt ... bench: 4,579 ns/iter (+/- 649)
test u8_partial_cmp ... bench: 4,768 ns/iter (+/- 761)
test u16_cmp ... bench: 4,607 ns/iter (+/- 580)
test u16_lt ... bench: 4,681 ns/iter (+/- 567)
test u16_partial_cmp ... bench: 4,607 ns/iter (+/- 967)
test u32_cmp ... bench: 4,448 ns/iter (+/- 891)
test u32_lt ... bench: 4,546 ns/iter (+/- 992)
test u32_partial_cmp ... bench: 4,415 ns/iter (+/- 646)
test u64_cmp ... bench: 4,380 ns/iter (+/- 1,184)
test u64_lt ... bench: 5,684 ns/iter (+/- 602)
test u64_partial_cmp ... bench: 4,663 ns/iter (+/- 1,158)
```
Reusing the same idea as in #26884, we can exploit the fact that the
length of slices is known, hence we can use a counted loop instead of
iterators, which means that we only need a single counter, instead of
having to increment and check one pointer for each iterator.
Using the generic implementation of the boolean comparison operators
(`lt`, `le`, `gt`, `ge`) provides further speedup for simple
types. This happens because the loop scans elements checking for
equality and dispatches to element comparison or length comparison
depending on the result of the prefix comparison.
```
test u8_cmp ... bench: 14,043 ns/iter (+/- 1,732)
test u8_lt ... bench: 16,156 ns/iter (+/- 1,864)
test u8_partial_cmp ... bench: 16,250 ns/iter (+/- 2,608)
test u16_cmp ... bench: 15,764 ns/iter (+/- 1,420)
test u16_lt ... bench: 19,833 ns/iter (+/- 2,826)
test u16_partial_cmp ... bench: 19,811 ns/iter (+/- 2,240)
test u32_cmp ... bench: 15,792 ns/iter (+/- 3,409)
test u32_lt ... bench: 18,577 ns/iter (+/- 2,075)
test u32_partial_cmp ... bench: 18,603 ns/iter (+/- 5,666)
test u64_cmp ... bench: 16,337 ns/iter (+/- 2,511)
test u64_lt ... bench: 18,074 ns/iter (+/- 7,914)
test u64_partial_cmp ... bench: 17,909 ns/iter (+/- 1,105)
```
```
test u8_cmp ... bench: 6,511 ns/iter (+/- 982)
test u8_lt ... bench: 6,671 ns/iter (+/- 919)
test u8_partial_cmp ... bench: 7,118 ns/iter (+/- 1,623)
test u16_cmp ... bench: 6,689 ns/iter (+/- 921)
test u16_lt ... bench: 6,712 ns/iter (+/- 947)
test u16_partial_cmp ... bench: 6,725 ns/iter (+/- 780)
test u32_cmp ... bench: 7,704 ns/iter (+/- 1,294)
test u32_lt ... bench: 7,611 ns/iter (+/- 3,062)
test u32_partial_cmp ... bench: 7,640 ns/iter (+/- 1,149)
test u64_cmp ... bench: 7,517 ns/iter (+/- 2,164)
test u64_lt ... bench: 7,579 ns/iter (+/- 1,048)
test u64_partial_cmp ... bench: 7,629 ns/iter (+/- 1,195)
```
Knowing the result of equality comparison can enable additional
optimizations in LLVM.
Additionally, this makes it obvious that `partial_cmp` on totally
ordered types cannot return `None`.
new error style:
```
path.rs:4:6: 4:7 error: the trait `core::marker::Sized` is not implemented for the type `[u8]` [E0277]
path.rs:4 fn f(p: Path) {}
^
path.rs:4:6: 4:7 help: run `rustc --explain E0277` to see a detailed explanation
path.rs:4:6: 4:7 note: `[u8]` does not have a constant size known at compile-time
path.rs:4:6: 4:7 note: required because it appears within the type `std::sys::os_str::Slice`
path.rs:4:6: 4:7 note: required because it appears within the type `std::ffi::os_str::OsStr`
path.rs:4:6: 4:7 note: required because it appears within the type `std::path::Path`
path.rs:4:6: 4:7 note: all local variables must have a statically known size
path.rs:7:5: 7:36 error: the trait `core::marker::Send` is not implemented for the type `alloc::rc::Rc<()>` [E0277]
path.rs:7 foo::<BTreeMap<Rc<()>, Rc<()>>>();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
path.rs:7:5: 7:36 help: run `rustc --explain E0277` to see a detailed explanation
path.rs:7:5: 7:36 note: `alloc::rc::Rc<()>` cannot be sent between threads safely
path.rs:7:5: 7:36 note: required because it appears within the type `collections::btree::node::Node<alloc::rc::Rc<()>, alloc::rc::Rc<()>>`
path.rs:7:5: 7:36 note: required because it appears within the type `collections::btree::map::BTreeMap<alloc::rc::Rc<()>, alloc::rc::Rc<()>>`
path.rs:7:5: 7:36 note: required by `foo`
error: aborting due to 2 previous errors
```
Fixes#21793Fixes#23286
r? @nikomatsakis
The second commit in this PR will stop printing the macro definition site in backtraces, which cuts their length in half and increases readability (the definition site was only correct for local macros).
The third commit will not print an invocation if the last one printed occurred at the same place (span). This will make backtraces caused by a self-recursive macro much shorter.
(A possible alternative would be to capture the backtrace first, then limit it to a few frames at the start and end of the chain and print `...` inbetween. This would also work with multiple macros calling each other, which is not addressed by this PR - although the backtrace will still be halved)
Example:
```rust
macro_rules! m {
( 0 $($t:tt)* ) => ( m!($($t)*); );
() => ( fn main() {0} );
}
m!(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0);
```
On a semi-recent nightly, this yields:
```
test.rs:3:21: 3:22 error: mismatched types:
expected `()`,
found `_`
(expected (),
found integral variable) [E0308]
test.rs:3 () => ( fn main() {0} );
^
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:2:23: 2:34 note: expansion site
test.rs:1:1: 4:2 note: in expansion of m!
test.rs:6:1: 6:35 note: expansion site
test.rs:3:21: 3:22 help: run `rustc --explain E0308` to see a detailed explanation
error: aborting due to previous error
```
After this patch:
```
test.rs:3:21: 3:22 error: mismatched types:
expected `()`,
found `_`
(expected (),
found integral variable) [E0308]
test.rs:3 () => ( fn main() {0} );
^
test.rs:2:23: 2:34 note: in this expansion of m!
test.rs:6:1: 6:35 note: in this expansion of m!
test.rs:3:21: 3:22 help: run `rustc --explain E0308` to see a detailed explanation
error: aborting due to previous error
```
This patch transforms functions of the form
```
fn f<Generic: AsRef<Concrete>>(arg: Generic) {
let arg: &Concrete = arg.as_ref();
// Code using arg
}
```
to the next form:
```
#[inline]
fn f<Generic: AsRef<Concrete>>(arg: Generic) {
fn f_inner(arg: &Concrete) {
// Code using arg
}
f_inner(arg.as_ref());
}
```
Therefore, most of the code is concrete and not duplicated during monomorphisation (unless inlined)
and only the tiny bit of conversion code is duplicated. This method was mentioned by @aturon in the
Conversion Traits RFC (https://github.com/rust-lang/rfcs/blame/master/text/0529-conversion-traits.md#L249) and similar techniques are not uncommon in C++ template libraries.
This patch goes to the extremes and applies the transformation even to smaller functions<sup>1</sup>
for purity of the experiment. *Some of them can be rolled back* if considered too ridiculous.
<sup>1</sup> However who knows how small are these functions are after inlining and everything.
The functions in question are mostly `fs`/`os` functions and not used especially often with variety
of argument types, so the code size reduction is rather small (but consistent). Here are the sizes
of stage2 artifacts before and after the patch:
https://gist.github.com/petrochenkov/e76a6b280f382da13c5dhttps://gist.github.com/petrochenkov/6cc28727d5256dbdfed0
Note:
All the `inner` functions are concrete and unavailable for cross-crate inlining, some of them may
need `#[inline]` annotations in the future.
r? @aturon
Overflows in integer pow() computations would be missed if they
preceded a 0 bit of the exponent being processed. This made
calls such as 2i32.pow(1024) not trigger an overflow.
Fixes#28012