specialize some collection and iterator operations to run in-place
This is a rebase and update of #66383 which was closed due inactivity.
Recent rustc changes made the compile time regressions disappear, at least for webrender-wrench. Running a stage2 compile and the rustc-perf suite takes hours on the hardware I have at the moment, so I can't do much more than that.
![Screenshot_2020-04-05 rustc performance data](https://user-images.githubusercontent.com/1065730/78462657-5d60f100-76d4-11ea-8a0b-4f3962707c38.png)
In the best case of the `vec::bench_in_place_recycle` synthetic microbenchmark these optimizations can provide a 15x speedup over the regular implementation which allocates a new vec for every benchmark iteration. [Benchmark results](https://gist.github.com/the8472/6d999b2d08a2bedf3b93f12112f96e2f). In real code the speedups are tiny, but it also depends on the allocator used, a system allocator that uses a process-wide mutex will benefit more than one with thread-local pools.
## What was changed
* `SpecExtend` which covered `from_iter` and `extend` specializations was split into separate traits
* `extend` and `from_iter` now reuse the `append_elements` if passed iterators are from slices.
* A preexisting `vec.into_iter().collect::<Vec<_>>()` optimization that passed through the original vec has been generalized further to also cover cases where the original has been partially drained.
* A chain of *Vec<T> / BinaryHeap<T> / Box<[T]>* `IntoIter`s through various iterator adapters collected into *Vec<U>* and *BinaryHeap<U>* will be performed in place as long as `T` and `U` have the same alignment and size and aren't ZSTs.
* To enable above specialization the unsafe, unstable `SourceIter` and `InPlaceIterable` traits have been added. The first allows reaching through the iterator pipeline to grab a pointer to the source memory. The latter is a marker that promises that the read pointer will advance as fast or faster than the write pointer and thus in-place operation is possible in the first place.
* `vec::IntoIter` implements `TrustedRandomAccess` for `T: Copy` to allow in-place collection when there is a `Zip` adapter in the iterator. TRA had to be made an unstable public trait to support this.
## In-place collectible adapters
* `Map`
* `MapWhile`
* `Filter`
* `FilterMap`
* `Fuse`
* `Skip`
* `SkipWhile`
* `Take`
* `TakeWhile`
* `Enumerate`
* `Zip` (left hand side only, `Copy` types only)
* `Peek`
* `Scan`
* `Inspect`
## Concerns
`vec.into_iter().filter(|_| false).collect()` will no longer return a vec with 0 capacity, instead it will return its original allocation. This avoids the cost of doing any allocation or deallocation but could lead to large allocations living longer than expected.
If that's not acceptable some resizing policy at the end of the attempted in-place collect would be necessary, which in the worst case could result in one more memcopy than the non-specialized case.
## Possible followup work
* split liballoc/vec.rs to remove `ignore-tidy-filelength`
* try to get trivial chains such as `vec.into_iter().skip(1).collect::<Vec<)>>()` to compile to a `memmove` (currently compiles to a pile of SIMD, see #69187 )
* improve up the traits so they can be reused by other crates, e.g. itertools. I think currently they're only good enough for internal use
* allow iterators sourced from a `HashSet` to be in-place collected into a `Vec`
rustdoc: do not use plain summary for trait impls
Fixes#38386.
Fixes#48332.
Fixes#49430.
Fixes#62741.
Fixes#73474.
Unfortunately this is not quite ready to go because the newly-working links trigger a bunch of linkcheck failures. The failures are tough to fix because the links are resolved relative to the implementor, which could be anywhere in the module hierarchy.
(In the current docs, these links end up rendering as uninterpreted markdown syntax, so I don't think these failures are any worse than the status quo. It might be acceptable to just add them to the linkchecker whitelist.)
Ideally this could be fixed with intra-doc links ~~but it isn't working for me: I am currently investigating if it's possible to solve it this way.~~ Opened #73829.
EDIT: This is now ready!
Convert many files to intra-doc links
Helps with https://github.com/rust-lang/rust/issues/75080
r? @poliorcetics
I recommend reviewing one commit at a time, but the diff is small enough you can do it all at once if you like :)
Move to intra-doc links for library/core/src/iter/traits/iterator.rs
Helps with #75080.
@jyn514 We're almost finished with this issue. Thanks for mentoring. If you have other topics to work on just let me know, I will be around in Discord.
@rustbot modify labels: T-doc, A-intra-doc-links
Known issues:
* Link from `core` to `std` (#74481):
[`OsStr`]
[`String`]
[`VecDeque<T>`]
Rename and expose LoopState as ControlFlow
Basic PR for #75744. Addresses everything there except for documentation; lots of examples are probably a good idea.
Add `[T; N]::as_[mut_]slice`
Part of me trying to populate arrays with a couple of basic useful methods, like slices already have. The ability to add methods to arrays were added in #75212. Tracking issue: #76118
This adds:
```rust
impl<T, const N: usize> [T; N] {
pub fn as_slice(&self) -> &[T];
pub fn as_mut_slice(&mut self) -> &mut [T];
}
```
These methods are like the ones on `std::array::FixedSizeArray` and in the crate `arraytools`.
- Use intra-doc links for `std::io` in `std::fs`
- Use intra-doc links for File::read in unix/ext/fs.rs
- Remove explicit intra-doc links for `true` in `net/addr.rs`
- Use intra-doc links in alloc/src/sync.rs
- Use intra-doc links in src/ascii.rs
- Switch to intra-doc links in alloc/rc.rs
- Use intra-doc links in core/pin.rs
- Use intra-doc links in std/prelude
- Use shorter links in `std/fs.rs`
`io` is already in scope.
flt2dec: properly handle uninitialized memory
The float-to-str code currently uses uninitialized memory incorrectly (see https://github.com/rust-lang/rust/issues/76092). This PR fixes that.
Specifically, that code used `&mut [T]` as "out references", but it would be incorrect for the caller to actually pass uninitialized memory. So the PR changes this to `&mut [MaybeUninit<T>]`, and then functions return a `&[T]` to the part of the buffer that they initialized (some functions already did that, indirectly via `&Formatted`, others were adjusted to return that buffer instead of just the initialized length).
What I particularly like about this is that it moves `unsafe` to the right place: previously, the outermost caller had to use `unsafe` to assert that things are initialized; now it is the functions that do the actual initializing which have the corresponding `unsafe` block when they call `MaybeUninit::slice_get_ref` (renamed in https://github.com/rust-lang/rust/pull/76217 to `slice_assume_init_ref`).
Reviewers please be aware that I have no idea how any of this code actually works. My changes were purely mechanical and type-driven. The test suite passes so I guess I didn't screw up badly...
Cc @sfackler this is somewhat related to your RFC, and possibly some of this code could benefit from (a generalized version of) the API you describe there. But for now I think what I did is "good enough".
Fixes https://github.com/rust-lang/rust/issues/76092.
Move to intra-doc links for library/core/src/panic.rs
Helps with #75080.
@rustbot modify labels: T-doc, A-intra-doc-links, T-rustdoc
Known issues:
* Link from `core` to `std` (#74481):
[`set_hook`]
[`String`]
Add more examples to lexicographic cmp on Iterators.
Given two arrays of T1 and T2, the most important rule of lexicographical comparison is that two arrays
of equal length will be compared until the first difference occured.
The examples provided only focuses on the second rule that says that the
shorter array will be filled with some T2 that is less than every T1.
Which is only possible because of the first rule.
Constify the following methods of `core::cmp::Ordering`:
- `reverse`
- `then`
Stabilizes these methods as const under the `const_ordering` feature.
Also adds a test for these methods in a const context.
Possible because of #49146 (Allow `if` and `match` in constants).
rename get_{ref, mut} to assume_init_{ref,mut} in Maybeuninit
References #63568
Rework with comments addressed from #66174
Have replaced most of the occurrences I've found, hopefully didn't miss out anything
r? @RalfJung
(thanks @danielhenrymantilla for the initial work on this)
Get rid of bounds check in slice::chunks_exact() and related function…
…s during construction
LLVM can't figure out in
let rem = self.len() % chunk_size;
let len = self.len() - rem;
let (fst, snd) = self.split_at(len);
and
let rem = self.len() % chunk_size;
let (fst, snd) = self.split_at(rem);
that the index passed to split_at() is smaller than the slice length and
adds a bounds check plus panic for it.
Apart from removing the overhead of the bounds check this also allows
LLVM to optimize code around the ChunksExact iterator better.
Use intra-doc links for `core/src/slice.mod.rs`
partial help in #75080
r? @jyn514
- most are using primitive types links, which cannot be used with intra links at the moment
- also `std` cannot be referenced in any link, `std::ptr::NonNull` and `std::slice` could not be referenced
Make some Ordering methods const
Constify the following methods of `core::cmp::Ordering`:
- `reverse`
- `then`
Possible because of #49146 (Allow `if` and `match` in constants).
Tracking issue: #76113
Stabilize the following methods of `Result` as const:
- `is_ok`
- `is_err`
- `as_ref`
Possible because of stabilization of #49146 (Allow if and match in constants).
LLVM can't figure out in
let rem = self.len() % chunk_size;
let len = self.len() - rem;
let (fst, snd) = self.split_at(len);
and
let rem = self.len() % chunk_size;
let (fst, snd) = self.split_at(rem);
that the index passed to split_at() is smaller than the slice length and
adds a bounds check plus panic for it.
Apart from removing the overhead of the bounds check this also allows
LLVM to optimize code around the ChunksExact iterator better.
These are unsafe variants of the non-unchecked functions and don't do
any bounds checking.
For the time being these are not public and only a preparation for the
following commit. Making it public and stabilization can follow later
and be discussed in https://github.com/rust-lang/rust/issues/76014 .
`alloc::slice` uses `core::slice` functions, documentation are copied
from there and the links as well without resolution. `crate::ptr...`
cannot be resolved in `alloc::slice`, but `ptr` itself is imported in
both `alloc::slice` and `core::slice`, so we used that instead.
Move to intra-doc links for library/core/src/sync/atomic.rs
Helps with #75080.
@rustbot modify labels: T-doc, A-intra-doc-links, T-rustdoc
Known issues:
* Link from core to std:
[`Arc`]
[`std:🧵:yield_now`]
[`std:🧵:sleep`]
[`std::sync::Mutex`]
The most important rule of lexicographical comparison is that two arrays
of equal length will be compared until the first difference occured.
The examples provided only focuses on the second rule that says that the
shorter array will be filled with some T2 that is less than every T.
Which is only possible because of the first rule.
Fix potential UB in align_offset doc examples
Currently it takes a pointer only to the first element in the array, this changes the code to take a pointer to the whole array.
miri can't catch this right now because it later calls `x.len()` which re-tags the pointer for the whole array.
https://github.com/rust-lang/miri/issues/1526#issuecomment-680897144
Unconfuse Unpin docs a bit
* Don't say that Unpin is used to prevent moves, because it is used
to *allow* moves
* Be more precise about kindedness of things, it is
`Pin<Pointer<Data>>`, rather than just `Pin<Pointer>`.
I would like to propose these two simple methods for stabilization:
- Knowing that a range is exhaused isn't otherwise trivial
- Clippy would like to suggest them, but had to do extra work to disable that path <https://github.com/rust-lang/rust-clippy/issues/3807> because they're unstable
- These work on `PartialOrd`, consistently with now-stable `contains`, and are thus more general than iterator-based approaches that need `Step`
- They've been unchanged for some time, and have picked up uses in the compiler
- Stabilizing them doesn't block any future iterator-based is_empty plans, as the inherent ones are preferred in name resolution
* Don't say that Unpin is used to prevent moves, because it is used
to *allow* moves
* Be more precise about kindedness of things, it is
`Pin<Pointer<Data>>`, rather than just `Pin<Pointer>`.
Report an ambiguity if both modules and primitives are in scope for intra-doc links
Closes https://github.com/rust-lang/rust/issues/75381
- Add a new `prim@` disambiguator, since both modules and primitives are in the same namespace
- Refactor `report_ambiguity` into a closure
Additionally, I noticed that rustdoc would previously allow `[struct@char]` if `char` resolved to a primitive (not if it had a DefId). I fixed that and added a test case.
I also need to update libstd to use `prim@char` instead of `type@char`. If possible I would also like to refactor `ambiguity_error` to use `Disambiguator` instead of its own hand-rolled match - that ran into issues with `prim@` (I updated one and not the other) and it would be better for them to be in sync.
Use allow(unused_imports) instead of cfg(doc) for imports used only for intra-doc links
This prevents links from breaking when items are re-exported in a
different crate and the original isn't being documented.
Spotted in https://github.com/rust-lang/rust/pull/75832#discussion_r475275837 (thanks ollie!)
r? @ollie27
Fix typo in documentation of i32 wrapping_abs()
Hi!
I was reading through the std library docs and noticed that this section flowed a bit oddly; comparing it against https://doc.rust-lang.org/std/primitive.i32.html#method.wrapping_div and https://doc.rust-lang.org/std/primitive.i32.html#method.wrapping_neg , I noticed that those two pieces of documentation used a semicolon here.
This is my first time submitting a PR to this repo. Am I doing this right? Are tiny typo-fix PRs like this worth submitting, or are they not a good use of time?
Thank you!
stabilize ptr_offset_from
This stabilizes ptr::offset_from, and closes https://github.com/rust-lang/rust/issues/41079. It also removes the deprecated `wrapping_offset_from`. This function was deprecated 19 days ago and was never stable; given an FCP of 10 days and some waiting time until FCP starts, that leaves at least a month between deprecation and removal which I think is fine for a nightly-only API.
Regarding the open questions in https://github.com/rust-lang/rust/issues/41079:
* Should offset_from abort instead of panic on ZSTs? -- As far as I know, there is no precedent for such aborts. We could, however, declare this UB. Given that the size is always known statically and the check thus rather cheap, UB seems excessive.
* Should there be more methods like this with different restrictions (to allow nuw/nsw, perhaps) or that return usize (like how isize-taking offset is more conveniently done with usize-taking add these days)? -- No reason to block stabilization on that, we can always add such methods later.
Also nominating the lang team because this exposes an intrinsic.
The stabilized method is best described [by its doc-comment](56d4b2d69a/src/libcore/ptr/const_ptr.rs (L227)). The documentation forgot to mention the requirement that both pointers must "have the same provenance", aka "be derived from pointers to the same allocation", which I am adding in this PR. This is a precondition that [Miri already implements](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=a3b9d0a07a01321f5202cd99e9613480) and that, should LLVM ever obtain a `psub` operation to subtract pointers, will likely be required for that operation (following the semantics in [this paper](https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf)).
Move to intra-doc links for library/core/src/alloc/{layout, global, mod}.rs
Helps with #75080. The files already contained intra-doc links, so there are only minor changes.
@rustbot modify labels: T-doc, A-intra-doc-links, T-rustdoc
Known issues:
* [`handle_alloc_error`]: Link from `core` to `alloc` could not be resolved.
* [`slice`]: slice is a primitive type, but could not be resolved; had to use [`crate::slice`] instead.
Move to intra-doc links for /library/core/src/intrinsics.rs
Helps with #75080.
@rustbot modify labels: T-doc, A-intra-doc-links, T-rustdoc
Known issues:
* The following f32 and f64 primitive methods cannot be resolved:
f32/f64::powi
f32/f64::sqrt
f32/f64::sin
f32/f64::cos
f32/f64::powf
f32/f64::exp
f32/f64::exp2
f32/f64::ln
f32/f64::log2
f32/f64::log10
f32/f64::mul_add
f32/f64::abs
f32/f64::copysign
f32/f64::floor
f32/f64::ceil
f32/f64::trunc
f32/f64::round
* Links from core to std:
[`std::pointer::*`]
[`std::process::abort`]
[`from_raw_parts`]
[`Vec::append`]
* Links with anchors?
I provided a separate commit that replaced links with anchors by intra-doc links.
Here the anchor location information gets lost, so its questionable whether to
actually replace those links.
enable align_to tests in Miri
With https://github.com/rust-lang/miri/issues/1074 resolved, we can enable these tests in Miri.
I also tweaked the test sized to get reasonable execution times with decent test coverage.
Use min_specialization in libcore
Getting `TrustedRandomAccess` to work is the main interesting thing here.
- `get_unchecked` is now an unstable, hidden method on `Iterator`
- The contract for `TrustedRandomAccess` is made clearer in documentation
- Fixed a bug where `Debug` would create aliasing references when using the specialized zip impl
- Added tests for the side effects of `next_back` and `nth`.
closes#68536
Improve codegen for `align_offset`
In this PR the `align_offset` implementation is changed/improved to produce better code in certain scenarios such as when pointer type is has a stride of 1 or when building for low optimisation levels.
While these changes do not achieve the "ideal" codegen referenced in #75579, it gets significantly closer to it. I’m not actually sure if the codegen can actually be much better with this function returning the offset, rather than the aligned pointer.
See the descriptions for separate commits for further information.
docs(marker/copy): provide example for `&T` being `Copy`
### Edited 2020-08-16 (most recent)
In the current documentation about the `Copy` marker trait, there is a section
with examples of structs that can implement `Copy`. Currently there is no example for
showing that shared references (`&T`) are also `Copy`.
It is worth to have a dedicated example for `&T` being `Copy`, because shared
references are an integral part of the language and it being `Copy` is not as
intuitive as other types that share this behaviour like `i32` or `bool`.
The example picks up on the previous non-`Copy` struct and shows that
structs can be `Copy`, even when they hold a shared reference to a non-`Copy` type.
-----------------------------------------
### Edited 2020-08-02, 3:28 p.m.
I've just realized that it says "in addition to the **implementors listed below**", which makes this PR kind of "wrong", because `&T` is indeed in the "implementors listed below".
Maybe we can instead show an example with `&T` in the [When can my type be Copy](https://doc.rust-lang.org/std/marker/trait.Copy.html#when-can-my-type-be-copy) section.
What I really want to achieve is that it becomes more obvious that `&T` is also `Copy`, because, I think, it is very valuable to know and it wasn't obvious for me, until I read something about it in a forum post.
What do you think? I would create another PR for that.
**Please feel free to close this PR.**
-----------------------------------
### Original post
In the current documentation about the `Copy` marker trait, there is a section
about "additional implementors", which list additional implementors of the `Copy` trait.
The fact that shared references are also `Copy` is mixed with another point,
which makes it hard to recognize and make it seem not as important.
This clarifies the fact that shared references are also `Copy`, by mentioning it as a
separate item in the list of "additional implementors".
See also X-Link mem::{swap, take, replace}
Since it's easy to end up at one of these functions when you really wanted the other one, cross link them with descriptions of why you'd want to use them.
Add `as_uninit`-like methods to pointer types and unify documentation of `as_ref` methods
This adds a convenient method to retrieve a `&(mut) [MaybeUninit<T>]` from slice pointers (`*const [T]`, `*mut [T]`, `NonNull<[T]>`). See also https://github.com/rust-lang/wg-allocators/issues/66#issuecomment-671789105.
~I'll add a tracking issue as soon as it's reviewed and CI passed.~
Tracking Issue: #75402
r? @RalfJung
Reference lang items during AST lowering
Fixes#60607 and fixes#61019.
This PR introduces `QPath::LangItem` to the HIR and uses it in AST lowering instead of constructing a `hir::Path` from a slice of symbols:
- Credit for much of this work goes to @matthewjasper, I basically just [rebased their earlier work](a227c706b7 (diff-c0f791ead38d2d02916faaad0f56f41d)).
- ~~Changes to Clippy might not be correct, they compile but attempting to run tests through `./x.py` produced failures which appeared spurious, so I didn't run any clippy tests.~~
- Changes to save analysis might not be correct - tests pass but I don't have a lot of confidence in those changes being correct.
- I've used `GenericBounds::LangItemTrait` rather than changing `PolyTraitRef`, as suggested by @matthewjasper [in this comment](a227c706b7 (r40107992)) but I'd prefer that be left for a follow-up.
- I've split things into smaller commits fairly arbitrarily to make the diff easier to review, each commit should compile but might not pass tests until the final commit.
r? @oli-obk
cc @matthewjasper
Fix example in `NonNull::as_uninit_slice`
Rename feature gate to "ptr_as_uninit"
Make methods more consistent with already stable methods
Make `pointer::as_uninit_slice` return an `Option`
Fix placement for `// SAFETY` section
Add `as_uninit_ref` and `as_uninit_mut` to pointers
Fix doctest
Update tracking issue
Fix doc links
Apply suggestions from review
Make wording about counterparts consistent
Fix doc links
Improve documentation
Fix doc-tests
Fix doc links ... again
Apply suggestions from review
Apply suggestions from Review
Apply suggestion from review to all affected files
Add missing words in safety sections in `as_uninit_slice_mut`
Fix safety-comment in `NonNull::as_uninit_slice_mut`
Move to intra-doc links for /library/core/src/any.rs
Helps with #75080.
@rustbot modify labels: T-doc, A-intra-doc-links, T-rustdoc
Known issues:
* Links from `core` to `std` (#74481):
* `[Box]: ../../std/boxed/struct.Box.html`
pin docs: add some forward references
@nagisa had some questions about pinning that were answered in the docs, which they did not realize because that discussion is below the examples. I still think it makes sense to introduce the examples before that discussion, since it give the discussion something concrete to refer to, but this PR adds some forward references so people don't think the examples conclude the docs.
@nagisa do you think this would have helped?
Code blocks in doc comments are compiled and run, so we show `Copy` works in this example.
Co-authored-by: Poliorcetics <poliorcetics@users.noreply.github.com>
Previously checking for `pmoda == 0` would get LLVM to generate branchy
code, when, for `stride = 1` the offset can be computed without such a
branch by doing effectively a `-p % a`.
For well-known (constant) alignments, with the new ordering of these
conditionals, we end up generating 2 to 3 cheap instructions on x86_64:
movq %rdi, %rax
negl %eax
andl $7, %eax
instead of 5+ as previously.
For unknown alignments the new code also generates just 3 instructions:
negq %rdi
leaq -1(%rsi), %rax
andq %rdi, %rax
At opt-level <= 1, the methods such as `wrapping_mul` are not being
inlined, causing significant bloating and slowdowns of the
implementation at these optimisation levels.
With use of these intrinsics, the codegen of this function at
-Copt_level=1 is the same as it is at -Copt_level=3.