Speed up `SipHasher128`.
The current code in `SipHasher128::short_write` is inefficient. It uses
`u8to64_le` (which is complex and slow) to extract just the right number of
bytes of the input into a u64 and pad the result with zeroes. It then
left-shifts that value in order to bitwise-OR it with `self.tail`.
For example, imagine we have a u32 input `0xIIHH_GGFF` and only need three bytes
to fill up `self.tail`. The current code uses `u8to64_le` to construct
`0x0000_0000_00HH_GGFF`, which is just `0xIIHH_GGFF` with the `0xII` removed and
zero-extended to a u64. The code then left-shifts that value by five bytes --
discarding the `0x00` byte that replaced the `0xII` byte! -- to give
`0xHHGG_FF00_0000_0000`. It then then ORs that value with `self.tail`.
There's a much simpler way to do it: zero-extend to u64 first, then left shift.
E.g. `0xIIHH_GGFF` is zero-extended to `0x0000_0000_IIHH_GGFF`, and then
left-shifted to `0xHHGG_FF00_0000_0000`. We don't have to take time to exclude
the unneeded `0xII` byte, because it just gets shifted out anyway! It also avoids
multiple occurrences of `unsafe`.
There's a similar story with the setting of `self.tail` at the method's end.
The current code uses `u8to64_le` to extract the remaining part of the input,
but the same effect can be achieved more quickly with a right shift on the
zero-extended input.
This commit changes `SipHasher128` to use the simpler shift-based approach. The
code is also smaller, which means that `short_write` is now inlined where
previously it wasn't, which makes things faster again. This gives big
speed-ups for all incremental builds, especially "baseline" incremental
builds.
r? @michaelwoerister
Improve `char::is_ascii_*` codegen
This PR is an attempt to fix https://github.com/rust-lang/rust/issues/65127
A couple of warnings:
1. the generated code might be further improved (in LLVM and/or MIR) by emitting better comparison sequences; in particular, this would improve the performance of "complex" checks such as those in `is_ascii_punctuation`
2. the second commit is currently marked "DO NOT MERGE", because it regresses SIMD on `u8` slices; this could likely be fixed by improving the computation/usage of demanded bits in LLVM
An alternative approach to remove the code duplication might be the use of macros, but currently most of the duplication is actually in the doc comments, so maybe just keeping the redundancy could be ok
Preparation for allocator aware `Box`
This cleans up the `Box` code a bit, and uses `Box::from_raw(ptr)` instead of `Box(ptr)`.
Additionally, `box_free` and `exchange_malloc` now uses the `AllocRef` trait and a comment was added on how `box_free` is tied to `Box`.
This a preparation for an upcoming PR, which makes `Box` aware of an allocator.
r? @Amanieu
Add missing `_zeroed` varants to `AllocRef`
The majority of the allocator wg has decided to add the missing `_zeroed` variants to `AllocRef`:
> these should be added since they can be efficiently implemented with the `mremap` system call on Linux. `mremap` allows you to move/grow/shrink a memory mapping, and any new pages added for growth are guaranteed to be zeroed.
>
> If `AllocRef` does not have these methods then the user will have to manually write zeroes to the added memory since the API makes no guarantees on their contents.
For the full discussion please see https://github.com/rust-lang/wg-allocators/issues/14.
This PR provides default implementations for `realloc_zeroed`, `alloc_excess_zeroed`, `realloc_excess_zeroed`, and `grow_in_place_zeroed`.
r? @Amanieu
Remove common usage pattern from `AllocRef`
This removes the common usage patterns from `AllocRef`:
- `alloc_one`
- `dealloc_one`
- `alloc_array`
- `realloc_array`
- `dealloc_array`
Actually, they add nothing to `AllocRef` except a [convenience wrapper around `Layout` and other methods in this trait](https://doc.rust-lang.org/1.41.0/src/core/alloc.rs.html#1076-1240) but have a major flaw: The documentation of `AllocRefs` notes, that
> some higher-level allocation methods (`alloc_one`, `alloc_array`) are well-defined on zero-sized types and can optionally support them: it is left up to the implementor whether to return `Err`, or to return `Ok` with some pointer.
With the current API, `GlobalAlloc` does not have those methods, so they cannot be overridden for `liballoc::Global`, which means that even if the global allocator would support zero-sized allocations, `alloc_one`, `alloc_array`, and `realloc_array` for `liballoc::Global` will error, while calling `alloc` with a zeroed-size `Layout` could succeed. Even worse: allocating with `alloc` and deallocating with `dealloc_{one,array}` could end up with not calling `dealloc` at all!
For the full discussion please see https://github.com/rust-lang/wg-allocators/issues/18
r? @Amanieu
Python script PEP8 style guide space formatting and minor Python source cleanup
This PR includes the following changes in the Python sources based on a flake8 3.7.9 (mccabe: 0.6.1, pycodestyle: 2.5.0, pyflakes: 2.1.1) CPython 3.7.6 on Darwin lint:
- PEP8 style guide spacing updates *without* line length changes
- removal of unused local variable assignments in context managers and exception handling
- removal of unused Python import statements
- removal of unnecessary semicolons
Test failure of unchecked arithmetic intrinsics in const eval
Test that the unchecked arithmetic intrinsics that were made unstably const in #68809 emit an error during const-eval if given invalid input.
Addresses [this comment](https://github.com/rust-lang/rust/pull/68809#discussion_r375753066).
r? @RalfJung
[experiment] Support linking from a .rlink file
Flag `-Z no-link` was previously introduced, which allows creating an `.rlink` file to perform compilation without linking. This change enables linking from an `.rlink` file.
Part of Issue #64191
This makes it faster and also changes it to a safe function. (Thanks to
Michael Woerister for the suggestion.) `load_int_le!` is also no longer
necessary.
Don't rustfmt check the vendor directory.
I need to be able to run `x.py tidy` to do license checks (which requires vendored dependencies). However, when vendoring is enabled, it wants to rustfmt check the entire vendor directory, which doesn't work.
Don't run coherence twice for future-compat lints
This fixes the regression introduced by https://github.com/rust-lang/rust/pull/65232 (which I mentioned in https://github.com/rust-lang/rust/pull/65232#issuecomment-583739037).
Old algorithm:
* Run coherence with all future-incompatible checks off, reporting errors on any overlap.
* If there's no overlap (common case), run it *again*, with the future-incompatible checks on. Report warnings for any overlap found.
New algorithm:
* Run coherence with all additional future-incompatible checks *on*, which means that we'll find *all* potentially overlapping impls immediately.
* If this found overlap, run coherence again, with the future-incompatible checks off. If that *still* gives an error, we report it. If not, it ought to be a warning.
This reduces time spent in coherence checking for the nrf52810-pac by roughly 50% relative to current master.
Use `dyn Trait` more in tests
Here are some tests using the old trait object type syntax which are not testing the syntax itself.
This has been extracted from https://github.com/rust-lang/rust/pull/66364.
traits: preallocate 2 Vecs of known initial size
The 2 preallocations are pretty obvious; both vectors will be as big as or larger than the collections they are created from.
In `WfPredicates::normalize` the change from a functional style improves readability and should be perf-friendly, too.
Enable Control Flow Guard in rustbuild
Now that Rust supports Control Flow Guard (#68180), add a config.toml option to build the standard library with CFG enabled.
r? @nagisa
Remove unused feature gates
I think many of the remaining unstable things can be easily be replaced with stable things. I have kept the `#![feature(nll)]` even though it is only necessary in `libstd`, to make regressions of it harder.
Invert control in struct_lint_level.
Closes#67927
Changes the `struct_lint*` methods to take a `decorate` function instead of a message string. This decorate function is also responsible for eventually stashing, emitting or cancelling the diagnostic. If the lint was allowed after all, the decorate function is not run at all, saving us from spending time formatting messages (and potentially other expensive work) for lints that don't end up being emitted.
r? @Centril