Adaptive hashmap implementation
All credits to @pczarn who wrote https://github.com/rust-lang/rfcs/pull/1796 and https://github.com/contain-rs/hashmap2/pull/5
**Background**
Rust std lib hashmap puts a strong emphasis on security, we did some improvements in https://github.com/rust-lang/rust/pull/37470 but in some very specific cases and for non-default hashers it's still vulnerable (see #36481).
This is a simplified version of https://github.com/rust-lang/rfcs/pull/1796 proposal sans switching hashers on the fly and other things that require an RFC process and further decisions. I think this part has great potential by itself.
**Proposal**
This PR adds code checking for extra long probe and shifts lengths (see code comments and https://github.com/rust-lang/rfcs/pull/1796 for details), when those are encountered the hashmap will grow (even if the capacity limit is not reached yet) _greatly_ attenuating the degenerate performance case.
We need a lower bound on the minimum occupancy that may trigger the early resize, otherwise in extreme cases it's possible to turn the CPU attack into a memory attack. The PR code puts that lower bound at half of the max occupancy (defined by ResizePolicy). This reduces the protection (it could potentially be exploited between 0-50% occupancy) but makes it completely safe.
**Drawbacks**
* May interact badly with poor hashers. Maps using those may not use the desired capacity.
* It adds 2-3 branches to the common insert path, luckily those are highly predictable and there's room to shave some in future patches.
* May complicate exposure of ResizePolicy in the future as the constants are a function of the fill factor.
**Example**
Example code that exploit the exposure of iteration order and weak hasher.
```
const MERGE: usize = 10_000usize;
#[bench]
fn merge_dos(b: &mut Bencher) {
let first_map: $hashmap<usize, usize, FnvBuilder> = (0..MERGE).map(|i| (i, i)).collect();
let second_map: $hashmap<usize, usize, FnvBuilder> = (MERGE..MERGE * 2).map(|i| (i, i)).collect();
b.iter(|| {
let mut merged = first_map.clone();
for (&k, &v) in &second_map {
merged.insert(k, v);
}
::test::black_box(merged);
});
}
```
_91 is stdlib and _ad is patched (the end capacity in both cases is the same)
```
running 2 tests
test _91::merge_dos ... bench: 47,311,843 ns/iter (+/- 2,040,302)
test _ad::merge_dos ... bench: 599,099 ns/iter (+/- 83,270)
```
travis: Disable source tarballs on most builders
Currently we create a source tarball on almost all of the `DEPLOY=1` builders
but this has the adverse side effect of all source tarballs overriding
themselves in the S3 bucket. Normally this is ok but unfortunately a source
tarball created on Windows is not buildable on Unix.
On Windows the vendored sources contain paths with `\` characters in them which
when interpreted on Unix end up in "file not found" errors.
Instead of this overwriting behavior, whitelist just one linux builder for
producing tarballs and avoid producing tarballs on all other hosts.
use bash when invoking dist shell scripts on solaris
Partially fixes#25845
A separate, trivial fix is needed to the rust-installer scripts to completely resolve this issue.
sys/mod doc update and mod import order adjust
* Some doc updates.
* Racer currently use the first mod it finds regardless of cfg attrs. Moving #[cfg(unix)] up should be a temporary tweak that works as expected for more people.
Vec, LinkedList, VecDeque, String, and Option NatVis visualizations
I've added some basic [NatVis](https://msdn.microsoft.com/en-us/library/jj620914.aspx) visualizations for core Rust collections and types. This helps address a need filed in issue #36503. NatVis visualizations are similar to gdb/lldb pretty printers, but for windbg and the Visual Studio debugger on Windows.
For example, Vec without the supplied NatVis looks like this in windbg using the "dx" command:
```
0:000> dx some_64_bit_vec
some_64_bit_vec [Type: collections::vec::Vec<u64>]
[+0x000] buf [Type: alloc::raw_vec::RawVec<u64>]
[+0x010] len : 0x4 [Type: unsigned __int64]
```
With the NatVis, the elements of the Vec are displayed:
```
0:000> dx some_64_bit_vec
some_64_bit_vec : { size=0x4 } [Type: collections::vec::Vec<u64>]
[<Raw View>] [Type: collections::vec::Vec<u64>]
[size] : 0x4 [Type: unsigned __int64]
[capacity] : 0x4 [Type: unsigned __int64]
[0] : 0x4 [Type: unsigned __int64]
[1] : 0x4f [Type: unsigned __int64]
[2] : 0x1a [Type: unsigned __int64]
[3] : 0x184 [Type: unsigned __int64]
```
In fact, the vector can be treated as an array by the NatVis expression evaluator:
```
0:000> dx some_64_bit_vec[2]
some_64_bit_vec[2] : 0x1a [Type: unsigned __int64]
```
In general, it works with any NatVis command that understands collections, such as NatVis LINQ expressions:
```
0:000> dx some_64_bit_vec.Select(x => x * 2)
some_64_bit_vec.Select(x => x * 2)
[0] : 0x8
[1] : 0x9e
[2] : 0x34
[3] : 0x308
```
std::string::String is implemented, as well:
```
0:000> dv
hello_world = "Hello, world!"
empty = ""
new = ""
0:000> dx hello_world
hello_world : "Hello, world!" [Type: collections::string::String]
[<Raw View>] [Type: collections::string::String]
[size] : 0xd [Type: unsigned __int64]
[capacity] : 0xd [Type: unsigned __int64]
[0] : 72 'H' [Type: char]
[1] : 101 'e' [Type: char]
...
[12] : 33 '!' [Type: char]
0:000> dx empty
empty : "" [Type: collections::string::String]
[<Raw View>] [Type: collections::string::String]
[size] : 0x0 [Type: unsigned __int64]
[capacity] : 0x0 [Type: unsigned __int64]
```
VecDeque and LinkedList are also implemented.
My biggest concern is the implementation for Option due to the different layouts it can receive based on whether the sentinel value can be embedded with-in the Some value or must be stored separately.
It seems to work, but my testing isn't exhaustive:
```
0:000> dv
three = { Some 3 }
none = { None }
no_str = { None }
some_str = { Some "Hello!" }
0:000> dx three
three : { Some 3 } [Type: core::option::Option<i32>]
[<Raw View>] [Type: core::option::Option<i32>]
[size] : 0x1 [Type: ULONG]
[value] : 3 [Type: int]
[0] : 3 [Type: int]
0:000> dx none
none : { None } [Type: core::option::Option<i32>]
[<Raw View>] [Type: core::option::Option<i32>]
[size] : 0x0 [Type: ULONG]
[value] : 4 [Type: int]
0:000> dx no_str
no_str : { None } [Type: core::option::Option<collections::string::String>]
[<Raw View>] [Type: core::option::Option<collections::string::String>]
[size] : 0x0 [Type: ULONG]
0:000> dx some_str
some_str : { Some "Hello!" } [Type: core::option::Option<collections::string::String>]
[<Raw View>] [Type: core::option::Option<collections::string::String>]
[size] : 0x1 [Type: ULONG]
[value] : 0x4673df710 : "Hello!" [Type: collections::string::String *]
[0] : "Hello!" [Type: collections::string::String]
```
For now all of these visualizations work in windbg, but I've only gotten the visualizations in libcore.natvis working in the VS debugger. My priority is windbg, but somebody else may be interested in investigating the issues related to VS.
You can load these visualizations into a windbg sessions using the .nvload command:
```
0:000> .nvload ..\rust\src\etc\natvis\libcollections.natvis; .nvload ..\rust\src\etc\natvis\libcore.natvis
Successfully loaded visualizers in "..\rust\src\etc\natvis\libcollections.natvis"
Successfully loaded visualizers in "..\rust\src\etc\natvis\libcore.natvis"
```
There are some issues with the symbols that Rust and LLVM conspire to emit into the PDB that inhibit debugging in windbg generally, and by extension make writing visualizations more difficult. Additionally, there are some bugs in windbg itself that complicate or disable some use of the NatVis visualizations for Rust. Significantly, due to NatVis limitations in windbg around allowable type names, you cannot write a visualization for [T] or str. I'll report separate issues as I isolate them.
In the near term, I hope to fill out these NatVis files with more of Rust's core collections and types. In the long run, I hope that we can ship NatVis files with crates and streamline their deployment when debugging Rust programs on windows.
Allow more Cell methods for non-Copy types
Clearly, `get_mut` is safe for any `T`. The other two only provide unsafe pointers anyway.
The only remaining inherent method with `Copy` bound is `get`, which sounds about right to me.
I found the order if `impl` blocks in the file a little weird (first inherent impl, then some trait impls, then another inherent impl), but didn't change it to keep the diff small.
Contributes to #39264
book: don’t use GNU extensions in the example unnecessarily
The use of a GNU C extension for bloc expressions is immaterial to the
actual problem with C macros that the section tries to show so don’t
use it and instead use a plain C way of writing the macro which has
added benefit of being better C code (since the macro now behaves like
a function, syntax-wise).
Stabilize field init shorthand
Closes#37340.
~Still blocked by the documentation issue #38830.~ EDIT: seems that all parts required for stabilisation are fixed, so its not blocked.
Currently we create a source tarball on almost all of the `DEPLOY=1` builders
but this has the adverse side effect of all source tarballs overriding
themselves in the S3 bucket. Normally this is ok but unfortunately a source
tarball created on Windows is not buildable on Unix.
On Windows the vendored sources contain paths with `\` characters in them which
when interpreted on Unix end up in "file not found" errors.
Instead of this overwriting behavior, whitelist just one linux builder for
producing tarballs and avoid producing tarballs on all other hosts.
Dont segfault if btree range is not in order
This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.
Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case.
Open questions:
1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.
The same question for other weird cases:
2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5.
4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?
The use of a GNU C extension for bloc expressions is immaterial to the
actual problem with C macros that the section tries to show so don’t
use it and instead use a plain C way of writing the macro.
Conversions between CStr, OsStr, Path and boxes
This closes a bit of the inconsistencies between `CStr`, `OsStr`, `Path`, and `str`, allowing people to create boxed versions of DSTs other than `str` and `[T]`.
Full list of additions:
* `Default` for `Box<str>`, `Box<CStr>`, `Box<OsStr>`, and `Box<Path>` (note: `Default` for `PathBuf` is already implemented)
* `CString::into_boxed_c_str` (feature gated)
* `OsString::into_boxed_os_str` (feature gated)
* `Path::into_boxed_path` (feature gated)
* `From<&CStr> for Box<CStr>`
* `From<&OsStr> for Box<OsStr>`
* `From<&Path> for Box<Path>`
This also includes adding the internal methods:
* `sys::*::os_str::Buf::into_box`
* `sys::*::os_str::Slice::{into_box, empty_box}`
* `sys_common::wtf8::Wtf8Buf::into_box`
* `sys_common::wtf8::Wtf8::{into_box, empty_box}`
Port books to mdbook
Part of https://github.com/rust-lang/rust/issues/39588
blocked on https://github.com/rust-lang/rust/pull/39431
As a first step towards the bookshelf, we ~vendor mdbook in-tree and~ port our books to it. Eventually, both of these books will be moved out-of-tree, but the nightly book will rely on doing the same thing. As such, this intermediate step is useful.
r? @alexcrichton @brson
/cc @azerupi
tidy: exempt URLs from the line length restriction
The length of a URL is usually not under our control, and Markdown
provides no way to split a URL in the middle. Therefore, comment
lines consisting _solely_ of a URL (possibly with a Markdown link
label in front) should be exempt from the line-length restriction.
Inline hyperlink destinations ( `[foo](http://...)` notation ) are
_not_ exempt, because it is my arrogant opinion that long lines of
that type make the source text illegible.
The patch adds dependencies on the `regex` and `lazy_static` crates
to the tidy utility. This _appears_ to Just Work, but if you would
rather not have that dependency I am willing to provide a hand-written
parser instead.
Adding compile fail test for staged_api feature
Issue #39059
r? @est31
@est31 running the tests for this feature fails. Is that expected since this is the `compile-fail`suite?
I copied this test from the run-pass suite: `rust/src/test/run-pass/reachable-unnameable-type-alias.rs`. What are the differences between these suites in operation and why they are used?
travis: Add builders without assertions
This commit adds three new builders, one OSX, one Linux, and one MSVC, which
will produce "nightlies" with LLVM assertions disabled. Currently all nightly
releases have LLVM assertions enabled to catch bugs before they reach the
beta/stable channels. The beta/stable channels, however, do not have LLVM
assertions enabled.
Unfortunately though projects like Servo are stuck on nightlies for the near
future at least and are also suffering very long compile times. The purpose of
this commit is to provide artifacts to these projects which are not distributed
through normal channels (e.g. rustup) but are provided for developers to use
locally if need be.
Logistically these builds will all be uploaded to `rustc-builds-alt` instead of
the `rustc-builds` folder of the `rust-lang-ci` bucket. These builds will stay
there forever (until cleaned out if necessary) and there are no plans to
integrate this with rustup and/or the official release process.
Add equivalents of C's <ctype.h> functions to AsciiExt.
* `is_ascii_alphabetic`
* `is_ascii_uppercase`
* `is_ascii_lowercase`
* `is_ascii_alphanumeric`
* `is_ascii_digit`
* `is_ascii_hexdigit`
* `is_ascii_punctuation`
* `is_ascii_graphic`
* `is_ascii_whitespace`
* `is_ascii_control`
This addresses issue #39658.
Lightly tested on x86-64-linux. tidy complains about the URLs in the documentation making lines too long, I don't know what to do about that.
Automate vendoring by invoking cargo-vendor when building src dist tarballs.
This avoids #39633 bringing the `src/vendor` checked into git by #37524, past 200,000 lines of code.
I believe the strategy of having rustbuild run `cargo vendor` during the `dist src` step is sound.
However, the only way to be sure `cargo-vendor` exists is to run `cargo install --force cargo-vendor`, which will recompile it every time (not passing `--force` means you can't tell between "already exists" and "build error"). ~~This is quite suboptimal and I'd like to somehow do it in each `Dockerfile` that would need it.~~
* [ ] Cache `CARGO_HOME` (i.e. `~/.cargo`) between CI runs
* `bin/cargo-vendor` and the actual caches are the relevant bits
* [x] Do not build `cargo-vendor` all the time
* ~~Maybe detect `~/.cargo/bin/cargo-vendor` already exists?~~
* ~~Could also try to build it in a `Dockerfile` but do we have `cargo`/`rustc` there?~~
* Final solution: check `cargo install --list` for a line starting with `cargo-vendor `
cc @rust-lang/tools