40cd1fdf0a
Cache conscious hashmap table Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973. This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged. **Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout). **Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions. Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/ The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower. Total wasted padding between items (C being the capacity of the table). * Old layout: C * (K-K padding) + C * (V-V padding) * Proposed: C * (K-V padding) + C * (V-K padding) In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_). Starting from the worst case the memory overhead is: * `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*) * `HashMap<u64, u16>` 33% memory overhead. * `HashMap<u64, u32>` 20% memory overhead. * `HashMap<T, T>` 0% memory overhead * Worst case based on sizeof K + sizeof V: | x | 16 | 24 | 32 | 64 | 128 | |----------------|--------|--------|--------|-------|-------| | (8+x+7)/(8+x) | 1.29 | 1.22 | 1.18 | 1.1 | 1.05 | I've a test repo here to run benchmarks https://github.com/arthurprs/hashmap2/tree/layout ``` ➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff % grow_10_000 922,064 783,933 -138,131 -14.98% grow_big_value_10_000 1,901,909 1,171,862 -730,047 -38.38% grow_fnv_10_000 443,544 418,674 -24,870 -5.61% insert_100 2,469 2,342 -127 -5.14% insert_1000 23,331 21,536 -1,795 -7.69% insert_100_000 4,748,048 3,764,305 -983,743 -20.72% insert_10_000 321,744 290,126 -31,618 -9.83% insert_int_bigvalue_10_000 749,764 407,547 -342,217 -45.64% insert_str_10_000 337,425 334,009 -3,416 -1.01% insert_string_10_000 788,667 788,262 -405 -0.05% iter_keys_100_000 394,484 374,161 -20,323 -5.15% iter_keys_big_value_100_000 402,071 620,810 218,739 54.40% iter_values_100_000 424,794 373,004 -51,790 -12.19% iterate_100_000 424,297 389,950 -34,347 -8.10% lookup_100_000 189,997 186,554 -3,443 -1.81% lookup_100_000_bigvalue 192,509 189,695 -2,814 -1.46% lookup_10_000 154,251 145,731 -8,520 -5.52% lookup_10_000_bigvalue 162,315 146,527 -15,788 -9.73% lookup_10_000_exist 132,769 128,922 -3,847 -2.90% lookup_10_000_noexist 146,880 144,504 -2,376 -1.62% lookup_1_000_000 137,167 132,260 -4,907 -3.58% lookup_1_000_000_bigvalue 141,130 134,371 -6,759 -4.79% lookup_1_000_000_bigvalue_unif 567,235 481,272 -85,963 -15.15% lookup_1_000_000_unif 589,391 453,576 -135,815 -23.04% merge_shuffle 1,253,357 1,207,387 -45,970 -3.67% merge_simple 40,264,690 37,996,903 -2,267,787 -5.63% new 6 5 -1 -16.67% with_capacity_10e5 3,214 3,256 42 1.31% ``` ``` ➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff % iter_keys_100_000 391,677 382,839 -8,838 -2.26% iter_keys_1_000_000 10,797,360 10,209,898 -587,462 -5.44% iter_keys_big_value_100_000 414,736 662,255 247,519 59.68% iter_keys_big_value_1_000_000 10,147,837 12,067,938 1,920,101 18.92% iter_values_100_000 440,445 377,080 -63,365 -14.39% iter_values_1_000_000 10,931,844 9,979,173 -952,671 -8.71% iterate_100_000 428,644 388,509 -40,135 -9.36% iterate_1_000_000 11,065,419 10,042,427 -1,022,992 -9.24% ``` |
||
---|---|---|
man | ||
mk | ||
src | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.mailmap | ||
.travis.yml | ||
COMPILER_TESTS.md | ||
configure | ||
CONTRIBUTING.md | ||
COPYRIGHT | ||
LICENSE-APACHE | ||
LICENSE-MIT | ||
Makefile.in | ||
README.md | ||
RELEASES.md |
The Rust Programming Language
This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.
Quick Start
Read "Installing Rust" from The Book.
Building from Source
-
Make sure you have installed the dependencies:
g++
4.7 or later orclang++
3.xpython
2.7 (but not 3.x)- GNU
make
3.81 or later cmake
3.4.3 or latercurl
git
-
Clone the source with
git
:$ git clone https://github.com/rust-lang/rust.git $ cd rust
-
Build and install:
$ ./configure $ make && make install
Note: You may need to use
sudo make install
if you do not normally have permission to modify the destination directory. The install locations can be adjusted by passing a--prefix
argument toconfigure
. Various other options are also supported – pass--help
for more information on them.When complete,
make install
will place several programs into/usr/local/bin
:rustc
, the Rust compiler, andrustdoc
, the API-documentation tool. This install does not include Cargo, Rust's package manager, which you may also want to build.
Building on Windows
There are two prominent ABIs in use on Windows: the native (MSVC) ABI used by Visual Studio, and the GNU ABI used by the GCC toolchain. Which version of Rust you need depends largely on what C/C++ libraries you want to interoperate with: for interop with software produced by Visual Studio use the MSVC build of Rust; for interop with GNU software built using the MinGW/MSYS2 toolchain use the GNU build.
MinGW
MSYS2 can be used to easily build Rust on Windows:
-
Grab the latest MSYS2 installer and go through the installer.
-
Run
mingw32_shell.bat
ormingw64_shell.bat
from wherever you installed MSYS2 (i.e.C:\msys64
), depending on whether you want 32-bit or 64-bit Rust. (As of the latest version of MSYS2 you have to runmsys2_shell.cmd -mingw32
ormsys2_shell.cmd -mingw64
from the command line instead) -
From this terminal, install the required tools:
# Update package mirrors (may be needed if you have a fresh install of MSYS2) $ pacman -Sy pacman-mirrors # Install build tools needed for Rust. If you're building a 32-bit compiler, # then replace "x86_64" below with "i686". If you've already got git, python, # or CMake installed and in PATH you can remove them from this list. Note # that it is important that the `python2` and `cmake` packages **not** used. # The build has historically been known to fail with these packages. $ pacman -S git \ make \ diffutils \ mingw-w64-x86_64-python2 \ mingw-w64-x86_64-cmake \ mingw-w64-x86_64-gcc
-
Navigate to Rust's source code (or clone it), then configure and build it:
$ ./configure $ make && make install
MSVC
MSVC builds of Rust additionally require an installation of Visual Studio 2013
(or later) so rustc
can use its linker. Make sure to check the “C++ tools”
option.
With these dependencies installed, the build takes two steps:
$ ./configure
$ make && make install
MSVC with rustbuild
The old build system, based on makefiles, is currently being rewritten into a Rust-based build system called rustbuild. This can be used to bootstrap the compiler on MSVC without needing to install MSYS or MinGW. All you need are Python 2, CMake, and Git in your PATH (make sure you do not use the ones from MSYS if you have it installed). You'll also need Visual Studio 2013 or newer with the C++ tools. Then all you need to do is to kick off rustbuild.
python .\src\bootstrap\bootstrap.py
Currently rustbuild only works with some known versions of Visual Studio. If you have a more recent version installed that a part of rustbuild doesn't understand then you may need to force rustbuild to use an older version. This can be done by manually calling the appropriate vcvars file before running the bootstrap.
CALL "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64\vcvars64.bat"
python .\src\bootstrap\bootstrap.py
Building Documentation
If you’d like to build the documentation, it’s almost the same:
$ ./configure
$ make docs
Building the documentation requires building the compiler, so the above details will apply. Once you have the compiler built, you can
$ make docs NO_REBUILD=1
To make sure you don’t re-build the compiler because you made a change to some documentation.
The generated documentation will appear in a top-level doc
directory,
created by the make
rule.
Notes
Since the Rust compiler is written in Rust, it must be built by a precompiled "snapshot" version of itself (made in an earlier state of development). As such, source builds require a connection to the Internet, to fetch snapshots, and an OS that can execute the available snapshot binaries.
Snapshot binaries are currently built and tested on several platforms:
Platform / Architecture | x86 | x86_64 |
---|---|---|
Windows (7, 8, Server 2008 R2) | ✓ | ✓ |
Linux (2.6.18 or later) | ✓ | ✓ |
OSX (10.7 Lion or later) | ✓ | ✓ |
You may find that other platforms work, but these are our officially supported build environments that are most likely to work.
Rust currently needs between 600MiB and 1.5GiB to build, depending on platform. If it hits swap, it will take a very long time to build.
There is more advice about hacking on Rust in CONTRIBUTING.md.
Getting Help
The Rust community congregates in a few places:
- Stack Overflow - Direct questions about using the language.
- users.rust-lang.org - General discussion and broader questions.
- /r/rust - News and general discussion.
Contributing
To contribute to Rust, please see CONTRIBUTING.
Rust has an IRC culture and most real-time collaboration happens in a variety of channels on Mozilla's IRC network, irc.mozilla.org. The most popular channel is #rust, a venue for general discussion about Rust. And a good place to ask for help would be #rust-beginners.
License
Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.