Go to file
bors 40cd1fdf0a Auto merge of #36692 - arthurprs:hashmap-layout, r=alexcrichton
Cache conscious hashmap table

Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973.

This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.

**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.

Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/

The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.

Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)

In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).

Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:

| x              |  16    |  24    |  32    |  64   |  128  |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x)  |  1.29  |  1.22  |  1.18  |  1.1  |  1.05 |

I've a test repo here to run benchmarks  https://github.com/arthurprs/hashmap2/tree/layout

```
 ➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                            hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 grow_10_000                     922,064           783,933               -138,131  -14.98%
 grow_big_value_10_000           1,901,909         1,171,862             -730,047  -38.38%
 grow_fnv_10_000                 443,544           418,674                -24,870   -5.61%
 insert_100                      2,469             2,342                     -127   -5.14%
 insert_1000                     23,331            21,536                  -1,795   -7.69%
 insert_100_000                  4,748,048         3,764,305             -983,743  -20.72%
 insert_10_000                   321,744           290,126                -31,618   -9.83%
 insert_int_bigvalue_10_000      749,764           407,547               -342,217  -45.64%
 insert_str_10_000               337,425           334,009                 -3,416   -1.01%
 insert_string_10_000            788,667           788,262                   -405   -0.05%
 iter_keys_100_000               394,484           374,161                -20,323   -5.15%
 iter_keys_big_value_100_000     402,071           620,810                218,739   54.40%
 iter_values_100_000             424,794           373,004                -51,790  -12.19%
 iterate_100_000                 424,297           389,950                -34,347   -8.10%
 lookup_100_000                  189,997           186,554                 -3,443   -1.81%
 lookup_100_000_bigvalue         192,509           189,695                 -2,814   -1.46%
 lookup_10_000                   154,251           145,731                 -8,520   -5.52%
 lookup_10_000_bigvalue          162,315           146,527                -15,788   -9.73%
 lookup_10_000_exist             132,769           128,922                 -3,847   -2.90%
 lookup_10_000_noexist           146,880           144,504                 -2,376   -1.62%
 lookup_1_000_000                137,167           132,260                 -4,907   -3.58%
 lookup_1_000_000_bigvalue       141,130           134,371                 -6,759   -4.79%
 lookup_1_000_000_bigvalue_unif  567,235           481,272                -85,963  -15.15%
 lookup_1_000_000_unif           589,391           453,576               -135,815  -23.04%
 merge_shuffle                   1,253,357         1,207,387              -45,970   -3.67%
 merge_simple                    40,264,690        37,996,903          -2,267,787   -5.63%
 new                             6                 5                           -1  -16.67%
 with_capacity_10e5              3,214             3,256                       42    1.31%
```

```
➜  hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
 name                           hhkkvv:: ns/iter  hhkvkv:: ns/iter  diff ns/iter   diff %
 iter_keys_100_000              391,677           382,839                 -8,838   -2.26%
 iter_keys_1_000_000            10,797,360        10,209,898            -587,462   -5.44%
 iter_keys_big_value_100_000    414,736           662,255                247,519   59.68%
 iter_keys_big_value_1_000_000  10,147,837        12,067,938           1,920,101   18.92%
 iter_values_100_000            440,445           377,080                -63,365  -14.39%
 iter_values_1_000_000          10,931,844        9,979,173             -952,671   -8.71%
 iterate_100_000                428,644           388,509                -40,135   -9.36%
 iterate_1_000_000              11,065,419        10,042,427          -1,022,992   -9.24%
```
2016-10-14 02:23:19 -07:00
man
mk rustc: Rename rustc_macro to proc_macro 2016-10-06 11:07:23 -07:00
src Auto merge of #36692 - arthurprs:hashmap-layout, r=alexcrichton 2016-10-14 02:23:19 -07:00
.gitattributes
.gitignore
.gitmodules
.mailmap
.travis.yml
COMPILER_TESTS.md
configure Rollup merge of #37091 - alexcrichton:configure, r=brson 2016-10-12 14:07:56 -07:00
CONTRIBUTING.md doc: Contributing.md: mention of make tidy 2016-09-03 12:51:16 +02:00
COPYRIGHT
LICENSE-APACHE
LICENSE-MIT
Makefile.in
README.md
RELEASES.md Rollup merge of #36842 - cjm00:release-notes-fix, r=brson 2016-09-30 13:44:48 -04:00

The Rust Programming Language

This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.

Quick Start

Read "Installing Rust" from The Book.

Building from Source

  1. Make sure you have installed the dependencies:

    • g++ 4.7 or later or clang++ 3.x
    • python 2.7 (but not 3.x)
    • GNU make 3.81 or later
    • cmake 3.4.3 or later
    • curl
    • git
  2. Clone the source with git:

    $ git clone https://github.com/rust-lang/rust.git
    $ cd rust
    
  1. Build and install:

    $ ./configure
    $ make && make install
    

    Note: You may need to use sudo make install if you do not normally have permission to modify the destination directory. The install locations can be adjusted by passing a --prefix argument to configure. Various other options are also supported pass --help for more information on them.

    When complete, make install will place several programs into /usr/local/bin: rustc, the Rust compiler, and rustdoc, the API-documentation tool. This install does not include Cargo, Rust's package manager, which you may also want to build.

Building on Windows

There are two prominent ABIs in use on Windows: the native (MSVC) ABI used by Visual Studio, and the GNU ABI used by the GCC toolchain. Which version of Rust you need depends largely on what C/C++ libraries you want to interoperate with: for interop with software produced by Visual Studio use the MSVC build of Rust; for interop with GNU software built using the MinGW/MSYS2 toolchain use the GNU build.

MinGW

MSYS2 can be used to easily build Rust on Windows:

  1. Grab the latest MSYS2 installer and go through the installer.

  2. Run mingw32_shell.bat or mingw64_shell.bat from wherever you installed MSYS2 (i.e. C:\msys64), depending on whether you want 32-bit or 64-bit Rust. (As of the latest version of MSYS2 you have to run msys2_shell.cmd -mingw32 or msys2_shell.cmd -mingw64 from the command line instead)

  3. From this terminal, install the required tools:

    # Update package mirrors (may be needed if you have a fresh install of MSYS2)
    $ pacman -Sy pacman-mirrors
    
    # Install build tools needed for Rust. If you're building a 32-bit compiler,
    # then replace "x86_64" below with "i686". If you've already got git, python,
    # or CMake installed and in PATH you can remove them from this list. Note
    # that it is important that the `python2` and `cmake` packages **not** used.
    # The build has historically been known to fail with these packages.
    $ pacman -S git \
                make \
                diffutils \
                mingw-w64-x86_64-python2 \
                mingw-w64-x86_64-cmake \
                mingw-w64-x86_64-gcc
    
  4. Navigate to Rust's source code (or clone it), then configure and build it:

    $ ./configure
    $ make && make install
    

MSVC

MSVC builds of Rust additionally require an installation of Visual Studio 2013 (or later) so rustc can use its linker. Make sure to check the “C++ tools” option.

With these dependencies installed, the build takes two steps:

$ ./configure
$ make && make install

MSVC with rustbuild

The old build system, based on makefiles, is currently being rewritten into a Rust-based build system called rustbuild. This can be used to bootstrap the compiler on MSVC without needing to install MSYS or MinGW. All you need are Python 2, CMake, and Git in your PATH (make sure you do not use the ones from MSYS if you have it installed). You'll also need Visual Studio 2013 or newer with the C++ tools. Then all you need to do is to kick off rustbuild.

python .\src\bootstrap\bootstrap.py

Currently rustbuild only works with some known versions of Visual Studio. If you have a more recent version installed that a part of rustbuild doesn't understand then you may need to force rustbuild to use an older version. This can be done by manually calling the appropriate vcvars file before running the bootstrap.

CALL "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64\vcvars64.bat"
python .\src\bootstrap\bootstrap.py

Building Documentation

If youd like to build the documentation, its almost the same:

$ ./configure
$ make docs

Building the documentation requires building the compiler, so the above details will apply. Once you have the compiler built, you can

$ make docs NO_REBUILD=1

To make sure you dont re-build the compiler because you made a change to some documentation.

The generated documentation will appear in a top-level doc directory, created by the make rule.

Notes

Since the Rust compiler is written in Rust, it must be built by a precompiled "snapshot" version of itself (made in an earlier state of development). As such, source builds require a connection to the Internet, to fetch snapshots, and an OS that can execute the available snapshot binaries.

Snapshot binaries are currently built and tested on several platforms:

Platform / Architecture x86 x86_64
Windows (7, 8, Server 2008 R2)
Linux (2.6.18 or later)
OSX (10.7 Lion or later)

You may find that other platforms work, but these are our officially supported build environments that are most likely to work.

Rust currently needs between 600MiB and 1.5GiB to build, depending on platform. If it hits swap, it will take a very long time to build.

There is more advice about hacking on Rust in CONTRIBUTING.md.

Getting Help

The Rust community congregates in a few places:

Contributing

To contribute to Rust, please see CONTRIBUTING.

Rust has an IRC culture and most real-time collaboration happens in a variety of channels on Mozilla's IRC network, irc.mozilla.org. The most popular channel is #rust, a venue for general discussion about Rust. And a good place to ask for help would be #rust-beginners.

License

Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.

See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.