83dfe7b27c
The essence of lexer cc @eddyb I would love to make a reusable library to lex rust code, which could be used by rustc, rust-analyzer, proc-macros, etc. This **draft** PR is my attempt at the API. Currently, the PR uses new lexer to lex comments and shebang, while using the old lexer for everything else. This should be enough to agree on the API though! ### High-level picture An `rust_lexer` crate is introduced, with zero or minimal (for XID_Start and other unicode) dependencies. This crate basically exposes a single function: `next_token(&str) -> (TokenKind, usize)` which returns the first token of a non-empty string (`usize` is the length of the token). The main goal of the API is to be minimal. Non-strictly essential concerns, like string interning, are left to the clients. ### Finer Points #### Iterator API We probably should expose a convenience function `fn tokenize(&str) -> impl Iterator<Item = Token>` EDIT: I've added `tokenize` #### Error handling The lexer itself provides only minimal amount of error detection and reporting. Additionally, it never fatal-errors and always produces some non-empty token. Examples of errors detected by the lexer: * unterminated block comment * unterminated string literals Example of errors **not** detected by the lexer: * invalid escape sequence in a string literal * out of range integer literal * bare `\r` in the doc comment. The idea is that the clients are responsible for additional validation of tokens. This is the mode IDE operates in: you want to skip validation for library files, because you are not showing errors there anyway, and for user-code, you want to do a deep validation with quick fixes and suggestions, which is not really fit for the lexer itself. In particular, in this PR unclosed `/*` comment is handled by the new lexer, bare `\r` and distinction between doc and non-doc comments is handled by the old lexer. #### Performance No attempt at performance measurement is made so far :) I think it is acceptable to regress perf here a bit in exchange for cleaner code, and I hope that regression wouldn't be too costly. In particular, because we validate tokens separately, we'll have to do one more pass for some of the tokens. I hope this is not a prohibitive cost. For example, for doc comments we already do two passes (lexing + interning), so adding a third one shouldn't be that much slower (and we also do an additional pass for utf-8 validation). And lexing is hopefully not a bottleneck. Note that for IDEs separate validation might actually improve performance, because we will be able to skip validation when, for example, computing completions. Long term, I hope that this approach will allow for *better* performance. If we separate pure lexing, in the future we can code-gen super-optimizes state machine that walks utf-8 directly, instead of current manual char-by-char toil. #### Cursor API For implementation, I am going slightly unconventionally. Instead of defining a `Lexer` struct with a bunch of helper methods (`current`, `bump`) and a bunch of lexing methods (`lex_comment`, `lex_whitespace`), I define a `Cursor` struct which has only helpers, and define a top-level function with a `&mut Cursor` argument for each grammar production. I find this C-style more readable for parsers and lexers. EDIT: swithced to a more conventional setup with lexing methods So, what do folks think about this? |
||
---|---|---|
.azure-pipelines | ||
src | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.mailmap | ||
Cargo.lock | ||
Cargo.toml | ||
CODE_OF_CONDUCT.md | ||
config.toml.example | ||
configure | ||
CONTRIBUTING.md | ||
COPYRIGHT | ||
LICENSE-APACHE | ||
LICENSE-MIT | ||
README.md | ||
RELEASES.md | ||
rustfmt.toml | ||
triagebot.toml | ||
x.py |
The Rust Programming Language
This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.
Quick Start
Read "Installation" from The Book.
Installing from Source
Note: If you wish to contribute to the compiler, you should read this chapter of the rustc-guide instead of this section.
The Rust build system has a Python script called x.py
to bootstrap building
the compiler. More information about it may be found by running ./x.py --help
or reading the rustc guide.
Building on *nix
-
Make sure you have installed the dependencies:
g++
4.7 or later orclang++
3.x or laterpython
2.7 (but not 3.x)- GNU
make
3.81 or later cmake
3.4.3 or latercurl
git
-
Clone the source with
git
:$ git clone https://github.com/rust-lang/rust.git $ cd rust
-
Configure the build settings:
The Rust build system uses a file named
config.toml
in the root of the source tree to determine various configuration settings for the build. Copy the defaultconfig.toml.example
toconfig.toml
to get started.$ cp config.toml.example config.toml
It is recommended that if you plan to use the Rust build system to create an installation (using
./x.py install
) that you set theprefix
value in the[install]
section to a directory that you have write permissions. -
Build and install:
$ ./x.py build && ./x.py install
When complete,
./x.py install
will place several programs into$PREFIX/bin
:rustc
, the Rust compiler, andrustdoc
, the API-documentation tool. This install does not include Cargo, Rust's package manager. To build and install Cargo, you may run./x.py install cargo
or set thebuild.extended
key inconfig.toml
totrue
to build and install all tools.
Building on Windows
There are two prominent ABIs in use on Windows: the native (MSVC) ABI used by Visual Studio, and the GNU ABI used by the GCC toolchain. Which version of Rust you need depends largely on what C/C++ libraries you want to interoperate with: for interop with software produced by Visual Studio use the MSVC build of Rust; for interop with GNU software built using the MinGW/MSYS2 toolchain use the GNU build.
MinGW
MSYS2 can be used to easily build Rust on Windows:
-
Grab the latest MSYS2 installer and go through the installer.
-
Run
mingw32_shell.bat
ormingw64_shell.bat
from wherever you installed MSYS2 (i.e.C:\msys64
), depending on whether you want 32-bit or 64-bit Rust. (As of the latest version of MSYS2 you have to runmsys2_shell.cmd -mingw32
ormsys2_shell.cmd -mingw64
from the command line instead) -
From this terminal, install the required tools:
# Update package mirrors (may be needed if you have a fresh install of MSYS2) $ pacman -Sy pacman-mirrors # Install build tools needed for Rust. If you're building a 32-bit compiler, # then replace "x86_64" below with "i686". If you've already got git, python, # or CMake installed and in PATH you can remove them from this list. Note # that it is important that you do **not** use the 'python2' and 'cmake' # packages from the 'msys2' subsystem. The build has historically been known # to fail with these packages. $ pacman -S git \ make \ diffutils \ tar \ mingw-w64-x86_64-python2 \ mingw-w64-x86_64-cmake \ mingw-w64-x86_64-gcc
-
Navigate to Rust's source code (or clone it), then build it:
$ ./x.py build && ./x.py install
MSVC
MSVC builds of Rust additionally require an installation of Visual Studio 2017
(or later) so rustc
can use its linker. The simplest way is to get the
Visual Studio, check the “C++ build tools” and “Windows 10 SDK” workload.
(If you're installing cmake yourself, be careful that “C++ CMake tools for Windows” doesn't get included under “Individual components”.)
With these dependencies installed, you can build the compiler in a cmd.exe
shell with:
> python x.py build
Currently, building Rust only works with some known versions of Visual Studio. If you have a more recent version installed the build system doesn't understand then you may need to force rustbuild to use an older version. This can be done by manually calling the appropriate vcvars file before running the bootstrap.
> CALL "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
> python x.py build
Specifying an ABI
Each specific ABI can also be used from either environment (for example, using the GNU ABI in PowerShell) by using an explicit build triple. The available Windows build triples are:
- GNU ABI (using GCC)
i686-pc-windows-gnu
x86_64-pc-windows-gnu
- The MSVC ABI
i686-pc-windows-msvc
x86_64-pc-windows-msvc
The build triple can be specified by either specifying --build=<triple>
when
invoking x.py
commands, or by copying the config.toml
file (as described
in Installing From Source), and modifying the
build
option under the [build]
section.
Configure and Make
While it's not the recommended build system, this project also provides a
configure script and makefile (the latter of which just invokes x.py
).
$ ./configure
$ make && sudo make install
When using the configure script, the generated config.mk
file may override the
config.toml
file. To go back to the config.toml
file, delete the generated
config.mk
file.
Building Documentation
If you’d like to build the documentation, it’s almost the same:
$ ./x.py doc
The generated documentation will appear under doc
in the build
directory for
the ABI used. I.e., if the ABI was x86_64-pc-windows-msvc
, the directory will be
build\x86_64-pc-windows-msvc\doc
.
Notes
Since the Rust compiler is written in Rust, it must be built by a precompiled "snapshot" version of itself (made in an earlier stage of development). As such, source builds require a connection to the Internet, to fetch snapshots, and an OS that can execute the available snapshot binaries.
Snapshot binaries are currently built and tested on several platforms:
Platform / Architecture | x86 | x86_64 |
---|---|---|
Windows (7, 8, 10, ...) | ✓ | ✓ |
Linux (2.6.18 or later) | ✓ | ✓ |
OSX (10.7 Lion or later) | ✓ | ✓ |
You may find that other platforms work, but these are our officially supported build environments that are most likely to work.
There is more advice about hacking on Rust in CONTRIBUTING.md.
Getting Help
The Rust community congregates in a few places:
- Stack Overflow - Direct questions about using the language.
- users.rust-lang.org - General discussion and broader questions.
- /r/rust - News and general discussion.
Contributing
To contribute to Rust, please see CONTRIBUTING.
Rust has an IRC culture and most real-time collaboration happens in a variety of channels on Mozilla's IRC network, irc.mozilla.org. The most popular channel is #rust, a venue for general discussion about Rust. And a good place to ask for help would be #rust-beginners.
The rustc guide might be a good place to start if you want to find out how various parts of the compiler work.
Also, you may find the rustdocs for the compiler itself useful.
License
Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.
Trademark
The Rust programming language is an open source, community project governed by a core team. It is also sponsored by the Mozilla Foundation (“Mozilla”), which owns and protects the Rust and Cargo trademarks and logos (the “Rust Trademarks”).
If you want to use these names or brands, please read the media guide.
Third-party logos may be subject to third-party copyrights and trademarks. See Licenses for details.