parent
a81ce5f991
commit
290da6f016
585
src/doc/intro.md
585
src/doc/intro.md
@ -1,586 +1,5 @@
|
||||
% A 30-minute Introduction to Rust
|
||||
|
||||
Rust is a modern systems programming language focusing on safety and speed. It
|
||||
accomplishes these goals by being memory safe without using garbage collection.
|
||||
This introduction is now deprecated. Please see [the introduction to the book][intro].
|
||||
|
||||
This introduction will give you a rough idea of what Rust is like, eliding many
|
||||
details. It does not require prior experience with systems programming, but you
|
||||
may find the syntax easier if you've used a "curly brace" programming language
|
||||
before, like C or JavaScript. The concepts are more important than the syntax,
|
||||
so don't worry if you don't get every last detail: you can read [The
|
||||
Rust Programming Language](book/index.html) to get a more complete explanation.
|
||||
|
||||
Because this is about high-level concepts, you don't need to actually install
|
||||
Rust to follow along. If you'd like to anyway, check out [the
|
||||
homepage](http://rust-lang.org) for explanation.
|
||||
|
||||
To show off Rust, let's talk about how easy it is to get started with Rust.
|
||||
Then, we'll talk about Rust's most interesting feature, *ownership*, and
|
||||
then discuss how it makes concurrency easier to reason about. Finally,
|
||||
we'll talk about how Rust breaks down the perceived dichotomy between speed
|
||||
and safety.
|
||||
|
||||
# Tools
|
||||
|
||||
Getting started on a new Rust project is incredibly easy, thanks to Rust's
|
||||
package manager, [Cargo](https://crates.io/).
|
||||
|
||||
To start a new project with Cargo, use `cargo new`:
|
||||
|
||||
```{bash}
|
||||
$ cargo new hello_world --bin
|
||||
```
|
||||
|
||||
We're passing `--bin` because we're making a binary program: if we
|
||||
were making a library, we'd leave it off.
|
||||
|
||||
Let's check out what Cargo has generated for us:
|
||||
|
||||
```{bash}
|
||||
$ cd hello_world
|
||||
$ tree .
|
||||
.
|
||||
├── Cargo.toml
|
||||
└── src
|
||||
└── main.rs
|
||||
|
||||
1 directory, 2 files
|
||||
```
|
||||
|
||||
This is all we need to get started. First, let's check out `Cargo.toml`:
|
||||
|
||||
```{toml}
|
||||
[package]
|
||||
|
||||
name = "hello_world"
|
||||
version = "0.0.1"
|
||||
authors = ["Your Name <you@example.com>"]
|
||||
```
|
||||
|
||||
This is called a *manifest*, and it contains all of the metadata that Cargo
|
||||
needs to compile your project.
|
||||
|
||||
Here's what's in `src/main.rs`:
|
||||
|
||||
```{rust}
|
||||
fn main() {
|
||||
println!("Hello, world!");
|
||||
}
|
||||
```
|
||||
|
||||
Cargo generated a "Hello World" for us. We'll talk more about the syntax here
|
||||
later, but that's what Rust code looks like! Let's compile and run it:
|
||||
|
||||
```{bash}
|
||||
$ cargo run
|
||||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||||
Running `target/hello_world`
|
||||
Hello, world!
|
||||
```
|
||||
|
||||
Using an external dependency in Rust is incredibly easy. You add a line to
|
||||
your `Cargo.toml`:
|
||||
|
||||
```{toml}
|
||||
[package]
|
||||
|
||||
name = "hello_world"
|
||||
version = "0.0.1"
|
||||
authors = ["Your Name <someone@example.com>"]
|
||||
|
||||
[dependencies.semver]
|
||||
|
||||
git = "https://github.com/rust-lang/semver.git"
|
||||
```
|
||||
|
||||
You added the `semver` library, which parses version numbers and compares them
|
||||
according to the [SemVer specification](http://semver.org/).
|
||||
|
||||
Now, you can pull in that library using `extern crate` in
|
||||
`main.rs`.
|
||||
|
||||
```{rust,ignore}
|
||||
extern crate semver;
|
||||
|
||||
use semver::Version;
|
||||
|
||||
fn main() {
|
||||
assert!(Version::parse("1.2.3") == Ok(Version {
|
||||
major: 1u64,
|
||||
minor: 2u64,
|
||||
patch: 3u64,
|
||||
pre: vec!(),
|
||||
build: vec!(),
|
||||
}));
|
||||
|
||||
println!("Versions compared successfully!");
|
||||
}
|
||||
```
|
||||
|
||||
Again, we'll discuss the exact details of all of this syntax soon. For now,
|
||||
let's compile and run it:
|
||||
|
||||
```{bash}
|
||||
$ cargo run
|
||||
Updating git repository `https://github.com/rust-lang/semver.git`
|
||||
Compiling semver v0.0.1 (https://github.com/rust-lang/semver.git#bf739419)
|
||||
Compiling hello_world v0.0.1 (file:///home/you/projects/hello_world)
|
||||
Running `target/hello_world`
|
||||
Versions compared successfully!
|
||||
```
|
||||
|
||||
Because we only specified a repository without a version, if someone else were
|
||||
to try out our project at a later date, when `semver` was updated, they would
|
||||
get a different, possibly incompatible version. To solve this problem, Cargo
|
||||
produces a file, `Cargo.lock`, which records the versions of any dependencies.
|
||||
This gives us repeatable builds.
|
||||
|
||||
There is a lot more here, and this is a whirlwind tour, but you should feel
|
||||
right at home if you've used tools like [Bundler](http://bundler.io/),
|
||||
[npm](https://www.npmjs.org/), or [pip](https://pip.pypa.io/en/latest/).
|
||||
There's no `Makefile`s or endless `autotools` output here. (Rust's tooling does
|
||||
[play nice with external libraries written in those
|
||||
tools](http://doc.crates.io/build-script.html), if you need to.)
|
||||
|
||||
Enough about tools, let's talk code!
|
||||
|
||||
# Ownership
|
||||
|
||||
Rust's defining feature is "memory safety without garbage collection". Let's
|
||||
take a moment to talk about what that means. *Memory safety* means that the
|
||||
programming language eliminates certain kinds of bugs, such as [buffer
|
||||
overflows](https://en.wikipedia.org/wiki/Buffer_overflow) and [dangling
|
||||
pointers](https://en.wikipedia.org/wiki/Dangling_pointer). These problems occur
|
||||
when you have unrestricted access to memory. As an example, here's some Ruby
|
||||
code:
|
||||
|
||||
```{ruby}
|
||||
v = []
|
||||
|
||||
v.push("Hello")
|
||||
|
||||
x = v[0]
|
||||
|
||||
v.push("world")
|
||||
|
||||
puts x
|
||||
```
|
||||
|
||||
We make an array, `v`, and then call `push` on it. `push` is a method which
|
||||
adds an element to the end of an array.
|
||||
|
||||
Next, we make a new variable, `x`, that's equal to the first element of
|
||||
the array. Simple, but this is where the "bug" will appear.
|
||||
|
||||
Let's keep going. We then call `push` again, pushing "world" onto the
|
||||
end of the array. `v` now is `["Hello", "world"]`.
|
||||
|
||||
Finally, we print `x` with the `puts` method. This prints "Hello."
|
||||
|
||||
All good? Let's go over a similar, but subtly different example, in C++:
|
||||
|
||||
```{cpp}
|
||||
#include<iostream>
|
||||
#include<vector>
|
||||
#include<string>
|
||||
|
||||
int main() {
|
||||
std::vector<std::string> v;
|
||||
|
||||
v.push_back("Hello");
|
||||
|
||||
std::string& x = v[0];
|
||||
|
||||
v.push_back("world");
|
||||
|
||||
std::cout << x;
|
||||
}
|
||||
```
|
||||
|
||||
It's a little more verbose due to the static typing, but it's almost the same
|
||||
thing. We make a `std::vector` of `std::string`s, we call `push_back` (same as
|
||||
`push`) on it, take a reference to the first element of the vector, call
|
||||
`push_back` again, and then print out the reference.
|
||||
|
||||
There's two big differences here: one, they're not _exactly_ the same thing,
|
||||
and two...
|
||||
|
||||
```{bash}
|
||||
$ g++ hello.cpp -Wall -Werror
|
||||
$ ./a.out
|
||||
Segmentation fault (core dumped)
|
||||
```
|
||||
|
||||
A crash! (Note that this is actually system-dependent. Because referring to an
|
||||
invalid reference is undefined behavior, the compiler can do anything,
|
||||
including the right thing!) Even though we compiled with flags to give us as
|
||||
many warnings as possible, and to treat those warnings as errors, we got no
|
||||
errors. When we ran the program, it crashed.
|
||||
|
||||
Why does this happen? When we append to an array, its length changes. Since
|
||||
its length changes, we may need to allocate more memory. In Ruby, this happens
|
||||
as well, we just don't think about it very often. So why does the C++ version
|
||||
segfault when we allocate more memory?
|
||||
|
||||
The answer is that in the C++ version, `x` is a *reference* to the memory
|
||||
location where the first element of the array is stored. But in Ruby, `x` is a
|
||||
standalone value, not connected to the underlying array at all. Let's dig into
|
||||
the details for a moment. Your program has access to memory, provided to it by
|
||||
the operating system. Each location in memory has an address. So when we make
|
||||
our vector, `v`, it's stored in a memory location somewhere:
|
||||
|
||||
| location | name | value |
|
||||
|----------|------|-------|
|
||||
| 0x30 | v | |
|
||||
|
||||
(Address numbers made up, and in hexadecimal. Those of you with deep C++
|
||||
knowledge, there are some simplifications going on here, like the lack of an
|
||||
allocated length for the vector. This is an introduction.)
|
||||
|
||||
When we push our first string onto the array, we allocate some memory,
|
||||
and `v` refers to it:
|
||||
|
||||
| location | name | value |
|
||||
|----------|------|----------|
|
||||
| 0x30 | v | 0x18 |
|
||||
| 0x18 | | "Hello" |
|
||||
|
||||
We then make a reference to that first element. A reference is a variable
|
||||
that points to a memory location, so its value is the memory location of
|
||||
the `"Hello"` string:
|
||||
|
||||
| location | name | value |
|
||||
|----------|------|----------|
|
||||
| 0x30 | v | 0x18 |
|
||||
| 0x18 | | "Hello" |
|
||||
| 0x14 | x | 0x18 |
|
||||
|
||||
When we push `"world"` onto the vector with `push_back`, there's no room:
|
||||
we only allocated one element. So, we need to allocate two elements,
|
||||
copy the `"Hello"` string over, and update the reference. Like this:
|
||||
|
||||
| location | name | value |
|
||||
|----------|------|----------|
|
||||
| 0x30 | v | 0x08 |
|
||||
| 0x18 | | GARBAGE |
|
||||
| 0x14 | x | 0x18 |
|
||||
| 0x08 | | "Hello" |
|
||||
| 0x04 | | "world" |
|
||||
|
||||
Note that `v` now refers to the new list, which has two elements. It's all
|
||||
good. But our `x` didn't get updated! It still points at the old location,
|
||||
which isn't valid anymore. In fact, [the documentation for `push_back` mentions
|
||||
this](http://en.cppreference.com/w/cpp/container/vector/push_back):
|
||||
|
||||
> If the new `size()` is greater than `capacity()` then all iterators and
|
||||
> references (including the past-the-end iterator) are invalidated.
|
||||
|
||||
Finding where these iterators and references are is a difficult problem, and
|
||||
even in this simple case, `g++` can't help us here. While the bug is obvious in
|
||||
this case, in real code, it can be difficult to track down the source of the
|
||||
error.
|
||||
|
||||
Before we talk about this solution, why didn't our Ruby code have this problem?
|
||||
The semantics are a little more complicated, and explaining Ruby's internals is
|
||||
out of the scope of a guide to Rust. But in a nutshell, Ruby's garbage
|
||||
collector keeps track of references, and makes sure that everything works as
|
||||
you might expect. This comes at an efficiency cost, and the internals are more
|
||||
complex. If you'd really like to dig into the details, [this
|
||||
article](http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values)
|
||||
can give you more information.
|
||||
|
||||
Garbage collection is a valid approach to memory safety, but Rust chooses a
|
||||
different path. Let's examine what the Rust version of this looks like:
|
||||
|
||||
```{rust,ignore}
|
||||
fn main() {
|
||||
let mut v = vec![];
|
||||
|
||||
v.push("Hello");
|
||||
|
||||
let x = &v[0];
|
||||
|
||||
v.push("world");
|
||||
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
This looks like a bit of both: fewer type annotations, but we do create new
|
||||
variables with `let`. The method name is `push`, some other stuff is different,
|
||||
but it's pretty close. So what happens when we compile this code? Does Rust
|
||||
print `"Hello"`, or does Rust crash?
|
||||
|
||||
Neither. It refuses to compile:
|
||||
|
||||
```bash
|
||||
$ cargo run
|
||||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||||
main.rs:8:5: 8:6 error: cannot borrow `v` as mutable because it is also borrowed as immutable
|
||||
main.rs:8 v.push("world");
|
||||
^
|
||||
main.rs:6:14: 6:15 note: previous borrow of `v` occurs here; the immutable borrow prevents subsequent moves or mutable borrows of `v` until the borrow ends
|
||||
main.rs:6 let x = &v[0];
|
||||
^
|
||||
main.rs:11:2: 11:2 note: previous borrow ends here
|
||||
main.rs:1 fn main() {
|
||||
...
|
||||
main.rs:11 }
|
||||
^
|
||||
error: aborting due to previous error
|
||||
```
|
||||
|
||||
When we try to mutate the array by `push`ing it the second time, Rust throws
|
||||
an error. It says that we "cannot borrow v as mutable because it is also
|
||||
borrowed as immutable." What does it mean by "borrowed"?
|
||||
|
||||
In Rust, the type system encodes the notion of *ownership*. The variable `v`
|
||||
is an *owner* of the vector. When we make a reference to `v`, we let that
|
||||
variable (in this case, `x`) *borrow* it for a while. Just like if you own a
|
||||
book, and you lend it to me, I'm borrowing the book.
|
||||
|
||||
So, when I try to modify the vector with the second call to `push`, I need
|
||||
to be owning it. But `x` is borrowing it. You can't modify something that
|
||||
you've lent to someone. And so Rust throws an error.
|
||||
|
||||
So how do we fix this problem? Well, we can make a copy of the element:
|
||||
|
||||
|
||||
```{rust}
|
||||
fn main() {
|
||||
let mut v = vec![];
|
||||
|
||||
v.push("Hello");
|
||||
|
||||
let x = v[0].clone();
|
||||
|
||||
v.push("world");
|
||||
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
Note the addition of `clone()`. This creates a copy of the element, leaving
|
||||
the original untouched. Now, we no longer have two references to the same
|
||||
memory, and so the compiler is happy. Let's give that a try:
|
||||
|
||||
```{bash}
|
||||
$ cargo run
|
||||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||||
Running `target/hello_world`
|
||||
Hello
|
||||
```
|
||||
|
||||
Same result. Now, making a copy can be inefficient, so this solution may not be
|
||||
acceptable. There are other ways to get around this problem, but this is a toy
|
||||
example, and because we're in an introduction, we'll leave that for later.
|
||||
|
||||
The point is, the Rust compiler and its notion of ownership has saved us from a
|
||||
bug that would crash the program. We've achieved safety, at compile time,
|
||||
without needing to rely on a garbage collector to handle our memory.
|
||||
|
||||
# Concurrency
|
||||
|
||||
Rust's ownership model can help in other ways, as well. For example, take
|
||||
concurrency. Concurrency is a big topic, and an important one for any modern
|
||||
programming language. Let's take a look at how ownership can help you write
|
||||
safe concurrent programs.
|
||||
|
||||
Here's an example of a concurrent Rust program:
|
||||
|
||||
```{rust}
|
||||
# #![feature(scoped)]
|
||||
use std::thread;
|
||||
|
||||
fn main() {
|
||||
let guards: Vec<_> = (0..10).map(|_| {
|
||||
thread::scoped(|| {
|
||||
println!("Hello, world!");
|
||||
})
|
||||
}).collect();
|
||||
}
|
||||
```
|
||||
|
||||
This program creates ten threads, which all print `Hello, world!`. The `scoped`
|
||||
function takes one argument, a closure, indicated by the double bars `||`. This
|
||||
closure is executed in a new thread created by `scoped`. The method is called
|
||||
`scoped` because it returns a 'join guard', which will automatically join the
|
||||
child thread when it goes out of scope. Because we `collect` these guards into
|
||||
a `Vec<T>`, and that vector goes out of scope at the end of our program, our
|
||||
program will wait for every thread to finish before finishing.
|
||||
|
||||
One common form of problem in concurrent programs is a *data race*.
|
||||
This occurs when two different threads attempt to access the same
|
||||
location in memory in a non-synchronized way, where at least one of
|
||||
them is a write. If one thread is attempting to read, and one thread
|
||||
is attempting to write, you cannot be sure that your data will not be
|
||||
corrupted. Note the first half of that requirement: two threads that
|
||||
attempt to access the same location in memory. Rust's ownership model
|
||||
can track which pointers own which memory locations, which solves this
|
||||
problem.
|
||||
|
||||
Let's see an example. This Rust code will not compile:
|
||||
|
||||
```{rust,ignore}
|
||||
# #![feature(scoped)]
|
||||
use std::thread;
|
||||
|
||||
fn main() {
|
||||
let mut numbers = vec![1, 2, 3];
|
||||
|
||||
let guards: Vec<_> = (0..3).map(|i| {
|
||||
thread::scoped(move || {
|
||||
numbers[i] += 1;
|
||||
println!("numbers[{}] is {}", i, numbers[i]);
|
||||
})
|
||||
}).collect();
|
||||
}
|
||||
```
|
||||
|
||||
It gives us this error:
|
||||
|
||||
```text
|
||||
7:25: 10:6 error: cannot move out of captured outer variable in an `FnMut` closure
|
||||
7 thread::scoped(move || {
|
||||
8 numbers[i] += 1;
|
||||
9 println!("numbers[{}] is {}", i, numbers[i]);
|
||||
10 })
|
||||
error: aborting due to previous error
|
||||
```
|
||||
|
||||
This is a little confusing because there are two closures here: the one passed
|
||||
to `map`, and the one passed to `thread::scoped`. In this case, the closure for
|
||||
`thread::scoped` is attempting to reference `numbers`, a `Vec<i32>`. This
|
||||
closure is a `FnOnce` closure, as that’s what `thread::scoped` takes as an
|
||||
argument. `FnOnce` closures take ownership of their environment. That’s fine,
|
||||
but there’s one detail: because of `map`, we’re going to make three of these
|
||||
closures. And since all three try to take ownership of `numbers`, that would be
|
||||
a problem. That’s what it means by ‘cannot move out of captured outer
|
||||
variable’: our `thread::scoped` closure wants to take ownership, and it can’t,
|
||||
because the closure for `map` won’t let it.
|
||||
|
||||
What to do here? Rust has a type that helps us: `Mutex<T>`. Because the threads
|
||||
are scoped, it is possible to use an _immutable_ reference to `numbers` inside
|
||||
of the closure. However, Rust prevents us from having multiple _mutable_
|
||||
references to the same object, so we need a `Mutex` to be able to modify what
|
||||
we're sharing. A Mutex will synchronize our accesses, so that we can ensure
|
||||
that our mutation doesn't cause a data race.
|
||||
|
||||
Here's what using a Mutex looks like:
|
||||
|
||||
```{rust}
|
||||
# #![feature(scoped)]
|
||||
use std::thread;
|
||||
use std::sync::Mutex;
|
||||
|
||||
fn main() {
|
||||
let numbers = &Mutex::new(vec![1, 2, 3]);
|
||||
|
||||
let guards: Vec<_> = (0..3).map(|i| {
|
||||
thread::scoped(move || {
|
||||
let mut array = numbers.lock().unwrap();
|
||||
array[i] += 1;
|
||||
println!("numbers[{}] is {}", i, array[i]);
|
||||
})
|
||||
}).collect();
|
||||
}
|
||||
```
|
||||
|
||||
We first have to `use` the appropriate library, and then we wrap our vector in
|
||||
a `Mutex` with the call to `Mutex::new()`. Inside of the loop, the `lock()`
|
||||
call will return us a reference to the value inside the Mutex, and block any
|
||||
other calls to `lock()` until said reference goes out of scope.
|
||||
|
||||
We can compile and run this program without error, and in fact, see the
|
||||
non-deterministic aspect:
|
||||
|
||||
```{shell}
|
||||
$ cargo run
|
||||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||||
Running `target/hello_world`
|
||||
numbers[1] is 3
|
||||
numbers[0] is 2
|
||||
numbers[2] is 4
|
||||
$ cargo run
|
||||
Running `target/hello_world`
|
||||
numbers[2] is 4
|
||||
numbers[1] is 3
|
||||
numbers[0] is 2
|
||||
```
|
||||
|
||||
Each time, we can get a slightly different output because the threads are not
|
||||
guaranteed to run in any set order. If you get the same order every time it is
|
||||
because each of these threads are very small and complete too fast for their
|
||||
indeterminate behavior to surface.
|
||||
|
||||
The important part here is that the Rust compiler was able to use ownership to
|
||||
give us assurance _at compile time_ that we weren't doing something incorrect
|
||||
with regards to concurrency. In order to share ownership, we were forced to be
|
||||
explicit and use a mechanism to ensure that it would be properly handled.
|
||||
|
||||
# Safety _and_ Speed
|
||||
|
||||
Safety and speed are always presented as a continuum. At one end of the spectrum,
|
||||
you have maximum speed, but no safety. On the other end, you have absolute safety
|
||||
with no speed. Rust seeks to break out of this paradigm by introducing safety at
|
||||
compile time, ensuring that you haven't done anything wrong, while compiling to
|
||||
the same low-level code you'd expect without the safety.
|
||||
|
||||
As an example, Rust's ownership system is _entirely_ at compile time. The
|
||||
safety check that makes this an error about moved values:
|
||||
|
||||
```{rust,ignore}
|
||||
# #![feature(scoped)]
|
||||
use std::thread;
|
||||
|
||||
fn main() {
|
||||
let numbers = vec![1, 2, 3];
|
||||
|
||||
let guards: Vec<_> = (0..3).map(|i| {
|
||||
thread::scoped(move || {
|
||||
println!("{}", numbers[i]);
|
||||
})
|
||||
}).collect();
|
||||
}
|
||||
```
|
||||
|
||||
carries no runtime penalty. And while some of Rust's safety features do have
|
||||
a run-time cost, there's often a way to write your code in such a way that
|
||||
you can remove it. As an example, this is a poor way to iterate through
|
||||
a vector:
|
||||
|
||||
```{rust}
|
||||
let vec = vec![1, 2, 3];
|
||||
|
||||
for i in 0..vec.len() {
|
||||
println!("{}", vec[i]);
|
||||
}
|
||||
```
|
||||
|
||||
The reason is that the access of `vec[i]` does bounds checking, to ensure
|
||||
that we don't try to access an invalid index. However, we can remove this
|
||||
while retaining safety. The answer is iterators:
|
||||
|
||||
```{rust}
|
||||
let vec = vec![1, 2, 3];
|
||||
|
||||
for x in &vec {
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
This version uses an iterator that yields each element of the vector in turn.
|
||||
Because we have a reference to the element, rather than the whole vector itself,
|
||||
there's no array access bounds to check.
|
||||
|
||||
# Learning More
|
||||
|
||||
I hope that this taste of Rust has given you an idea if Rust is the right
|
||||
language for you. We talked about Rust's tooling, how encoding ownership into
|
||||
the type system helps you find bugs, how Rust can help you write correct
|
||||
concurrent code, and how you don't have to pay a speed cost for much of this
|
||||
safety.
|
||||
|
||||
To continue your Rustic education, read [The Rust Programming
|
||||
Language](book/index.html) for a more in-depth exploration of Rust's syntax and
|
||||
concepts.
|
||||
[intro]: book/README.html
|
||||
|
Loading…
Reference in New Issue
Block a user