Go over the tutorial again

Edit some things, make sure all code runs.
This commit is contained in:
Marijn Haverbeke 2012-01-12 12:56:23 +01:00
parent 44352df57c
commit 0f72c53fdf
14 changed files with 342 additions and 329 deletions

View File

@ -3,36 +3,39 @@
Rust datatypes are not trivial to copy (the way, for example,
JavaScript values can be copied by simply taking one or two machine
words and plunking them somewhere else). Shared boxes require
reference count updates, big records or tags require an arbitrary
amount of data to be copied (plus updating the reference counts of
shared boxes hanging off them), unique pointers require their origin
to be de-initialized.
reference count updates, big records, tags, or unique pointers require
an arbitrary amount of data to be copied (plus updating the reference
counts of shared boxes hanging off them).
For this reason, the way Rust passes arguments to functions is a bit
more involved than it is in most languages. It performs some
compile-time cleverness to get rid of most of the cost of copying
arguments, and forces you to put in explicit copy operators in the
places where it can not.
For this reason, the default calling convention for Rust functions
leaves ownership of the arguments with the caller. The caller
guarantees that the arguments will outlive the call, the callee merely
gets access to them.
## Safe references
The foundation of Rust's argument-passing optimization is the fact
that Rust tasks for single-threaded worlds, which share no data with
other tasks, and that most data is immutable.
There is one catch with this approach: sometimes the compiler can
*not* statically guarantee that the argument value at the caller side
will survive to the end of the call. Another argument might indirectly
refer to it and be used to overwrite it, or a closure might assign a
new value to it.
Fortunately, Rust tasks are single-threaded worlds, which share no
data with other tasks, and that most data is immutable. This allows
most argument-passing situations to be proved safe without further
difficulty.
Take the following program:
# fn get_really_big_record() -> int { 1 }
# fn myfunc(a: int) {}
let x = get_really_big_record();
myfunc(x);
fn main() {
let x = get_really_big_record();
myfunc(x);
}
We want to pass `x` to `myfunc` by pointer (which is easy), *and* we
want to ensure that `x` stays intact for the duration of the call
(which, in this example, is also easy). So we can just use the
existing value as the argument, without copying.
There are more involved cases. The call could look like this:
Here we know for sure that no one else has access to the `x` variable
in `main`, so we're good. But the call could also look like this:
# fn myfunc(a: int, b: block()) {}
# fn get_another_record() -> int { 1 }
@ -43,14 +46,11 @@ Now, if `myfunc` first calls its second argument and then accesses its
first argument, it will see a different value from the one that was
passed to it.
The compiler will insert an implicit copy of `x` in such a case,
In such a case, the compiler will insert an implicit copy of `x`,
*except* if `x` contains something mutable, in which case a copy would
result in code that behaves differently (if you mutate the copy, `x`
stays unchanged). That would be bad, so the compiler will disallow
such code.
When inserting an implicit copy for something big, the compiler will
warn, so that you know that the code is not as efficient as it looks.
result in code that behaves differently. If copying `x` might be
expensive (for example, if it holds a vector), the compiler will emit
a warning.
There are even more tricky cases, in which the Rust compiler is forced
to pessimistically assume a value will get mutated, even though it is
@ -81,51 +81,59 @@ with the `copy` operator:
for elt in v { iter(copy elt); }
}
## Argument passing styles
The fact that arguments are conceptually passed by safe reference does
not mean all arguments are passed by pointer. Composite types like
records and tags *are* passed by pointer, but others, like integers
and pointers, are simply passed by value.
It is possible, when defining a function, to specify a passing style
for a parameter by prefixing the parameter name with a symbol. The
most common special style is by-mutable-reference, written `&`:
fn vec_push(&v: [int], elt: int) {
v += [elt];
}
This will make it possible for the function to mutate the parameter.
Clearly, you are only allowed to pass things that can actually be
mutated to such a function.
Another style is by-move, which will cause the argument to become
de-initialized on the caller side, and give ownership of it to the
called function. This is written `-`.
Finally, the default passing styles (by-value for non-structural
types, by-reference for structural ones) are written `++` for by-value
and `&&` for by(-immutable)-reference. It is sometimes necessary to
override the defaults. We'll talk more about this when discussing
[generics][gens].
[gens]: generic.html
Adding a `copy` operator is also the way to muffle warnings about
implicit copies.
## Other uses of safe references
Safe references are not only used for argument passing. When you
destructure on a value in an `alt` expression, or loop over a vector
with `for`, variables bound to the inside of the given data structure
will use safe references, not copies. This means such references have
little overhead, but you'll occasionally have to copy them to ensure
will use safe references, not copies. This means such references are
very cheap, but you'll occasionally have to copy them to ensure
safety.
let my_rec = {a: 4, b: [1, 2, 3]};
alt my_rec {
{a, b} {
log b; // This is okay
log(info, b); // This is okay
my_rec = {a: a + 1, b: b + [a]};
log b; // Here reference b has become invalid
log(info, b); // Here reference b has become invalid
}
}
## Argument passing styles
The fact that arguments are conceptually passed by safe reference does
not mean all arguments are passed by pointer. Composite types like
records and tags *are* passed by pointer, but single-word values, like
integers and pointers, are simply passed by value. Most of the time,
the programmer does not have to worry about this, as the compiler will
simply pick the most efficient passing style. There is one exception,
which will be described in the section on [generics](generic.html).
To explicitly set the passing-style for a parameter, you prefix the
argument name with a sigil. There are two special passing styles that
are often useful. The first is by-mutable-pointer, written with a
single `&`:
fn vec_push(&v: [int], elt: int) {
v += [elt];
}
This allows the function to mutate the value of the argument, *in the
caller's context*. Clearly, you are only allowed to pass things that
can actually be mutated to such a function.
Then there is the by-copy style, written `+`. This indicates that the
function wants to take ownership of the argument value. If the caller
does not use the argument after the call, it will be 'given' to the
callee. Otherwise a copy will be made. This mode is mostly used for
functions that construct data structures. The argument will end up
being owned by the data structure, so if that can be done without a
copy, that's a win.
type person = {name: str, address: str};
fn make_person(+name: str, +address: str) -> person {
ret {name: name, address: address};
}

View File

@ -67,9 +67,9 @@ that `(float, float)` is a tuple of two floats:
fn angle(vec: (float, float)) -> float {
alt vec {
(0f, y) if y < 0f { 1.5 * std::math::pi }
(0f, y) { 0.5 * std::math::pi }
(x, y) { std::math::atan(y / x) }
(0f, y) if y < 0f { 1.5 * float::consts::pi }
(0f, y) { 0.5 * float::consts::pi }
(x, y) { float::atan(y / x) }
}
}
@ -79,7 +79,7 @@ y)` matches any tuple whose first element is zero, and binds `y` to
the second element. `(x, y)` matches any tuple, and binds both
elements to a variable.
Any `alt` arm can have a guard clause (written `when EXPR`), which is
Any `alt` arm can have a guard clause (written `if EXPR`), which is
an expression of type `bool` that determines, after the pattern is
found to match, whether the arm is taken or not. The variables bound
by the pattern are available in this guard expression.
@ -111,7 +111,7 @@ to abort the current iteration and continue with the next.
while true {
x += x - 3;
if x % 5 == 0 { break; }
std::io::println(std::int::str(x));
std::io::println(int::str(x));
}
This code prints out a weird sequence of numbers and stops as soon as
@ -161,24 +161,32 @@ Logging is polymorphic—any type of value can be logged, and the
runtime will do its best to output a textual representation of the
value.
log "hi";
log (1, [2.5, -1.8]);
log(warn, "hi");
log(error, (1, [2.5, -1.8]));
By default, you *will not* see the output of your log statements. The
environment variable `RUST_LOG` controls which log statements actually
get output. It can contain a comma-separated list of paths for modules
that should be logged. For example, running `rustc` with
`RUST_LOG=rustc::front::attr` will turn on logging in its attribute
parser. If you compile a program `foo.rs`, you can set `RUST_LOG` to
`foo` to enable its logging.
The first argument is the log level (levels `info`, `warn`, and
`error` are predefined), and the second is the value to log. By
default, you *will not* see the output of that first log statement,
which has `warn` level. The environment variable `RUST_LOG` controls
which log level is used. It can contain a comma-separated list of
paths for modules that should be logged. For example, running `rustc`
with `RUST_LOG=rustc::front::attr` will turn on logging in its
attribute parser. If you compile a program named `foo.rs`, its
top-level module will be called `foo`, and you can set `RUST_LOG` to
`foo` to enable `warn` and `info` logging for the module.
Turned-off `log` statements impose minimal overhead on the code that
contains them, so except in code that needs to be really, really fast,
you should feel free to scatter around debug logging statements, and
leave them in.
For interactive debugging, you often want unconditional logging. For
this, use `log_err` instead of `log` [FIXME better name].
Three macros that combine text-formatting (as with `#fmt`) and logging
are available. These take a string and any number of format arguments,
and will log the formatted string:
# fn get_error_string() -> str { "boo" }
#warn("only %d seconds remaining", 10);
#error("fatal: %s", get_error_string());
## Assertions

View File

@ -1,11 +1,11 @@
# Datatypes
Rust datatypes are, by default, immutable. The core datatypes of Rust
are structural records and 'tags' (tagged unions, algebraic data
are structural records and 'enums' (tagged unions, algebraic data
types).
type point = {x: float, y: float};
tag shape {
enum shape {
circle(point, float);
rectangle(point, point);
}
@ -26,8 +26,8 @@ example...
type stack = {content: [int], mutable head: uint};
With such a type, you can do `mystack.head += 1u`. When the `mutable`
is omitted from the type, such an assignment would result in a type
With such a type, you can do `mystack.head += 1u`. If `mutable` were
omitted from the type, such an assignment would result in a type
error.
To 'update' an immutable record, you use functional record update
@ -67,13 +67,13 @@ same order they appear in the type. When you are not interested in all
the fields of a record, a record pattern may end with `, _` (as in
`{field1, _}`) to indicate that you're ignoring all other fields.
## Tags
## Enums
Tags [FIXME terminology] are datatypes that have several different
representations. For example, the type shown earlier:
Enums are datatypes that have several different representations. For
example, the type shown earlier:
# type point = {x: float, y: float};
tag shape {
enum shape {
circle(point, float);
rectangle(point, point);
}
@ -90,10 +90,10 @@ which can be used to construct values of the type (taking arguments of
the specified types). So `circle({x: 0f, y: 0f}, 10f)` is the way to
create a new circle.
Tag variants do not have to have parameters. This, for example, is
equivalent to an `enum` in C:
Enum variants do not have to have parameters. This, for example, is
equivalent to a C enum:
tag direction {
enum direction {
north;
east;
south;
@ -103,36 +103,36 @@ equivalent to an `enum` in C:
This will define `north`, `east`, `south`, and `west` as constants,
all of which have type `direction`.
<a name="single_variant_tag"></a>
<a name="single_variant_enum"></a>
There is a special case for tags with a single variant. These are used
to define new types in such a way that the new name is not just a
There is a special case for enums with a single variant. These are
used to define new types in such a way that the new name is not just a
synonym for an existing type, but its own distinct type. If you say:
tag gizmo_id = int;
enum gizmo_id = int;
That is a shorthand for this:
tag gizmo_id { gizmo_id(int); }
enum gizmo_id { gizmo_id(int); }
Tag types like this can have their content extracted with the
Enum types like this can have their content extracted with the
dereference (`*`) unary operator:
# tag gizmo_id = int;
# enum gizmo_id = int;
let my_gizmo_id = gizmo_id(10);
let id_int: int = *my_gizmo_id;
## Tag patterns
## Enum patterns
For tag types with multiple variants, destructuring is the only way to
For enum types with multiple variants, destructuring is the only way to
get at their contents. All variant constructors can be used as
patterns, as in this definition of `area`:
# type point = {x: float, y: float};
# tag shape { circle(point, float); rectangle(point, point); }
# enum shape { circle(point, float); rectangle(point, point); }
fn area(sh: shape) -> float {
alt sh {
circle(_, size) { std::math::pi * size * size }
circle(_, size) { float::consts::pi * size * size }
rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) }
}
}
@ -142,7 +142,7 @@ a dot at the end) to match them in a pattern. This to prevent
ambiguity between matching a variant name and binding a new variable.
# type point = {x: float, y: float};
# tag direction { north; east; south; west; }
# enum direction { north; east; south; west; }
fn point_from_direction(dir: direction) -> point {
alt dir {
north. { {x: 0f, y: 1f} }
@ -161,22 +161,22 @@ Tuples can have any arity except for 0 or 1 (though you may see nil,
let mytup: (int, int, float) = (10, 20, 30.0);
alt mytup {
(a, b, c) { log a + b + (c as int); }
(a, b, c) { log(info, a + b + (c as int)); }
}
## Pointers
In contrast to a lot of modern languages, record and tag types in Rust
are not represented as pointers to allocated memory. They are, like in
C and C++, represented directly. This means that if you `let x = {x:
1f, y: 1f};`, you are creating a record on the stack. If you then copy
it into a data structure, the whole record is copied, not just a
pointer.
In contrast to a lot of modern languages, record and enum types in
Rust are not represented as pointers to allocated memory. They are,
like in C and C++, represented directly. This means that if you `let x
= {x: 1f, y: 1f};`, you are creating a record on the stack. If you
then copy it into a data structure, the whole record is copied, not
just a pointer.
For small records like `point`, this is usually still more efficient
than allocating memory and going through a pointer. But for big
records, or records with mutable fields, it can be useful to have a
single copy on the heap, and refer to that through a pointer.
For small records like `point`, this is usually more efficient than
allocating memory and going through a pointer. But for big records, or
records with mutable fields, it can be useful to have a single copy on
the heap, and refer to that through a pointer.
Rust supports several types of pointers. The simplest is the unsafe
pointer, written `*TYPE`, which is a completely unchecked pointer
@ -194,7 +194,7 @@ Shared boxes are pointers to heap-allocated, reference counted memory.
A cycle collector ensures that circular references do not result in
memory leaks.
Creating a shared box is done by simply applying the binary `@`
Creating a shared box is done by simply applying the unary `@`
operator to an expression. The result of the expression will be boxed,
resulting in a box of the right type. For example:
@ -221,11 +221,8 @@ box exists at any time.
This is where the 'move' (`<-`) operator comes in. It is similar to
`=`, but it de-initializes its source. Thus, the unique box can move
from `x` to `y`, without violating the constraint that it only has a
single owner.
NOTE: If you do `y = x` instead, the box will be copied. We should
emit warning for this, or disallow it entirely, but do not currently
do so.
single owner (if you used assignment instead of the move operator, the
box would, in principle, be copied).
Unique boxes, when they do not contain any shared boxes, can be sent
to other tasks. The sending task will give up ownership of the box,
@ -249,10 +246,10 @@ Rust vectors are always heap-allocated and unique. A value of type
containing any number of `TYPE` values.
NOTE: This uniqueness is turning out to be quite awkward in practice,
and might change.
and might change in the future.
Vector literals are enclosed in square brackets. Dereferencing is done
with square brackets (and zero-based):
with square brackets (zero-based):
let myvec = [true, false, true, false];
if myvec[1] { std::io::println("boom"); }
@ -262,8 +259,8 @@ The type written as `[mutable TYPE]` is a vector with mutable
elements. Mutable vector literals are written `[mutable]` (empty) or
`[mutable 1, 2, 3]` (with elements).
Growing a vector in Rust is not as inefficient as it looks (the `+`
operator means concatenation when applied to vector types):
The `+` operator means concatenation when applied to vector types.
Growing a vector in Rust is not as inefficient as it looks :
let myvec = [], i = 0;
while i < 100 {
@ -286,17 +283,17 @@ null byte (for interoperability with C APIs).
This sequence of bytes is interpreted as an UTF-8 encoded sequence of
characters. This has the advantage that UTF-8 encoded I/O (which
should really be the goal for modern systems) is very fast, and that
strings have, for most intents and purposes, a nicely compact
should really be the default for modern systems) is very fast, and
that strings have, for most intents and purposes, a nicely compact
representation. It has the disadvantage that you only get
constant-time access by byte, not by character.
A lot of algorithms don't need constant-time indexed access (they
iterate over all characters, which `std::str::chars` helps with), and
iterate over all characters, which `str::chars` helps with), and
for those that do, many don't need actual characters, and can operate
on bytes. For algorithms that do really need to index by character,
there's the option to convert your string to a character vector (using
`std::str::to_chars`).
`str::to_chars`).
Like vectors, strings are always unique. You can wrap them in a shared
box to share them. Unlike vectors, there is no mutable variant of

View File

@ -13,7 +13,6 @@ hexadecimal string and prints to standard output. If you have the
OpenSSL libraries installed, it should 'just work'.
use std;
import std::{vec, str};
native mod crypto {
fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
@ -28,7 +27,7 @@ OpenSSL libraries installed, it should 'just work'.
fn sha1(data: str) -> str unsafe {
let bytes = str::bytes(data);
let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),
vec::len(bytes), std::ptr::null());
vec::len(bytes), ptr::null());
ret as_hex(vec::unsafe::from_buf(hash, 20u));
}
@ -109,13 +108,12 @@ null pointers.
The `sha1` function is the most obscure part of the program.
# import std::{str, vec};
# mod crypto { fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out } }
# fn as_hex(data: [u8]) -> str { "hi" }
fn sha1(data: str) -> str unsafe {
let bytes = str::bytes(data);
let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),
vec::len(bytes), std::ptr::null());
vec::len(bytes), ptr::null());
ret as_hex(vec::unsafe::from_buf(hash, 20u));
}
@ -134,7 +132,7 @@ caused by some unsafe code.
Unsafe blocks isolate unsafety. Unsafe functions, on the other hand,
advertise it to the world. An unsafe function is written like this:
unsafe fn kaboom() { log "I'm harmless!"; }
unsafe fn kaboom() { "I'm harmless!"; }
This function can only be called from an unsafe block or another
unsafe function.
@ -147,13 +145,12 @@ Rust's safety mechanisms.
Let's look at our `sha1` function again.
# import std::{str, vec};
# mod crypto { fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out } }
# fn as_hex(data: [u8]) -> str { "hi" }
# fn x(data: str) -> str unsafe {
let bytes = str::bytes(data);
let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),
vec::len(bytes), std::ptr::null());
vec::len(bytes), ptr::null());
ret as_hex(vec::unsafe::from_buf(hash, 20u));
# }
@ -195,7 +192,7 @@ microsecond-resolution timer.
}
fn unix_time_in_microseconds() -> u64 unsafe {
let x = {mutable tv_sec: 0u32, mutable tv_usec: 0u32};
libc::gettimeofday(std::ptr::addr_of(x), std::ptr::null());
libc::gettimeofday(ptr::addr_of(x), ptr::null());
ret (x.tv_sec as u64) * 1000_000_u64 + (x.tv_usec as u64);
}

View File

@ -11,7 +11,7 @@ also return a value by having its top level block produce an
expression (by omitting the final semicolon).
Some functions (such as the C function `exit`) never return normally.
In Rust, these are annotated with return type `!`:
In Rust, these are annotated with the pseudo-return type '`!`':
fn dead_end() -> ! { fail; }
@ -21,7 +21,7 @@ expected to return.
# fn can_go_left() -> bool { true }
# fn can_go_right() -> bool { true }
# tag dir { left; right; }
# enum dir { left; right; }
# fn dead_end() -> ! { fail; }
let dir = if can_go_left() { left }
else if can_go_right() { right }
@ -29,93 +29,113 @@ expected to return.
## Closures
Named rust functions, like those in the previous section, do not close
over their environment. Rust also includes support for closures, which
are anonymous functions that can access the variables that were in
scope at the time the closure was created. Closures are represented
as the pair of a function pointer (as in C) and the environment, which
is where the values of the closed over variables are stored. Rust
includes support for three varieties of closure, each with different
costs and capabilities:
Named functions, like those in the previous section, do not close over
their environment. Rust also includes support for closures, which are
functions that can access variables in the scope in which they are
created.
- Stack closures (written `block`) store their environment in the
stack frame of their creator; they are very lightweight but cannot
be stored in a data structure.
- Boxed closures (written `fn@`) store their environment in a
[shared box](data#shared-box). These are good for storing within
data structures but cannot be sent to another task.
- Unique closures (written `fn~`) store their environment in a
[unique box](data#unique-box). These are limited in the kinds of
data that they can close over so that they can be safely sent
between tasks. As with any unique pointer, copying a unique closure
results in a deep clone of the environment.
Both boxed closures and unique closures are subtypes of stack
closures, meaning that wherever a stack closure type appears, a boxed
or unique closure value can be used. This is due to the restrictions
placed on the use of stack closures, which ensure that all operations
on a stack closure are also safe on any kind of closure.
There are several forms of closures, each with its own role. The most
common type is called a 'block', this is a closure which has full
access to its environment.
### Working with closures
Closures are specified by writing an inline, anonymous function
declaration. For example, the following code creates a boxed closure:
let plus_two = fn@(x: int) -> int {
ret x + 2;
};
fn call_block_with_ten(b: block(int)) { b(10); }
Creating a unique closure is very similar:
let plus_two_uniq = fn~(x: int) -> int {
ret x + 2;
};
Stack closures can be created in a similar way; however, because stack
closures literally point into their creator's stack frame, they can
only be used in a very specific way. Stack closures may be passed as
parameters and they may be called, but they may not be stored into
local variables or fields. Creating a stack closure can therefore be
done using a syntax like the following:
let doubled = vec::map([1, 2, 3], block(x: int) -> int {
x * 2
let x = 20;
call_block_with_ten({|arg|
#info("x=%d, arg=%d", x, arg);
});
This defines a function that accepts a block, and then calls it with a
simple block that executes a log statement, accessing both its
argument and the variable `x` from its environment.
Blocks can only be used in a restricted way, because it is not allowed
to survive the scope in which it was created. They are allowed to
appear in function argument position and in call position, but nowhere
else.
### Boxed closures
When you need to store a closure in a data structure, a block will not
do, since the compiler will refuse to let you store it. For this
purpose, Rust provides a type of closure that has an arbitrary
lifetime, written `fn@` (boxed closure, analogous to the `@` pointer
type described in the next section).
A boxed closure does not directly access its environment, but merely
copies out the values that it closes over into a private data
structure. This means that it can not assign to these variables, and
will not 'see' updates to them.
This code creates a closure that adds a given string to its argument,
returns it from a function, and then calls it:
use std;
Here the `vec::map()` is the standard higher-order map function, which
applies the closure to each item in the vector and returns a new
vector containing the results.
fn mk_appender(suffix: str) -> fn@(str) -> str {
let f = fn@(s: str) -> str { s + suffix };
ret f;
}
fn main() {
let shout = mk_appender("!");
std::io::println(shout("hey ho, let's go"));
}
### Closure compatibility
A nice property of Rust closures is that you can pass any kind of
closure (as long as the arguments and return types match) to functions
that expect a `block`. Thus, when writing a higher-order function that
wants to do nothing with its function argument beyond calling it, you
should almost always specify the type of that argument as `block`, so
that callers have the flexibility to pass whatever they want.
fn call_twice(f: block()) { f(); f(); }
call_twice({|| "I am a block"; });
call_twice(fn@() { "I am a boxed closure"; });
fn bare_function() { "I am a plain function"; }
call_twice(bare_function);
### Unique closures
<a name="unique"></a>
Unique closures, written `fn~` in analogy to the `~` pointer type (see
next section), hold on to things that can safely be sent between
processes. They copy the values they close over, much like boxed
closures, but they also 'own' them—meaning no other code can access
them. Unique closures mostly exist to for spawning new
[tasks](task.html).
### Shorthand syntax
The syntax in the previous section was very explicit; it fully
specifies the kind of closure as well as the type of every parameter
and the return type. In practice, however, closures are often used as
parameters to functions, and all of these details can be inferred.
Therefore, we support a shorthand syntax similar to Ruby or Smalltalk
blocks, which looks as follows:
let doubled = vec::map([1, 2, 3], {|x| x*2});
Here the vertical bars after the open brace `{` indicate that this is
a closure. A list of parameters appears between the bars. The bars
must always be present: if there are no arguments, then simply write
`{||...}`.
The compact syntax used for blocks (`{|arg1, arg2| body}`) can also
be used to express boxed and unique closures in situations where the
closure style can be unambiguously derived from the context. Most
notably, when calling a higher-order function you do not have to use
the long-hand syntax for the function you're passing, since the
compiler can look at the argument type to find out what the parameter
types are.
As a further simplification, if the final parameter to a function is a
closure, the closure need not be placed within parenthesis.
Therefore, one could write
closure, the closure need not be placed within parenthesis. You could,
for example, write...
let doubled = vec::map([1, 2, 3]) {|x| x*2};
This form is often easier to parse as it involves less nesting.
`vec::map` is a function in the core library that applies its last
argument to every element of a vector, producing a new vector.
Even when a closure takes no parameters, you must still write the bars
for the parameter list, as in `{|| ...}`.
## Binding
Partial application is done using the `bind` keyword in Rust.
let daynum = bind std::vec::position(_, ["mo", "tu", "we", "do",
"fr", "sa", "su"]);
let daynum = bind vec::position(_, ["mo", "tu", "we", "do",
"fr", "sa", "su"]);
Binding a function produces a boxed closure (`fn@` type) in which some
of the arguments to the bound function have already been provided.
@ -129,7 +149,7 @@ iteration constructs. For example, this one iterates over a vector
of integers backwards:
fn for_rev(v: [int], act: block(int)) {
let i = std::vec::len(v);
let i = vec::len(v);
while (i > 0u) {
i -= 1u;
act(v[i]);
@ -139,7 +159,7 @@ of integers backwards:
To run such an iteration, you could do this:
# fn for_rev(v: [int], act: block(int)) {}
for_rev([1, 2, 3], {|n| log n; });
for_rev([1, 2, 3], {|n| log(error, n); });
Making use of the shorthand where a final closure argument can be
moved outside of the parentheses permits the following, which
@ -147,41 +167,8 @@ looks quite like a normal loop:
# fn for_rev(v: [int], act: block(int)) {}
for_rev([1, 2, 3]) {|n|
log n;
log(error, n);
}
Note that, because `for_rev()` returns unit type, no semicolon is
needed when the final closure is pulled outside of the parentheses.
## Capture clauses
When creating a boxed or unique closure, the default is to copy in the
values of any closed over variables. But sometimes, particularly if a
value is large or expensive to copy, you would like to *move* the
value into the closure instead. Rust supports this via the use of a
capture clause, which lets you specify precisely whether each variable
used in the closure is copied or moved.
As an example, let's assume we had some type of unique tree type:
tag tree<T> = tree_rec<T>;
type tree_rec<T> = ~{left: option<tree>, right: option<tree>, val: T};
Now if we have a function like the following:
let some_tree: tree<T> = ...;
let some_closure = fn~() {
... use some_tree in some way ...
};
Here the variable `some_tree` is used within the closure body, so a
deep copy will be performed. This can become quite expensive if the
tree is large. If we know that `some_tree` will not be used again,
we could avoid this expense by making use of a capture clause like so:
let some_tree: tree<T> = ...;
let some_closure = fn~[move some_tree]() {
... use some_tree in some way ...
};
This is particularly useful when moving data into [child tasks](task).

View File

@ -2,21 +2,20 @@
## Generic functions
Throughout this tutorial, I've been defining functions like `map` and
`for_rev` to take vectors of integers. It is 2011, and we no longer
expect to be defining such functions again and again for every type
they apply to. Thus, Rust allows functions and datatypes to have type
parameters.
Throughout this tutorial, I've been defining functions like `for_rev`
that act only on integers. It is 2012, and we no longer expect to be
defining such functions again and again for every type they apply to.
Thus, Rust allows functions and datatypes to have type parameters.
fn for_rev<T>(v: [T], act: block(T)) {
let i = std::vec::len(v);
let i = vec::len(v);
while i > 0u {
i -= 1u;
act(v[i]);
}
}
fn map<T, U>(f: block(T) -> U, v: [T]) -> [U] {
fn map<T, U>(v: [T], f: block(T) -> U) -> [U] {
let acc = [];
for elt in v { acc += [f(elt)]; }
ret acc;
@ -32,21 +31,21 @@ can't look inside them, but you can pass them around.
## Generic datatypes
Generic `type` and `tag` declarations follow the same pattern:
Generic `type` and `enum` declarations follow the same pattern:
type circular_buf<T> = {start: uint,
end: uint,
buf: [mutable T]};
tag option<T> { some(T); none; }
enum option<T> { some(T); none; }
You can then declare a function to take a `circular_buf<u8>` or return
an `option<str>`, or even an `option<T>` if the function itself is
generic.
The `option` type given above exists in the standard library as
`std::option::t`, and is the way Rust programs express the thing that
in C would be a nullable pointer. The nice part is that you have to
The `option` type given above exists in the core library as
`option::t`, and is the way Rust programs express the thing that in C
would be a nullable pointer. The nice part is that you have to
explicitly unpack an `option` type, so accidental null pointer
dereferences become impossible.
@ -55,17 +54,17 @@ dereferences become impossible.
Rust's type inferrer works very well with generics, but there are
programs that just can't be typed.
let n = std::option::none;
# n = std::option::some(1);
let n = option::none;
# n = option::some(1);
If you never do anything else with `n`, the compiler will not be able
to assign a type to it. (The same goes for `[]`, in fact.) If you
really want to have such a statement, you'll have to write it like
to assign a type to it. (The same goes for `[]`, the empty vector.) If
you really want to have such a statement, you'll have to write it like
this:
let n2: std::option::t<int> = std::option::none;
let n2: option::t<int> = option::none;
// or
let n = std::option::none::<int>;
let n = option::none::<int>;
Note that, in a value expression, `<` already has a meaning as a
comparison operator, so you'll have to write `::<T>` to explicitly
@ -76,7 +75,7 @@ is rarely necessary.
There are two built-in operations that, perhaps surprisingly, act on
values of any type. It was already mentioned earlier that `log` can
take any type of value and output it as a string.
take any type of value and output it.
More interesting is that Rust also defines an ordering for values of
all datatypes, and allows you to meaningfully apply comparison
@ -99,10 +98,11 @@ parameter `T`, can you copy values of that type? In Rust, you can't,
unless you explicitly declare that type parameter to have copyable
'kind'. A kind is a type of type.
## ignore
// This does not compile
fn head_bad<T>(v: [T]) -> T { v[0] }
// This does
fn head<T:copy>(v: [T]) -> T { v[0] }
fn head<T: copy>(v: [T]) -> T { v[0] }
When instantiating a generic function, you can only instantiate it
with types that fit its kinds. So you could not apply `head` to a
@ -116,12 +116,14 @@ with the `send` keyword to make them sendable.
Sendable types are a subset of copyable types. They are types that do
not contain shared (reference counted) types, which are thus uniquely
owned by the function that owns them, and can be sent over channels to
other tasks. Most of the generic functions in the `std::comm` module
other tasks. Most of the generic functions in the core `comm` module
take sendable types.
## Generic functions and argument-passing
If you try this program:
The previous section mentioned that arguments are passed by pointer or
by value based on their type. There is one situation in which this is
difficult. If you try this program:
# fn map(f: block(int) -> int, v: [int]) {}
fn plus1(x: int) -> int { x + 1 }
@ -133,7 +135,8 @@ pointer, so `map` expects a function that takes its argument by
pointer. The `plus1` you defined, however, uses the default, efficient
way to pass integers, which is by value. To get around this issue, you
have to explicitly mark the arguments to a function that you want to
pass to a generic higher-order function as being passed by pointer:
pass to a generic higher-order function as being passed by pointer,
using the `&&` sigil:
# fn map<T, U>(f: block(T) -> U, v: [T]) {}
fn plus1(&&x: int) -> int { x + 1 }

View File

@ -8,8 +8,6 @@ programmed in one or more other languages before. The tutorial covers
the whole language, though not with the depth and precision of the
[language reference][1].
FIXME: maybe also the stdlib?
[1]: http://www.rust-lang.org/doc/rust.html
## Disclaimer
@ -55,5 +53,5 @@ identifiers defined in the example code are displayed in `code font`.
Code snippets are indented, and also shown in a monospace font. Not
all snippets constitute whole programs. For brevity, we'll often show
fragments of programs that don't compile on their own. To try them
out, you'll have to wrap them in `fn main() { ... }`, and make sure
out, you might have to wrap them in `fn main() { ... }`, and make sure
they don't contain references to things that aren't actually defined.

View File

@ -1,7 +1,7 @@
# Modules and crates
The Rust namespace is divided into modules. Each source file starts
with its own, empty module.
with its own module.
## Local modules
@ -15,7 +15,7 @@ explicitly import it, you must refer to it by its long name,
fn cow() -> str { "mooo" }
}
fn main() {
log_err farm::chicken();
std::io::println(farm::chicken());
}
Modules can be nested to arbitrary depth.
@ -48,7 +48,7 @@ file, compile them all together, and, depending on the presence of the
The `#[link(...)]` part provides meta information about the module,
which other crates can use to load the right module. More about that
in a moment.
later.
To have a nested directory structure for your source files, you can
nest mods in your `.rc` file:
@ -68,15 +68,14 @@ content to the `poultry` module itself.
Having compiled a crate with `--lib`, you can use it in another crate
with a `use` directive. We've already seen `use std` in several of the
examples, which loads in the standard library.
examples, which loads in the [standard library][std].
[std]: http://doc.rust-lang.org/doc/std/index/General.html
`use` directives can appear in a crate file, or at the top level of a
single-file `.rs` crate. They will cause the compiler to search its
library search path (which you can extend with `-L` switch) for a Rust
crate library with the right name. This name is deduced from the crate
name in a platform-dependent way. The `farm` library will be called
`farm.dll` on Windows, `libfarm.so` on Linux, and `libfarm.dylib` on
OS X.
crate library with the right name.
It is possible to provide more specific information when using an
external crate.
@ -100,6 +99,16 @@ The version does not match the one provided in the `use` directive, so
unless the compiler can find another crate with the right version
somewhere, it will complain that no matching crate was found.
## The core library
A set of basic library routines, mostly related to built-in datatypes
and the task system, are always implicitly linked and included in any
Rust program, unless the `--no-core` compiler switch is given.
This library is document [here][core].
[core]: http://doc.rust-lang.org/doc/core/index/General.html
## A minimal example
Now for something that you can actually compile yourself. We have
@ -112,7 +121,7 @@ these two files:
## ignore
// main.rs
use mylib;
fn main() { log_err "hello " + mylib::world(); }
fn main() { std::io::println("hello " + mylib::world()); }
Now compile and run like this (adjust to your platform if necessary):
@ -127,7 +136,7 @@ Now compile and run like this (adjust to your platform if necessary):
When using identifiers from other modules, it can get tiresome to
qualify them with the full module path every time (especially when
that path is several modules deep). Rust allows you to import
identifiers at the top of a file or module.
identifiers at the top of a file, module, or block.
use std;
import std::io::println;
@ -136,12 +145,11 @@ identifiers at the top of a file or module.
}
It is also possible to import just the name of a module (`import
std::io;`, then use `io::println`), import all identifiers exported by
a given module (`import std::io::*`), or to import a specific set of
identifiers (`import std::math::{min, max, pi}`).
std::io;`, then use `io::println`), to import all identifiers exported
by a given module (`import std::io::*`), or to import a specific set
of identifiers (`import math::{min, max, pi}`).
It is also possible to rename an identifier when importing, using the
`=` operator:
You can rename an identifier when importing using the `=` operator:
import prnt = std::io::println;
@ -177,7 +185,7 @@ and one for values. This means that this code is valid:
You don't want to write things like that, but it *is* very practical
to not have to worry about name clashes between types, values, and
modules. This allows us to have a module `std::str`, for example, even
modules. This allows us to have a module `core::str`, for example, even
though `str` is a built-in type name.
## Resolution

View File

@ -11,7 +11,7 @@ we have a file `hello.rs` containing this program:
use std;
fn main(args: [str]) {
std::io::println("hello world from " + args[0] + "!");
std::io::println("hello world from '" + args[0] + "'!");
}
If the Rust compiler was installed successfully, running `rustc
@ -39,9 +39,11 @@ live inside a function.
Rust programs can also be compiled as libraries, and included in other
programs. The `use std` directive that appears at the top of a lot of
examples imports the standard library. This is described in more
examples imports the [standard library][std]. This is described in more
detail [later on](mod.html).
[std]: http://doc.rust-lang.org/doc/std/index/General.html
## Editing Rust code
There are Vim highlighting and indentation scrips in the Rust source

View File

@ -139,7 +139,7 @@ The basic types are written like this:
: Nil, the type that has only a single value.
`bool`
: Boolean type..
: Boolean type, with values `true` and `false`.
`int`
: A machine-pointer-sized integer.
@ -177,7 +177,7 @@ more detail later on (the `T`s here stand for any other type):
`(T1, T2)`
: Tuple type. Any arity above 1 is supported.
`{fname1: T1, fname2: T2}`
`{field1: T1, field2: T2}`
: Record type.
`fn(arg1: T1, arg2: T2) -> T3`, `lambda()`, `block()`
@ -186,9 +186,6 @@ more detail later on (the `T`s here stand for any other type):
`@T`, `~T`, `*T`
: Pointer types.
`obj { fn method1() }`
: Object type.
Types can be given names with `type` declarations:
type monster_size = uint;
@ -196,10 +193,10 @@ Types can be given names with `type` declarations:
This will provide a synonym, `monster_size`, for unsigned integers. It
will not actually create a new type—`monster_size` and `uint` can be
used interchangeably, and using one where the other is expected is not
a type error. Read about [single-variant tags][svt] further on if you
a type error. Read about [single-variant enums][sve] further on if you
need to create a type name that's not just a synonym.
[svt]: data.html#single_variant_tag
[sve]: data.html#single_variant_enum
## Literals
@ -223,8 +220,8 @@ The nil literal is written just like the type: `()`. The keywords
`true` and `false` produce the boolean literals.
Character literals are written between single quotes, as in `'x'`. You
may put non-ascii characters between single quotes (your source file
should be encoded as utf-8 in that case). Rust understands a number of
may put non-ascii characters between single quotes (your source files
should be encoded as utf-8). Rust understands a number of
character escapes, using the backslash character:
`\n`
@ -308,14 +305,16 @@ simply checks whether the configuration flag is defined at all). Flags
for `target_os` and `target_arch` are set by the compiler. It is
possible to set additional flags with the `--cfg` command-line option.
Attributes always look like `#[attr]`, where `attr` can be simply a
name (as in `#[test]`, which is used by the [built-in test
framework](test.html)), a name followed by `=` and then a literal (as
in `#[license = "BSD"]`, which is a valid way to annotate a Rust
program as being released under a BSD-style license), or a name
followed by a comma-separated list of nested attributes, as in the
`cfg` example above, or in this [crate](mod.html) metadata
declaration:
Attributes are always wrapped in hash-braces (`#[attr]`). Inside the
braces, a small minilanguage is supported, whose interpretation
depends on the attribute that's being used. The simplest form is a
plain name (as in `#[test]`, which is used by the [built-in test
framework](test.html '')). A name-value pair can be provided using an `=`
character followed by a literal (as in `#[license = "BSD"]`, which is
a valid way to annotate a Rust program as being released under a
BSD-style license). Finally, you can have a name followed by a
comma-separated list of nested attributes, as in the `cfg` example
above, or in this [crate](mod.html) metadata declaration:
## ignore
#[link(name = "std",
@ -324,7 +323,7 @@ declaration:
An attribute without a semicolon following it applies to the
definition that follows it. When terminated with a semicolon, it
applies to the module or crate.
applies to the module or crate in which it appears.
## Syntax extensions

View File

@ -1,10 +1,11 @@
# Tasks
Rust supports a system of lightweight tasks, similar to what is found
in Erlang or other actor systems. Rust tasks communicate via messages
and do not share data. However, it is possible to send data without
copying it by making use of [unique boxes][uniques] (still, the data
is owned by only one task at a time).
in Erlang or other actor systems. Rust tasks communicate via messages
and do not share data. However, it is possible to send data without
copying it by making use of [unique boxes][uniques], which allow the
sending task to release ownership of a value, so that the receiving
task can keep on using it.
[uniques]: data.html#unique-box
@ -14,8 +15,7 @@ somewhat. The tutorial documents the API as it exists today.
## Spawning a task
Spawning a task is done using the various spawn functions in the
module `task`. We will Let's begin with the simplest one, `task::spawn()`, and
later move on to the others:
module `task`. Let's begin with the simplest one, `task::spawn()`:
let some_value = 22;
let child_task = task::spawn {||
@ -23,11 +23,12 @@ later move on to the others:
std::io::println(#fmt("%d", some_value));
};
The argument to `task::spawn()` is a [unique closure](func) of type
`fn~()`, meaning that it takes no arguments and generates no return
value. The effect of `task::spawn()` is to fire up a child task that
will execute the closure in parallel with the creator. The result is
a task id, here stored into the variable `child_task`.
The argument to `task::spawn()` is a [unique
closure](func.html#unique) of type `fn~()`, meaning that it takes no
arguments and generates no return value. The effect of `task::spawn()`
is to fire up a child task that will execute the closure in parallel
with the creator. The result is a task id, here stored into the
variable `child_task`.
## Ports and channels
@ -38,6 +39,8 @@ of a particular type. A channel is used to send messages to a port.
For example, imagine we wish to perform two expensive computations
in parallel. We might write something like:
# fn some_expensive_computation() -> int { 42 }
# fn some_other_expensive_computation() {}
let port = comm::port::<int>();
let chan = comm::chan::<int>(port);
let child_task = task::spawn {||
@ -56,11 +59,15 @@ This port is where we will receive the message from the child task
once it is complete. The second line creates a channel for sending
integers to the port `port`:
# let port = comm::port::<int>();
let chan = comm::chan::<int>(port);
The channel will be used by the child to send a message to the port.
The next statement actually spawns the child:
# fn some_expensive_computation() -> int { 42 }
# let port = comm::port::<int>();
# let chan = comm::chan::<int>(port);
let child_task = task::spawn {||
let result = some_expensive_computation();
comm::send(chan, result);
@ -71,15 +78,17 @@ over the channel. Finally, the parent continues by performing
some other expensive computation and then waiting for the child's result
to arrive on the port:
# fn some_other_expensive_computation() {}
# let port = comm::port::<int>();
some_other_expensive_computation();
let result = comm::recv(port);
## Creating a task with a bi-directional communication path
A very common thing to do is to spawn a child task where the parent
and child both need to exchange messages with each other. The
function `task::spawn_connected()` supports this pattern. We'll look
briefly at how it is used.
and child both need to exchange messages with each other. The function
`task::spawn_connected()` supports this pattern. We'll look briefly at
how it is used.
To see how `spawn_connected()` works, we will create a child task
which receives `uint` messages, converts them to a string, and sends
@ -104,6 +113,8 @@ strified version of the received value, `uint::to_str(value)`.
Here is the code for the parent task:
# fn stringifier(from_par: comm::port<uint>,
# to_par: comm::chan<str>) {}
fn main() {
let t = task::spawn_connected(stringifier);
comm::send(t.to_child, 22u);
@ -125,11 +136,11 @@ here to send and receive three messages from the child task.
## Joining a task
The function `spawn_joinable()` is used to spawn a task that can later
be joined. This is implemented by having the child task send a
message when it has completed (either successfully or by failing).
Therefore, `spawn_joinable()` returns a structure containing both the
task ID and the port where this message will be sent---this structure
type is called `task::joinable_task`. The structure can be passed to
be joined. This is implemented by having the child task send a message
when it has completed (either successfully or by failing). Therefore,
`spawn_joinable()` returns a structure containing both the task ID and
the port where this message will be sent---this structure type is
called `task::joinable_task`. The structure can be passed to
`task::join()`, which simply blocks on the port, waiting to receive
the message from the child task.
@ -141,4 +152,3 @@ task fails, that failure is propagated to the parent task, which will
fail sometime later. This propagation can be disabled by using the
function `task::unsupervise()`, which disables error propagation from
the current task to its parent.

View File

@ -18,8 +18,8 @@ Tests can be interspersed with other code, and annotated with the
}
When you compile the program normally, the `test_twice` function will
not be used. To actually run the tests, compile with the `--test`
flag:
not be included. To compile and run such tests, compile with the
`--test` flag, and then run the result:
## notrust
> rustc --test twice.rs

0
doc/tutorial/test.sh Normal file → Executable file
View File

View File

@ -13,10 +13,6 @@ h1 { font-size: 22pt; }
h2 { font-size: 17pt; }
h3 { font-size: 14pt; }
code {
color: #033;
}
pre {
margin: 1.1em 0;
padding: .4em .4em .4em 1em;