2011-10-31 16:18:59 +01:00
|
|
|
|
# Datatypes
|
|
|
|
|
|
|
|
|
|
Rust datatypes are, by default, immutable. The core datatypes of Rust
|
|
|
|
|
are structural records and 'tags' (tagged unions, algebraic data
|
|
|
|
|
types).
|
|
|
|
|
|
|
|
|
|
type point = {x: float, y: float};
|
|
|
|
|
tag shape {
|
|
|
|
|
circle(point, float);
|
|
|
|
|
rectangle(point, point);
|
|
|
|
|
}
|
|
|
|
|
let my_shape = circle({x: 0.0, y: 0.0}, 10.0);
|
|
|
|
|
|
|
|
|
|
## Records
|
|
|
|
|
|
2011-11-01 12:26:17 +01:00
|
|
|
|
Rust record types are written `{field1: TYPE, field2: TYPE [, ...]}`,
|
|
|
|
|
and record literals are written in the same way, but with expressions
|
|
|
|
|
instead of types. They are quite similar to C structs, and even laid
|
|
|
|
|
out the same way in memory (so you can read from a Rust struct in C,
|
|
|
|
|
and vice-versa).
|
2011-10-31 16:18:59 +01:00
|
|
|
|
|
|
|
|
|
The dot operator is used to access record fields (`mypoint.x`).
|
|
|
|
|
|
|
|
|
|
Fields that you want to mutate must be explicitly marked as such. For
|
|
|
|
|
example...
|
|
|
|
|
|
|
|
|
|
type stack = {content: [int], mutable head: uint};
|
|
|
|
|
|
|
|
|
|
With such a type, you can do `mystack.head += 1u`. When the `mutable`
|
|
|
|
|
is omitted from the type, such an assignment would result in a type
|
|
|
|
|
error.
|
|
|
|
|
|
|
|
|
|
To 'update' an immutable record, you use functional record update
|
|
|
|
|
syntax, by ending a record literal with the keyword `with`:
|
|
|
|
|
|
2011-11-01 09:42:24 +01:00
|
|
|
|
let oldpoint = {x: 10f, y: 20f};
|
2011-10-31 16:18:59 +01:00
|
|
|
|
let newpoint = {x: 0f with oldpoint};
|
2011-11-01 09:42:24 +01:00
|
|
|
|
assert newpoint == {x: 0f, y: 20f};
|
2011-10-31 16:18:59 +01:00
|
|
|
|
|
|
|
|
|
This will create a new struct, copying all the fields from `oldpoint`
|
|
|
|
|
into it, except for the ones that are explicitly set in the literal.
|
|
|
|
|
|
|
|
|
|
Rust record types are *structural*. This means that `{x: float, y:
|
|
|
|
|
float}` is not just a way to define a new type, but is the actual name
|
|
|
|
|
of the type. Record types can be used without first defining them. If
|
|
|
|
|
module A defines `type point = {x: float, y: float}`, and module B,
|
|
|
|
|
without knowing anything about A, defines a function that returns an
|
|
|
|
|
`{x: float, y: float}`, you can use that return value as a `point` in
|
|
|
|
|
module A. (Remember that `type` defines an additional name for a type,
|
|
|
|
|
not an actual new type.)
|
|
|
|
|
|
|
|
|
|
## Record patterns
|
|
|
|
|
|
|
|
|
|
Records can be destructured on in `alt` patterns. The basic syntax is
|
|
|
|
|
`{fieldname: pattern, ...}`, but the pattern for a field can be
|
|
|
|
|
omitted as a shorthand for simply binding the variable with the same
|
|
|
|
|
name as the field.
|
|
|
|
|
|
|
|
|
|
alt mypoint {
|
|
|
|
|
{x: 0f, y: y_name} { /* Provide sub-patterns for fields */ }
|
|
|
|
|
{x, y} { /* Simply bind the fields */ }
|
|
|
|
|
}
|
|
|
|
|
|
2011-11-02 09:43:49 +01:00
|
|
|
|
The field names of a record do not have to appear in a pattern in the
|
|
|
|
|
same order they appear in the type. When you are not interested in all
|
|
|
|
|
the fields of a record, a record pattern may end with `, _` (as in
|
|
|
|
|
`{field1, _}`) to indicate that you're ignoring all other fields.
|
2011-10-31 16:18:59 +01:00
|
|
|
|
|
|
|
|
|
## Tags
|
|
|
|
|
|
|
|
|
|
Tags [FIXME terminology] are datatypes that have several different
|
|
|
|
|
representations. For example, the type shown earlier:
|
|
|
|
|
|
|
|
|
|
tag shape {
|
|
|
|
|
circle(point, float);
|
|
|
|
|
rectangle(point, point);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
A value of this type is either a circle¸ in which case it contains a
|
|
|
|
|
point record and a float, or a rectangle, in which case it contains
|
|
|
|
|
two point records. The run-time representation of such a value
|
|
|
|
|
includes an identifier of the actual form that it holds, much like the
|
|
|
|
|
'tagged union' pattern in C, but with better ergonomics.
|
|
|
|
|
|
|
|
|
|
The above declaration will define a type `shape` that can be used to
|
|
|
|
|
refer to such shapes, and two functions, `circle` and `rectangle`,
|
|
|
|
|
which can be used to construct values of the type (taking arguments of
|
|
|
|
|
the specified types). So `circle({x: 0f, y: 0f}, 10f)` is the way to
|
|
|
|
|
create a new circle.
|
|
|
|
|
|
|
|
|
|
Tag variants do not have to have parameters. This, for example, is
|
|
|
|
|
equivalent to an `enum` in C:
|
|
|
|
|
|
|
|
|
|
tag direction {
|
|
|
|
|
north;
|
|
|
|
|
east;
|
|
|
|
|
south;
|
|
|
|
|
west;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
This will define `north`, `east`, `south`, and `west` as constants,
|
|
|
|
|
all of which have type `direction`.
|
|
|
|
|
|
2011-11-01 12:26:17 +01:00
|
|
|
|
<a name="single_variant_tag"></a>
|
|
|
|
|
|
2011-10-31 16:18:59 +01:00
|
|
|
|
There is a special case for tags with a single variant. These are used
|
|
|
|
|
to define new types in such a way that the new name is not just a
|
|
|
|
|
synonym for an existing type, but its own distinct type. If you say:
|
|
|
|
|
|
|
|
|
|
tag gizmo_id = int;
|
|
|
|
|
|
|
|
|
|
That is a shorthand for this:
|
|
|
|
|
|
|
|
|
|
tag gizmo_id { gizmo_id(int); }
|
|
|
|
|
|
|
|
|
|
Tag types like this can have their content extracted with the
|
|
|
|
|
dereference (`*`) unary operator:
|
|
|
|
|
|
|
|
|
|
let my_gizmo_id = gizmo_id(10);
|
|
|
|
|
let id_int: int = *my_gizmo_id;
|
|
|
|
|
|
|
|
|
|
## Tag patterns
|
|
|
|
|
|
|
|
|
|
For tag types with multiple variants, destructuring is the only way to
|
|
|
|
|
get at their contents. All variant constructors can be used as
|
|
|
|
|
patterns, as in this definition of `area`:
|
|
|
|
|
|
|
|
|
|
fn area(sh: shape) -> float {
|
|
|
|
|
alt sh {
|
|
|
|
|
circle(_, size) { std::math::pi * size * size }
|
|
|
|
|
rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) }
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
For variants without arguments, you have to write `variantname.` (with
|
|
|
|
|
a dot at the end) to match them in a pattern. This to prevent
|
|
|
|
|
ambiguity between matching a variant name and binding a new variable.
|
|
|
|
|
|
|
|
|
|
fn point_from_direction(dir: direction) -> point {
|
|
|
|
|
alt dir {
|
|
|
|
|
north. { {x: 0f, y: 1f} }
|
|
|
|
|
east. { {x: 1f, y: 0f} }
|
|
|
|
|
south. { {x: 0f, y: -1f} }
|
|
|
|
|
west. { {x: -1f, y: 0f} }
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
## Tuples
|
|
|
|
|
|
|
|
|
|
Tuples in Rust behave exactly like records, except that their fields
|
|
|
|
|
do not have names (and can thus not be accessed with dot notation).
|
|
|
|
|
Tuples can have any arity except for 0 or 1 (though you may see nil,
|
|
|
|
|
`()`, as the empty tuple if you like).
|
|
|
|
|
|
|
|
|
|
let mytup: (int, int, float) = (10, 20, 30.0);
|
|
|
|
|
alt mytup {
|
|
|
|
|
(a, b, c) { log a + b + (c as int); }
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
## Pointers
|
|
|
|
|
|
|
|
|
|
In contrast to a lot of modern languages, record and tag types in Rust
|
|
|
|
|
are not represented as pointers to allocated memory. They are, like in
|
|
|
|
|
C and C++, represented directly. This means that if you `let x = {x:
|
|
|
|
|
1f, y: 1f};`, you are creating a record on the stack. If you then copy
|
|
|
|
|
it into a data structure, the whole record is copied, not just a
|
|
|
|
|
pointer.
|
|
|
|
|
|
|
|
|
|
For small records like `point`, this is usually still more efficient
|
|
|
|
|
than allocating memory and going through a pointer. But for big
|
|
|
|
|
records, or records with mutable fields, it can be useful to have a
|
|
|
|
|
single copy on the heap, and refer to that through a pointer.
|
|
|
|
|
|
|
|
|
|
Rust supports several types of pointers. The simplest is the unsafe
|
|
|
|
|
pointer, written `*TYPE`, which is a completely unchecked pointer
|
|
|
|
|
type only used in unsafe code (and thus, in typical Rust code, very
|
|
|
|
|
rarely). The safe pointer types are `@TYPE` for shared,
|
|
|
|
|
reference-counted boxes, and `~TYPE`, for uniquely-owned pointers.
|
|
|
|
|
|
|
|
|
|
All pointer types can be dereferenced with the `*` unary operator.
|
|
|
|
|
|
|
|
|
|
### Shared boxes
|
|
|
|
|
|
|
|
|
|
Shared boxes are pointers to heap-allocated, reference counted memory.
|
|
|
|
|
A cycle collector ensures that circular references do not result in
|
|
|
|
|
memory leaks.
|
|
|
|
|
|
|
|
|
|
Creating a shared box is done by simply applying the binary `@`
|
|
|
|
|
operator to an expression. The result of the expression will be boxed,
|
|
|
|
|
resulting in a box of the right type. For example:
|
|
|
|
|
|
|
|
|
|
let x = @10; // New box, refcount of 1
|
|
|
|
|
let y = x; // Copy the pointer, increase refcount
|
|
|
|
|
// When x and y go out of scope, refcount goes to 0, box is freed
|
|
|
|
|
|
|
|
|
|
NOTE: We may in the future switch to garbage collection, rather than
|
|
|
|
|
reference counting, for shared boxes.
|
|
|
|
|
|
|
|
|
|
Shared boxes never cross task boundaries.
|
|
|
|
|
|
|
|
|
|
### Unique boxes
|
|
|
|
|
|
|
|
|
|
In contrast to shared boxes, unique boxes are not reference counted.
|
|
|
|
|
Instead, it is statically guaranteed that only a single owner of the
|
|
|
|
|
box exists at any time.
|
|
|
|
|
|
|
|
|
|
let x = ~10;
|
|
|
|
|
let y <- x;
|
|
|
|
|
|
|
|
|
|
This is where the 'move' (`<-`) operator comes in. It is similar to
|
|
|
|
|
`=`, but it de-initializes its source. Thus, the unique box can move
|
|
|
|
|
from `x` to `y`, without violating the constraint that it only has a
|
|
|
|
|
single owner.
|
|
|
|
|
|
|
|
|
|
NOTE: If you do `y = x` instead, the box will be copied. We should
|
|
|
|
|
emit warning for this, or disallow it entirely, but do not currently
|
|
|
|
|
do so.
|
|
|
|
|
|
|
|
|
|
Unique boxes, when they do not contain any shared boxes, can be sent
|
|
|
|
|
to other tasks. The sending task will give up ownership of the box,
|
|
|
|
|
and won't be able to access it afterwards. The receiving task will
|
|
|
|
|
become the sole owner of the box.
|
|
|
|
|
|
|
|
|
|
### Mutability
|
|
|
|
|
|
|
|
|
|
All pointer types have a mutable variant, written `@mutable TYPE` or
|
|
|
|
|
`~mutable TYPE`. Given such a pointer, you can write to its contents
|
|
|
|
|
by combining the dereference operator with a mutating action.
|
|
|
|
|
|
|
|
|
|
fn increase_contents(pt: @mutable int) {
|
|
|
|
|
*pt += 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
## Vectors
|
|
|
|
|
|
|
|
|
|
Rust vectors are always heap-allocated and unique. A value of type
|
|
|
|
|
`[TYPE]` is represented by a pointer to a section of heap memory
|
|
|
|
|
containing any number of `TYPE` values.
|
|
|
|
|
|
|
|
|
|
NOTE: This uniqueness is turning out to be quite awkward in practice,
|
|
|
|
|
and might change.
|
|
|
|
|
|
|
|
|
|
Vector literals are enclosed in square brackets. Dereferencing is done
|
|
|
|
|
with square brackets (and zero-based):
|
|
|
|
|
|
|
|
|
|
let myvec = [true, false, true, false];
|
|
|
|
|
if myvec[1] { std::io::println("boom"); }
|
|
|
|
|
|
|
|
|
|
By default, vectors are immutable—you can not replace their elements.
|
|
|
|
|
The type written as `[mutable TYPE]` is a vector with mutable
|
|
|
|
|
elements. Mutable vector literals are written `[mutable]` (empty) or
|
|
|
|
|
`[mutable 1, 2, 3]` (with elements).
|
|
|
|
|
|
|
|
|
|
Growing a vector in Rust is not as inefficient as it looks (the `+`
|
|
|
|
|
operator means concatenation when applied to vector types):
|
|
|
|
|
|
|
|
|
|
let myvec = [], i = 0;
|
|
|
|
|
while i < 100 {
|
|
|
|
|
myvec += [i];
|
|
|
|
|
i += 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
Because a vector is unique, replacing it with a longer one (which is
|
|
|
|
|
what `+= [i]` does) is indistinguishable from appending to it
|
|
|
|
|
in-place. Vector representations are optimized to grow
|
|
|
|
|
logarithmically, so the above code generates about the same amount of
|
|
|
|
|
copying and reallocation as `push` implementations in most other
|
|
|
|
|
languages.
|
|
|
|
|
|
|
|
|
|
## Strings
|
|
|
|
|
|
|
|
|
|
The `str` type in Rust is represented exactly the same way as a vector
|
|
|
|
|
of bytes (`[u8]`), except that it is guaranteed to have a trailing
|
|
|
|
|
null byte (for interoperability with C APIs).
|
|
|
|
|
|
|
|
|
|
This sequence of bytes is interpreted as an UTF-8 encoded sequence of
|
|
|
|
|
characters. This has the advantage that UTF-8 encoded I/O (which
|
|
|
|
|
should really be the goal for modern systems) is very fast, and that
|
|
|
|
|
strings have, for most intents and purposes, a nicely compact
|
|
|
|
|
representation. It has the disadvantage that you only get
|
|
|
|
|
constant-time access by byte, not by character.
|
|
|
|
|
|
|
|
|
|
A lot of algorithms don't need constant-time indexed access (they
|
|
|
|
|
iterate over all characters, which `std::str::chars` helps with), and
|
|
|
|
|
for those that do, many don't need actual characters, and can operate
|
|
|
|
|
on bytes. For algorithms that do really need to index by character,
|
|
|
|
|
there's the option to convert your string to a character vector (using
|
|
|
|
|
`std::str::to_chars`).
|
|
|
|
|
|
|
|
|
|
Like vectors, strings are always unique. You can wrap them in a shared
|
|
|
|
|
box to share them. Unlike vectors, there is no mutable variant of
|
|
|
|
|
strings. They are always immutable.
|
|
|
|
|
|
|
|
|
|
## Resources
|
|
|
|
|
|
2011-11-01 15:41:14 +01:00
|
|
|
|
Resources are data types that have a destructor associated with them.
|
|
|
|
|
|
|
|
|
|
resource file_desc(fd: int) {
|
|
|
|
|
close_file_desc(fd);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
This defines a type `file_desc` and a constructor of the same name,
|
|
|
|
|
which takes an integer. Values of such a type can not be copied, and
|
|
|
|
|
when they are destroyed (by going out of scope, or, when boxed, when
|
|
|
|
|
their box is cleaned up), their body runs. In the example above, this
|
|
|
|
|
would cause the given file descriptor to be closed.
|
|
|
|
|
|
|
|
|
|
NOTE: We're considering alternative approaches for data types with
|
|
|
|
|
destructors. Resources might go away in the future.
|