rust/doc/tutorial/data.md

294 lines
10 KiB
Markdown
Raw Normal View History

# Datatypes
Rust datatypes are, by default, immutable. The core datatypes of Rust
are structural records and 'tags' (tagged unions, algebraic data
types).
type point = {x: float, y: float};
tag shape {
circle(point, float);
rectangle(point, point);
}
let my_shape = circle({x: 0.0, y: 0.0}, 10.0);
## Records
Rust record types are written `{field1: TYPE, field2: TYPE [,
...]}`, and record literals are written in the same way, but with
expressions instead of types. They are quite similar to C structs, and
even laid out the same way in memory (so you can read from a Rust
struct in C, and vice-versa).
The dot operator is used to access record fields (`mypoint.x`).
Fields that you want to mutate must be explicitly marked as such. For
example...
type stack = {content: [int], mutable head: uint};
With such a type, you can do `mystack.head += 1u`. When the `mutable`
is omitted from the type, such an assignment would result in a type
error.
To 'update' an immutable record, you use functional record update
syntax, by ending a record literal with the keyword `with`:
let oldpoint = {x: 10f, y: 20f};
let newpoint = {x: 0f with oldpoint};
assert newpoint == {x: 0f, y: 20f};
This will create a new struct, copying all the fields from `oldpoint`
into it, except for the ones that are explicitly set in the literal.
Rust record types are *structural*. This means that `{x: float, y:
float}` is not just a way to define a new type, but is the actual name
of the type. Record types can be used without first defining them. If
module A defines `type point = {x: float, y: float}`, and module B,
without knowing anything about A, defines a function that returns an
`{x: float, y: float}`, you can use that return value as a `point` in
module A. (Remember that `type` defines an additional name for a type,
not an actual new type.)
## Record patterns
Records can be destructured on in `alt` patterns. The basic syntax is
`{fieldname: pattern, ...}`, but the pattern for a field can be
omitted as a shorthand for simply binding the variable with the same
name as the field.
alt mypoint {
{x: 0f, y: y_name} { /* Provide sub-patterns for fields */ }
{x, y} { /* Simply bind the fields */ }
}
When you are not interested in all the fields of a record, a record
pattern may end with `, _` (as in `{field1, _}`) to indicate that
you're ignoring all other fields.
## Tags
Tags [FIXME terminology] are datatypes that have several different
representations. For example, the type shown earlier:
tag shape {
circle(point, float);
rectangle(point, point);
}
A value of this type is either a circle¸ in which case it contains a
point record and a float, or a rectangle, in which case it contains
two point records. The run-time representation of such a value
includes an identifier of the actual form that it holds, much like the
'tagged union' pattern in C, but with better ergonomics.
The above declaration will define a type `shape` that can be used to
refer to such shapes, and two functions, `circle` and `rectangle`,
which can be used to construct values of the type (taking arguments of
the specified types). So `circle({x: 0f, y: 0f}, 10f)` is the way to
create a new circle.
Tag variants do not have to have parameters. This, for example, is
equivalent to an `enum` in C:
tag direction {
north;
east;
south;
west;
};
This will define `north`, `east`, `south`, and `west` as constants,
all of which have type `direction`.
There is a special case for tags with a single variant. These are used
to define new types in such a way that the new name is not just a
synonym for an existing type, but its own distinct type. If you say:
tag gizmo_id = int;
That is a shorthand for this:
tag gizmo_id { gizmo_id(int); }
Tag types like this can have their content extracted with the
dereference (`*`) unary operator:
let my_gizmo_id = gizmo_id(10);
let id_int: int = *my_gizmo_id;
## Tag patterns
For tag types with multiple variants, destructuring is the only way to
get at their contents. All variant constructors can be used as
patterns, as in this definition of `area`:
fn area(sh: shape) -> float {
alt sh {
circle(_, size) { std::math::pi * size * size }
rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) }
}
}
For variants without arguments, you have to write `variantname.` (with
a dot at the end) to match them in a pattern. This to prevent
ambiguity between matching a variant name and binding a new variable.
fn point_from_direction(dir: direction) -> point {
alt dir {
north. { {x: 0f, y: 1f} }
east. { {x: 1f, y: 0f} }
south. { {x: 0f, y: -1f} }
west. { {x: -1f, y: 0f} }
}
}
## Tuples
Tuples in Rust behave exactly like records, except that their fields
do not have names (and can thus not be accessed with dot notation).
Tuples can have any arity except for 0 or 1 (though you may see nil,
`()`, as the empty tuple if you like).
let mytup: (int, int, float) = (10, 20, 30.0);
alt mytup {
(a, b, c) { log a + b + (c as int); }
}
## Pointers
In contrast to a lot of modern languages, record and tag types in Rust
are not represented as pointers to allocated memory. They are, like in
C and C++, represented directly. This means that if you `let x = {x:
1f, y: 1f};`, you are creating a record on the stack. If you then copy
it into a data structure, the whole record is copied, not just a
pointer.
For small records like `point`, this is usually still more efficient
than allocating memory and going through a pointer. But for big
records, or records with mutable fields, it can be useful to have a
single copy on the heap, and refer to that through a pointer.
Rust supports several types of pointers. The simplest is the unsafe
pointer, written `*TYPE`, which is a completely unchecked pointer
type only used in unsafe code (and thus, in typical Rust code, very
rarely). The safe pointer types are `@TYPE` for shared,
reference-counted boxes, and `~TYPE`, for uniquely-owned pointers.
All pointer types can be dereferenced with the `*` unary operator.
### Shared boxes
Shared boxes are pointers to heap-allocated, reference counted memory.
A cycle collector ensures that circular references do not result in
memory leaks.
Creating a shared box is done by simply applying the binary `@`
operator to an expression. The result of the expression will be boxed,
resulting in a box of the right type. For example:
let x = @10; // New box, refcount of 1
let y = x; // Copy the pointer, increase refcount
// When x and y go out of scope, refcount goes to 0, box is freed
NOTE: We may in the future switch to garbage collection, rather than
reference counting, for shared boxes.
Shared boxes never cross task boundaries.
### Unique boxes
In contrast to shared boxes, unique boxes are not reference counted.
Instead, it is statically guaranteed that only a single owner of the
box exists at any time.
let x = ~10;
let y <- x;
This is where the 'move' (`<-`) operator comes in. It is similar to
`=`, but it de-initializes its source. Thus, the unique box can move
from `x` to `y`, without violating the constraint that it only has a
single owner.
NOTE: If you do `y = x` instead, the box will be copied. We should
emit warning for this, or disallow it entirely, but do not currently
do so.
Unique boxes, when they do not contain any shared boxes, can be sent
to other tasks. The sending task will give up ownership of the box,
and won't be able to access it afterwards. The receiving task will
become the sole owner of the box.
### Mutability
All pointer types have a mutable variant, written `@mutable TYPE` or
`~mutable TYPE`. Given such a pointer, you can write to its contents
by combining the dereference operator with a mutating action.
fn increase_contents(pt: @mutable int) {
*pt += 1;
}
## Vectors
Rust vectors are always heap-allocated and unique. A value of type
`[TYPE]` is represented by a pointer to a section of heap memory
containing any number of `TYPE` values.
NOTE: This uniqueness is turning out to be quite awkward in practice,
and might change.
Vector literals are enclosed in square brackets. Dereferencing is done
with square brackets (and zero-based):
let myvec = [true, false, true, false];
if myvec[1] { std::io::println("boom"); }
By default, vectors are immutable—you can not replace their elements.
The type written as `[mutable TYPE]` is a vector with mutable
elements. Mutable vector literals are written `[mutable]` (empty) or
`[mutable 1, 2, 3]` (with elements).
Growing a vector in Rust is not as inefficient as it looks (the `+`
operator means concatenation when applied to vector types):
let myvec = [], i = 0;
while i < 100 {
myvec += [i];
i += 1;
}
Because a vector is unique, replacing it with a longer one (which is
what `+= [i]` does) is indistinguishable from appending to it
in-place. Vector representations are optimized to grow
logarithmically, so the above code generates about the same amount of
copying and reallocation as `push` implementations in most other
languages.
## Strings
The `str` type in Rust is represented exactly the same way as a vector
of bytes (`[u8]`), except that it is guaranteed to have a trailing
null byte (for interoperability with C APIs).
This sequence of bytes is interpreted as an UTF-8 encoded sequence of
characters. This has the advantage that UTF-8 encoded I/O (which
should really be the goal for modern systems) is very fast, and that
strings have, for most intents and purposes, a nicely compact
representation. It has the disadvantage that you only get
constant-time access by byte, not by character.
A lot of algorithms don't need constant-time indexed access (they
iterate over all characters, which `std::str::chars` helps with), and
for those that do, many don't need actual characters, and can operate
on bytes. For algorithms that do really need to index by character,
there's the option to convert your string to a character vector (using
`std::str::to_chars`).
Like vectors, strings are always unique. You can wrap them in a shared
box to share them. Unlike vectors, there is no mutable variant of
strings. They are always immutable.
## Resources
FIXME fill this in