\input texinfo @c -*-texinfo-*- @c %**start of header @setfilename rust.info @settitle Rust Documentation @setchapternewpage odd @c %**end of header @include version.texi @ifinfo This manual is for the ``Rust'' programming language. @uref{http://www.rust-lang.org} Version: @gitversion Copyright 2006-2010 Graydon Hoare Copyright 2009-2011 Mozilla Foundation See accompanying LICENSE.txt for terms. @end ifinfo @dircategory Programming @direntry * rust: (rust). Rust programming language @end direntry @titlepage @title Rust @subtitle A safe, concurrent, practical language. @author Graydon Hoare @author Mozilla Foundation @page @vskip 0pt plus 1filll @uref{http://rust-lang.org} Version: @gitversion @sp 2 Copyright @copyright{} 2006-2010 Graydon Hoare Copyright @copyright{} 2009-2011 Mozilla Foundation See accompanying LICENSE.txt for terms. @end titlepage @everyfooting @| @emph{-- Draft @today --} @| @ifnottex @node Top @top Top Rust Documentation @end ifnottex @menu * Disclaimer:: Notes on a work in progress. * Introduction:: Background, intentions, lineage. * Tutorial:: Gentle introduction to reading Rust code. * Reference:: Systematic reference of language elements. * Index:: Index @end menu @ifnottex Complete table of contents @end ifnottex @contents @c ############################################################ @c Disclaimer @c ############################################################ @node Disclaimer @chapter Disclaimer To the reader, Rust is a work in progress. The language continues to evolve as the design shifts and is fleshed out in working code. Certain parts work, certain parts do not, certain parts will be removed or changed. This manual is a snapshot written in the present tense. Some features described do not yet exist in working code. Some may be temporary. It is a @emph{draft}, and we ask that you not take anything you read here as either definitive or final. The manual is to help you get a sense of the language and its organization, not to serve as a complete specification. At least not yet. If you have suggestions to make, please try to focus them on @emph{reductions} to the language: possible features that can be combined or omitted. At this point, every ``additive'' feature we're likely to support is already on the table. The task ahead involves combining, trimming, and implementing. @c ############################################################ @c Introduction @c ############################################################ @node Introduction @chapter Introduction @quotation We have to fight chaos, and the most effective way of doing that is to prevent its emergence. @flushright - Edsger Dijkstra @end flushright @end quotation @sp 2 Rust is a curly-brace, block-structured expression language. It visually resembles the C language family, but differs significantly in syntactic and semantic details. Its design is oriented toward concerns of ``programming in the large'', that is, of creating and maintaining @emph{boundaries} -- both abstract and operational -- that preserve large-system @emph{integrity}, @emph{availability} and @emph{concurrency}. It supports a mixture of imperative procedural, concurrent actor, object-oriented and pure functional styles. Rust also supports generic programming and metaprogramming, in both static and dynamic styles. @menu * Goals:: Intentions, motivations. * Sales Pitch:: A summary for the impatient. * Influences:: Relationship to past languages. @end menu @node Goals @section Goals The language design pursues the following goals: @sp 1 @itemize @item Compile-time error detection and prevention. @item Run-time fault tolerance and containment. @item System building, analysis and maintenance affordances. @item Clarity and precision of expression. @item Implementation simplicity. @item Run-time efficiency. @item High concurrency. @end itemize @sp 1 Note that most of these goals are @emph{engineering} goals, not showcases for sophisticated language technology. Most of the technology in Rust is @emph{old} and has been seen decades earlier in other languages. All new languages are developed in a technological context. Rust's goals arise from the context of writing large programs that interact with the internet -- both servers and clients -- and are thus much more concerned with @emph{safety} and @emph{concurrency} than older generations of program. Our experience is that these two forces do not conflict; rather they drive system design decisions toward extensive use of @emph{partitioning} and @emph{statelessness}. Rust aims to make these a more natural part of writing programs, within the niche of lower-level, practical, resource-conscious languages. @page @node Sales Pitch @section Sales Pitch The following comprises a brief ``sales pitch'' overview of the salient features of Rust, relative to other languages. @itemize @sp 1 @item No @code{null} pointers The initialization state of every slot is statically computed as part of the typestate system (see below), and requires that all slots are initialized before use. There is no @code{null} value; uninitialized slots are uninitialized and can only be written to, not read. The common use for @code{null} in other languages -- as a sentinel value -- is subsumed into the more general facility of disjoint union types. A program must explicitly model its use of such types. @sp 1 @item Lightweight tasks with no shared values Like many @emph{actor} languages, Rust provides an isolation (and concurrency) model based on lightweight tasks scheduled by the language runtime. These tasks are very inexpensive and statically unable to manipulate one another's local memory. Breaking the rule of task isolation is possible only by calling external (C/C++) code. Inter-task communication is typed, asynchronous, and simplex, based on passing messages over channels to ports. @sp 1 @item Predictable native code, simple runtime The meaning and cost of every operation within a Rust program is intended to be easy to model for the reader. The code should not ``surprise'' the programmer once it has been compiled. Rust compiles to native code. Rust compilation units are large and the compilation model is designed around multi-file, whole-library or whole-program optimization. The compiled units are standard loadable objects (ELF, PE, Mach-O) containing standard debug information (DWARF) and are compatible with existing, standard low-level tools (disassemblers, debuggers, profilers, dynamic loaders). The compiled units include custom metadata that carries full type and version information. The Rust runtime library is a small collection of support code for scheduling, memory management, inter-task communication, reflection and runtime linkage. This library is written in standard C++ and is quite straightforward. It presents a simple interface to embeddings. No research-level virtual machine, JIT or garbage collection technology is required. It should be relatively easy to adapt a Rust front-end on to many existing native toolchains. @sp 1 @item Integrated system-construction facility The units of compilation of Rust are multi-file amalgamations called @emph{crates}. A crate is described by a separate, declarative type of source file that guides the compilation of the crate, its packaging, its versioning, and its external dependencies. Crates are also the units of distribution and loading. Significantly: the dependency graph of crates is @emph{acyclic} and @emph{anonymous}: there is no global namespace for crates, and module-level recursion cannot cross crate barriers. Unlike many languages, individual modules do @emph{not} carry all the mechanisms or restrictions of crates. Modules and crates serve different roles. @sp 1 @item Static control over memory allocation, packing and aliasing. Many values in Rust are allocated @emph{within} their containing stack-frame or parent structure. Numbers, records, tuples and tags are all allocated this way. To allocate such values in the heap, they must be explicitly @emph{boxed}. A @dfn{box} is a pointer to a heap allocation that holds another value, its @emph{content}. Boxes may be either shared or unique, depending on which sort of storage management is desired. Boxing and unboxing in Rust is explicit, though in some cases (such as name-component dereferencing) Rust will automatically dereference a box to access its content. Box values can be passed and assigned independently, like pointers in C; the difference is that in Rust they always point to live contents, and are not subject to pointer arithmetic. In addition to boxes, Rust supports a kind of pass-by-pointer slot called a reference. Forming or releasing a reference does not perform reference-count operations; references can only be formed on values that will provably outlive the reference. References are not ``general values'', in the sense that they cannot be independently manipulated. They are a lot like C++'s references, except that they are safe: the compiler ensures that they always point to live values. In addition, every slot (stack-local allocation or reference) has a static initialization state that is calculated by the typestate system. This permits late initialization of slots in functions with complex control-flow, while still guaranteeing that every use of a slot occurs after it has been initialized. @sp 1 @item Immutable data by default All types in Rust are immutable by default. A field within a type must be declared as @code{mutable} in order to be modified. @sp 1 @item Move semantics and unique pointers Rust differentiates copying values from moving them, and permits moving and swapping values explicitly rather than copying. Moving can be more efficient and, crucially, represents an indivisible transfer of ownership of a value from its source to its destination. In addition, pointer types in Rust come in several varieties. One important type of pointer related to move semantics is the @emph{unique} pointer, denoted @code{~}, which is statically guaranteed to be the only pointer pointing to its referent at any given time. Combining move-semantics and unique pointers, Rust permits a very lightweight form of inter-task communication: values are sent between tasks by moving, and only types composed of unique pointers can be sent. This statically ensures there can never be sharing of data between tasks, while keeping the costs of transferring data between tasks as cheap as moving a pointer. @sp 1 @item Stack-based iterators Rust provides a type of function-like multiple-invocation iterator that is very efficient: the iterator state lives only on the stack and is tightly coupled to the loop that invoked it. @sp 1 @item Direct interface to C code Rust can load and call many C library functions simply by declaring them. Calling a C function is an ``unsafe'' action, and can only be taken within a block marked with the @code{unsafe} keyword. Every unsafe block in a Rust compilation unit must be explicitly authorized in the crate file. @sp 1 @item Structural algebraic data types The Rust type system is primarily structural, and contains the standard assortment of useful ``algebraic'' type constructors from functional languages, such as function types, tuples, record types, vectors, and nominally-tagged disjoint unions. Such values may be @emph{pattern-matched} in an @code{alt} expression. @sp 1 @item Generic code Rust supports a simple form of parametric polymorphism: functions, iterators, types and objects can be parametrized by other types. @sp 1 @item Argument binding Rust provides a mechanism of partially binding arguments to functions, producing new functions that accept the remaining un-bound arguments. This mechanism combines some of the features of lexical closures with some of the features of currying, in a smaller and simpler package. @sp 1 @item Local type inference To save some quantity of programmer key-pressing, Rust supports local type inference: signatures of functions, objects and iterators always require type annotation, but within the body of a function or iterator many slots can be declared without a type, and Rust will infer the slot's type from its uses. @sp 1 @item Structural object system Rust has a lightweight object system based on structural object types: there is no ``class hierarchy'' nor any concept of inheritance. Method overriding and object restriction are performed explicitly on object values, which are little more than order-insensitive records of methods sharing a common private value. @sp 1 @item Static metaprogramming (syntactic extension) Rust supports a system for syntactic extensions that can be loaded into the compiler, to implement user-defined notations, macros, program-generators and the like. These notations are @emph{marked} using a special form of bracketing, such that a reader unfamiliar with the extension can still parse the surrounding text by skipping over the bracketed ``extension text''. @sp 1 @item Idempotent failure If a task fails due to a signal, or if it evaluates the special @code{fail} expression, it enters the @emph{failing} state. A failing task unwinds its control stack, frees all of its owned resources (executing destructors) and enters the @emph{dead} state. Failure is idempotent and non-recoverable. @sp 1 @item Supervision hierarchy Rust has a system for propagating task-failures, either directly to a supervisor task, or indirectly by sending a message into a channel. @sp 1 @item Resource types with deterministic destruction Rust includes a type constructor for @emph{resource} types, which have an associated destructor and cannot be moved in memory. Resources types belong to the kind of @emph{pinned} types, and any value that directly contains a resource is implicitly pinned as well. Resources can only contain types from the pinned or unique kinds of type, which means that unlike finalizers, there is always a deterministic, top-down order to run the destructors of a resource and its sub-resources. @sp 1 @item Typestate system Every storage slot in a Rust frame participates in not only a conventional structural static type system, describing the interpretation of memory in the slot, but also a @emph{typestate} system. The static typestates of a program describe the set of @emph{pure, dynamic predicates} that provably hold over some set of slots, at each point in the program's control-flow graph within each frame. The static calculation of the typestates of a program is a function-local dataflow problem, and handles user-defined predicates in a similar fashion to the way the type system permits user-defined types. A short way of thinking of this is: types statically model values, typestates statically model @emph{assertions that hold} before and after statements and expressions. @end itemize @page @node Influences @section Influences @sp 2 @quotation The essential problem that must be solved in making a fault-tolerant software system is therefore that of fault-isolation. Different programmers will write different modules, some modules will be correct, others will have errors. We do not want the errors in one module to adversely affect the behaviour of a module which does not have any errors. @flushright - Joe Armstrong @end flushright @end quotation @sp 2 @quotation In our approach, all data is private to some process, and processes can only communicate through communications channels. @emph{Security}, as used in this paper, is the property which guarantees that processes in a system cannot affect each other except by explicit communication. When security is absent, nothing which can be proven about a single module in isolation can be guaranteed to hold when that module is embedded in a system [...] @flushright - Robert Strom and Shaula Yemini @end flushright @end quotation @sp 2 @quotation Concurrent and applicative programming complement each other. The ability to send messages on channels provides I/O without side effects, while the avoidance of shared data helps keep concurrent processes from colliding. @flushright - Rob Pike @end flushright @end quotation @sp 2 @page Rust is not a particularly original language. It may however appear unusual by contemporary standards, as its design elements are drawn from a number of ``historical'' languages that have, with a few exceptions, fallen out of favour. Five prominent lineages contribute the most: @itemize @sp 1 @item The NIL (1981) and Hermes (1990) family. These languages were developed by Robert Strom, Shaula Yemini, David Bacon and others in their group at IBM Watson Research Center (Yorktown Heights, NY, USA). @sp 1 @item The Erlang (1987) language, developed by Joe Armstrong, Robert Virding, Claes Wikstr@"om, Mike Williams and others in their group at the Ericsson Computer Science Laboratory (@"Alvsj@"o, Stockholm, Sweden) . @sp 1 @item The Sather (1990) language, developed by Stephen Omohundro, Chu-Cheow Lim, Heinz Schmidt and others in their group at The International Computer Science Institute of the University of California, Berkeley (Berkeley, CA, USA). @sp 1 @item The Newsqueak (1988), Alef (1995), and Limbo (1996) family. These languages were developed by Rob Pike, Phil Winterbottom, Sean Dorward and others in their group at Bell labs Computing Sciences Reserch Center (Murray Hill, NJ, USA). @sp 1 @item The Napier (1985) and Napier88 (1988) family. These languages were developed by Malcolm Atkinson, Ron Morrison and others in their group at the University of St. Andrews (St. Andrews, Fife, UK). @end itemize @sp 1 Additional specific influences can be seen from the following languages: @itemize @item The structural algebraic types and compilation manager of SML. @item The deterministic destructor system of C++. @end itemize @c ############################################################ @c Tutorial @c ############################################################ @node Tutorial @chapter Tutorial @emph{TODO}. @c ############################################################ @c Reference @c ############################################################ @node Reference @chapter Reference @menu * Ref.Lex:: Lexical structure. * Ref.Path:: References to items. * Ref.Gram:: Grammar. * Ref.Comp:: Compilation and component model. * Ref.Mem:: Semantic model of memory. * Ref.Task:: Semantic model of tasks. * Ref.Item:: The components of a module. * Ref.Type:: The types of values held in memory. * Ref.Typestate:: Predicates that hold at points in time. * Ref.Stmt:: Components of an executable block. * Ref.Expr:: Units of execution and evaluation. * Ref.Run:: Organization of runtime services. @end menu @node Ref.Lex @section Ref.Lex @c * Ref.Lex:: Lexical structure. @cindex Lexical structure @cindex Token The lexical structure of a Rust source file or crate file is defined in terms of Unicode character codes and character properties. Groups of Unicode character codes and characters are organized into @emph{tokens}. Tokens are defined as the longest contiguous sequence of characters within the same token type (identifier, keyword, literal, symbol), or interrupted by ignored characters. Most tokens in Rust follow rules similar to the C family. Most tokens (including whitespace, keywords, operators and structural symbols) are drawn from the ASCII-compatible range of Unicode. Identifiers are drawn from Unicode characters specified by the @code{XID_start} and @code{XID_continue} rules given by UAX #31@footnote{Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax}. String and character literals may include the full range of Unicode characters. @emph{TODO: formalize this section much more}. @menu * Ref.Lex.Ignore:: Ignored characters. * Ref.Lex.Ident:: Identifier tokens. * Ref.Lex.Key:: Keyword tokens. * Ref.Lex.Res:: Reserved tokens. * Ref.Lex.Num:: Numeric tokens. * Ref.Lex.Text:: String and character tokens. * Ref.Lex.Syntax:: Syntactic extension tokens. * Ref.Lex.Sym:: Special symbol tokens. @end menu @node Ref.Lex.Ignore @subsection Ref.Lex.Ignore @c * Ref.Lex.Ignore:: Ignored tokens. Characters considered to be @emph{whitespace} or @emph{comment} are ignored, and are not considered as tokens. They serve only to delimit tokens. Rust is otherwise a free-form language. @dfn{Whitespace} is any of the following Unicode characters: U+0020 (space), U+0009 (tab, @code{'\t'}), U+000A (LF, @code{'\n'}), U+000D (CR, @code{'\r'}). @dfn{Comments} are @emph{single-line comments} or @emph{multi-line comments}. A @dfn{single-line comment} is any sequence of Unicode characters beginning with U+002F U+002F (@code{"//"}) and extending to the next U+000A character, @emph{excluding} cases in which such a sequence occurs within a string literal token. A @dfn{multi-line comments} is any sequence of Unicode characters beginning with U+002F U+002A (@code{"/*"}) and ending with U+002A U+002F (@code{"*/"}), @emph{excluding} cases in which such a sequence occurs within a string literal token. Multi-line comments may be nested. @node Ref.Lex.Ident @subsection Ref.Lex.Ident @c * Ref.Lex.Ident:: Identifier tokens. @cindex Identifier token Identifiers follow the rules given by Unicode Standard Annex #31, in the form closed under NFKC normalization, @emph{excluding} those tokens that are otherwise defined as keywords or reserved tokens. @xref{Ref.Lex.Key}. @xref{Ref.Lex.Res}. That is: an identifier starts with any character having derived property @code{XID_Start} and continues with zero or more characters having derived property @code{XID_Continue}; and such an identifier is NFKC-normalized during lexing, such that all subsequent comparison of identifiers is performed on the NFKC-normalized forms. @emph{TODO: define relationship between Unicode and Rust versions}. @footnote{This identifier syntax is a superset of the identifier syntaxes of C and Java, and is modeled on Python PEP #3131, which formed the definition of identifiers in Python 3.0 and later.} @node Ref.Lex.Key @subsection Ref.Lex.Key @c * Ref.Lex.Key:: Keyword tokens. The keywords are: @cindex Keywords @sp 2 @include keywords.texi @node Ref.Lex.Res @subsection Ref.Lex.Res @c * Ref.Lex.Res:: Reserved tokens. The reserved tokens are: @cindex Reserved @sp 2 @multitable @columnfractions .15 .15 .15 .15 .15 @item @code{f16} @tab @code{f80} @tab @code{f128} @item @code{m32} @tab @code{m64} @tab @code{m128} @tab @code{dec} @end multitable @sp 2 At present these tokens have no defined meaning in the Rust language. These tokens may correspond, in some current or future implementation, to additional built-in types for decimal floating-point, extended binary and interchange floating-point formats, as defined in the IEEE 754-1985 and IEEE 754-2008 specifications. @node Ref.Lex.Num @subsection Ref.Lex.Num @c * Ref.Lex.Num:: Numeric tokens. @cindex Number token @cindex Hex token @cindex Decimal token @cindex Binary token @cindex Floating-point token @c FIXME: This discussion isn't quite right since 'f' and 'i' can be used as @c suffixes A @dfn{number literal} is either an @emph{integer literal} or a @emph{floating-point literal}. @sp 1 An @dfn{integer literal} has one of three forms: @enumerate @item A @dfn{decimal literal} starts with a @emph{decimal digit} and continues with any mixture of @emph{decimal digits} and @emph{underscores}. @item A @dfn{hex literal} starts with the character sequence U+0030 U+0078 (@code{"0x"}) and continues as any mixture @emph{hex digits} and @emph{underscores}. @item A @dfn{binary literal} starts with the character sequence U+0030 U+0062 (@code{"0b"}) and continues as any mixture @emph{binary digits} and @emph{underscores}. @end enumerate By default, an integer literal is of type @code{int}. An integer literal may be followed (immediately, without any spaces) by a @dfn{integer suffix}, which changes the type of the literal. There are three kinds of integer literal suffix: @enumerate @item The @code{u} suffix gives the literal type @code{uint}. @item The @code{g} suffix gives the literal type @code{big}. @item Each of the signed and unsigned machine types @code{u8}, @code{i8}, @code{u16}, @code{i16}, @code{u32}, @code{i32}, @code{u64} and @code{i64} give the literal the corresponding machine type. @end enumerate @sp 1 A @dfn{floating-point literal} has one of two forms: @enumerate @item Two @emph{decimal literals} separated by a period character U+002E ('.'), with an optional @emph{exponent} trailing after the second @emph{decimal literal}. @item A single @emph{decimal literal} followed by an @emph{exponent}. @end enumerate By default, a floating-point literal is of type @code{float}. A floating-point literal may be followed (immediately, without any spaces) by a @dfn{floating-point suffix}, which changes the type of the literal. There are only two floating-point suffixes: @code{f32} and @code{f64}. Each of these gives the floating point literal the associated type, rather than @code{float}. A set of suffixes are also reserved to accommodate literal support for types corresponding to reserved tokens. The reserved suffixes are @code{f16}, @code{f80}, @code{f128}, @code{m}, @code{m32}, @code{m64} and @code{m128}. @sp 1 A @dfn{hex digit} is either a @emph{decimal digit} or else a character in the ranges U+0061-U+0066 and U+0041-U+0046 (@code{'a'}-@code{'f'}, @code{'A'}-@code{'F'}). A @dfn{binary digit} is either the character U+0030 or U+0031 (@code{'0'} or @code{'1'}). An @dfn{exponent} begins with either of the characters U+0065 or U+0045 (@code{'e'} or @code{'E'}), followed by an optional @emph{sign character}, followed by a trailing @emph{decimal literal}. A @dfn{sign character} is either U+002B or U+002D (@code{'+'} or @code{'-'}). Examples of integer literals of various forms: @example 123; // type int 123u; // type uint 123_u; // type uint 0xff00; // type int 0xffu8; // type u8 0b1111_1111_1001_0000_i32; // type i32 0xffff_ffff_ffff_ffff_ffff_ffffg; // type big @end example Examples of floating-point literals of various forms: @example 123.0; // type float 0.1; // type float 0.1f32; // type f32 12E+99_f64; // type f64 @end example @node Ref.Lex.Text @subsection Ref.Lex.Text @c * Ref.Lex.Key:: String and character tokens. @cindex String token @cindex Character token @cindex Escape sequence @cindex Unicode A @dfn{character literal} is a single Unicode character enclosed within two U+0027 (single-quote) characters, with the exception of U+0027 itself, which must be @emph{escaped} by a preceding U+005C character ('\'). A @dfn{string literal} is a sequence of any Unicode characters enclosed within two U+0022 (double-quote) characters, with the exception of U+0022 itself, which must be @emph{escaped} by a preceding U+005C character ('\'). Some additional @emph{escapes} are available in either character or string literals. An escape starts with a U+005C ('\') and continues with one of the following forms: @itemize @item An @dfn{8-bit codepoint escape} escape starts with U+0078 ('x') and is followed by exactly two @dfn{hex digits}. It denotes the Unicode codepoint equal to the provided hex value. @item A @dfn{16-bit codepoint escape} starts with U+0075 ('u') and is followed by exactly four @dfn{hex digits}. It denotes the Unicode codepoint equal to the provided hex value. @item A @dfn{32-bit codepoint escape} starts with U+0055 ('U') and is followed by exactly eight @dfn{hex digits}. It denotes the Unicode codepoint equal to the provided hex value. @item A @dfn{whitespace escape} is one of the characters U+006E, U+0072, or U+0074, denoting the unicode values U+000A (LF), U+000D (CR) or U+0009 (HT) respectively. @item The @dfn{backslash escape} is the character U+005C ('\') which must be escaped in order to denote @emph{itself}. @end itemize @node Ref.Lex.Syntax @subsection Ref.Lex.Syntax @c * Ref.Lex.Syntax:: Syntactic extension tokens. Syntactic extensions are marked with the @emph{pound} sigil U+0023 (@code{#}), followed by an identifier, one of @code{fmt}, @code{env}, @code{concat_idents}, @code{ident_to_str}, @code{log_syntax}, @code{macro}, or the name of a user-defined macro. This is followed by a vector literal. (Its value will be interpreted syntactically; in particular, it need not be well-typed.) @emph{TODO: formalize those terms more}. @node Ref.Lex.Sym @subsection Ref.Lex.Sym @c * Ref.Lex.Sym:: Special symbol tokens. @cindex Symbol @cindex Operator The special symbols are: @sp 2 @multitable @columnfractions .1 .1 .1 .1 .1 .1 @item @code{@@} @tab @code{_} @item @code{#} @tab @code{:} @tab @code{.} @tab @code{;} @tab @code{,} @item @code{[} @tab @code{]} @tab @code{@{} @tab @code{@}} @tab @code{(} @tab @code{)} @item @code{=} @tab @code{<-} @tab @code{<->} @tab @code{->} @item @code{+} @tab @code{++} @tab @code{+=} @tab @code{-} @tab @code{--} @tab @code{-=} @item @code{*} @tab @code{/} @tab @code{%} @tab @code{*=} @tab @code{/=} @tab @code{%=} @item @code{&} @tab @code{|} @tab @code{!} @tab @code{~} @tab @code{^} @item @code{&=} @tab @code{|=} @tab @code{^=} @tab @code{!=} @item @code{>>} @tab @code{>>>} @tab @code{<<} @tab @code{<<=} @tab @code{>>=} @tab @code{>>>=} @item @code{<} @tab @code{<=} @tab @code{==} @tab @code{>=} @tab @code{>} @item @code{&&} @tab @code{||} @end multitable @page @page @node Ref.Path @section Ref.Path @c * Ref.Path:: References to items. @cindex Names of items or slots @cindex Path name @cindex Type parameters A @dfn{path} is a sequence of one or more path components separated by a namespace qualifier (@code{::}). If a path consists of only one component, it may refer to either an item or a slot in a local control scope. @xref{Ref.Mem.Slot}. @xref{Ref.Item}. If a path has multiple components, it refers to an item. Every item has a @emph{canonical path} within its crate, but the path naming an item is only meaningful within a given crate. There is no global namespace across crates; an item's canonical path merely identifies it within the crate. @xref{Ref.Comp.Crate}. Path components are usually identifiers. @xref{Ref.Lex.Ident}. The last component of a path may also have trailing explicit type arguments. Two examples of simple paths consisting of only identifier components: @example x; x::y::z; @end example In most contexts, the Rust grammar accepts a general @emph{path}, but subsequent passes may restrict paths occurring in various contexts to refer to slots or items, depending on the semantics of the occurrence. In other words: in some contexts a slot is required (for example, on the left hand side of the copy operator, @pxref{Ref.Expr.Copy}) and in other contexts an item is required (for example, as a type parameter, @pxref{Ref.Item}). In no case is the grammar made ambiguous by accepting a general path and interpreting the reference in later passes. @xref{Ref.Gram}. An example of a path with type parameters: @example m::map; @end example @page @node Ref.Gram @section Ref.Gram @c * Ref.Gram:: Grammar. @emph{TODO: mostly LL(1), it reads like C++, Alef and bits of Napier; formalize here}. @page @node Ref.Comp @section Ref.Comp @c * Ref.Comp:: Compilation and component model. @cindex Compilation model Rust is a @emph{compiled} language. Its semantics are divided along a @emph{phase distinction} between compile-time and run-time. Those semantic rules that have a @emph{static interpretation} govern the success or failure of compilation. A program that fails to compile due to violation of a compile-time rule has no defined semantics at run-time; the compiler should halt with an error report, and produce no executable artifact. The compilation model centres on artifacts called @emph{crates}. Each compilation is directed towards a single crate in source form, and if successful produces a single crate in executable form. @menu * Ref.Comp.Crate:: Units of compilation and linking. * Ref.Comp.Attr:: Attributes of crates, modules and items. * Ref.Comp.Syntax:: Syntax extensions. @end menu @node Ref.Comp.Crate @subsection Ref.Comp.Crate @c * Ref.Comp.Crate:: Units of compilation and linking. @cindex Crate A @dfn{crate} is a unit of compilation and linking, as well as versioning, distribution and runtime loading. Crates are defined by @emph{crate source files}, which are a type of source file written in a special declarative language: @emph{crate language}.@footnote{A crate is somewhat analogous to an @emph{assembly} in the ECMA-335 CLI model, a @emph{library} in the SML/NJ Compilation Manager, a @emph{unit} in the Owens and Flatt module system, or a @emph{configuration} in Mesa.} A crate source file describes: @itemize @item Metadata about the crate, such as author, name, version, and copyright. @item The source-file and directory modules that make up the crate. @item Any external crates or native modules that the crate imports to its top level. @item The organization of the crate's internal namespace. @item The set of names exported from the crate. @end itemize A single crate source file may describe the compilation of a large number of Rust source files; it is compiled in its entirety, as a single indivisible unit. The compilation phase attempts to transform a single crate source file, and its referenced contents, into a single compiled crate. Crate source files and compiled crates have a 1:1 relationship. The syntactic form of a crate is a sequence of @emph{directives}, some of which have nested sub-directives. A crate defines an implicit top-level module: within this module, all members of the crate have canonical path names. @xref{Ref.Path}. The @code{mod} directives within a crate file specify sub-modules to include in the crate: these are either directory modules, corresponding to directories in the filesystem of the compilation environment, or file modules, corresponding to Rust source files. The names given to such modules in @code{mod} directives become prefixes of the paths of items defined within any included Rust source files. If a .rs file exists in the filesystem alongside the .rc crate file, then it will be used to provide the top-level module of the crate. Similarly, directory modules may be paired with .rs files of the same name as the directory to provide the code for those modules. These source files are never mentioned explicitly in the crate file; they are simply used if they are present. The @code{use} directives within the crate specify @emph{other crates} to scan for, locate, import into the crate's module namespace during compilation, and link against at runtime. Use directives may also occur independently in rust source files. These directives may specify loose or tight ``matching criteria'' for imported crates, depending on the preferences of the crate developer. In the simplest case, a @code{use} directive may only specify a symbolic name and leave the task of locating and binding an appropriate crate to a compile-time heuristic. In a more controlled case, a @code{use} directive may specify any metadata as matching criteria, such as a URI, an author name or version number, a checksum or even a cryptographic signature, in order to select an an appropriate imported crate. @xref{Ref.Comp.Attr}. The compiled form of a crate is a loadable and executable object file full of machine code, in a standard loadable operating-system format such as ELF, PE or Mach-O. The loadable object contains metadata, describing: @itemize @item Metadata required for type reflection. @item The publicly exported module structure of the crate. @item Any metadata about the crate, defined by attributes. @item The crates to dynamically link with at run-time, with matching criteria derived from the same @code{use} directives that guided compile-time imports. @end itemize @c This might come along sometime in the future. @c The @code{syntax} directives of a crate are similar to the @code{use} @c directives, except they govern the syntax extension namespace (accessed @c through the syntax-extension sigil @code{#}, @pxref{Ref.Comp.Syntax}) @c available only at compile time. A @code{syntax} directive also makes its @c extension available to all subsequent directives in the crate file. An example of a crate: @example // Linkage attributes #[ link(name = "projx" vers = "2.5", uuid = "9cccc5d5-aceb-4af5-8285-811211826b82") ]; // Additional metadata attributes #[ desc = "Project X", license = "BSD" ]; author = "Jane Doe" ]; // Import a module. use std (ver = "1.0"); // Define some modules. #[path = "foo.rs"] mod foo; mod bar @{ #[path = "quux.rs"] mod quux; @} @end example @node Ref.Comp.Attr @subsection Ref.Comp.Attr @cindex Attributes Static entities in Rust -- crates, modules and items -- may have attributes applied to them.@footnote{Attributes in Rust are modeled on Attributes in ECMA-335, C#} An attribute is a general, free-form piece of metadata that is interpreted according to name, convention, and language and compiler version. Attributes may appear as any of: @itemize @item A single identifier, the attribute name @item An identifier followed by the equals sign '=' and a literal, providing a key/value pair @item An identifier followed by a parenthesized list of sub-attribute arguments @end itemize Attributes are applied to an entity by placing them within a hash-list (@code{#[...]}) as either a prefix to the entity or as a semicolon-delimited declaration within the entity body. An example of attributes: @example // A function marked as a unit test #[test] fn test_foo() @{ ... @} // General metadata applied to the enclosing module or crate. #[license = "BSD"]; // A conditionally-compiled module #[cfg(target_os="linux")] module bar @{ ... @} @end example In future versions of Rust, user-provided extensions to the compiler will be able to interpret attributes. When this facility is provided, a distinction will be made between language-reserved and user-available attributes. At present, only the Rust compiler interprets attributes, so all attribute names are effectively reserved. Some significant attributes include: @itemize @item The @code{cfg} attribute, for conditional-compilation by build-configuration @item The @code{link} attribute, describing linkage metadata for a crate @item The @code{test} attribute, for marking functions as unit tests. @end itemize Other attributes may be added or removed during development of the language. @node Ref.Comp.Syntax @subsection Ref.Comp.Syntax @c * Ref.Comp.Syntax:: Syntax extension. @cindex Syntax extension Rust provides a notation for @dfn{syntax extension}. The notation for invoking a syntax extension is a marked syntactic form that can appear as an expression in the body of a Rust program. @xref{Ref.Lex.Syntax}. After parsing, a syntax-extension incovation is expanded into a Rust expression. The name of the extension determines the translation performed. In future versions of Rust, user-provided syntax extensions aside from macros will be provided via external crates. At present, only a set of built-in syntax extensions, as well as macros introduced inline in source code using the @code{macro} extension, may be used. The current built-in syntax extensions are: @itemize @item @code{fmt} expands into code to produce a formatted string, similar to @code{printf} from C. @item @code{env} expands into a string literal containing the value of that environment variable at compile-time. @item @code{concat_idents} expands into an identifier which is the concatenation of its arguments. @item @code{ident_to_str} expands into a string literal containing the name of its argument (which must be a literal). @item @code{log_syntax} causes the compiler to pretty-print its arguments. @end itemize Finally, @code{macro} is used to define a new macro. A macro can abstract over second-class Rust concepts that are present in syntax. The arguments to @code{macro} are a bracketed list of pairs (two-element lists). The pairs consist of an invocation and the syntax to expand into. An example: @example #macro[[#apply[fn, [args, ...]], fn(args, ...)]]; @end example In this case, the invocation @code{#apply[sum, 5, 8, 6]} expands to @code{sum(5,8,6)}. If @code{...} follows an expression (which need not be as simple as a single identifier) in the input syntax, the matcher will expect an arbitrary number of occurences of the thing preceeding it, and bind syntax to the identifiers it contains. If it follows an expression in the output syntax, it will transcribe that expression repeatedly, according to the identifiers (bound to syntax) that it contains. The behavior of @code{...} is known as Macro By Example. It allows you to write a macro with arbitrary repetition by specifying only one case of that repetition, and following it by @code{...}, both where the repeated input is matched, and where the repeated output must be transcribed. A more sophisticated example: @example #macro[#zip_literals[[x, ...], [y, ...]], [[x, y], ...]]; #macro[#unzip_literals[[x, y], ...], [[x, ...], [y, ...]]]; @end example In this case, @code{#zip_literals[[1,2,3], [1,2,3]]} expands to @code{[[1,1],[2,2],[3,3]]}, and @code{#unzip_literals[[1,1], [2,2], [3,3]]} expands to @code{[[1,2,3],[1,2,3]]}. Macro expansion takes place outside-in: that is, @code{#unzip_literals[#zip_literals[[1,2,3],[1,2,3]]]} will fail because @code{unzip_literals} expects a list, not a macro invocation, as an argument. @c The macro system currently has some limitations. It's not possible to destructure anything other than vector literals (therefore, the arguments to complicated macros will tend to be an ocean of square brackets). Macro invocations and @code{...} can only appear in expression positions. Finally, macro expansion is currently unhygienic. That is, name collisions between macro-generated and user-written code can cause unintentional capture. @page @node Ref.Mem @section Ref.Mem @c * Ref.Mem:: Semantic model of memory. @cindex Memory model @cindex Box @cindex Slot A Rust task's memory consists of a static set of @emph{items}, a set of tasks each with its own @emph{stack}, and a @emph{heap}. Immutable portions of the heap may be shared between tasks, mutable portions may not. Allocations in the stack consist of @emph{slots}, and allocations in the heap consist of @emph{boxes}. @menu * Ref.Mem.Alloc:: Memory allocation model. * Ref.Mem.Own:: Memory ownership model. * Ref.Mem.Slot:: Stack memory model. * Ref.Mem.Box:: Heap memory model. @end menu @node Ref.Mem.Alloc @subsection Ref.Mem.Alloc @c * Ref.Mem.Alloc:: Memory allocation model. @cindex Item @cindex Stack @cindex Heap @cindex Shared box @cindex Task-local box The @dfn{items} of a program are those functions, iterators, objects, modules and types that have their value calculated at compile-time and stored uniquely in the memory image of the rust process. Items are neither dynamically allocated nor freed. A task's @dfn{stack} consists of activation frames automatically allocated on entry to each function as the task executes. A stack allocation is reclaimed when control leaves the frame containing it. The @dfn{heap} is a general term that describes two separate sets of boxes: shared boxes -- which may be subject to garbage collection -- and unique boxes. The lifetime of an allocation in the heap depends on the lifetime of the box values pointing to it. Since box values may themselves be passed in and out of frames, or stored in the heap, heap allocations may outlive the frame they are allocated within. @node Ref.Mem.Own @subsection Ref.Mem.Own @c * Ref.Mem.Own:: Memory ownership model. @cindex Ownership A task owns all memory it can @emph{safely} reach through local variables, shared or unique boxes, and/or references. Sharing memory between tasks can only be accomplished using @emph{unsafe} constructs, such as raw pointer operations or calling C code. When a task sends a value of @emph{unique} kind over a channel, it loses ownership of the value sent and can no longer refer to it. This is statically guaranteed by the combined use of ``move semantics'' and unique kinds, within the communication system. When a stack frame is exited, its local allocations are all released, and its references to boxes (both shared and owned) are dropped. A shared box may (in the case of a recursive, mutable shared type) be cyclic; in this case the release of memory inside the shared structure may be deferred until task-local garbage collection can reclaim it. Code can ensure no such delayed deallocation occurs by restricting itself to unique boxes and similar unshared kinds of data. When a task finishes, its stack is necessarily empty and it therefore has no references to any boxes; the remainder of its heap is immediately freed. @node Ref.Mem.Slot @subsection Ref.Mem.Slot @c * Ref.Mem.Slot:: Stack memory model. @cindex Stack @cindex Slot @cindex Local slot @cindex Reference slot A task's stack contains slots. A @dfn{slot} is a component of a stack frame. A slot is either @emph{local} or an @emph{alias}. A @dfn{local} slot (or @emph{stack-local} allocation) holds a value directly, allocated within the stack's memory. The value is a part of the stack frame. A @dfn{reference} references a value outside the frame. It may refer to a value allocated in another frame @emph{or} a boxed value in the heap. The reference-formation rules ensure that the referent will outlive the reference. Local slots are always implicitly mutable. Local slots are not initialized when allocated; the entire frame worth of local slots are allocated at once, on frame-entry, in an uninitialized state. Subsequent statements within a function may or may not initialize the local slots. Local slots can be used only after they have been initialized; this condition is guaranteed by the typestate system. References are created for function arguments. If the compiler can not prove that the referred-to value will outlive the reference, it will try to set aside a copy of that value to refer to. If this is not sematically safe (for example, if the referred-to value contains mutable fields), it will reject the program. If the compiler deems copying the value expensive, it will warn. A function can be declared to take an argument by mutable reference. This allows the function to write to the slot that the reference refers to. An example function that accepts an value by mutable reference: @example fn incr(&i: int) @{ i = i + 1; @} @end example @node Ref.Mem.Box @subsection Ref.Mem.Box @c * Ref.Mem.Box:: Heap memory model. @cindex Box @cindex Dereference operator A @dfn{box} is a reference to a heap allocation holding another value. There are two kinds of boxes: @emph{shared boxes} and @emph{unique boxes}. A @dfn{shared box} type or value is constructed by the prefix @emph{at} sigil @code{@@}. A @dfn{unique box} type or value is constructed by the prefix @emph{tilde} sigil @code{~}. Multiple shared box values can point to the same heap allocation; copying a shared box value makes a shallow copy of the pointer (optionally incrementing a reference count, if the shared box is implemented through reference-counting). Unique box values exist in 1:1 correspondence with their heap allocation; copying a unique box value makes a deep copy of the heap allocation and produces a pointer to the new allocation. An example of constructing one shared box type and value, and one unique box type and value: @example let x: @@int = @@10; let x: ~int = ~10; @end example Some operations implicitly dereference boxes. Examples of such @dfn{implicit dereference} operations are: @itemize @item arithmetic operators (@code{x + y - z}) @item field selection (@code{x.y.z}) @end itemize An example of an implicit-dereference operation performed on box values: @example let x: @@int = @@10; let y: @@int = @@12; assert (x + y == 22); @end example Other operations act on box values as single-word-sized address values. For these operations, to access the value held in the box requires an explicit dereference of the box value. Explicitly dereferencing a box is indicated with the unary @emph{star} operator @code{*}. Examples of such @dfn{explicit dereference} operations are: @itemize @item copying box values (@code{x = y}) @item passing box values to functions (@code{f(x,y)}) @end itemize An example of an explicit-dereference operation performed on box values: @example fn takes_boxed(b: @@int) @{ @} fn takes_unboxed(b: int) @{ @} fn main() @{ let x: @@int = @@10; takes_boxed(x); takes_unboxed(*x); @} @end example @page @node Ref.Task @section Ref.Task @c * Ref.Task:: Semantic model of tasks. @cindex Task @cindex Process An executing Rust program consists of a tree of tasks. A Rust @dfn{task} consists of an entry function, a stack, a set of outgoing communication channels and incoming communication ports, and ownership of some portion of the heap of a single operating-system process. Multiple Rust tasks may coexist in a single operating-system process. Execution of multiple Rust tasks in a single operating-system process may be either truly concurrent or interleaved by the runtime scheduler. Rust tasks are lightweight: each consumes less memory than an operating-system process, and switching between Rust tasks is faster than switching between operating-system processes. @menu * Ref.Task.Comm:: Inter-task communication. * Ref.Task.Life:: Task lifecycle and state transitions. * Ref.Task.Sched:: Task scheduling model. * Ref.Task.Spawn:: Library interface for making new tasks. * Ref.Task.Send:: Library interface for sending messages. * Ref.Task.Recv:: Library interface for receiving messages. @end menu @node Ref.Task.Comm @subsection Ref.Task.Comm @c * Ref.Task.Comm:: Inter-task communication. @cindex Communication @cindex Port @cindex Channel @cindex Message passing @cindex Send expression @cindex Receive expression With the exception of @emph{unsafe} blocks, Rust tasks are isolated from interfering with one another's memory directly. Instead of manipulating shared storage, Rust tasks communicate with one another using a typed, asynchronous, simplex message-passing system. A @dfn{port} is a communication endpoint that can @emph{receive} messages. Ports receive messages from channels. A @dfn{channel} is a communication endpoint that can @emph{send} messages. Channels send messages to ports. Each port is implicitly boxed and mutable; as such a port has a unique per-task identity and cannot be replicated or transmitted. If a port value is copied, both copies refer to the @emph{same} port. New ports can be constructed dynamically and stored in data structures. Each channel is bound to a port when the channel is constructed, so the destination port for a channel must exist before the channel itself. A channel cannot be rebound to a different port from the one it was constructed with. Channels are weak: a channel does not keep the port it is bound to alive. Ports are owned by their allocating task and cannot be sent over channels; if a task dies its ports die with it, and all channels bound to those ports no longer function. Messages sent to a channel connected to a dead port will be dropped. Channels are immutable types with meaning known to the runtime; channels can be sent over channels. Many channels can be bound to the same port, but each channel is bound to a single port. In other words, channels and ports exist in an N:1 relationship, N channels to 1 port. @footnote{It may help to remember nautical terminology when differentiating channels from ports. Many different waterways -- channels -- may lead to the same port.} Each port and channel can carry only one type of message. The message type is encoded as a parameter of the channel or port type. The message type of a channel is equal to the message type of the port it is bound to. The types of messages must be of @emph{unique} kind. Messages are generally sent asynchronously, with optional rate-limiting on the transmit side. A channel contains a message queue and asynchronously sending a message merely inserts it into the sending channel's queue; message receipt is the responsibility of the receiving task. Messages are sent on channels and received on ports using standard library functions. @node Ref.Task.Life @subsection Ref.Task.Life @c * Ref.Task.Life:: Task lifecycle and state transitions. @cindex Lifecycle of task @cindex Scheduling @cindex Running, task state @cindex Blocked, task state @cindex Failing, task state @cindex Dead, task state @cindex Soft failure @cindex Hard failure The @dfn{lifecycle} of a task consists of a finite set of states and events that cause transitions between the states. The lifecycle states of a task are: @itemize @item running @item blocked @item failing @item dead @end itemize A task begins its lifecycle -- once it has been spawned -- in the @emph{running} state. In this state it executes the statements of its entry function, and any functions called by the entry function. A task may transition from the @emph{running} state to the @emph{blocked} state any time it evaluates a communication expression on a port or channel that cannot be immediately completed. When the communication expression can be completed -- when a message arrives at a sender, or a queue drains sufficiently to complete a semi-synchronous send -- then the blocked task will unblock and transition back to @emph{running}. A task may transition to the @emph{failing} state at any time, due to an un-trapped signal or the evaluation of a @code{fail} expression. Once @emph{failing}, a task unwinds its stack and transitions to the @emph{dead} state. Unwinding the stack of a task is done by the task itself, on its own control stack. If a value with a destructor is freed during unwinding, the code for the destructor is run, also on the task's control stack. Running the destructor code causes a temporary transition to a @emph{running} state, and allows the destructor code to cause any subsequent state transitions. The original task of unwinding and failing thereby may suspend temporarily, and may involve (recursive) unwinding of the stack of a failed destructor. Nonetheless, the outermost unwinding activity will continue until the stack is unwound and the task transitions to the @emph{dead} state. There is no way to ``recover'' from task failure. Once a task has temporarily suspended its unwinding in the @emph{failing} state, failure occurring from within this destructor results in @emph{hard} failure. The unwinding procedure of hard failure frees resources but does not execute destructors. The original (soft) failure is still resumed at the point where it was temporarily suspended. A task in the @emph{dead} state cannot transition to other states; it exists only to have its termination status inspected by other tasks, and/or to await reclamation when the last reference to it drops. @node Ref.Task.Sched @subsection Ref.Task.Sched @c * Ref.Task.Sched:: Task scheduling model. @cindex Scheduling @cindex Preemption @cindex Yielding control The currently scheduled task is given a finite @emph{time slice} in which to execute, after which it is @emph{descheduled} at a loop-edge or similar preemption point, and another task within is scheduled, pseudo-randomly. An executing task can @code{yield} control at any time, which deschedules it immediately. Entering any other non-executing state (blocked, dead) similarly deschedules the task. @node Ref.Task.Spawn @subsection Ref.Task.Spawn @c * Ref.Task.Spawn:: Calls for creating new tasks. @cindex Spawn expression A call to @code{std::task::spawn}, passing a 0-argument function as its single argument, causes the runtime to construct a new task executing the passed function. The passed function is referred to as the @dfn{entry function} for the spawned task, and any captured environment is carries is moved from the spawning task to the spawned task before the spawned task begins execution. The result of a @code{spawn} call is a @code{std::task::task} value. An example of a @code{spawn} call: @example import std::task::*; import std::comm::*; fn helper(c: chan) @{ // do some work. let result = ...; send(c, result); @} let p: port; spawn(bind helper(chan(p))); // let task run, do other things. // ... let result = recv(p); @end example @node Ref.Task.Send @subsection Ref.Task.Send @c * Ref.Task.Send:: Calls for sending a value into a channel. @cindex Send call @cindex Messages @cindex Communication Sending a value into a channel is done by a library call to @code{std::comm::send}, which takes a channel and a value to send, and moves the value into the channel's outgoing buffer. An example of a send: @example import std::comm::*; let c: chan = @dots{}; send(c, "hello, world"); @end example @node Ref.Task.Recv @subsection Ref.Task.Recv @c * Ref.Task.Recv:: Calls for receiving a value from a channel. @cindex Receive call @cindex Messages @cindex Communication Receiving a value is done by a call to the @code{recv} method, on an object of type @code{std::comm::port}. This call causes the receiving task to enter the @emph{blocked reading} state until a task is sending a value to the port, at which point the runtime pseudo-randomly selects a sending task and moves a value from the head of one of the task queues to the call's return value, and un-blocks the receiving task. @xref{Ref.Run.Comm}. An example of a @emph{receive}: @example import std::comm::*; let p: port = @dots{}; let s: str = recv(p); @end example @page @node Ref.Item @section Ref.Item @c * Ref.Item:: The components of a module. @cindex Item @cindex Type parameters @cindex Module item An @dfn{item} is a component of a module. Items are entirely determined at compile-time, remain constant during execution, and may reside in read-only memory. There are five primary kinds of item: modules, functions, iterators, objects and type definitions. All items form an implicit scope for the declaration of sub-items. In other words, within a function, object or iterator, declarations of items can (in many cases) be mixed with the statements, control blocks, and similar artifacts that otherwise compose the item body. The meaning of these scoped items is the same as if the item was declared outside the scope, except that the item's @emph{path name} within the module namespace is qualified by the name of the enclosing item. The exact locations in which sub-items may be declared is given by the grammar. @xref{Ref.Gram}. Functions, iterators, objects and type definitions may be @emph{parametrized} by type. Type parameters are given as a comma-separated list of identifiers enclosed in angle brackets (@code{<>}), after the name of the item and before its definition. The type parameters of an item are part of the name, not the type of the item; in order to refer to the type-parametrized item, a referencing name must in general provide type arguments as a list of comma-separated types enclosed within angle brackets. In practice, the type-inference system can usually infer such argument types from context. There are no general parametric types. @menu * Ref.Item.Mod:: Items defining modules. * Ref.Item.Fn:: Items defining functions. * Ref.Item.Pred:: Items defining predicates for typestates. * Ref.Item.Obj:: Items defining objects. * Ref.Item.Type:: Items defining the types of values and slots. * Ref.Item.Tag:: Items defining the constructors of a tag type. @end menu @node Ref.Item.Mod @subsection Ref.Item.Mod @c * Ref.Item.Mod:: Items defining sub-modules. @cindex Module item @cindex Importing names @cindex Exporting names @cindex Visibility control A @dfn{module item} contains declarations of other @emph{items}. The items within a module may be functions, modules, objects or types. These declarations have both static and dynamic interpretation. The purpose of a module is to organize @emph{names} and control @emph{visibility}. Modules are declared with the keyword @code{mod}. An example of a module: @example mod math @{ type complex = (f64,f64); fn sin(f64) -> f64 @{ @dots{} @} fn cos(f64) -> f64 @{ @dots{} @} fn tan(f64) -> f64 @{ @dots{} @} @dots{} @} @end example Modules may also include any number of @dfn{import and export declarations}. These declarations must precede any module item declarations within the module, and control the visibility of names both within the module and outside of it. @menu * Ref.Item.Mod.Import:: Declarations for module-local synonyms. * Ref.Item.Mod.Export:: Declarations for restricting visibility. @end menu @node Ref.Item.Mod.Import @subsubsection Ref.Item.Mod.Import @c * Ref.Item.Mod.Import:: Declarations for module-local synonyms. @cindex Importing names @cindex Visibility control An @dfn{import declaration} creates one or more local name bindings synonymous with some other name. Usually an import declaration is used to shorten the path required to refer to a module item. @emph{Note}: unlike many languages, Rust's @code{import} declarations do @emph{not} declare linkage-dependency with external crates. Linkage dependencies are independently declared with @code{use} declarations. @xref{Ref.Comp.Crate}. An example of imports: @example import std::math::sin; import std::option::*; import std::str::@{char_at, hash@}; fn main() @{ // Equivalent to 'log std::math::sin(1.0);' log sin(1.0); // Equivalent to 'log std::option::some(1.0);' log some(1.0); // Equivalent to 'log std::str::hash(std::str::char_at("foo"));' log hash(char_at("foo")); @} @end example @node Ref.Item.Mod.Export @subsubsection Ref.Item.Mod.Export @c * Ref.Item.Mod.Import:: Declarations for restricting visibility. @cindex Exporting names @cindex Visibility control An @dfn{export declaration} restricts the set of local declarations within a module that can be accessed from code outside the module. By default, all local declarations in a module are exported. If a module contains an export declaration, this declaration replaces the default export with the export specified. An example of an export: @example mod foo @{ export primary; fn primary() @{ helper(1, 2); helper(3, 4); @} fn helper(x: int, y: int) @{ @dots{} @} @} fn main() @{ foo::primary(); // Will compile. foo::helper(2,3) // ERROR: will not compile. @} @end example Multiple items may be exported from a single export declaration: @example mod foo @{ export primary, secondary; fn primary() @{ helper(1, 2); helper(3, 4); @} fn secondary() @{ @dots{} @} fn helper(x: int, y: int) @{ @dots{} @} @} @end example @node Ref.Item.Fn @subsection Ref.Item.Fn @c * Ref.Item.Fn:: Items defining functions. @cindex Functions @cindex Slots, function input and output A @dfn{function item} defines a sequence of statements associated with a name and a set of parameters. Functions are declared with the keyword @code{fn}. Functions declare a set of @emph{input slots} as parameters, through which the caller passes arguments into the function, and an @emph{output slot} through which the function passes results back to the caller. A function may also be copied into a first class @emph{value}, in which case the value has the corresponding @emph{function type}, and can be used otherwise exactly as a function item (with a minor additional cost of calling the function, as such a call is indirect). @xref{Ref.Type.Fn}. Every control path in a function ends with a @code{ret} or @code{be} expression or with a diverging expression (described later in this section). If a control path lacks a @code{ret} expression in source code, an implicit @code{ret} expression is appended to the end of the control path during compilation, returning the implicit @code{()} value. An example of a function: @example fn add(x: int, y: int) -> int @{ ret x + y; @} @end example A special kind of function can be declared with a @code{!} character where the output slot type would normally be. For example: @example fn my_err(s: str) -> ! @{ log s; fail; @} @end example We call such functions ``diverging'' because they never return a value to the caller. Every control path in a diverging function must end with a @code{fail} or a call to another diverging function on every control path. The @code{!} annotation does @emph{not} denote a type. Rather, the result type of a diverging function is a special type called @math{\bot} (``bottom'') that unifies with any type. Rust has no syntax for @math{\bot}. It might be necessary to declare a diverging function because as mentioned previously, the typechecker checks that every control path in a function ends with a @code{ret}, @code{be}, or diverging expression. So, if @code{my_err} were declared without the @code{!} annotation, the following code would not typecheck: @example fn f(i: int) -> int @{ if i == 42 @{ ret 42; @} else @{ my_err("Bad number!"); @} @} @end example The typechecker would complain that @code{f} doesn't return a value in the @code{else} branch. Adding the @code{!} annotation on @code{my_err} would express that @code{f} requires no explicit @code{ret}, as if it returns control to the caller, it returns a value (true because it never returns control). @node Ref.Item.Pred @subsection Ref.Item.Pred @c * Ref.Item.Pred:: Items defining predicates. @cindex Predicate Any pure boolean function is called a @emph{predicate}, and may be used as part of the static typestate system. @xref{Ref.Typestate.Constr}. A predicate declaration is identical to a function declaration, except that it is declared with the additional keyword @code{pure}. In addition, the typechecker checks the body of a predicate with a restricted set of typechecking rules. A predicate @itemize @item may not contain a @code{put}, @code{send}, @code{recv}, assignment, or self-call expression; and @item may only call other predicates, not general functions. @end itemize An example of a predicate: @example pure fn lt_42(x: int) -> bool @{ ret (x < 42); @} @end example A non-boolean function may also be declared with @code{pure fn}. This allows predicates to call non-boolean functions as long as they are pure. For example: @example pure fn pure_length<@@T>(ls: list) -> uint @{ /* ... */ @} pure fn nonempty_list<@@T>(ls: list) -> bool @{ pure_length(ls) > 0u @} @end example In this example, @code{nonempty_list} is a predicate---it can be used in a typestate constraint---but the auxiliary function @code{pure_length}@ is not. @emph{ToDo:} should actually define referential transparency. The effect checking rules previously enumerated are a restricted set of typechecking rules meant to approximate the universe of observably referentially transparent Rust procedures conservatively. Sometimes, these rules are @emph{too} restrictive. Rust allows programmers to violate these rules by writing predicates that the compiler cannot prove to be referentially transparent, using an escape-hatch feature called ``unchecked blocks''. When writing code that uses unchecked blocks, programmers should always be aware that they have an obligation to show that the code @emph{behaves} referentially transparently at all times, even if the compiler cannot @emph{prove} automatically that the code is referentially transparent. In the presence of unchecked blocks, the compiler provides no static guarantee that the code will behave as expected at runtime. Rather, the programmer has an independent obligation to verify the semantics of the predicates they write. @emph{ToDo:} last two sentences are vague. An example of a predicate that uses an unchecked block: @example fn pure_foldl<@@T, @@U>(ls: list, u: U, f: block(&T, &U) -> U) -> U @{ alt ls @{ nil. @{ u @} cons(hd, tl) @{ f(hd, pure_foldl(*tl, f(hd, u), f)) @} @} @} pure fn pure_length<@@T>(ls: list) -> uint @{ fn count(_t: T, u: uint) -> uint @{ u + 1u @} unchecked @{ pure_foldl(ls, 0u, count) @} @} @end example Despite its name, @code{pure_foldl} is a @code{fn}, not a @code{pure fn}, because there is no way in Rust to specify that the higher-order function argument @code{f} is a pure function. So, to use @code{foldl} in a pure list length function that a predicate could then use, we must use an @code{unchecked} block wrapped around the call to @code{pure_foldl} in the definition of @code{pure_length}. @node Ref.Item.Obj @subsection Ref.Item.Obj @c * Ref.Item.Obj:: Items defining objects. @cindex Objects @cindex Object constructors An @dfn{object item} defines the @emph{state} and @emph{methods} of a set of @emph{object values}. Object values have object types. @xref{Ref.Type.Obj}. An @emph{object item} declaration -- in addition to providing a scope for state and method declarations -- implicitly declares a static function called the @emph{object constructor}, as well as a named @emph{object type}. The name given to the object item is resolved to a type when used in type context, or a constructor function when used in value context (such as a call). Example of an object item: @example obj counter(state: @@mutable int) @{ fn incr() @{ *state += 1; @} fn get() -> int @{ ret *state; @} @} let c: counter = counter(@@mutable 1); c.incr(); c.incr(); assert c.get() == 3; @end example Inside an object's methods, you can make @emph{self-calls} using the @code{self} keyword. @example obj my_obj() @{ fn get() -> int @{ ret 3; @} fn foo() -> int @{ let c = self.get(); ret c + 2; @} @} let o = my_obj(); assert o.foo() == 5; @end example Rust objects are extendable with additional methods and fields using @emph{anonymous object} expressions. @xref{Ref.Expr.AnonObj}. @node Ref.Item.Type @subsection Ref.Item.Type @c * Ref.Item.Type:: Items defining the types of values and slots. @cindex Type definitions A @dfn{type definition} defines a set of possible values in memory. @xref{Ref.Type}. Type definitions are declared with the keyword @code{type}. Every value has a single, specific type; the type-specified aspects of a value include: @itemize @item Whether the value is composed of sub-values or is indivisible. @item Whether the value represents textual or numerical information. @item Whether the value represents integral or floating-point information. @item The sequence of memory operations required to access the value. @item The @emph{kind} of the type (pinned, unique or shared). @end itemize For example, the type @code{@{x: u8, y: u8@}} defines the set of immutable values that are composite records, each containing two unsigned 8-bit integers accessed through the components @code{x} and @code{y}, and laid out in memory with the @code{x} component preceding the @code{y} component. This type is of @emph{unique} kind, meaning that there is no shared substructure with other types, but it can be copied and moved freely. @node Ref.Item.Tag @subsection Ref.Item.Tag @c * Ref.Item.Type:: Items defining the constructors of a tag type. @cindex Tag types A tag item simultaneously declares a new nominal tag type (@pxref{Ref.Type.Tag}) as well as a set of @emph{constructors} that can be used to create or pattern-match values of the corresponding tag type. The constructors of a @code{tag} type may be recursive: that is, each constructor may take an argument that refers, directly or indirectly, to the tag type the constructor is a member of. Such recursion has restrictions: @itemize @item Recursive types can be introduced only through @code{tag} constructors. @item A recursive @code{tag} item must have at least one non-recursive constructor (in order to give the recursion a basis case). @item The recursive argument of recursive tag constructors must be @emph{box} values (in order to bound the in-memory size of the constructor). @item Recursive type definitions can cross module boundaries, but not module @emph{visibility} boundaries, nor crate boundaries (in order to simplify the module system). @end itemize An example of a @code{tag} item and its use: @example tag animal @{ dog; cat; @} let a: animal = dog; a = cat; @end example An example of a @emph{recursive} @code{tag} item and its use: @example tag list @{ nil; cons(T, @@list); @} let a: list = cons(7, @@cons(13, @@nil)); @end example @page @node Ref.Type @section Ref.Type @cindex Types Every slot and value in a Rust program has a type. The @dfn{type} of a @emph{value} defines the interpretation of the memory holding it. The type of a @emph{slot} may also include constraints. @xref{Ref.Type.Constr}. Built-in types and type-constructors are tightly integrated into the language, in nontrivial ways that are not possible to emulate in user-defined types. User-defined types have limited capabilities. In addition, every built-in type or type-constructor name is reserved as a @emph{keyword} in Rust; they cannot be used as user-defined identifiers in any context. @menu * Ref.Type.Any:: An open union of every possible type. * Ref.Type.Mach:: Machine-level types. * Ref.Type.Int:: The machine-dependent integer types. * Ref.Type.Float:: The machine-dependent floating-point types. * Ref.Type.Prim:: Primitive types. * Ref.Type.Big:: The arbitrary-precision integer type. * Ref.Type.Text:: Strings and characters. * Ref.Type.Rec:: Labeled products of heterogeneous types. * Ref.Type.Tup:: Unlabeled products of heterogeneous types. * Ref.Type.Vec:: Open products of homogeneous types. * Ref.Type.Tag:: Disjoint unions of heterogeneous types. * Ref.Type.Fn:: Subroutine types. * Ref.Type.Obj:: Abstract types. * Ref.Type.Constr:: Constrained types. * Ref.Type.Type:: Types describing types. @end menu @node Ref.Type.Any @subsection Ref.Type.Any @cindex Any type @cindex Dynamic type, see @i{Any type} @cindex Alt type expression The type @code{any} is the union of all possible Rust types. A value of type @code{any} is represented in memory as a pair consisting of a boxed value of some non-@code{any} type @var{T} and a reflection of the type @var{T}. Values of type @code{any} can be used in an @code{alt type} expression, in which the reflection is used to select a block corresponding to a particular type extraction. @xref{Ref.Expr.Alt}. @node Ref.Type.Mach @subsection Ref.Type.Mach @cindex Machine types @cindex Floating-point types @cindex Integer types @cindex Word types The machine types are the following: @itemize @item The unsigned word types @code{u8}, @code{u16}, @code{u32} and @code{u64}, with values drawn from the integer intervals @iftex @math{[0, 2^8 - 1]}, @math{[0, 2^{16} - 1]}, @math{[0, 2^{32} - 1]} and @math{[0, 2^{64} - 1]} @end iftex @ifhtml @html [0, 28-1], [0, 216-1], [0, 232-1] and [0, 264-1] @end html @end ifhtml respectively. @item The signed two's complement word types @code{i8}, @code{i16}, @code{i32} and @code{i64}, with values drawn from the integer intervals @iftex @math{[-(2^7),(2^7)-1)]}, @math{[-(2^{15}),2^{15}-1)]}, @math{[-(2^{31}),2^{31}-1)]} and @math{[-(2^{63}),2^{63}-1)]} @end iftex @ifhtml @html [-(27), 27-1], [-(215), 215-1], [-(231), 231-1] and [-(263), 263-1] @end html @end ifhtml respectively. @item The IEEE 754-2008 @code{binary32} and @code{binary64} floating-point types: @code{f32} and @code{f64}, respectively. @end itemize @node Ref.Type.Int @subsection Ref.Type.Int @cindex Machine-dependent types @cindex Integer types @cindex Word types The Rust type @code{uint}@footnote{A Rust @code{uint} is analogous to a C99 @code{uintptr_t}.} is an unsigned integer type with with target-machine-dependent size. Its size, in bits, is equal to the number of bits required to hold any memory address on the target machine. The Rust type @code{int}@footnote{A Rust @code{int} is analogous to a C99 @code{intptr_t}.} is a two's complement signed integer type with target-machine-dependent size. Its size, in bits, is equal to the size of the rust type @code{uint} on the same target machine. @node Ref.Type.Float @subsection Ref.Type.Float @cindex Machine-dependent types @cindex Floating-point types The Rust type @code{float} is a machine-specific type equal to one of the supported Rust floating-point machine types (@code{f32} or @code{f64}). It is the largest floating-point type that is directly supported by hardware on the target machine, or if the target machine has no floating-point hardware support, the largest floating-point type supported by the software floating-point library used to support the other floating-point machine types. Note that due to the preference for hardware-supported floating-point, the type @code{float} may not be equal to the largest @emph{supported} floating-point type. @node Ref.Type.Prim @subsection Ref.Type.Prim @cindex Primitive types @cindex Integer types @cindex Floating-point types @cindex Character type @cindex Boolean type The primitive types are the following: @itemize @item The ``nil'' type @code{()}, having the single ``nil'' value @code{()}.@footnote{The ``nil'' value @code{()} is @emph{not} a sentinel ``null pointer'' value for reference slots; the ``nil'' type is the implicit return type from functions otherwise lacking a return type, and can be used in other contexts (such as message-sending or type-parametric code) as a zero-size type.} @item The boolean type @code{bool} with values @code{true} and @code{false}. @item The machine types. @item The machine-dependent integer and floating-point types. @end itemize @node Ref.Type.Big @subsection Ref.Type.Big @cindex Integer types @cindex Big integer type The Rust type @code{big}@footnote{A Rust @code{big} is analogous to a Lisp bignum or a Python long integer.} is an arbitrary precision integer type that fits in a machine word @emph{when possible} and transparently expands to a boxed ``big integer'' allocated in the run-time heap when it overflows or underflows outside of the range of a machine word. A Rust @code{big} grows to accommodate extra binary digits as they are needed, by taking extra memory from the memory budget available to each Rust task, and should only exhaust its range due to memory exhaustion. @node Ref.Type.Text @subsection Ref.Type.Text @cindex Text types @cindex String type @cindex Character type @cindex Unicode @cindex UCS-4 @cindex UTF-8 The types @code{char} and @code{str} hold textual data. A value of type @code{char} is a Unicode character, represented as a 32-bit unsigned word holding a UCS-4 codepoint. A value of type @code{str} is a Unicode string, represented as a vector of 8-bit unsigned bytes holding a sequence of UTF-8 codepoints. @node Ref.Type.Rec @subsection Ref.Type.Rec @cindex Record types @cindex Structure types, see @i{Record types} The record type-constructor forms a new heterogeneous product of values.@footnote{The record type-constructor is analogous to the @code{struct} type-constructor in the Algol/C family, the @emph{record} types of the ML family, or the @emph{structure} types of the Lisp family.} Fields of a record type are accessed by name and are arranged in memory in the order specified by the record type. An example of a record type and its use: @example type point = @{x: int, y: int@}; let p: point = @{x: 10, y: 11@}; let px: int = p.x; @end example @node Ref.Type.Tup @subsection Ref.Type.Tup @cindex Tuple types The tuple type-constructor forms a new heterogeneous product of values similar to the record type-constructor. The differences are as follows: @itemize @item tuple elements cannot be mutable, unlike record fields @item tuple elements are not named and can be accessed only by pattern-matching @end itemize Tuple types and values are denoted by listing the types or values of their elements, respectively, in a parenthesized, comma-separated list. Single-element tuples are not legal; all tuples have two or more values. The members of a tuple are laid out in memory contiguously, like a record, in order specified by the tuple type. An example of a tuple type and its use: @example type pair = (int,str); let p: pair = (10,"hello"); let (a, b) = p; assert (b == "world"); @end example @node Ref.Type.Vec @subsection Ref.Type.Vec @cindex Vector types @cindex Array types, see @i{Vector types} The vector type-constructor represents a homogeneous array of values of a given type. A vector has a fixed size. The kind of a vector type depends on the kind of its member type, as with other simple structural types. An example of a vector type and its use: @example let v: [int] = [7, 5, 3]; let i: int = v[2]; assert (i == 3); @end example Vectors always @emph{allocate} a storage region sufficient to store the first power of two worth of elements greater than or equal to the size of the vector. This behaviour supports idiomatic in-place ``growth'' of a mutable slot holding a vector: @example let v: mutable [int] = [1, 2, 3]; v += [4, 5, 6]; @end example Normal vector concatenation causes the allocation of a fresh vector to hold the result; in this case, however, the slot holding the vector recycles the underlying storage in-place (since the reference-count of the underlying storage is equal to 1). All accessible elements of a vector are always initialized, and access to a vector is always bounds-checked. @node Ref.Type.Tag @subsection Ref.Type.Tag @cindex Tag types @cindex Union types, see @i{Tag types} A @emph{tag type} is a nominal, heterogeneous disjoint union type.@footnote{The @code{tag} type is analogous to a @code{data} constructor declaration in ML or a @emph{pick ADT} in Limbo.} A @code{tag} @emph{item} consists of a number of @emph{constructors}, each of which is independently named and takes an optional tuple of arguments. Tag types cannot be denoted @emph{structurally} as types, but must be denoted by named reference to a @emph{tag item} declaration. @xref{Ref.Item.Tag}. @node Ref.Type.Fn @subsection Ref.Type.Fn @cindex Function types The function type-constructor @code{fn} forms new function types. A function type consists of a sequence of input slots, an optional set of input constraints (@pxref{Ref.Typestate.Constr}) and an output slot. @xref{Ref.Item.Fn}. An example of a @code{fn} type: @example fn add(x: int, y: int) -> int @{ ret x + y; @} let int x = add(5,7); type binop = fn(int,int) -> int; let bo: binop = add; x = bo(5,7); @end example @node Ref.Type.Obj @subsection Ref.Type.Obj @c * Ref.Type.Obj:: Object types. @cindex Object types A @dfn{object type} describes values of abstract type, that carry some hidden @emph{fields} and are accessed through a set of un-ordered @emph{methods}. Every object item (@pxref{Ref.Item.Obj}) implicitly declares an object type carrying methods with types derived from all the methods of the object item. Object types can also be declared in isolation, independent of any object item declaration. Such a ``plain'' object type can be used to describe an interface that a variety of particular objects may conform to, by supporting a superset of the methods. The kind of an object type serves as a restriction to the kinds of fields that may be stored in it. Unique objects, for example, can only carry unique values in their fields. An example of an object type with two separate object items supporting it, and a client function using both items via the object type: @example type taker = obj @{ fn take(int); @}; obj adder(x: @@mutable int) @{ fn take(y: int) @{ *x += y; @} @} obj sender(c: chan) @{ fn take(z: int) @{ std::comm::send(c, z); @} @} fn give_ints(t: taker) @{ t.take(1); t.take(2); t.take(3); @} let p: port = std::comm::mk_port(); let t1: taker = adder(@@mutable 0); let t2: taker = sender(p.mk_chan()); give_ints(t1); give_ints(t2); @end example @node Ref.Type.Constr @subsection Ref.Type.Constr @c * Ref.Type.Constr:: Constrained types. @cindex Constrained types A @dfn{constrained type} is a type that carries a @emph{formal constraint} (@pxref{Ref.Typestate.Constr}), which is similar to a normal constraint except that the @emph{base name} of any slots mentioned in the constraint must be the special @emph{formal symbol} @emph{*}. When a constrained type is instantiated in a particular slot declaration, the formal symbol in the constraint is replaced with the name of the declared slot and the resulting constraint is checked immediately after the slot is declared. @xref{Ref.Expr.Check}. An example of a constrained type with two separate instantiations: @example type ordered_range = @{low: int, high: int@} : less_than(*.low, *.high); let rng1: ordered_range = @{low: 5, high: 7@}; // implicit: 'check less_than(rng1.low, rng1.high);' let rng2: ordered_range = @{low: 15, high: 17@}; // implicit: 'check less_than(rng2.low, rng2.high);' @end example @node Ref.Type.Type @subsection Ref.Type.Type @c * Ref.Type.Type:: Types describing types. @cindex Type type @emph{TODO}. @node Ref.Typestate @section Ref.Typestate @c * Ref.Typestate:: The static system of predicate analysis. @cindex Typestate system Rust programs have a static semantics that determine the types of values produced by each expression, as well as the @emph{predicates} that hold over slots in the environment at each point in time during execution. The latter semantics -- the dataflow analysis of predicates holding over slots -- is called the @emph{typestate} system. @menu * Ref.Typestate.Point:: Discrete positions in execution. * Ref.Typestate.CFG:: The control-flow graph formed by points. * Ref.Typestate.Constr:: Predicates applied to slots. * Ref.Typestate.Cond:: Constraints required and implied by a point. * Ref.Typestate.State:: Constraints that hold at points. * Ref.Typestate.Check:: Relating dynamic state to static typestate. @end menu @node Ref.Typestate.Point @subsection Ref.Typestate.Point @c * Ref.Typestate.Point:: Discrete positions in execution. @cindex Points Control flows from statement to statement in a block, and through the evaluation of each expression, from one sub-expression to another. This sequential control flow is specified as a set of @dfn{points}, each of which has a set of points before and after it in the implied control flow. For example, this code: @example s = "hello, world"; print(s); @end example Consists of 2 statements, 3 expressions and 12 points: @itemize @item the point before the first statement @item the point before evaluating the static initializer @code{"hello, world"} @item the point after evaluating the static initializer @code{"hello, world"} @item the point after the first statement @item the point before the second statement @item the point before evaluating the function value @code{print} @item the point after evaluating the function value @code{print} @item the point before evaluating the arguments to @code{print} @item the point before evaluating the symbol @code{s} @item the point after evaluating the symbol @code{s} @item the point after evaluating the arguments to @code{print} @item the point after the second statement @end itemize Whereas this code: @example print(x() + y()); @end example Consists of 1 statement, 7 expressions and 14 points: @itemize @item the point before the statement @item the point before evaluating the function value @code{print} @item the point after evaluating the function value @code{print} @item the point before evaluating the arguments to @code{print} @item the point before evaluating the arguments to @code{+} @item the point before evaluating the function value @code{x} @item the point after evaluating the function value @code{x} @item the point before evaluating the arguments to @code{x} @item the point after evaluating the arguments to @code{x} @item the point before evaluating the function value @code{y} @item the point after evaluating the function value @code{y} @item the point before evaluating the arguments to @code{y} @item the point after evaluating the arguments to @code{y} @item the point after evaluating the arguments to @code{+} @item the point after evaluating the arguments to @code{print} @end itemize The typestate system reasons over points, rather than statements or expressions. This may seem counter-intuitive, but points are the more primitive concept. Another way of thinking about a point is as a set of @emph{instants in time} at which the state of a task is fixed. By contrast, a statement or expression represents a @emph{duration in time}, during which the state of the task changes. The typestate system is concerned with constraining the possible states of a task's memory at @emph{instants}; it is meaningless to speak of the state of a task's memory ``at'' a statement or expression, as each statement or expression is likely to change the contents of memory. @node Ref.Typestate.CFG @subsection Ref.Typestate.CFG @c * Ref.Typestate.CFG:: The control-flow graph formed by points. @cindex Control-flow graph Each @emph{point} can be considered a vertex in a directed @emph{graph}. Each kind of expression or statement implies a number of points @emph{and edges} in this graph. The edges connect the points within each statement or expression, as well as between those points and those of nearby statements and expressions in the program. The edges between points represent @emph{possible} indivisible control transfers that might occur during execution. This implicit graph is called the @dfn{control-flow graph}, or @dfn{CFG}. @node Ref.Typestate.Constr @subsection Ref.Typestate.Constr @c * Ref.Typestate.Constr:: Predicates applied to slots. @cindex Predicate @cindex Constraint A @dfn{predicate} is a pure boolean function declared with the keyword @code{pred}. @xref{Ref.Item.Pred}. A @dfn{constraint} is a predicate applied to specific slots. For example, consider the following code: @example pure fn is_less_than(int a, int b) -> bool @{ ret a < b; @} fn test() @{ let x: int = 10; let y: int = 20; check is_less_than(x,y); @} @end example This example defines the predicate @code{is_less_than}, and applies it to the slots @code{x} and @code{y}. The constraint being checked on the third line of the function is @code{is_less_than(x,y)}. Predicates can only apply to slots holding immutable values. The slots a predicate applies to can themselves be mutable, but the types of values held in those slots must be immutable. @node Ref.Typestate.Cond @subsection Ref.Typestate.Cond @c * Ref.Typestate.Cond:: Constraints required and implied by a point. @cindex Condition @cindex Precondition @cindex Postcondition A @dfn{condition} is a set of zero or more constraints. Each @emph{point} has an associated @emph{condition}: @itemize @item The @dfn{precondition} of a statement or expression is the condition required at in the point before it. @item The @dfn{postcondition} of a statement or expression is the condition enforced in the point after it. @end itemize Any constraint present in the precondition and @emph{absent} in the postcondition is considered to be @emph{dropped} by the statement or expression. @node Ref.Typestate.State @subsection Ref.Typestate.State @c * Ref.Typestate.State:: Constraints that hold at points. @cindex Typestate @cindex Prestate @cindex Poststate The typestate checking system @emph{calculates} an additional condition for each point called its typestate. For a given statement or expression, we call the two typestates associated with its two points the prestate and a poststate. @itemize @item The @dfn{prestate} of a statement or expression is the typestate of the point before it. @item The @dfn{poststate} of a statement or expression is the typestate of the point after it. @end itemize A @dfn{typestate} is a condition that has @emph{been determined by the typestate algorithm} to hold at a point. This is a subtle but important point to understand: preconditions and postconditions are @emph{inputs} to the typestate algorithm; prestates and poststates are @emph{outputs} from the typestate algorithm. The typestate algorithm analyses the preconditions and postconditions of every statement and expression in a block, and computes a condition for each typestate. Specifically: @itemize @item Initially, every typestate is empty. @item Each statement or expression's poststate is given the union of the its prestate, precondition, and postcondition. @item Each statement or expression's poststate has the difference between its precondition and postcondition removed. @item Each statement or expression's prestate is given the intersection of the poststates of every predecessor point in the CFG. @item The previous three steps are repeated until no typestates in the block change. @end itemize The typestate algorithm is a very conventional dataflow calculation, and can be performed using bit-set operations, with one bit per predicate and one bit-set per condition. After the typestates of a block are computed, the typestate algorithm checks that every constraint in the precondition of a statement is satisfied by its prestate. If any preconditions are not satisfied, the mismatch is considered a static (compile-time) error. @node Ref.Typestate.Check @subsection Ref.Typestate.Check @c * Ref.Typestate.Check:: Relating dynamic state to static typestate. @cindex Check statement @cindex Assertions, see @i{Check statement} The key mechanism that connects run-time semantics and compile-time analysis of typestates is the use of @code{check} expressions. @xref{Ref.Expr.Check}. A @code{check} expression guarantees that @emph{if} control were to proceed past it, the predicate associated with the @code{check} would have succeeded, so the constraint being checked @emph{statically} holds in subsequent points.@footnote{A @code{check} expression is similar to an @code{assert} call in a C program, with the significant difference that the Rust compiler @emph{tracks} the constraint that each @code{check} expression enforces. Naturally, @code{check} expressions cannot be omitted from a ``production build'' of a Rust program the same way @code{asserts} are frequently disabled in deployed C programs.} It is important to understand that the typestate system has @emph{no insight} into the meaning of a particular predicate. Predicates and constraints are not evaluated in any way at compile time. Predicates are treated as specific (but unknown) functions applied to specific (also unknown) slots. All the typestate system does is track which of those predicates -- whatever they calculate -- @emph{must have been checked already} in order for program control to reach a particular point in the CFG. The fundamental building block, therefore, is the @code{check} statement, which tells the typestate system ``if control passes this point, the checked predicate holds''. From this building block, constraints can be propagated to function signatures and constrained types, and the responsibility to @code{check} a constraint pushed further and further away from the site at which the program requires it to hold in order to execute properly. @page @node Ref.Stmt @section Ref.Stmt @c * Ref.Stmt:: Components of an executable block. @cindex Statements A @dfn{statement} is a component of a block, which is in turn a component of an outer block-expression, a function or an iterator. When a function is spawned into a task, the task @emph{executes} statements in an order determined by the body of the enclosing structure. Each statement causes the task to perform certain actions. Rust has two kinds of statement: declarations and expressions. A declaration serves to introduce a @emph{name} that can be used in the block @emph{scope} enclosing the statement: all statements before and after the name, from the previous opening curly-brace (@code{@{}) up to the next closing curly-brace (@code{@}}). An expression serves the dual roles of causing side effects and producing a @emph{value}. Expressions are said to @emph{evaluate to} a value, and the side effects are caused during @emph{evaluation}. Many expressions contain sub-expressions as operands; the definition of each kind of expression dictates whether or not, and in which order, it will evaluate its sub-expressions, and how the expression's value derives from the value of its sub-expressions. In this way, the structure of execution -- both the overall sequence of observable side effects and the final produced value -- is dictated by the structure of expressions. Blocks themselves are expressions, so the nesting sequence of block, statement, expression, and block can repeatedly nest to an arbitrary depth. @menu * Ref.Stmt.Decl:: Statement declaring an item or slot. * Ref.Stmt.Expr:: Statement evaluating an expression. @end menu @node Ref.Stmt.Decl @subsection Ref.Stmt.Decl @c * Ref.Stmt.Decl:: Statement declaring an item or slot. @cindex Declaration statement A @dfn{declaration statement} is one that introduces a @emph{name} into the enclosing statement block. The declared name may denote a new slot or a new item. The scope of the name extends to the entire containing block, both before and after the declaration. @menu * Ref.Stmt.Decl.Item:: Statement declaring an item. * Ref.Stmt.Decl.Slot:: Statement declaring a slot. @end menu @node Ref.Stmt.Decl.Item @subsubsection Ref.Stmt.Decl.Item @c * Ref.Stmt.Decl.Item:: Statement declaring an item. An @dfn{item declaration statement} has a syntactic form identical to an item declaration within a module. Declaring an item -- a function, iterator, object, type or module -- locally within a statement block is simply a way of restricting its scope to a narrow region containing all of its uses; it is otherwise identical in meaning to declaring the item outside the statement block. Note: there is no implicit capture of the function's dynamic environment when declaring a function-local item. @node Ref.Stmt.Decl.Slot @subsubsection Ref.Stmt.Decl.Slot @c * Ref.Stmt.Decl.Slot:: Statement declaring an slot. @cindex Local slot @cindex Variable, see @i{Local slot} @cindex Type inference A @code{slot declaration statement} has one one of two forms: @itemize @item @code{let} @var{pattern} @var{optional-init}; @item @code{let} @var{pattern} : @var{type} @var{optional-init}; @end itemize Where @var{type} is a type expression, @var{pattern} is an irrefutable pattern (often just the name of a single slot), and @var{optional-init} is an optional initializer. If present, the initializer consists of either an equals sign (@code{=}) or move operator (@code{<-}), followed by an expression. Both forms introduce a new slot into the containing block scope. The new slot is visible across the entire scope, but is initialized only at the point following the declaration statement. The former form, with no type annotation, causes the compiler to infer the static type of the slot through unification with the types of values assigned to the slot in the remaining code in the block scope. Inference only occurs on frame-local slots, not argument slots. Function, iterator and object signatures must always declared types for all argument slots. @xref{Ref.Mem.Slot}. @node Ref.Stmt.Expr @subsection Ref.Stmt.Expr @c * Ref.Stmt.Expr:: Statement evaluating an expression @cindex Expression statement An @dfn{expression statement} is one that evaluates an expression and drops its result. The purpose of an expression statement is often to cause the side effects of the expression's evaluation. @page @node Ref.Expr @section Ref.Expr @c * Ref.Expr:: Parsed and primitive expressions. @cindex Expressions @menu * Ref.Expr.Copy:: Expression for copying a value. * Ref.Expr.Call:: Expression for calling a function. * Ref.Expr.Bind:: Expression for binding arguments to functions. * Ref.Expr.Ret:: Expression for stopping and producing a value. @c * Ref.Expr.Be:: Expression for stopping and executing a tail call. * Ref.Expr.Put:: Expression for pausing and producing a value. * Ref.Expr.As:: Expression for casting a value to a different type. * Ref.Expr.Fail:: Expression for causing task failure. * Ref.Expr.Log:: Expression for logging values to diagnostic buffers. * Ref.Expr.Note:: Expression for logging values during failure. * Ref.Expr.While:: Expression for simple conditional looping. * Ref.Expr.Break:: Expression for terminating a loop. * Ref.Expr.Cont:: Expression for terminating a single loop iteration. * Ref.Expr.For:: Expression for looping over strings and vectors. * Ref.Expr.If:: Expression for simple conditional branching. * Ref.Expr.Alt:: Expression for complex conditional branching. * Ref.Expr.Prove:: Expression for static assertion of typestate. * Ref.Expr.Check:: Expression for dynamic assertion of typestate. * Ref.Expr.Claim:: Expression for static (unsafe) or dynamic assertion of typestate. * Ref.Expr.Assert:: Expression for halting the program if a boolean condition fails to hold. * Ref.Expr.IfCheck:: Expression for dynamic testing of typestate. * Ref.Expr.AnonObj:: Expression for extending objects with additional methods. @end menu @node Ref.Expr.Copy @subsection Ref.Expr.Copy @c * Ref.Expr.Copy:: Expression for copying a value. @cindex Copy expression @cindex Assignment operator, see @i{Copy expression} A @dfn{copy expression} consists of an @emph{lval} followed by an equals-sign (@code{=}) and a primitive expression. @xref{Ref.Expr}. Executing a copy expression causes the value denoted by the expression -- either a value or a primitive combination of values -- to be copied into the memory location denoted by the @emph{lval}. A copy may entail the adjustment of reference counts, execution of destructors, or similar adjustments in order to respect the path through the memory graph implied by the @code{lval}, as well as any existing value held in the memory being written-to. All such adjustment is automatic and implied by the @code{=} operator. An example of three different copy expressions: @example x = y; x.y = z; x.y = z + 2; @end example @node Ref.Expr.Call @subsection Ref.Expr.Call @c * Ref.Expr.Call:: Expression for calling a function. @cindex Call expression @cindex Function calls A @dfn{call expression} invokes a function, providing a tuple of input slots and an reference slot to serve as the function's output, bound to the @var{lval} on the right hand side of the call. If the function eventually returns, then the expression completes. A call expression statically requires that the precondition declared in the callee's signature is satisfied by the expression prestate. In this way, typestates propagate through function boundaries. @xref{Ref.Typestate}. An example of a call expression: @example let x: int = add(1, 2); @end example @node Ref.Expr.Bind @subsection Ref.Expr.Bind @c * Ref.Expr.Bind:: Expression for binding arguments to functions. @cindex Bind expression @cindex Closures @cindex Currying A @dfn{bind expression} constructs a new function from an existing function.@footnote{The @code{bind} expression is analogous to the @code{bind} expression in the Sather language.} The new function has zero or more of its arguments @emph{bound} into a new, hidden boxed tuple that holds the bindings. For each concrete argument passed in the @code{bind} expression, the corresponding parameter in the existing function is @emph{omitted} as a parameter of the new function. For each argument passed the placeholder symbol @code{_} in the @code{bind} expression, the corresponding parameter of the existing function is @emph{retained} as a parameter of the new function. Any subsequent invocation of the new function with residual arguments causes invocation of the existing function with the combination of bound arguments and residual arguments that was specified during the binding. An example of a @code{bind} expression: @example fn add(x: int, y: int) -> int @{ ret x + y; @} type single_param_fn = fn(int) -> int; let add4: single_param_fn = bind add(4, _); let add5: single_param_fn = bind add(_, 5); assert (add(4,5) == add4(5)); assert (add(4,5) == add5(4)); @end example A @code{bind} expression generally stores a copy of the bound arguments in the hidden, boxed tuple, owned by the resulting first-class function. For each bound slot in the bound function's signature, space is allocated in the hidden tuple and populated with a copy of the bound value. The @code{bind} expression is a lightweight mechanism for simulating the more elaborate construct of @emph{lexical closures} that exist in other languages. Rust has no support for lexical closures, but many realistic uses of them can be achieved with @code{bind} expressions. @node Ref.Expr.Ret @subsection Ref.Expr.Ret @c * Ref.Expr.Ret:: Expression for stopping and producing a value. @cindex Return expression Executing a @code{ret} expression@footnote{A @code{ret} expression is analogous to a @code{return} expression in the C family.} copies a value into the output slot of the current function, destroys the current function activation frame, and transfers control to the caller frame. An example of a @code{ret} expression: @example fn max(a: int, b: int) -> int @{ if a > b @{ ret a; @} ret b; @} @end example @ignore @node Ref.Expr.Be @subsection Ref.Expr.Be @c * Ref.Expr.Be:: Expression for stopping and executing a tail call. @cindex Be expression @cindex Tail calls Executing a @code{be} expression @footnote{A @code{be} expression in is analogous to a @code{become} expression in Newsqueak or Alef.} destroys the current function activation frame and replaces it with an activation frame for the called function. In other words, @code{be} executes a tail-call. The syntactic form of a @code{be} expression is therefore limited to @emph{tail position}: its argument must be a @emph{call expression}, and it must be the last expression in a block. An example of a @code{be} expression: @example fn print_loop(n: int) @{ if n <= 0 @{ ret; @} else @{ print_int(n); be print_loop(n-1); @} @} @end example The above example executes in constant space, replacing each frame with a new copy of itself. @end ignore @node Ref.Expr.Put @subsection Ref.Expr.Put @c * Ref.Expr.Put:: Expression for pausing and producing a value. @cindex Put expression @cindex Iterators Executing a @code{put} expression copies a value into the output slot of the current iterator, suspends execution of the current iterator, and transfers control to the current put-recipient frame. A @code{put} expression is only valid within an iterator. @footnote{A @code{put} expression is analogous to a @code{yield} expression in the CLU, and Sather languages, or in more recent languages providing a ``generator'' facility, such as Python, Javascript or C#. Like the generators of CLU and Sather but @emph{unlike} these later languages, Rust's iterators reside on the stack and obey a strict stack discipline.} The current put-recipient will eventually resume the suspended iterator containing the @code{put} expression, either continuing execution after the @code{put} expression, or terminating its execution and destroying the iterator frame. @node Ref.Expr.As @subsection Ref.Expr.As @c * Ref.Expr.As:: Expression for casting a value to a different type. @cindex As expression @cindex Cast @cindex Typecast Executing an @code{as} expression casts the value on the left-hand side to the type on the right-hand side. A numeric value can be cast to any numeric type. A native pointer value can be cast to or from any integral type or native pointer type. Any other cast is unsupported and will fail to compile. An example of an @code{as} expression: @example fn avg(v: [float]) -> float @{ let sum: float = sum(v); let sz: float = std::vec::len(v) as float; ret sum / sz; @} @end example @node Ref.Expr.Fail @subsection Ref.Expr.Fail @c * Ref.Expr.Fail:: Expression for causing task failure. @cindex Fail expression @cindex Failure @cindex Unwinding Executing a @code{fail} expression causes a task to enter the @emph{failing} state. In the @emph{failing} state, a task unwinds its stack, destroying all frames and freeing all resources until it reaches its entry frame, at which point it halts execution in the @emph{dead} state. @node Ref.Expr.Log @subsection Ref.Expr.Log @c * Ref.Expr.Log:: Expression for logging values to diagnostic buffers. @cindex Log expression @cindex Logging Executing a @code{log} expression may, depending on runtime configuration, cause a value to be appended to an internal diagnostic logging buffer provided by the runtime or emitted to a system console. Log expressions are enabled or disabled dynamically at run-time on a per-task and per-item basis. @xref{Ref.Run.Log}. @example @end example @node Ref.Expr.Note @subsection Ref.Expr.Note @c * Ref.Expr.Note:: Expression for logging values during failure. @cindex Note expression @cindex Logging @cindex Unwinding @cindex Failure A @code{note} expression has no effect during normal execution. The purpose of a @code{note} expression is to provide additional diagnostic information to the logging subsystem during task failure. @xref{Ref.Expr.Log}. Using @code{note} expressions, normal diagnostic logging can be kept relatively sparse, while still providing verbose diagnostic ``back-traces'' when a task fails. When a task is failing, control frames @emph{unwind} from the innermost frame to the outermost, and from the innermost lexical block within an unwinding frame to the outermost. When unwinding a lexical block, the runtime processes all the @code{note} expressions in the block sequentially, from the first expression of the block to the last. During processing, a @code{note} expression has equivalent meaning to a @code{log} expression: it causes the runtime to append the argument of the @code{note} to the internal logging diagnostic buffer. An example of a @code{note} expression: @example fn read_file_lines(path: str) -> [str] @{ note path; let r: [str]; let f: file = open_read(path); lines(f) @{|s| r += [s]; @} ret r; @} @end example In this example, if the task fails while attempting to open or read a file, the runtime will log the path name that was being read. If the function completes normally, the runtime will not log the path. A value that is marked by a @code{note} expression is @emph{not} copied aside when control passes through the @code{note}. In other words, if a @code{note} expression notes a particular @var{lval}, and code after the @code{note} mutates that slot, and then a subsequent failure occurs, the @emph{mutated} value will be logged during unwinding, @emph{not} the original value that was denoted by the @var{lval} at the moment control passed through the @code{note} expression. @node Ref.Expr.While @subsection Ref.Expr.While @c * Ref.Expr.While:: Expression for simple conditional looping. @cindex While expression @cindex Loops @cindex Control-flow A @code{while} expression is a loop construct. A @code{while} loop may be either a simple @code{while} or a @code{do}-@code{while} loop. In the case of a simple @code{while}, the loop begins by evaluating the boolean loop conditional expression. If the loop conditional expression evaluates to @code{true}, the loop body block executes and control returns to the loop conditional expression. If the loop conditional expression evaluates to @code{false}, the @code{while} expression completes. In the case of a @code{do}-@code{while}, the loop begins with an execution of the loop body. After the loop body executes, it evaluates the loop conditional expression. If it evaluates to @code{true}, control returns to the beginning of the loop body. If it evaluates to @code{false}, control exits the loop. An example of a simple @code{while} expression: @example while (i < 10) @{ print("hello\n"); i = i + 1; @} @end example An example of a @code{do}-@code{while} expression: @example do @{ print("hello\n"); i = i + 1; @} while (i < 10); @end example @node Ref.Expr.Break @subsection Ref.Expr.Break @c * Ref.Expr.Break:: Expression for terminating a loop. @cindex Break expression @cindex Loops @cindex Control-flow Executing a @code{break} expression immediately terminates the innermost loop enclosing it. It is only permitted in the body of a loop. @node Ref.Expr.Cont @subsection Ref.Expr.Cont @c * Ref.Expr.Cont:: Expression for terminating a single loop iteration. @cindex Continue expression @cindex Loops @cindex Control-flow Executing a @code{cont} expression immediately terminates the current iteration of the innermost loop enclosing it, returning control to the loop @emph{head}. In the case of a @code{while} loop, the head is the conditional expression controlling the loop. In the case of a @code{for} or @code{for each} loop, the head is the iterator or vector-element increment controlling the loop. A @code{cont} expression is only permitted in the body of a loop. @node Ref.Expr.For @subsection Ref.Expr.For @c * Ref.Expr.For:: Expression for looping over strings and vectors. @cindex For expression @cindex Loops @cindex Control-flow A @dfn{for loop} is controlled by a vector or string. The for loop bounds-checks the underlying sequence @emph{once} when initiating the loop, then repeatedly copies each value of the underlying sequence into the element variable, executing the loop body once per copy. Example a for loop: @example let v: [foo] = [a, b, c]; for e: foo in v @{ bar(e); @} @end example @node Ref.Expr.If @subsection Ref.Expr.If @c * Ref.Expr.If:: Expression for simple conditional branching. @cindex If expression @cindex Control-flow An @code{if} expression is a conditional branch in program control. The form of an @code{if} expression is a condition expression, followed by a consequent block, any number of @code{else if} conditions and blocks, and an optional trailing @code{else} block. The condition expressions must have type @code{bool}. If a condition expression evaluates to @code{true}, the consequent block is executed and any subsequent @code{else if} or @code{else} block is skipped. If a condition expression evaluates to @code{false}, the consequent block is skipped and any subsequent @code{else if} condition is evaluated. If all @code{if} and @code{else if} conditions evaluate to @code{false} then any @code{else} block is executed. @node Ref.Expr.Alt @subsection Ref.Expr.Alt @c * Ref.Expr.Alt:: Expression for complex conditional branching. @cindex Alt expression @cindex Control-flow @cindex Switch expression, see @i{Alt expression} An @code{alt} expression is a multi-directional branch in program control. There are two kinds of @code{alt} expression: pattern @code{alt} expressions and @code{alt type} expressions. The form of each kind of @code{alt} is similar: an initial @emph{head} that describes the criteria for branching, followed by a sequence of zero or more @emph{arms}, each of which describes a @emph{case} and provides a @emph{block} of expressions associated with the case. When an @code{alt} is executed, control enters the head, determines which of the cases to branch to, branches to the block associated with the chosen case, and then proceeds to the expression following the @code{alt} when the case block completes. @menu * Ref.Expr.Alt.Pat:: Expression for branching on pattern matches. * Ref.Expr.Alt.Type:: Expression for branching on types. @end menu @node Ref.Expr.Alt.Pat @subsubsection Ref.Expr.Alt.Pat @c * Ref.Expr.Alt.Pat:: Expression for branching on pattern matches. @cindex Pattern alt expression @cindex Control-flow A pattern @code{alt} expression branches on a @emph{pattern}. The exact form of matching that occurs depends on the pattern. Patterns consist of some combination of literals, destructured tag constructors, records and tuples, variable binding specifications and placeholders (@code{_}). A pattern @code{alt} has a @emph{head expression}, which is the value to compare to the patterns. The type of the patterns must equal the type of the head expression. To execute a pattern @code{alt} expression, first the head expression is evaluated, then its value is sequentially compared to the patterns in the arms until a match is found. The first arm with a matching pattern is chosen as the branch target of the @code{alt}, any variables bound by the pattern are assigned to local slots in the arm's block, and control enters the block. An example of a pattern @code{alt} expression: @example tag list @{ nil; cons(X, @@list); @} let x: list = cons(10, @@cons(11, @@nil)); alt x @{ cons(a, @@cons(b, _)) @{ process_pair(a,b); @} cons(10, _) @{ process_ten(); @} nil. @{ ret; @} _ @{ fail; @} @} @end example Note in the above example that @code{nil} is followed by a period. This is required syntax for pattern matching a nullary tag variant, to distingush the variant @code{nil} from a binding to variable @code{nil}. Without the period the value of @code{x} would be bound to variable @code{nil} and the compiler would issue an error about the final wildcard case being unreachable. Records can also be pattern-matched and their fields bound to variables. When matching fields of a record, the fields being matched are specified first, then a placeholder (@code{_}) represents the remaining fields. @example fn main() @{ let r = @{ player: "ralph", stats: load_stats(), options: @{ choose: true, size: "small" @} @}; alt r @{ @{options: @{choose: true, _@}, _@} @{ choose_player(r) @} @{player: p, options: @{size: "small", _@}, _@} @{ log p + " is small"; @} _ @{ next_player(); @} @} @} @end example Multiple alternative patterns may be joined with the @code{|} operator. A range of values may be specified with @code{to}. For example: @example let message = alt x @{ 0 | 1 @{ "not many" @} 2 to 9 @{ "a few" @} _ @{ "lots" @} @} @end example Finally, alt patterns can accept @emph{pattern guards} to further refine the criteria for matching a case. Pattern guards appear after the pattern and consist of a bool-typed expression following the @emph{when} keyword. A pattern guard may refer to the variables bound within the pattern they follow. @example let message = alt maybe_digit @{ some(x) when x < 10 @{ process_digit(x) @} some(x) @{ process_other(x) @} @} @end example @node Ref.Expr.Alt.Type @subsubsection Ref.Expr.Alt.Type @c * Ref.Expr.Alt.Type:: Expression for branching on type. @cindex Type alt expression @cindex Control-flow An @code{alt type} expression is similar to a pattern @code{alt}, but branches on the @emph{type} of its head expression, rather than the value. The head expression of an @code{alt type} expression must be of type @code{any}, and the arms of the expression are slot patterns rather than value patterns. Control branches to the arm with a @code{case} that matches the @emph{actual type} of the value in the @code{any}. An example of an @code{alt type} expression: @example let x: any = foo(); alt type (x) @{ case (int i) @{ ret i; @} case (list li) @{ ret int_list_sum(li); @} case (list lx) @{ ret list_len(lx); @} case (_) @{ ret 0; @} @} @end example @node Ref.Expr.Prove @subsection Ref.Expr.Prove @c * Ref.Expr.Prove:: Expression for static assertion of typestate. @cindex Prove expression @cindex Typestate system A @code{prove} expression has no run-time effect. Its purpose is to statically check (and document) that its argument constraint holds at its expression entry point. If its argument typestate does not hold, under the typestate algorithm, the program containing it will fail to compile. @node Ref.Expr.Check @subsection Ref.Expr.Check @c * Ref.Expr.Check:: Expression for dynamic assertion of typestate. @cindex Check expression @cindex Typestate system A @code{check} expression connects dynamic assertions made at run-time to the static typestate system. A @code{check} expression takes a constraint to check at run-time. If the constraint holds at run-time, control passes through the @code{check} and on to the next expression in the enclosing block. If the condition fails to hold at run-time, the @code{check} expression behaves as a @code{fail} expression. The typestate algorithm is built around @code{check} expressions, and in particular the fact that control @emph{will not pass} a check expression with a condition that fails to hold. The typestate algorithm can therefore assume that the (static) postcondition of a @code{check} expression includes the checked constraint itself. From there, the typestate algorithm can perform dataflow calculations on subsequent expressions, propagating conditions forward and statically comparing implied states and their specifications. @xref{Ref.Typestate}. @example pure fn even(x: int) -> bool @{ ret x & 1 == 0; @} fn print_even(x: int) : even(x) @{ print(x); @} fn test() @{ let y: int = 8; // Cannot call print_even(y) here. check even(y); // Can call print_even(y) here, since even(y) now holds. print_even(y); @} @end example @node Ref.Expr.Claim @subsection Ref.Expr.Claim @c * Ref.Expr.Claim:: Expression for static (unsafe) or dynamic assertion of typestate. @cindex Claim expression @cindex Typestate system A @code{claim} expression is an unsafe variant on a @code{check} expression that is not actually checked at runtime. Thus, using a @code{claim} implies a proof obligation to ensure---without compiler assistance---that an assertion always holds. Setting a runtime flag can turn all @code{claim} expressions into @code{check} expressions in a compiled Rust program, but the default is to not check the assertion contained in a @code{claim}. The idea behind @code{claim} is that performance profiling might identify a few bottlenecks in the code where actually checking a given callee's predicate is too expensive; @code{claim} allows the code to typecheck without removing the predicate check at every other call site. @node Ref.Expr.IfCheck @subsection Ref.Expr.IfCheck @c * Ref.Expr.IfCheck:: Expression for dynamic testing of typestate. @cindex If check expression @cindex Typestate system @cindex Control-flow An @code{if check} expression combines a @code{if} expression and a @code{check} expression in an indivisible unit that can be used to build more complex conditional control-flow than the @code{check} expression affords. In fact, @code{if check} is a ``more primitive'' expression than @code{check}; instances of the latter can be rewritten as instances of the former. The following two examples are equivalent: @sp 1 Example using @code{check}: @example check even(x); print_even(x); @end example @sp 1 Equivalent example using @code{if check}: @example if check even(x) @{ print_even(x); @} else @{ fail; @} @end example @node Ref.Expr.Assert @subsection Ref.Expr.Assert @c * Ref.Expr.Assert:: Expression that halts the program if a boolean condition fails to hold. @cindex Assertions An @code{assert} expression is similar to a @code{check} expression, except the condition may be any boolean-typed expression, and the compiler makes no use of the knowledge that the condition holds if the program continues to execute after the @code{assert}. @node Ref.Expr.AnonObj @subsection Ref.Expr.AnonObj @c * Ref.Expr.AnonObj:: Expression that extends an object with additional methods. @cindex Anonymous objects An @emph{anonymous object} expression extends an existing object with methods. @page @node Ref.Run @section Ref.Run @c * Ref.Run:: Organization of runtime services. @cindex Runtime library The Rust @dfn{runtime} is a relatively compact collection of C and Rust code that provides fundamental services and datatypes to all Rust tasks at run-time. It is smaller and simpler than many modern language runtimes. It is tightly integrated into the language's execution model of memory, tasks, communication, reflection, logging and signal handling. @menu * Ref.Run.Mem:: Runtime memory management service. * Ref.Run.Type:: Runtime built-in type services. * Ref.Run.Comm:: Runtime communication service. * Ref.Run.Log:: Runtime logging system. * Ref.Run.Sig:: Runtime signal handler. @end menu @node Ref.Run.Mem @subsection Ref.Run.Mem @c * Ref.Run.Mem:: Runtime memory management service. @cindex Memory allocation The runtime memory-management system is based on a @emph{service-provider interface}, through which the runtime requests blocks of memory from its environment and releases them back to its environment when they are no longer in use. The default implementation of the service-provider interface consists of the C runtime functions @code{malloc} and @code{free}. The runtime memory-management system in turn supplies Rust tasks with facilities for allocating, extending and releasing stacks, as well as allocating and freeing boxed values. @node Ref.Run.Type @subsection Ref.Run.Type @c * Ref.Run.Mem:: Runtime built-in type services. @cindex Built-in types The runtime provides C and Rust code to assist with various built-in types, such as vectors, strings, bignums, and the low level communication system (ports, channels, tasks). Support for other built-in types such as simple types, tuples, records, and tags is open-coded by the Rust compiler. @node Ref.Run.Comm @subsection Ref.Run.Comm @c * Ref.Run.Comm:: Runtime communication service. @cindex Communication @cindex Process @cindex Thread The runtime provides code to manage inter-task communication. This includes the system of task-lifecycle state transitions depending on the contents of queues, as well as code to copy values between queues and their recipients and to serialize values for transmission over operating-system inter-process communication facilities. @node Ref.Run.Log @subsection Ref.Run.Log @c * Ref.Run.Log:: Runtime logging system. @cindex Logging The runtime contains a system for directing logging expressions to a logging console and/or internal logging buffers. @xref{Ref.Expr.Log}. Logging expressions can be enabled per module. Logging output is enabled by setting the @code{RUST_LOG} environment variable. @code{RUST_LOG} accepts a logging specification that is a comma-separated list of paths. For each module containing log statements, if @code{RUST_LOG} contains the path to that module or a parent of that module, then its logs will be output to the console. The path to an module consists of the crate name, any parent modules, then the module itself, all separated by double colons (@code{::}). As an example, to see all the logs generated by the compiler, you would set @code{RUST_LOG} to @code{rustc}, which is the crate name (as specified in its @code{link} attribute). @xref{Ref.Comp.Crate}. To narrow down the logs to just crate resolution, you would set it to @code{rustc::metadata::creader}. Note that when compiling either .rs or .rc files that don't specifiy a crate name the crate is given a default name that matches the source file, sans extension. In that case, to turn on logging for a program compiled from, e.g. helloworld.rs, @code{RUST_LOG} should be set to @code{helloworld}. As a convenience, the logging spec can also be set to a special psuedo-crate, @code{::help}. In this case, when the application starts, the runtime will simply output a list of loaded modules containing log statements, then exit. The Rust runtime itself generates logging information. The runtime's logs are generated for a number of artificial modules in the @code{::rt} psuedo-crate, and can be enabled just like the logs for any standard module. The full list of runtime logging modules follows. @itemize @item @code{::rt::mem} Memory management @item @code{::rt::comm} Messaging and task communication @item @code{::rt::task} Task management @item @code{::rt::dom} Task scheduling @item @code{::rt::trace} Unused @item @code{::rt::cache} Type descriptor cache @item @code{::rt::upcall} Compiler-generated runtime calls @item @code{::rt::timer} The scheduler timer @item @code{::rt::gc} Garbage collection @item @code{::rt::stdlib} Functions used directly by the standard library @item @code{::rt::kern} The runtime kernel @item @code{::rt::backtrace} Unused @item @code{::rt::callback} Unused @end itemize @node Ref.Run.Sig @subsection Ref.Run.Sig @c * Ref.Run.Sig:: Runtime signal handler. @cindex Signals The runtime signal-handling system is driven by a signal-dispatch table and a signal queue associated with each task. Sending a signal to a task inserts the signal into the task's signal queue and marks the task as having a pending signal. At the next scheduling opportunity, the runtime processes signals in the task's queue using its dispatch table. The signal queue memory is charged to the task; if the queue grows too big, the task will fail. @c ############################################################ @c end main body of nodes @c ############################################################ @page @node Index @chapter Index @printindex cp @bye @c Local Variables: @c mode: texinfo @c fill-column: 78; @c indent-tabs-mode: nil @c buffer-file-coding-system: utf-8-unix @c compile-command: "make -C $RBUILD -k 2>&1 | sed -e 's/\\/x\\//x:\\//g'"; @c End: