From 0aa42bc91e08faac450d1850f3b549b78382ccf4 Mon Sep 17 00:00:00 2001 From: Tim Chevalier Date: Tue, 9 Oct 2012 14:40:23 -0700 Subject: [PATCH] Copyedit the macro tutorial I hope I haven't introduced any grievous errors :-) --- doc/tutorial-macros.md | 117 +++++++++++++++++++++++++---------------- 1 file changed, 73 insertions(+), 44 deletions(-) diff --git a/doc/tutorial-macros.md b/doc/tutorial-macros.md index 8202fbe8dc5..31f260076f0 100644 --- a/doc/tutorial-macros.md +++ b/doc/tutorial-macros.md @@ -2,10 +2,11 @@ # Introduction -Functions are the programmer's primary tool of abstraction, but there are -cases in which they are insufficient, because the programmer wants to -abstract over concepts not represented as values. Consider the following -example: +Functions are the primary tool that programmers can use to build +abstractions. Sometimes, though, programmers want to abstract over +compile-time, syntactic structures rather than runtime values. For example, +the following two code fragments both pattern-match on their input and return +early in one case, doing nothing otherwise: ~~~~ # enum t { special_a(uint), special_b(uint) }; @@ -24,11 +25,12 @@ match input_2 { # } ~~~~ -This code could become tiresome if repeated many times. However, there is -no reasonable function that could be written to solve this problem. In such a -case, it's possible to define a macro to solve the problem. Macros are +This code could become tiresome if repeated many times. However, there is no +straightforward way to rewrite it without the repeated code, using functions +alone. There is a solution, though: defining a macro to solve the problem. Macros are lightweight custom syntax extensions, themselves defined using the -`macro_rules!` syntax extension: +`macro_rules!` syntax extension. The following `early_return` macro captures +the pattern in the above code: ~~~~ # enum t { special_a(uint), special_b(uint) }; @@ -42,7 +44,12 @@ macro_rules! early_return( } ); ); -// ... +~~~~ + +Now, we can replace each `match` with an invocation of the `early_return` +macro: + +~~~~ early_return!(input_1 special_a); // ... early_return!(input_2 special_b); @@ -50,48 +57,72 @@ early_return!(input_2 special_b); # } ~~~~ -Macros are defined in pattern-matching style: +Macros are defined in pattern-matching style: in the above example, the text +`($inp:expr $sp:ident)` that appears on the left-hand side of the `=>` is the +*macro invocation syntax*, a pattern denoting how to write a call to the +macro. The text on the right-hand side of the `=>`, beginning with `match +$inp`, is the *macro transcription syntax*: what the macro expands to. # Invocation syntax -On the left-hand-side of the `=>` is the macro invocation syntax. It is -free-form, excepting the following rules: +The macro invocation syntax specifies the syntax for the arguments to the +macro. It appears on the left-hand side of the `=>` in a macro definition. It +conforms to the following rules: -1. It must be surrounded in parentheses. +1. It must be surrounded by parentheses. 2. `$` has special meaning. 3. The `()`s, `[]`s, and `{}`s it contains must balance. For example, `([)` is forbidden. +Otherwise, the invocation syntax is free-form. + To take as an argument a fragment of Rust code, write `$` followed by a name - (for use on the right-hand side), followed by a `:`, followed by the sort of -fragment to match (the most common ones are `ident`, `expr`, `ty`, `pat`, and -`block`). Anything not preceded by a `$` is taken literally. The standard + (for use on the right-hand side), followed by a `:`, followed by a *fragment + specifier*. The fragment specifier denotes the sort of fragment to match. The + most common fragment specifiers are: + +* `ident` (an identifier, referring to a variable or item. Examples: `f`, `x`, + `foo`.) +* `expr` (an expression. Examples: `2 + 2`; `if true then { 1 } else { 2 }`; + `f(42)`.) +* `ty` (a type. Examples: `int`, `~[(char, ~str)]`, `&T`.) +* `pat` (a pattern, usually appearing in a `match` or on the left-hand side of + a declaration. Examples: `Some(t)`; `(17, 'a')`; `_`.) +* `block` (a sequence of actions. Example: `{ log(error, "hi"); return 12; }`) + +The parser interprets any token that's not preceded by a `$` literally. Rust's usual rules of tokenization apply, -So `($x:ident => (($e:expr)))`, though excessively fancy, would create a macro -that could be invoked like `my_macro!(i=>(( 2+2 )))`. +So `($x:ident -> (($e:expr)))`, though excessively fancy, would designate a macro +that could be invoked like: `my_macro!(i->(( 2+2 )))`. # Transcription syntax The right-hand side of the `=>` follows the same rules as the left-hand side, -except that `$` need only be followed by the name of the syntactic fragment -to transcribe. +except that a `$` need only be followed by the name of the syntactic fragment +to transcribe into the macro expansion; its type need not be repeated. -The right-hand side must be surrounded by delimiters of some kind, and must be -an expression; currently, user-defined macros can only be invoked in -expression position (even though `macro_rules!` itself can be in item -position). +The right-hand side must be enclosed by delimiters, and must be +an expression. Currently, invocations of user-defined macros can only appear in a context +where the Rust grammar requires an expression, even though `macro_rules!` itself can appear +in a context where the grammar requires an item. # Multiplicity ## Invocation -Going back to the motivating example, suppose that we wanted each invocation -of `early_return` to potentially accept multiple "special" identifiers. The -syntax `$(...)*` accepts zero or more occurrences of its contents, much like -the Kleene star operator in regular expressions. It also supports a separator -token (a comma-separated list could be written `$(...),*`), and `+` instead of -`*` to mean "at least one". +Going back to the motivating example, recall that `early_return` expanded into +a `match` that would `return` if the `match`'s scrutinee matched the +"special case" identifier provided as the second argument to `early_return`, +and do nothing otherwise. Now suppose that we wanted to write a +version of `early_return` that could handle a variable number of "special" +cases. + +The syntax `$(...)*` on the left-hand side of the `=>` in a macro definition +accepts zero or more occurrences of its contents. It works much +like the `*` operator in regular expressions. It also supports a +separator token (a comma-separated list could be written `$(...),*`), and `+` +instead of `*` to mean "at least one". ~~~~ # enum t { special_a(uint),special_b(uint),special_c(uint),special_d(uint)}; @@ -118,37 +149,35 @@ early_return!(input_2, [special_b]); ### Transcription As the above example demonstrates, `$(...)*` is also valid on the right-hand -side of a macro definition. The behavior of Kleene star in transcription, -especially in cases where multiple stars are nested, and multiple different +side of a macro definition. The behavior of `*` in transcription, +especially in cases where multiple `*`s are nested, and multiple different names are involved, can seem somewhat magical and intuitive at first. The system that interprets them is called "Macro By Example". The two rules to keep in mind are (1) the behavior of `$(...)*` is to walk through one "layer" of repetitions for all of the `$name`s it contains in lockstep, and (2) each `$name` must be under at least as many `$(...)*`s as it was matched against. -If it is under more, it'll will be repeated, as appropriate. +If it is under more, it'll be repeated, as appropriate. ## Parsing limitations -The parser used by the macro system is reasonably powerful, but the parsing of -Rust syntax is restricted in two ways: +The macro parser will parse Rust syntax with two limitations: 1. The parser will always parse as much as possible. For example, if the comma were omitted from the syntax of `early_return!` above, `input_1 [` would've been interpreted as the beginning of an array index. In fact, invoking the macro would have been impossible. 2. The parser must have eliminated all ambiguity by the time it reaches a -`$name:fragment_specifier`. This most often affects them when they occur in -the beginning of, or immediately after, a `$(...)*`; requiring a distinctive +`$name:fragment_specifier` declaration. This limitation can result in parse +errors when declarations occur at the beginning of, or immediately after, +a `$(...)*`. Changing the invocation syntax to require a distinctive token in front can solve the problem. ## A final note Macros, as currently implemented, are not for the faint of heart. Even -ordinary syntax errors can be more difficult to debug when they occur inside -a macro, and errors caused by parse problems in generated code can be very +ordinary syntax errors can be more difficult to debug when they occur inside a +macro, and errors caused by parse problems in generated code can be very tricky. Invoking the `log_syntax!` macro can help elucidate intermediate -states, using `trace_macros!(true)` will automatically print those -intermediate states out, and using `--pretty expanded` as an argument to the -compiler will show the result of expansion. - - +states, invoking `trace_macros!(true)` will automatically print those +intermediate states out, and passing the flag `--pretty expanded` as a +command-line argument to the compiler will show the result of expansion.