Remove legacy grammar

2019-09-29 16:04:14 +02:00 · 2019-09-29 16:04:14 +02:00 · 96c8049b20
parent d046ffddc4
commit 96c8049b20
8 changed files with 4 additions and 3566 deletions
--- a/src/doc/grammar.md
+++ b/src/doc/grammar.md
@ -1,812 +1,7 @@
 % Grammar

-# Introduction
+The Rust grammar may now be found in the [reference]. Additionally, the [grammar
+working group] is working on producing a testable grammar.

-This document is the primary reference for the Rust programming language grammar. It
-provides only one kind of material:
-
-  - Chapters that formally define the language grammar.
-
-This document does not serve as an introduction to the language. Background
-familiarity with the language is assumed. A separate [guide] is available to
-help acquire such background.
-
-This document also does not serve as a reference to the [standard] library
-included in the language distribution. Those libraries are documented
-separately by extracting documentation attributes from their source code. Many
-of the features that one might expect to be language features are library
-features in Rust, so what you're looking for may be there, not here.
-
-[guide]: guide.html
-[standard]: std/index.html
-
-# Notation
-
-Rust's grammar is defined over Unicode codepoints, each conventionally denoted
-`U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
-confined to the ASCII range of Unicode, and is described in this document by a
-dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
-supported by common automated LL(k) parsing tools such as `llgen`, rather than
-the dialect given in ISO 14977. The dialect can be defined self-referentially
-as follows:
-
-```antlr
-grammar : rule + ;
-rule    : nonterminal ':' productionrule ';' ;
-productionrule : production [ '|' production ] * ;
-production : term * ;
-term : element repeats ;
-element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
-repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
-```
-
-Where:
-
- Whitespace in the grammar is ignored.
- Square brackets are used to group rules.
- `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
-  ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
-  Unicode codepoint `U+00QQ`.
- `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
- The `repeat` forms apply to the adjacent `element`, and are as follows:
-  - `?` means zero or one repetition
-  - `*` means zero or more repetitions
-  - `+` means one or more repetitions
-  - NUMBER trailing a repeat symbol gives a maximum repetition count
-  - NUMBER on its own gives an exact repetition count
-
-This EBNF dialect should hopefully be familiar to many readers.
-
-## Unicode productions
-
-A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
-range. We define these productions in terms of character properties specified
-in the Unicode standard, rather than in terms of ASCII-range codepoints. The
-section [Special Unicode Productions](#special-unicode-productions) lists these
-productions.
-
-## String table productions
-
-Some rules in the grammar &mdash; notably [unary
-operators](#unary-operator-expressions), [binary
-operators](#binary-operator-expressions), and [keywords](#keywords) &mdash; are
-given in a simplified form: as a listing of a table of unquoted, printable
-whitespace-separated strings. These cases form a subset of the rules regarding
-the [token](#tokens) rule, and are assumed to be the result of a
-lexical-analysis phase feeding the parser, driven by a DFA, operating over the
-disjunction of all such string table entries.
-
-When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
-it is an implicit reference to a single member of such a string table
-production. See [tokens](#tokens) for more information.
-
-# Lexical structure
-
-## Input format
-
-Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
-Most Rust grammar rules are defined in terms of printable ASCII-range
-codepoints, but a small number are defined in terms of Unicode properties or
-explicit codepoint lists. [^inputformat]
-
-[^inputformat]: Substitute definitions for the special Unicode productions are
-  provided to the grammar verifier, restricted to ASCII range, when verifying the
-  grammar in this document.
-
-## Special Unicode Productions
-
-The following productions in the Rust grammar are defined in terms of Unicode
-properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and
-`non_double_quote`.
-
-### Identifiers
-
-The `ident` production is any nonempty Unicode string of
-the following form:
-
- The first character is in one of the following ranges `U+0041` to `U+005A`
-("A" to "Z"), `U+0061` to `U+007A` ("a" to "z"), or `U+005F` ("\_").
- The remaining characters are in the range `U+0030` to `U+0039` ("0" to "9"),
-or any of the prior valid initial characters.
-
-as long as the identifier does _not_ occur in the set of [keywords](#keywords).
-
-### Delimiter-restricted productions
-
-Some productions are defined by exclusion of particular Unicode characters:
-
- `non_null` is any single Unicode character aside from `U+0000` (null)
- `non_eol` is any single Unicode character aside from `U+000A` (`'\n'`)
- `non_single_quote` is any single Unicode character aside from `U+0027`  (`'`)
- `non_double_quote` is any single Unicode character aside from `U+0022` (`"`)
-
-## Comments
-
-```antlr
-comment : block_comment | line_comment ;
-block_comment : "/*" block_comment_body * "*/" ;
-block_comment_body : [block_comment | character] * ;
-line_comment : "//" non_eol * ;
-```
-
-**FIXME:** add doc grammar?
-
-## Whitespace
-
-```antlr
-whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
-whitespace : [ whitespace_char | comment ] + ;
-```
-
-## Tokens
-
-```antlr
-simple_token : keyword | unop | binop ;
-token : simple_token | ident | literal | symbol | whitespace token ;
-```
-
-### Keywords
-
-<p id="keyword-table-marker"></p>
-
-|          |          |          |          |          |
-|----------|----------|----------|----------|----------|
-| _        | abstract | alignof  | as       | become   |
-| box      | break    | const    | continue | crate    |
-| do       | else     | enum     | extern   | false    |
-| final    | fn       | for      | if       | impl     |
-| in       | let      | loop     | macro    | match    |
-| mod      | move     | mut      | offsetof | override |
-| priv     | proc     | pub      | pure     | ref      |
-| return   | Self     | self     | sizeof   | static   |
-| struct   | super    | trait    | true     | type     |
-| typeof   | unsafe   | unsized  | use      | virtual  |
-| where    | while    | yield    |          |          |
-
-
-Each of these keywords has special meaning in its grammar, and all of them are
-excluded from the `ident` rule.
-
-Not all of these keywords are used by the language. Some of them were used
-before Rust 1.0, and were left reserved once their implementations were
-removed. Some of them were reserved before 1.0 to make space for possible
-future features.
-
-### Literals
-
-```antlr
-lit_suffix : ident;
-literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?;
-```
-
-The optional `lit_suffix` production is only used for certain numeric literals,
-but is reserved for future extension. That is, the above gives the lexical
-grammar, but a Rust parser will reject everything but the 12 special cases
-mentioned in [Number literals](reference/tokens.html#number-literals) in the
-reference.
-
-#### Character and string literals
-
-```antlr
-char_lit : '\x27' char_body '\x27' ;
-string_lit : '"' string_body * '"' | 'r' raw_string ;
-
-char_body : non_single_quote
-          | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
-
-string_body : non_double_quote
-            | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
-raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
-
-common_escape : '\x5c'
-              | 'n' | 'r' | 't' | '0'
-              | 'x' hex_digit 2
-unicode_escape : 'u' '{' hex_digit+ 6 '}';
-
-hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
-          | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
-          | dec_digit ;
-oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
-dec_digit : '0' | nonzero_dec ;
-nonzero_dec: '1' | '2' | '3' | '4'
-           | '5' | '6' | '7' | '8' | '9' ;
-```
-
-#### Byte and byte string literals
-
-```antlr
-byte_lit : "b\x27" byte_body '\x27' ;
-byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
-
-byte_body : ascii_non_single_quote
-          | '\x5c' [ '\x27' | common_escape ] ;
-
-byte_string_body : ascii_non_double_quote
-            | '\x5c' [ '\x22' | common_escape ] ;
-raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
-
-```
-
-#### Number literals
-
-```antlr
-num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
-        | '0' [       [ dec_digit | '_' ] * float_suffix ?
-              | 'b'   [ '1' | '0' | '_' ] +
-              | 'o'   [ oct_digit | '_' ] +
-              | 'x'   [ hex_digit | '_' ] +  ] ;
-
-float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
-
-exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
-dec_lit : [ dec_digit | '_' ] + ;
-```
-
-#### Boolean literals
-
-```antlr
-bool_lit : [ "true" | "false" ] ;
-```
-
-The two values of the boolean type are written `true` and `false`.
-
-### Symbols
-
-```antlr
-symbol : "::" | "->"
-       | '#' | '[' | ']' | '(' | ')' | '{' | '}'
-       | ',' | ';' ;
-```
-
-Symbols are a general class of printable [tokens](#tokens) that play structural
-roles in a variety of grammar productions. They are cataloged here for
-completeness as the set of remaining miscellaneous printable tokens that do not
-otherwise appear as [unary operators](#unary-operator-expressions), [binary
-operators](#binary-operator-expressions), or [keywords](#keywords).
-
-## Paths
-
-```antlr
-expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
-expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
-               | expr_path ;
-
-type_path : ident [ type_path_tail ] + ;
-type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
-               | "::" type_path ;
-```
-
-# Syntax extensions
-
-## Macros
-
-```antlr
-expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';'
-                 | "macro_rules" '!' ident '{' macro_rule * '}' ;
-macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
-matcher : '(' matcher * ')' | '[' matcher * ']'
-        | '{' matcher * '}' | '$' ident ':' ident
-        | '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
-        | non_special_token ;
-transcriber : '(' transcriber * ')' | '[' transcriber * ']'
-            | '{' transcriber * '}' | '$' ident
-            | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
-            | non_special_token ;
-```
-
-# Crates and source files
-
-**FIXME:** grammar? What production covers #![crate_id = "foo"] ?
-
-# Items and attributes
-
-**FIXME:** grammar?
-
-## Items
-
-```antlr
-item : vis ? mod_item | fn_item | type_item | struct_item | enum_item
-     | const_item | static_item | trait_item | impl_item | extern_block_item ;
-```
-
-### Type Parameters
-
-**FIXME:** grammar?
-
-### Modules
-
-```antlr
-mod_item : "mod" ident ( ';' | '{' mod '}' );
-mod : [ view_item | item ] * ;
-```
-
-#### View items
-
-```antlr
-view_item : extern_crate_decl | use_decl ';' ;
-```
-
-##### Extern crate declarations
-
-```antlr
-extern_crate_decl : "extern" "crate" crate_name
-crate_name: ident | ( ident "as" ident )
-```
-
-##### Use declarations
-
-```antlr
-use_decl : vis ? "use" [ path "as" ident
-                        | path_glob ] ;
-
-path_glob : ident [ "::" [ path_glob
-                          | '*' ] ] ?
-          | '{' path_item [ ',' path_item ] * '}' ;
-
-path_item : ident | "self" ;
-```
-
-### Functions
-
-**FIXME:** grammar?
-
-#### Generic functions
-
-**FIXME:** grammar?
-
-#### Unsafety
-
-**FIXME:** grammar?
-
-##### Unsafe functions
-
-**FIXME:** grammar?
-
-##### Unsafe blocks
-
-**FIXME:** grammar?
-
-#### Diverging functions
-
-**FIXME:** grammar?
-
-### Type definitions
-
-**FIXME:** grammar?
-
-### Structures
-
-**FIXME:** grammar?
-
-### Enumerations
-
-**FIXME:** grammar?
-
-### Constant items
-
-```antlr
-const_item : "const" ident ':' type '=' expr ';' ;
-```
-
-### Static items
-
-```antlr
-static_item : "static" ident ':' type '=' expr ';' ;
-```
-
-#### Mutable statics
-
-**FIXME:** grammar?
-
-### Traits
-
-**FIXME:** grammar?
-
-### Implementations
-
-**FIXME:** grammar?
-
-### External blocks
-
-```antlr
-extern_block_item : "extern" '{' extern_block '}' ;
-extern_block : [ foreign_fn ] * ;
-```
-
-## Visibility and Privacy
-
-```antlr
-vis : "pub" ;
-```
-### Re-exporting and Visibility
-
-See [Use declarations](#use-declarations).
-
-## Attributes
-
-```antlr
-attribute : '#' '!' ? '[' meta_item ']' ;
-meta_item : ident [ '=' literal
-                  | '(' meta_seq ')' ] ? ;
-meta_seq : meta_item [ ',' meta_seq ] ? ;
-```
-
-# Statements and expressions
-
-## Statements
-
-```antlr
-stmt : decl_stmt | expr_stmt | ';' ;
-```
-
-### Declaration statements
-
-```antlr
-decl_stmt : item | let_decl ;
-```
-
-#### Item declarations
-
-See [Items](#items).
-
-#### Variable declarations
-
-```antlr
-let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
-init : [ '=' ] expr ;
-```
-
-### Expression statements
-
-```antlr
-expr_stmt : expr ';' ;
-```
-
-## Expressions
-
-```antlr
-expr : literal | path | tuple_expr | unit_expr | struct_expr
-     | block_expr | method_call_expr | field_expr | array_expr
-     | idx_expr | range_expr | unop_expr | binop_expr
-     | paren_expr | call_expr | lambda_expr | while_expr
-     | loop_expr | break_expr | continue_expr | for_expr
-     | if_expr | match_expr | if_let_expr | while_let_expr
-     | return_expr ;
-```
-
-#### Lvalues, rvalues and temporaries
-
-**FIXME:** grammar?
-
-#### Moved and copied types
-
-**FIXME:** Do we want to capture this in the grammar as different productions?
-
-### Literal expressions
-
-See [Literals](#literals).
-
-### Path expressions
-
-See [Paths](#paths).
-
-### Tuple expressions
-
-```antlr
-tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ;
-```
-
-### Unit expressions
-
-```antlr
-unit_expr : "()" ;
-```
-
-### Structure expressions
-
-```antlr
-struct_expr_field_init : ident | ident ':' expr ;
-struct_expr : expr_path '{' struct_expr_field_init
-                      [ ',' struct_expr_field_init ] *
-                      [ ".." expr ] '}' |
-              expr_path '(' expr
-                      [ ',' expr ] * ')' |
-              expr_path ;
-```
-
-### Block expressions
-
-```antlr
-block_expr : '{' [ stmt | item ] *
-                 [ expr ] '}' ;
-```
-
-### Method-call expressions
-
-```antlr
-method_call_expr : expr '.' ident paren_expr_list ;
-```
-
-### Field expressions
-
-```antlr
-field_expr : expr '.' ident ;
-```
-
-### Array expressions
-
-```antlr
-array_expr : '[' "mut" ? array_elems? ']' ;
-
-array_elems : [expr [',' expr]*] | [expr ';' expr] ;
-```
-
-### Index expressions
-
-```antlr
-idx_expr : expr '[' expr ']' ;
-```
-
-### Range expressions
-
-```antlr
-range_expr : expr ".." expr |
-             expr ".." |
-             ".." expr |
-             ".." ;
-```
-
-### Unary operator expressions
-
-```antlr
-unop_expr : unop expr ;
-unop : '-' | '*' | '!' ;
-```
-
-### Binary operator expressions
-
-```antlr
-binop_expr : expr binop expr | type_cast_expr
-           | assignment_expr | compound_assignment_expr ;
-binop : arith_op | bitwise_op | lazy_bool_op | comp_op
-```
-
-#### Arithmetic operators
-
-```antlr
-arith_op : '+' | '-' | '*' | '/' | '%' ;
-```
-
-#### Bitwise operators
-
-```antlr
-bitwise_op : '&' | '|' | '^' | "<<" | ">>" ;
-```
-
-#### Lazy boolean operators
-
-```antlr
-lazy_bool_op : "&&" | "||" ;
-```
-
-#### Comparison operators
-
-```antlr
-comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ;
-```
-
-#### Type cast expressions
-
-```antlr
-type_cast_expr : value "as" type ;
-```
-
-#### Assignment expressions
-
-```antlr
-assignment_expr : expr '=' expr ;
-```
-
-#### Compound assignment expressions
-
-```antlr
-compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ;
-```
-
-### Grouped expressions
-
-```antlr
-paren_expr : '(' expr ')' ;
-```
-
-### Call expressions
-
-```antlr
-expr_list : [ expr [ ',' expr ]* ] ? ;
-paren_expr_list : '(' expr_list ')' ;
-call_expr : expr paren_expr_list ;
-```
-
-### Lambda expressions
-
-```antlr
-ident_list : [ ident [ ',' ident ]* ] ? ;
-lambda_expr : '|' ident_list '|' expr ;
-```
-
-### While loops
-
-```antlr
-while_expr : [ lifetime ':' ] ? "while" no_struct_literal_expr '{' block '}' ;
-```
-
-### Infinite loops
-
-```antlr
-loop_expr : [ lifetime ':' ] ? "loop" '{' block '}';
-```
-
-### Break expressions
-
-```antlr
-break_expr : "break" [ lifetime ] ?;
-```
-
-### Continue expressions
-
-```antlr
-continue_expr : "continue" [ lifetime ] ?;
-```
-
-### For expressions
-
-```antlr
-for_expr : [ lifetime ':' ] ? "for" pat "in" no_struct_literal_expr '{' block '}' ;
-```
-
-### If expressions
-
-```antlr
-if_expr : "if" no_struct_literal_expr '{' block '}'
-          else_tail ? ;
-
-else_tail : "else" [ if_expr | if_let_expr
-                   | '{' block '}' ] ;
-```
-
-### Match expressions
-
-```antlr
-match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
-
-match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
-
-match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
-```
-
-### If let expressions
-
-```antlr
-if_let_expr : "if" "let" pat '=' expr '{' block '}'
-               else_tail ? ;
-```
-
-### While let loops
-
-```antlr
-while_let_expr : [ lifetime ':' ] ? "while" "let" pat '=' expr '{' block '}' ;
-```
-
-### Return expressions
-
-```antlr
-return_expr : "return" expr ? ;
-```
-
-# Type system
-
-**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
-
-## Types
-
-### Primitive types
-
-**FIXME:** grammar?
-
-#### Machine types
-
-**FIXME:** grammar?
-
-#### Machine-dependent integer types
-
-**FIXME:** grammar?
-
-### Textual types
-
-**FIXME:** grammar?
-
-### Tuple types
-
-**FIXME:** grammar?
-
-### Array, and Slice types
-
-**FIXME:** grammar?
-
-### Structure types
-
-**FIXME:** grammar?
-
-### Enumerated types
-
-**FIXME:** grammar?
-
-### Pointer types
-
-**FIXME:** grammar?
-
-### Function types
-
-**FIXME:** grammar?
-
-### Closure types
-
-```antlr
-closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
-                [ ':' bound-list ] [ '->' type ]
-lifetime-list := lifetime | lifetime ',' lifetime-list
-arg-list := ident ':' type | ident ':' type ',' arg-list
-```
-
-### Never type
-An empty type
-
-```antlr
-never_type : "!" ;
-```
-
-### Object types
-
-**FIXME:** grammar?
-
-### Type parameters
-
-**FIXME:** grammar?
-
-### Type parameter bounds
-
-```antlr
-bound-list := bound | bound '+' bound-list '+' ?
-bound := ty_bound | lt_bound
-lt_bound := lifetime
-ty_bound := ty_bound_noparen | (ty_bound_noparen)
-ty_bound_noparen := [?] [ for<lt_param_defs> ] simple_path
-```
-
-### Self types
-
-**FIXME:** grammar?
-
-## Type kinds
-
-**FIXME:** this is probably not relevant to the grammar...
-
-# Memory and concurrency models
-
-**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
-
-## Memory model
-
-### Memory allocation and lifetime
-
-### Memory ownership
-
-### Variables
-
-### Boxes
-
-## Threads
-
-### Communication between threads
-
-### Thread lifecycle
+[reference]: https://doc.rust-lang.org/reference/
+[grammar working group]: https://github.com/rust-lang/wg-grammar
--- a/src/grammar/.gitignore
+++ b/src/grammar/.gitignore
@ -1,3 +0,0 @@
-*.class
-*.java
-*.tokens
--- a/src/grammar/lexer.l
+++ b/src/grammar/lexer.l
@ -1,350 +0,0 @@
-%{
-#include <stdio.h>
-#include <ctype.h>
-
-static int num_hashes;
-static int end_hashes;
-static int saw_non_hash;
-
-%}
-
-%option stack
-%option yylineno
-
-%x str
-%x rawstr
-%x rawstr_esc_begin
-%x rawstr_esc_body
-%x rawstr_esc_end
-%x byte
-%x bytestr
-%x rawbytestr
-%x rawbytestr_nohash
-%x pound
-%x shebang_or_attr
-%x ltorchar
-%x linecomment
-%x doc_line
-%x blockcomment
-%x doc_block
-%x suffix
-
-ident [a-zA-Z\x80-\xff_][a-zA-Z0-9\x80-\xff_]*
-
-%%
-
-<suffix>{ident}            { BEGIN(INITIAL); }
-<suffix>(.|\n)  { yyless(0); BEGIN(INITIAL); }
-
-[ \n\t\r]             { }
-
-\xef\xbb\xbf {
-  // UTF-8 byte order mark (BOM), ignore if in line 1, error otherwise
-  if (yyget_lineno() != 1) {
-    return -1;
-  }
-}
-
-\/\/(\/|\!)           { BEGIN(doc_line); yymore(); }
-<doc_line>\n          { BEGIN(INITIAL);
-                        yyleng--;
-                        yytext[yyleng] = 0;
-                        return ((yytext[2] == '!') ? INNER_DOC_COMMENT : OUTER_DOC_COMMENT);
-                      }
-<doc_line>[^\n]*      { yymore(); }
-
-\/\/|\/\/\/\/         { BEGIN(linecomment); }
-<linecomment>\n       { BEGIN(INITIAL); }
-<linecomment>[^\n]*   { }
-
-\/\*(\*|\!)[^*]       { yy_push_state(INITIAL); yy_push_state(doc_block); yymore(); }
-<doc_block>\/\*       { yy_push_state(doc_block); yymore(); }
-<doc_block>\*\/       {
-    yy_pop_state();
-    if (yy_top_state() == doc_block) {
-        yymore();
-    } else {
-        return ((yytext[2] == '!') ? INNER_DOC_COMMENT : OUTER_DOC_COMMENT);
-    }
-}
-<doc_block>(.|\n)     { yymore(); }
-
-\/\*                  { yy_push_state(blockcomment); }
-<blockcomment>\/\*    { yy_push_state(blockcomment); }
-<blockcomment>\*\/    { yy_pop_state(); }
-<blockcomment>(.|\n)   { }
-
-_        { return UNDERSCORE; }
-abstract { return ABSTRACT; }
-alignof  { return ALIGNOF; }
-as       { return AS; }
-become   { return BECOME; }
-box      { return BOX; }
-break    { return BREAK; }
-catch    { return CATCH; }
-const    { return CONST; }
-continue { return CONTINUE; }
-crate    { return CRATE; }
-default  { return DEFAULT; }
-do       { return DO; }
-else     { return ELSE; }
-enum     { return ENUM; }
-extern   { return EXTERN; }
-false    { return FALSE; }
-final    { return FINAL; }
-fn       { return FN; }
-for      { return FOR; }
-if       { return IF; }
-impl     { return IMPL; }
-in       { return IN; }
-let      { return LET; }
-loop     { return LOOP; }
-macro    { return MACRO; }
-match    { return MATCH; }
-mod      { return MOD; }
-move     { return MOVE; }
-mut      { return MUT; }
-offsetof { return OFFSETOF; }
-override { return OVERRIDE; }
-priv     { return PRIV; }
-proc     { return PROC; }
-pure     { return PURE; }
-pub      { return PUB; }
-ref      { return REF; }
-return   { return RETURN; }
-self     { return SELF; }
-sizeof   { return SIZEOF; }
-static   { return STATIC; }
-struct   { return STRUCT; }
-super    { return SUPER; }
-trait    { return TRAIT; }
-true     { return TRUE; }
-type     { return TYPE; }
-typeof   { return TYPEOF; }
-union    { return UNION; }
-unsafe   { return UNSAFE; }
-unsized  { return UNSIZED; }
-use      { return USE; }
-virtual  { return VIRTUAL; }
-where    { return WHERE; }
-while    { return WHILE; }
-yield    { return YIELD; }
-
-{ident}  { return IDENT; }
-
-0x[0-9a-fA-F_]+                                    { BEGIN(suffix); return LIT_INTEGER; }
-0o[0-7_]+                                          { BEGIN(suffix); return LIT_INTEGER; }
-0b[01_]+                                           { BEGIN(suffix); return LIT_INTEGER; }
-[0-9][0-9_]*                                       { BEGIN(suffix); return LIT_INTEGER; }
-[0-9][0-9_]*\.(\.|[a-zA-Z])    { yyless(yyleng - 2); BEGIN(suffix); return LIT_INTEGER; }
-
-[0-9][0-9_]*\.[0-9_]*([eE][-\+]?[0-9_]+)?          { BEGIN(suffix); return LIT_FLOAT; }
-[0-9][0-9_]*(\.[0-9_]*)?[eE][-\+]?[0-9_]+          { BEGIN(suffix); return LIT_FLOAT; }
-
-;      { return ';'; }
-,      { return ','; }
-\.\.\. { return DOTDOTDOT; }
-\.\.   { return DOTDOT; }
-\.     { return '.'; }
-\(     { return '('; }
-\)     { return ')'; }
-\{     { return '{'; }
-\}     { return '}'; }
-\[     { return '['; }
-\]     { return ']'; }
-@      { return '@'; }
-#      { BEGIN(pound); yymore(); }
-<pound>\! { BEGIN(shebang_or_attr); yymore(); }
-<shebang_or_attr>\[ {
-  BEGIN(INITIAL);
-  yyless(2);
-  return SHEBANG;
-}
-<shebang_or_attr>[^\[\n]*\n {
-  // Since the \n was eaten as part of the token, yylineno will have
-  // been incremented to the value 2 if the shebang was on the first
-  // line. This yyless undoes that, setting yylineno back to 1.
-  yyless(yyleng - 1);
-  if (yyget_lineno() == 1) {
-    BEGIN(INITIAL);
-    return SHEBANG_LINE;
-  } else {
-    BEGIN(INITIAL);
-    yyless(2);
-    return SHEBANG;
-  }
-}
-<pound>. { BEGIN(INITIAL); yyless(1); return '#'; }
-
-\~     { return '~'; }
-::     { return MOD_SEP; }
-:      { return ':'; }
-\$     { return '$'; }
-\?     { return '?'; }
-
-==    { return EQEQ; }
-=>    { return FAT_ARROW; }
-=     { return '='; }
-\!=   { return NE; }
-\!    { return '!'; }
-\<=   { return LE; }
-\<\<  { return SHL; }
-\<\<= { return SHLEQ; }
-\<    { return '<'; }
-\>=   { return GE; }
-\>\>  { return SHR; }
-\>\>= { return SHREQ; }
-\>    { return '>'; }
-
-\x27                                      { BEGIN(ltorchar); yymore(); }
-<ltorchar>static                          { BEGIN(INITIAL); return STATIC_LIFETIME; }
-<ltorchar>{ident}                         { BEGIN(INITIAL); return LIFETIME; }
-<ltorchar>\\[nrt\\\x27\x220]\x27          { BEGIN(suffix); return LIT_CHAR; }
-<ltorchar>\\x[0-9a-fA-F]{2}\x27           { BEGIN(suffix); return LIT_CHAR; }
-<ltorchar>\\u\{([0-9a-fA-F]_*){1,6}\}\x27 { BEGIN(suffix); return LIT_CHAR; }
-<ltorchar>.\x27                           { BEGIN(suffix); return LIT_CHAR; }
-<ltorchar>[\x80-\xff]{2,4}\x27            { BEGIN(suffix); return LIT_CHAR; }
-<ltorchar><<EOF>>                         { BEGIN(INITIAL); return -1; }
-
-b\x22              { BEGIN(bytestr); yymore(); }
-<bytestr>\x22      { BEGIN(suffix); return LIT_BYTE_STR; }
-
-<bytestr><<EOF>>                     { return -1; }
-<bytestr>\\[n\nrt\\\x27\x220]        { yymore(); }
-<bytestr>\\x[0-9a-fA-F]{2}           { yymore(); }
-<bytestr>\\u\{([0-9a-fA-F]_*){1,6}\} { yymore(); }
-<bytestr>\\[^n\nrt\\\x27\x220]       { return -1; }
-<bytestr>(.|\n)                      { yymore(); }
-
-br\x22                      { BEGIN(rawbytestr_nohash); yymore(); }
-<rawbytestr_nohash>\x22     { BEGIN(suffix); return LIT_BYTE_STR_RAW; }
-<rawbytestr_nohash>(.|\n)   { yymore(); }
-<rawbytestr_nohash><<EOF>>  { return -1; }
-
-br/# {
-    BEGIN(rawbytestr);
-    yymore();
-    num_hashes = 0;
-    saw_non_hash = 0;
-    end_hashes = 0;
-}
-<rawbytestr># {
-    if (!saw_non_hash) {
-        num_hashes++;
-    } else if (end_hashes != 0) {
-        end_hashes++;
-        if (end_hashes == num_hashes) {
-            BEGIN(INITIAL);
-            return LIT_BYTE_STR_RAW;
-        }
-    }
-    yymore();
-}
-<rawbytestr>\x22# {
-    end_hashes = 1;
-    if (end_hashes == num_hashes) {
-        BEGIN(INITIAL);
-        return LIT_BYTE_STR_RAW;
-    }
-    yymore();
-}
-<rawbytestr>(.|\n) {
-    if (!saw_non_hash) {
-        saw_non_hash = 1;
-    }
-    if (end_hashes != 0) {
-        end_hashes = 0;
-    }
-    yymore();
-}
-<rawbytestr><<EOF>> { return -1; }
-
-b\x27                           { BEGIN(byte); yymore(); }
-<byte>\\[nrt\\\x27\x220]\x27    { BEGIN(INITIAL); return LIT_BYTE; }
-<byte>\\x[0-9a-fA-F]{2}\x27     { BEGIN(INITIAL); return LIT_BYTE; }
-<byte>\\u([0-9a-fA-F]_*){4}\x27 { BEGIN(INITIAL); return LIT_BYTE; }
-<byte>\\U([0-9a-fA-F]_*){8}\x27 { BEGIN(INITIAL); return LIT_BYTE; }
-<byte>.\x27                     { BEGIN(INITIAL); return LIT_BYTE; }
-<byte><<EOF>>                   { BEGIN(INITIAL); return -1; }
-
-r\x22           { BEGIN(rawstr); yymore(); }
-<rawstr>\x22    { BEGIN(suffix); return LIT_STR_RAW; }
-<rawstr>(.|\n)  { yymore(); }
-<rawstr><<EOF>> { return -1; }
-
-r/#             {
-    BEGIN(rawstr_esc_begin);
-    yymore();
-    num_hashes = 0;
-    saw_non_hash = 0;
-    end_hashes = 0;
-}
-
-<rawstr_esc_begin># {
-    num_hashes++;
-    yymore();
-}
-<rawstr_esc_begin>\x22 {
-    BEGIN(rawstr_esc_body);
-    yymore();
-}
-<rawstr_esc_begin>(.|\n) { return -1; }
-
-<rawstr_esc_body>\x22/# {
-  BEGIN(rawstr_esc_end);
-  yymore();
- }
-<rawstr_esc_body>(.|\n) {
-  yymore();
- }
-
-<rawstr_esc_end># {
-  end_hashes++;
-  if (end_hashes == num_hashes) {
-    BEGIN(INITIAL);
-    return LIT_STR_RAW;
-  }
-  yymore();
- }
-<rawstr_esc_end>[^#] {
-  end_hashes = 0;
-  BEGIN(rawstr_esc_body);
-  yymore();
- }
-
-<rawstr_esc_begin,rawstr_esc_body,rawstr_esc_end><<EOF>> { return -1; }
-
-\x22                     { BEGIN(str); yymore(); }
-<str>\x22                { BEGIN(suffix); return LIT_STR; }
-
-<str><<EOF>>                     { return -1; }
-<str>\\[n\nr\rt\\\x27\x220]      { yymore(); }
-<str>\\x[0-9a-fA-F]{2}           { yymore(); }
-<str>\\u\{([0-9a-fA-F]_*){1,6}\} { yymore(); }
-<str>\\[^n\nrt\\\x27\x220]       { return -1; }
-<str>(.|\n)                      { yymore(); }
-
-\<-  { return LARROW; }
-\>  { return RARROW; }
-    { return '-'; }
-=   { return MINUSEQ; }
-&&   { return ANDAND; }
-&    { return '&'; }
-&=   { return ANDEQ; }
-\|\| { return OROR; }
-\|   { return '|'; }
-\|=  { return OREQ; }
-\+   { return '+'; }
-\+=  { return PLUSEQ; }
-\*   { return '*'; }
-\*=  { return STAREQ; }
-\/   { return '/'; }
-\/=  { return SLASHEQ; }
-\^   { return '^'; }
-\^=  { return CARETEQ; }
-%    { return '%'; }
-%=   { return PERCENTEQ; }
-
-<<EOF>> { return 0; }
-
-%%
--- a/src/grammar/parser-lalr-main.c
+++ b/src/grammar/parser-lalr-main.c
@ -1,193 +0,0 @@
-#include <stdio.h>
-#include <stdarg.h>
-#include <stdlib.h>
-#include <string.h>
-
-extern int yylex();
-extern int rsparse();
-
-#define PUSHBACK_LEN 4
-
-static char pushback[PUSHBACK_LEN];
-static int verbose;
-
-void print(const char* format, ...) {
-  va_list args;
-  va_start(args, format);
-  if (verbose) {
-    vprintf(format, args);
-  }
-  va_end(args);
-}
-
-// If there is a non-null char at the head of the pushback queue,
-// dequeue it and shift the rest of the queue forwards. Otherwise,
-// return the token from calling yylex.
-int rslex() {
-  if (pushback[0] == '\0') {
-    return yylex();
-  } else {
-    char c = pushback[0];
-    memmove(pushback, pushback + 1, PUSHBACK_LEN - 1);
-    pushback[PUSHBACK_LEN - 1] = '\0';
-    return c;
-  }
-}
-
-// Note: this does nothing if the pushback queue is full. As long as
-// there aren't more than PUSHBACK_LEN consecutive calls to push_back
-// in an action, this shouldn't be a problem.
-void push_back(char c) {
-  for (int i = 0; i < PUSHBACK_LEN; ++i) {
-    if (pushback[i] == '\0') {
-      pushback[i] = c;
-      break;
-    }
-  }
-}
-
-extern int rsdebug;
-
-struct node {
-  struct node *next;
-  struct node *prev;
-  int own_string;
-  char const *name;
-  int n_elems;
-  struct node *elems[];
-};
-
-struct node *nodes = NULL;
-int n_nodes;
-
-struct node *mk_node(char const *name, int n, ...) {
-  va_list ap;
-  int i = 0;
-  unsigned sz = sizeof(struct node) + (n * sizeof(struct node *));
-  struct node *nn, *nd = (struct node *)malloc(sz);
-
-  print("# New %d-ary node: %s = %p\n", n, name, nd);
-
-  nd->own_string = 0;
-  nd->prev = NULL;
-  nd->next = nodes;
-  if (nodes) {
-    nodes->prev = nd;
-  }
-  nodes = nd;
-
-  nd->name = name;
-  nd->n_elems = n;
-
-  va_start(ap, n);
-  while (i < n) {
-    nn = va_arg(ap, struct node *);
-    print("#   arg[%d]: %p\n", i, nn);
-    print("#            (%s ...)\n", nn->name);
-    nd->elems[i++] = nn;
-  }
-  va_end(ap);
-  n_nodes++;
-  return nd;
-}
-
-struct node *mk_atom(char *name) {
-  struct node *nd = mk_node((char const *)strdup(name), 0);
-  nd->own_string = 1;
-  return nd;
-}
-
-struct node *mk_none() {
-  return mk_atom("<none>");
-}
-
-struct node *ext_node(struct node *nd, int n, ...) {
-  va_list ap;
-  int i = 0, c = nd->n_elems + n;
-  unsigned sz = sizeof(struct node) + (c * sizeof(struct node *));
-  struct node *nn;
-
-  print("# Extending %d-ary node by %d nodes: %s = %p",
-        nd->n_elems, c, nd->name, nd);
-
-  if (nd->next) {
-    nd->next->prev = nd->prev;
-  }
-  if (nd->prev) {
-    nd->prev->next = nd->next;
-  }
-  nd = realloc(nd, sz);
-  nd->prev = NULL;
-  nd->next = nodes;
-  nodes->prev = nd;
-  nodes = nd;
-
-  print(" ==> %p\n", nd);
-
-  va_start(ap, n);
-  while (i < n) {
-    nn = va_arg(ap, struct node *);
-    print("#   arg[%d]: %p\n", i, nn);
-    print("#            (%s ...)\n", nn->name);
-    nd->elems[nd->n_elems++] = nn;
-    ++i;
-  }
-  va_end(ap);
-  return nd;
-}
-
-int const indent_step = 4;
-
-void print_indent(int depth) {
-  while (depth) {
-    if (depth-- % indent_step == 0) {
-      print("|");
-    } else {
-      print(" ");
-    }
-  }
-}
-
-void print_node(struct node *n, int depth) {
-  int i = 0;
-  print_indent(depth);
-  if (n->n_elems == 0) {
-    print("%s\n", n->name);
-  } else {
-    print("(%s\n", n->name);
-    for (i = 0; i < n->n_elems; ++i) {
-      print_node(n->elems[i], depth + indent_step);
-    }
-    print_indent(depth);
-    print(")\n");
-  }
-}
-
-int main(int argc, char **argv) {
-  if (argc == 2 && strcmp(argv[1], "-v") == 0) {
-    verbose = 1;
-  } else {
-    verbose = 0;
-  }
-  int ret = 0;
-  struct node *tmp;
-  memset(pushback, '\0', PUSHBACK_LEN);
-  ret = rsparse();
-  print("--- PARSE COMPLETE: ret:%d, n_nodes:%d ---\n", ret, n_nodes);
-  if (nodes) {
-    print_node(nodes, 0);
-  }
-  while (nodes) {
-    tmp = nodes;
-    nodes = tmp->next;
-    if (tmp->own_string) {
-      free((void*)tmp->name);
-    }
-    free(tmp);
-  }
-  return ret;
-}
-
-void rserror(char const *s) {
-  fprintf(stderr, "%s\n", s);
-}
--- a/src/grammar/parser-lalr.y
+++ b/src/grammar/parser-lalr.y
--- a/src/grammar/raw-string-literal-ambiguity.md
+++ b/src/grammar/raw-string-literal-ambiguity.md
@ -1,64 +0,0 @@
-Rust's lexical grammar is not context-free. Raw string literals are the source
-of the problem. Informally, a raw string literal is an `r`, followed by `N`
-hashes (where N can be zero), a quote, any characters, then a quote followed
-by `N` hashes. Critically, once inside the first pair of quotes,
-another quote cannot be followed by `N` consecutive hashes. e.g.
-`r###""###"###` is invalid.
-
-This grammar describes this as best possible:
-
-    R -> 'r' S
-    S -> '"' B '"'
-    S -> '#' S '#'
-    B -> . B
-    B -> ε
-
-Where `.` represents any character, and `ε` the empty string. Consider the
-string `r#""#"#`. This string is not a valid raw string literal, but can be
-accepted as one by the above grammar, using the derivation:
-
-    R : #""#"#
-    S : ""#"
-    S : "#
-    B : #
-    B : ε
-
-(Where `T : U` means the rule `T` is applied, and `U` is the remainder of the
-string.) The difficulty arises from the fact that it is fundamentally
-context-sensitive. In particular, the context needed is the number of hashes.
-
-To prove that Rust's string literals are not context-free, we will use
-the fact that context-free languages are closed under intersection with
-regular languages, and the
-[pumping lemma for context-free languages](https://en.wikipedia.org/wiki/Pumping_lemma_for_context-free_languages).
-
-Consider the regular language `R = r#+""#*"#+`. If Rust's raw string literals are
-context-free, then their intersection with `R`, `R'`, should also be context-free.
-Therefore, to prove that raw string literals are not context-free,
-it is sufficient to prove that `R'` is not context-free.
-
-The language `R'` is `{r#^n""#^m"#^n | m < n}`.
-
-Assume `R'` *is* context-free. Then `R'` has some pumping length `p > 0` for which
-the pumping lemma applies. Consider the following string `s` in `R'`:
-
-`r#^p""#^{p-1}"#^p`
-
-e.g. for `p = 2`: `s = r##""#"##`
-
-Then `s = uvwxy` for some choice of `uvwxy` such that `vx` is non-empty,
-`|vwx| < p+1`, and `uv^iwx^iy` is in `R'` for all `i >= 0`.
-
-Neither `v` nor `x` can contain a `"` or `r`, as the number of these characters
-in any string in `R'` is fixed. So `v` and `x` contain only hashes.
-Consequently, of the three sequences of hashes, `v` and `x` combined
-can only pump two of them.
-If we ever choose the central sequence of hashes, then one of the outer sequences
-will not grow when we pump, leading to an imbalance between the outer sequences.
-Therefore, we must pump both outer sequences of hashes. However,
-there are `p+2` characters between these two sequences of hashes, and `|vwx|` must
-be less than `p+1`. Therefore we have a contradiction, and `R'` must not be
-context-free.
-
-Since `R'` is not context-free, it follows that the Rust's raw string literals
-must not be context-free.
--- a/src/grammar/testparser.py
+++ b/src/grammar/testparser.py
@ -1,66 +0,0 @@
-#!/usr/bin/env python
-
-# ignore-tidy-linelength
-
-import sys
-
-import os
-import subprocess
-import argparse
-
-# usage: testparser.py [-h] [-p PARSER [PARSER ...]] -s SOURCE_DIR
-
-# Parsers should read from stdin and return exit status 0 for a
-# successful parse, and nonzero for an unsuccessful parse
-
-parser = argparse.ArgumentParser()
-parser.add_argument('-p', '--parser', nargs='+')
-parser.add_argument('-s', '--source-dir', nargs=1, required=True)
-args = parser.parse_args(sys.argv[1:])
-
-total = 0
-ok = {}
-bad = {}
-for parser in args.parser:
-    ok[parser] = 0
-    bad[parser] = []
-devnull = open(os.devnull, 'w')
-print("\n")
-
-for base, dirs, files in os.walk(args.source_dir[0]):
-    for f in filter(lambda p: p.endswith('.rs'), files):
-        p = os.path.join(base, f)
-        parse_fail = 'parse-fail' in p
-        if sys.version_info.major == 3:
-            lines = open(p, encoding='utf-8').readlines()
-        else:
-            lines = open(p).readlines()
-        if any('ignore-test' in line or 'ignore-lexer-test' in line for line in lines):
-            continue
-        total += 1
-        for parser in args.parser:
-            if subprocess.call(parser, stdin=open(p), stderr=subprocess.STDOUT, stdout=devnull) == 0:
-                if parse_fail:
-                    bad[parser].append(p)
-                else:
-                    ok[parser] += 1
-            else:
-                if parse_fail:
-                    ok[parser] += 1
-                else:
-                    bad[parser].append(p)
-        parser_stats = ', '.join(['{}: {}'.format(parser, ok[parser]) for parser in args.parser])
-        sys.stdout.write("\033[K\r total: {}, {}, scanned {}"
-                         .format(total, os.path.relpath(parser_stats), os.path.relpath(p)))
-
-devnull.close()
-
-print("\n")
-
-for parser in args.parser:
-    filename = os.path.basename(parser) + '.bad'
-    print("writing {} files that did not yield the correct result with {} to {}".format(len(bad[parser]), parser, filename))
-    with open(filename, "w") as f:
-        for p in bad[parser]:
-            f.write(p)
-            f.write("\n")
--- a/src/grammar/tokens.h
+++ b/src/grammar/tokens.h
@ -1,99 +0,0 @@
-enum Token {
-  SHL = 257, // Parser generators reserve 0-256 for char literals
-  SHR,
-  LE,
-  EQEQ,
-  NE,
-  GE,
-  ANDAND,
-  OROR,
-  SHLEQ,
-  SHREQ,
-  MINUSEQ,
-  ANDEQ,
-  OREQ,
-  PLUSEQ,
-  STAREQ,
-  SLASHEQ,
-  CARETEQ,
-  PERCENTEQ,
-  DOTDOT,
-  DOTDOTDOT,
-  MOD_SEP,
-  LARROW,
-  RARROW,
-  FAT_ARROW,
-  LIT_BYTE,
-  LIT_CHAR,
-  LIT_INTEGER,
-  LIT_FLOAT,
-  LIT_STR,
-  LIT_STR_RAW,
-  LIT_BYTE_STR,
-  LIT_BYTE_STR_RAW,
-  IDENT,
-  UNDERSCORE,
-  LIFETIME,
-
-  // keywords
-  SELF,
-  STATIC,
-  ABSTRACT,
-  ALIGNOF,
-  AS,
-  BECOME,
-  BREAK,
-  CATCH,
-  CRATE,
-  DEFAULT,
-  DO,
-  ELSE,
-  ENUM,
-  EXTERN,
-  FALSE,
-  FINAL,
-  FN,
-  FOR,
-  IF,
-  IMPL,
-  IN,
-  LET,
-  LOOP,
-  MACRO,
-  MATCH,
-  MOD,
-  MOVE,
-  MUT,
-  OFFSETOF,
-  OVERRIDE,
-  PRIV,
-  PUB,
-  PURE,
-  REF,
-  RETURN,
-  SIZEOF,
-  STRUCT,
-  SUPER,
-  UNION,
-  TRUE,
-  TRAIT,
-  TYPE,
-  UNSAFE,
-  UNSIZED,
-  USE,
-  VIRTUAL,
-  WHILE,
-  YIELD,
-  CONTINUE,
-  PROC,
-  BOX,
-  CONST,
-  WHERE,
-  TYPEOF,
-  INNER_DOC_COMMENT,
-  OUTER_DOC_COMMENT,
-
-  SHEBANG,
-  SHEBANG_LINE,
-  STATIC_LIFETIME
-};