Rollup merge of #64896 - XAMPPRocky:remove-grammar, r=Centril
Remove legacy grammar Revival of #50835 & #55545 On the #wg-grammar discord there was agreement that enough progress has been made to be able to remove the legacy grammar. r? @Centril @qmx cc @rust-lang/wg-grammar
This commit is contained in:
commit
b8c5d3a427
@ -1,812 +1,7 @@
|
||||
% Grammar
|
||||
|
||||
# Introduction
|
||||
The Rust grammar may now be found in the [reference]. Additionally, the [grammar
|
||||
working group] is working on producing a testable grammar.
|
||||
|
||||
This document is the primary reference for the Rust programming language grammar. It
|
||||
provides only one kind of material:
|
||||
|
||||
- Chapters that formally define the language grammar.
|
||||
|
||||
This document does not serve as an introduction to the language. Background
|
||||
familiarity with the language is assumed. A separate [guide] is available to
|
||||
help acquire such background.
|
||||
|
||||
This document also does not serve as a reference to the [standard] library
|
||||
included in the language distribution. Those libraries are documented
|
||||
separately by extracting documentation attributes from their source code. Many
|
||||
of the features that one might expect to be language features are library
|
||||
features in Rust, so what you're looking for may be there, not here.
|
||||
|
||||
[guide]: guide.html
|
||||
[standard]: std/index.html
|
||||
|
||||
# Notation
|
||||
|
||||
Rust's grammar is defined over Unicode codepoints, each conventionally denoted
|
||||
`U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
|
||||
confined to the ASCII range of Unicode, and is described in this document by a
|
||||
dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
|
||||
supported by common automated LL(k) parsing tools such as `llgen`, rather than
|
||||
the dialect given in ISO 14977. The dialect can be defined self-referentially
|
||||
as follows:
|
||||
|
||||
```antlr
|
||||
grammar : rule + ;
|
||||
rule : nonterminal ':' productionrule ';' ;
|
||||
productionrule : production [ '|' production ] * ;
|
||||
production : term * ;
|
||||
term : element repeats ;
|
||||
element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
|
||||
repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- Whitespace in the grammar is ignored.
|
||||
- Square brackets are used to group rules.
|
||||
- `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
|
||||
ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
|
||||
Unicode codepoint `U+00QQ`.
|
||||
- `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
|
||||
- The `repeat` forms apply to the adjacent `element`, and are as follows:
|
||||
- `?` means zero or one repetition
|
||||
- `*` means zero or more repetitions
|
||||
- `+` means one or more repetitions
|
||||
- NUMBER trailing a repeat symbol gives a maximum repetition count
|
||||
- NUMBER on its own gives an exact repetition count
|
||||
|
||||
This EBNF dialect should hopefully be familiar to many readers.
|
||||
|
||||
## Unicode productions
|
||||
|
||||
A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
|
||||
range. We define these productions in terms of character properties specified
|
||||
in the Unicode standard, rather than in terms of ASCII-range codepoints. The
|
||||
section [Special Unicode Productions](#special-unicode-productions) lists these
|
||||
productions.
|
||||
|
||||
## String table productions
|
||||
|
||||
Some rules in the grammar — notably [unary
|
||||
operators](#unary-operator-expressions), [binary
|
||||
operators](#binary-operator-expressions), and [keywords](#keywords) — are
|
||||
given in a simplified form: as a listing of a table of unquoted, printable
|
||||
whitespace-separated strings. These cases form a subset of the rules regarding
|
||||
the [token](#tokens) rule, and are assumed to be the result of a
|
||||
lexical-analysis phase feeding the parser, driven by a DFA, operating over the
|
||||
disjunction of all such string table entries.
|
||||
|
||||
When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
|
||||
it is an implicit reference to a single member of such a string table
|
||||
production. See [tokens](#tokens) for more information.
|
||||
|
||||
# Lexical structure
|
||||
|
||||
## Input format
|
||||
|
||||
Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
|
||||
Most Rust grammar rules are defined in terms of printable ASCII-range
|
||||
codepoints, but a small number are defined in terms of Unicode properties or
|
||||
explicit codepoint lists. [^inputformat]
|
||||
|
||||
[^inputformat]: Substitute definitions for the special Unicode productions are
|
||||
provided to the grammar verifier, restricted to ASCII range, when verifying the
|
||||
grammar in this document.
|
||||
|
||||
## Special Unicode Productions
|
||||
|
||||
The following productions in the Rust grammar are defined in terms of Unicode
|
||||
properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and
|
||||
`non_double_quote`.
|
||||
|
||||
### Identifiers
|
||||
|
||||
The `ident` production is any nonempty Unicode string of
|
||||
the following form:
|
||||
|
||||
- The first character is in one of the following ranges `U+0041` to `U+005A`
|
||||
("A" to "Z"), `U+0061` to `U+007A` ("a" to "z"), or `U+005F` ("\_").
|
||||
- The remaining characters are in the range `U+0030` to `U+0039` ("0" to "9"),
|
||||
or any of the prior valid initial characters.
|
||||
|
||||
as long as the identifier does _not_ occur in the set of [keywords](#keywords).
|
||||
|
||||
### Delimiter-restricted productions
|
||||
|
||||
Some productions are defined by exclusion of particular Unicode characters:
|
||||
|
||||
- `non_null` is any single Unicode character aside from `U+0000` (null)
|
||||
- `non_eol` is any single Unicode character aside from `U+000A` (`'\n'`)
|
||||
- `non_single_quote` is any single Unicode character aside from `U+0027` (`'`)
|
||||
- `non_double_quote` is any single Unicode character aside from `U+0022` (`"`)
|
||||
|
||||
## Comments
|
||||
|
||||
```antlr
|
||||
comment : block_comment | line_comment ;
|
||||
block_comment : "/*" block_comment_body * "*/" ;
|
||||
block_comment_body : [block_comment | character] * ;
|
||||
line_comment : "//" non_eol * ;
|
||||
```
|
||||
|
||||
**FIXME:** add doc grammar?
|
||||
|
||||
## Whitespace
|
||||
|
||||
```antlr
|
||||
whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
|
||||
whitespace : [ whitespace_char | comment ] + ;
|
||||
```
|
||||
|
||||
## Tokens
|
||||
|
||||
```antlr
|
||||
simple_token : keyword | unop | binop ;
|
||||
token : simple_token | ident | literal | symbol | whitespace token ;
|
||||
```
|
||||
|
||||
### Keywords
|
||||
|
||||
<p id="keyword-table-marker"></p>
|
||||
|
||||
| | | | | |
|
||||
|----------|----------|----------|----------|----------|
|
||||
| _ | abstract | alignof | as | become |
|
||||
| box | break | const | continue | crate |
|
||||
| do | else | enum | extern | false |
|
||||
| final | fn | for | if | impl |
|
||||
| in | let | loop | macro | match |
|
||||
| mod | move | mut | offsetof | override |
|
||||
| priv | proc | pub | pure | ref |
|
||||
| return | Self | self | sizeof | static |
|
||||
| struct | super | trait | true | type |
|
||||
| typeof | unsafe | unsized | use | virtual |
|
||||
| where | while | yield | | |
|
||||
|
||||
|
||||
Each of these keywords has special meaning in its grammar, and all of them are
|
||||
excluded from the `ident` rule.
|
||||
|
||||
Not all of these keywords are used by the language. Some of them were used
|
||||
before Rust 1.0, and were left reserved once their implementations were
|
||||
removed. Some of them were reserved before 1.0 to make space for possible
|
||||
future features.
|
||||
|
||||
### Literals
|
||||
|
||||
```antlr
|
||||
lit_suffix : ident;
|
||||
literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?;
|
||||
```
|
||||
|
||||
The optional `lit_suffix` production is only used for certain numeric literals,
|
||||
but is reserved for future extension. That is, the above gives the lexical
|
||||
grammar, but a Rust parser will reject everything but the 12 special cases
|
||||
mentioned in [Number literals](reference/tokens.html#number-literals) in the
|
||||
reference.
|
||||
|
||||
#### Character and string literals
|
||||
|
||||
```antlr
|
||||
char_lit : '\x27' char_body '\x27' ;
|
||||
string_lit : '"' string_body * '"' | 'r' raw_string ;
|
||||
|
||||
char_body : non_single_quote
|
||||
| '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
|
||||
|
||||
string_body : non_double_quote
|
||||
| '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
|
||||
raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
|
||||
|
||||
common_escape : '\x5c'
|
||||
| 'n' | 'r' | 't' | '0'
|
||||
| 'x' hex_digit 2
|
||||
unicode_escape : 'u' '{' hex_digit+ 6 '}';
|
||||
|
||||
hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
|
||||
| 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
|
||||
| dec_digit ;
|
||||
oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
|
||||
dec_digit : '0' | nonzero_dec ;
|
||||
nonzero_dec: '1' | '2' | '3' | '4'
|
||||
| '5' | '6' | '7' | '8' | '9' ;
|
||||
```
|
||||
|
||||
#### Byte and byte string literals
|
||||
|
||||
```antlr
|
||||
byte_lit : "b\x27" byte_body '\x27' ;
|
||||
byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
|
||||
|
||||
byte_body : ascii_non_single_quote
|
||||
| '\x5c' [ '\x27' | common_escape ] ;
|
||||
|
||||
byte_string_body : ascii_non_double_quote
|
||||
| '\x5c' [ '\x22' | common_escape ] ;
|
||||
raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
|
||||
|
||||
```
|
||||
|
||||
#### Number literals
|
||||
|
||||
```antlr
|
||||
num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
|
||||
| '0' [ [ dec_digit | '_' ] * float_suffix ?
|
||||
| 'b' [ '1' | '0' | '_' ] +
|
||||
| 'o' [ oct_digit | '_' ] +
|
||||
| 'x' [ hex_digit | '_' ] + ] ;
|
||||
|
||||
float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
|
||||
|
||||
exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
|
||||
dec_lit : [ dec_digit | '_' ] + ;
|
||||
```
|
||||
|
||||
#### Boolean literals
|
||||
|
||||
```antlr
|
||||
bool_lit : [ "true" | "false" ] ;
|
||||
```
|
||||
|
||||
The two values of the boolean type are written `true` and `false`.
|
||||
|
||||
### Symbols
|
||||
|
||||
```antlr
|
||||
symbol : "::" | "->"
|
||||
| '#' | '[' | ']' | '(' | ')' | '{' | '}'
|
||||
| ',' | ';' ;
|
||||
```
|
||||
|
||||
Symbols are a general class of printable [tokens](#tokens) that play structural
|
||||
roles in a variety of grammar productions. They are cataloged here for
|
||||
completeness as the set of remaining miscellaneous printable tokens that do not
|
||||
otherwise appear as [unary operators](#unary-operator-expressions), [binary
|
||||
operators](#binary-operator-expressions), or [keywords](#keywords).
|
||||
|
||||
## Paths
|
||||
|
||||
```antlr
|
||||
expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
|
||||
expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
|
||||
| expr_path ;
|
||||
|
||||
type_path : ident [ type_path_tail ] + ;
|
||||
type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
|
||||
| "::" type_path ;
|
||||
```
|
||||
|
||||
# Syntax extensions
|
||||
|
||||
## Macros
|
||||
|
||||
```antlr
|
||||
expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';'
|
||||
| "macro_rules" '!' ident '{' macro_rule * '}' ;
|
||||
macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
|
||||
matcher : '(' matcher * ')' | '[' matcher * ']'
|
||||
| '{' matcher * '}' | '$' ident ':' ident
|
||||
| '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
|
||||
| non_special_token ;
|
||||
transcriber : '(' transcriber * ')' | '[' transcriber * ']'
|
||||
| '{' transcriber * '}' | '$' ident
|
||||
| '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
|
||||
| non_special_token ;
|
||||
```
|
||||
|
||||
# Crates and source files
|
||||
|
||||
**FIXME:** grammar? What production covers #![crate_id = "foo"] ?
|
||||
|
||||
# Items and attributes
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
## Items
|
||||
|
||||
```antlr
|
||||
item : vis ? mod_item | fn_item | type_item | struct_item | enum_item
|
||||
| const_item | static_item | trait_item | impl_item | extern_block_item ;
|
||||
```
|
||||
|
||||
### Type Parameters
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Modules
|
||||
|
||||
```antlr
|
||||
mod_item : "mod" ident ( ';' | '{' mod '}' );
|
||||
mod : [ view_item | item ] * ;
|
||||
```
|
||||
|
||||
#### View items
|
||||
|
||||
```antlr
|
||||
view_item : extern_crate_decl | use_decl ';' ;
|
||||
```
|
||||
|
||||
##### Extern crate declarations
|
||||
|
||||
```antlr
|
||||
extern_crate_decl : "extern" "crate" crate_name
|
||||
crate_name: ident | ( ident "as" ident )
|
||||
```
|
||||
|
||||
##### Use declarations
|
||||
|
||||
```antlr
|
||||
use_decl : vis ? "use" [ path "as" ident
|
||||
| path_glob ] ;
|
||||
|
||||
path_glob : ident [ "::" [ path_glob
|
||||
| '*' ] ] ?
|
||||
| '{' path_item [ ',' path_item ] * '}' ;
|
||||
|
||||
path_item : ident | "self" ;
|
||||
```
|
||||
|
||||
### Functions
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
#### Generic functions
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
#### Unsafety
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
##### Unsafe functions
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
##### Unsafe blocks
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
#### Diverging functions
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Type definitions
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Structures
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Enumerations
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Constant items
|
||||
|
||||
```antlr
|
||||
const_item : "const" ident ':' type '=' expr ';' ;
|
||||
```
|
||||
|
||||
### Static items
|
||||
|
||||
```antlr
|
||||
static_item : "static" ident ':' type '=' expr ';' ;
|
||||
```
|
||||
|
||||
#### Mutable statics
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Traits
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Implementations
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### External blocks
|
||||
|
||||
```antlr
|
||||
extern_block_item : "extern" '{' extern_block '}' ;
|
||||
extern_block : [ foreign_fn ] * ;
|
||||
```
|
||||
|
||||
## Visibility and Privacy
|
||||
|
||||
```antlr
|
||||
vis : "pub" ;
|
||||
```
|
||||
### Re-exporting and Visibility
|
||||
|
||||
See [Use declarations](#use-declarations).
|
||||
|
||||
## Attributes
|
||||
|
||||
```antlr
|
||||
attribute : '#' '!' ? '[' meta_item ']' ;
|
||||
meta_item : ident [ '=' literal
|
||||
| '(' meta_seq ')' ] ? ;
|
||||
meta_seq : meta_item [ ',' meta_seq ] ? ;
|
||||
```
|
||||
|
||||
# Statements and expressions
|
||||
|
||||
## Statements
|
||||
|
||||
```antlr
|
||||
stmt : decl_stmt | expr_stmt | ';' ;
|
||||
```
|
||||
|
||||
### Declaration statements
|
||||
|
||||
```antlr
|
||||
decl_stmt : item | let_decl ;
|
||||
```
|
||||
|
||||
#### Item declarations
|
||||
|
||||
See [Items](#items).
|
||||
|
||||
#### Variable declarations
|
||||
|
||||
```antlr
|
||||
let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
|
||||
init : [ '=' ] expr ;
|
||||
```
|
||||
|
||||
### Expression statements
|
||||
|
||||
```antlr
|
||||
expr_stmt : expr ';' ;
|
||||
```
|
||||
|
||||
## Expressions
|
||||
|
||||
```antlr
|
||||
expr : literal | path | tuple_expr | unit_expr | struct_expr
|
||||
| block_expr | method_call_expr | field_expr | array_expr
|
||||
| idx_expr | range_expr | unop_expr | binop_expr
|
||||
| paren_expr | call_expr | lambda_expr | while_expr
|
||||
| loop_expr | break_expr | continue_expr | for_expr
|
||||
| if_expr | match_expr | if_let_expr | while_let_expr
|
||||
| return_expr ;
|
||||
```
|
||||
|
||||
#### Lvalues, rvalues and temporaries
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
#### Moved and copied types
|
||||
|
||||
**FIXME:** Do we want to capture this in the grammar as different productions?
|
||||
|
||||
### Literal expressions
|
||||
|
||||
See [Literals](#literals).
|
||||
|
||||
### Path expressions
|
||||
|
||||
See [Paths](#paths).
|
||||
|
||||
### Tuple expressions
|
||||
|
||||
```antlr
|
||||
tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ;
|
||||
```
|
||||
|
||||
### Unit expressions
|
||||
|
||||
```antlr
|
||||
unit_expr : "()" ;
|
||||
```
|
||||
|
||||
### Structure expressions
|
||||
|
||||
```antlr
|
||||
struct_expr_field_init : ident | ident ':' expr ;
|
||||
struct_expr : expr_path '{' struct_expr_field_init
|
||||
[ ',' struct_expr_field_init ] *
|
||||
[ ".." expr ] '}' |
|
||||
expr_path '(' expr
|
||||
[ ',' expr ] * ')' |
|
||||
expr_path ;
|
||||
```
|
||||
|
||||
### Block expressions
|
||||
|
||||
```antlr
|
||||
block_expr : '{' [ stmt | item ] *
|
||||
[ expr ] '}' ;
|
||||
```
|
||||
|
||||
### Method-call expressions
|
||||
|
||||
```antlr
|
||||
method_call_expr : expr '.' ident paren_expr_list ;
|
||||
```
|
||||
|
||||
### Field expressions
|
||||
|
||||
```antlr
|
||||
field_expr : expr '.' ident ;
|
||||
```
|
||||
|
||||
### Array expressions
|
||||
|
||||
```antlr
|
||||
array_expr : '[' "mut" ? array_elems? ']' ;
|
||||
|
||||
array_elems : [expr [',' expr]*] | [expr ';' expr] ;
|
||||
```
|
||||
|
||||
### Index expressions
|
||||
|
||||
```antlr
|
||||
idx_expr : expr '[' expr ']' ;
|
||||
```
|
||||
|
||||
### Range expressions
|
||||
|
||||
```antlr
|
||||
range_expr : expr ".." expr |
|
||||
expr ".." |
|
||||
".." expr |
|
||||
".." ;
|
||||
```
|
||||
|
||||
### Unary operator expressions
|
||||
|
||||
```antlr
|
||||
unop_expr : unop expr ;
|
||||
unop : '-' | '*' | '!' ;
|
||||
```
|
||||
|
||||
### Binary operator expressions
|
||||
|
||||
```antlr
|
||||
binop_expr : expr binop expr | type_cast_expr
|
||||
| assignment_expr | compound_assignment_expr ;
|
||||
binop : arith_op | bitwise_op | lazy_bool_op | comp_op
|
||||
```
|
||||
|
||||
#### Arithmetic operators
|
||||
|
||||
```antlr
|
||||
arith_op : '+' | '-' | '*' | '/' | '%' ;
|
||||
```
|
||||
|
||||
#### Bitwise operators
|
||||
|
||||
```antlr
|
||||
bitwise_op : '&' | '|' | '^' | "<<" | ">>" ;
|
||||
```
|
||||
|
||||
#### Lazy boolean operators
|
||||
|
||||
```antlr
|
||||
lazy_bool_op : "&&" | "||" ;
|
||||
```
|
||||
|
||||
#### Comparison operators
|
||||
|
||||
```antlr
|
||||
comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ;
|
||||
```
|
||||
|
||||
#### Type cast expressions
|
||||
|
||||
```antlr
|
||||
type_cast_expr : value "as" type ;
|
||||
```
|
||||
|
||||
#### Assignment expressions
|
||||
|
||||
```antlr
|
||||
assignment_expr : expr '=' expr ;
|
||||
```
|
||||
|
||||
#### Compound assignment expressions
|
||||
|
||||
```antlr
|
||||
compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ;
|
||||
```
|
||||
|
||||
### Grouped expressions
|
||||
|
||||
```antlr
|
||||
paren_expr : '(' expr ')' ;
|
||||
```
|
||||
|
||||
### Call expressions
|
||||
|
||||
```antlr
|
||||
expr_list : [ expr [ ',' expr ]* ] ? ;
|
||||
paren_expr_list : '(' expr_list ')' ;
|
||||
call_expr : expr paren_expr_list ;
|
||||
```
|
||||
|
||||
### Lambda expressions
|
||||
|
||||
```antlr
|
||||
ident_list : [ ident [ ',' ident ]* ] ? ;
|
||||
lambda_expr : '|' ident_list '|' expr ;
|
||||
```
|
||||
|
||||
### While loops
|
||||
|
||||
```antlr
|
||||
while_expr : [ lifetime ':' ] ? "while" no_struct_literal_expr '{' block '}' ;
|
||||
```
|
||||
|
||||
### Infinite loops
|
||||
|
||||
```antlr
|
||||
loop_expr : [ lifetime ':' ] ? "loop" '{' block '}';
|
||||
```
|
||||
|
||||
### Break expressions
|
||||
|
||||
```antlr
|
||||
break_expr : "break" [ lifetime ] ?;
|
||||
```
|
||||
|
||||
### Continue expressions
|
||||
|
||||
```antlr
|
||||
continue_expr : "continue" [ lifetime ] ?;
|
||||
```
|
||||
|
||||
### For expressions
|
||||
|
||||
```antlr
|
||||
for_expr : [ lifetime ':' ] ? "for" pat "in" no_struct_literal_expr '{' block '}' ;
|
||||
```
|
||||
|
||||
### If expressions
|
||||
|
||||
```antlr
|
||||
if_expr : "if" no_struct_literal_expr '{' block '}'
|
||||
else_tail ? ;
|
||||
|
||||
else_tail : "else" [ if_expr | if_let_expr
|
||||
| '{' block '}' ] ;
|
||||
```
|
||||
|
||||
### Match expressions
|
||||
|
||||
```antlr
|
||||
match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
|
||||
|
||||
match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
|
||||
|
||||
match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
|
||||
```
|
||||
|
||||
### If let expressions
|
||||
|
||||
```antlr
|
||||
if_let_expr : "if" "let" pat '=' expr '{' block '}'
|
||||
else_tail ? ;
|
||||
```
|
||||
|
||||
### While let loops
|
||||
|
||||
```antlr
|
||||
while_let_expr : [ lifetime ':' ] ? "while" "let" pat '=' expr '{' block '}' ;
|
||||
```
|
||||
|
||||
### Return expressions
|
||||
|
||||
```antlr
|
||||
return_expr : "return" expr ? ;
|
||||
```
|
||||
|
||||
# Type system
|
||||
|
||||
**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
|
||||
|
||||
## Types
|
||||
|
||||
### Primitive types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
#### Machine types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
#### Machine-dependent integer types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Textual types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Tuple types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Array, and Slice types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Structure types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Enumerated types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Pointer types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Function types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Closure types
|
||||
|
||||
```antlr
|
||||
closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
|
||||
[ ':' bound-list ] [ '->' type ]
|
||||
lifetime-list := lifetime | lifetime ',' lifetime-list
|
||||
arg-list := ident ':' type | ident ':' type ',' arg-list
|
||||
```
|
||||
|
||||
### Never type
|
||||
An empty type
|
||||
|
||||
```antlr
|
||||
never_type : "!" ;
|
||||
```
|
||||
|
||||
### Object types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Type parameters
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Type parameter bounds
|
||||
|
||||
```antlr
|
||||
bound-list := bound | bound '+' bound-list '+' ?
|
||||
bound := ty_bound | lt_bound
|
||||
lt_bound := lifetime
|
||||
ty_bound := ty_bound_noparen | (ty_bound_noparen)
|
||||
ty_bound_noparen := [?] [ for<lt_param_defs> ] simple_path
|
||||
```
|
||||
|
||||
### Self types
|
||||
|
||||
**FIXME:** grammar?
|
||||
|
||||
## Type kinds
|
||||
|
||||
**FIXME:** this is probably not relevant to the grammar...
|
||||
|
||||
# Memory and concurrency models
|
||||
|
||||
**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
|
||||
|
||||
## Memory model
|
||||
|
||||
### Memory allocation and lifetime
|
||||
|
||||
### Memory ownership
|
||||
|
||||
### Variables
|
||||
|
||||
### Boxes
|
||||
|
||||
## Threads
|
||||
|
||||
### Communication between threads
|
||||
|
||||
### Thread lifecycle
|
||||
[reference]: https://doc.rust-lang.org/reference/
|
||||
[grammar working group]: https://github.com/rust-lang/wg-grammar
|
||||
|
3
src/grammar/.gitignore
vendored
3
src/grammar/.gitignore
vendored
@ -1,3 +0,0 @@
|
||||
*.class
|
||||
*.java
|
||||
*.tokens
|
@ -1,350 +0,0 @@
|
||||
%{
|
||||
#include <stdio.h>
|
||||
#include <ctype.h>
|
||||
|
||||
static int num_hashes;
|
||||
static int end_hashes;
|
||||
static int saw_non_hash;
|
||||
|
||||
%}
|
||||
|
||||
%option stack
|
||||
%option yylineno
|
||||
|
||||
%x str
|
||||
%x rawstr
|
||||
%x rawstr_esc_begin
|
||||
%x rawstr_esc_body
|
||||
%x rawstr_esc_end
|
||||
%x byte
|
||||
%x bytestr
|
||||
%x rawbytestr
|
||||
%x rawbytestr_nohash
|
||||
%x pound
|
||||
%x shebang_or_attr
|
||||
%x ltorchar
|
||||
%x linecomment
|
||||
%x doc_line
|
||||
%x blockcomment
|
||||
%x doc_block
|
||||
%x suffix
|
||||
|
||||
ident [a-zA-Z\x80-\xff_][a-zA-Z0-9\x80-\xff_]*
|
||||
|
||||
%%
|
||||
|
||||
<suffix>{ident} { BEGIN(INITIAL); }
|
||||
<suffix>(.|\n) { yyless(0); BEGIN(INITIAL); }
|
||||
|
||||
[ \n\t\r] { }
|
||||
|
||||
\xef\xbb\xbf {
|
||||
// UTF-8 byte order mark (BOM), ignore if in line 1, error otherwise
|
||||
if (yyget_lineno() != 1) {
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
|
||||
\/\/(\/|\!) { BEGIN(doc_line); yymore(); }
|
||||
<doc_line>\n { BEGIN(INITIAL);
|
||||
yyleng--;
|
||||
yytext[yyleng] = 0;
|
||||
return ((yytext[2] == '!') ? INNER_DOC_COMMENT : OUTER_DOC_COMMENT);
|
||||
}
|
||||
<doc_line>[^\n]* { yymore(); }
|
||||
|
||||
\/\/|\/\/\/\/ { BEGIN(linecomment); }
|
||||
<linecomment>\n { BEGIN(INITIAL); }
|
||||
<linecomment>[^\n]* { }
|
||||
|
||||
\/\*(\*|\!)[^*] { yy_push_state(INITIAL); yy_push_state(doc_block); yymore(); }
|
||||
<doc_block>\/\* { yy_push_state(doc_block); yymore(); }
|
||||
<doc_block>\*\/ {
|
||||
yy_pop_state();
|
||||
if (yy_top_state() == doc_block) {
|
||||
yymore();
|
||||
} else {
|
||||
return ((yytext[2] == '!') ? INNER_DOC_COMMENT : OUTER_DOC_COMMENT);
|
||||
}
|
||||
}
|
||||
<doc_block>(.|\n) { yymore(); }
|
||||
|
||||
\/\* { yy_push_state(blockcomment); }
|
||||
<blockcomment>\/\* { yy_push_state(blockcomment); }
|
||||
<blockcomment>\*\/ { yy_pop_state(); }
|
||||
<blockcomment>(.|\n) { }
|
||||
|
||||
_ { return UNDERSCORE; }
|
||||
abstract { return ABSTRACT; }
|
||||
alignof { return ALIGNOF; }
|
||||
as { return AS; }
|
||||
become { return BECOME; }
|
||||
box { return BOX; }
|
||||
break { return BREAK; }
|
||||
catch { return CATCH; }
|
||||
const { return CONST; }
|
||||
continue { return CONTINUE; }
|
||||
crate { return CRATE; }
|
||||
default { return DEFAULT; }
|
||||
do { return DO; }
|
||||
else { return ELSE; }
|
||||
enum { return ENUM; }
|
||||
extern { return EXTERN; }
|
||||
false { return FALSE; }
|
||||
final { return FINAL; }
|
||||
fn { return FN; }
|
||||
for { return FOR; }
|
||||
if { return IF; }
|
||||
impl { return IMPL; }
|
||||
in { return IN; }
|
||||
let { return LET; }
|
||||
loop { return LOOP; }
|
||||
macro { return MACRO; }
|
||||
match { return MATCH; }
|
||||
mod { return MOD; }
|
||||
move { return MOVE; }
|
||||
mut { return MUT; }
|
||||
offsetof { return OFFSETOF; }
|
||||
override { return OVERRIDE; }
|
||||
priv { return PRIV; }
|
||||
proc { return PROC; }
|
||||
pure { return PURE; }
|
||||
pub { return PUB; }
|
||||
ref { return REF; }
|
||||
return { return RETURN; }
|
||||
self { return SELF; }
|
||||
sizeof { return SIZEOF; }
|
||||
static { return STATIC; }
|
||||
struct { return STRUCT; }
|
||||
super { return SUPER; }
|
||||
trait { return TRAIT; }
|
||||
true { return TRUE; }
|
||||
type { return TYPE; }
|
||||
typeof { return TYPEOF; }
|
||||
union { return UNION; }
|
||||
unsafe { return UNSAFE; }
|
||||
unsized { return UNSIZED; }
|
||||
use { return USE; }
|
||||
virtual { return VIRTUAL; }
|
||||
where { return WHERE; }
|
||||
while { return WHILE; }
|
||||
yield { return YIELD; }
|
||||
|
||||
{ident} { return IDENT; }
|
||||
|
||||
0x[0-9a-fA-F_]+ { BEGIN(suffix); return LIT_INTEGER; }
|
||||
0o[0-7_]+ { BEGIN(suffix); return LIT_INTEGER; }
|
||||
0b[01_]+ { BEGIN(suffix); return LIT_INTEGER; }
|
||||
[0-9][0-9_]* { BEGIN(suffix); return LIT_INTEGER; }
|
||||
[0-9][0-9_]*\.(\.|[a-zA-Z]) { yyless(yyleng - 2); BEGIN(suffix); return LIT_INTEGER; }
|
||||
|
||||
[0-9][0-9_]*\.[0-9_]*([eE][-\+]?[0-9_]+)? { BEGIN(suffix); return LIT_FLOAT; }
|
||||
[0-9][0-9_]*(\.[0-9_]*)?[eE][-\+]?[0-9_]+ { BEGIN(suffix); return LIT_FLOAT; }
|
||||
|
||||
; { return ';'; }
|
||||
, { return ','; }
|
||||
\.\.\. { return DOTDOTDOT; }
|
||||
\.\. { return DOTDOT; }
|
||||
\. { return '.'; }
|
||||
\( { return '('; }
|
||||
\) { return ')'; }
|
||||
\{ { return '{'; }
|
||||
\} { return '}'; }
|
||||
\[ { return '['; }
|
||||
\] { return ']'; }
|
||||
@ { return '@'; }
|
||||
# { BEGIN(pound); yymore(); }
|
||||
<pound>\! { BEGIN(shebang_or_attr); yymore(); }
|
||||
<shebang_or_attr>\[ {
|
||||
BEGIN(INITIAL);
|
||||
yyless(2);
|
||||
return SHEBANG;
|
||||
}
|
||||
<shebang_or_attr>[^\[\n]*\n {
|
||||
// Since the \n was eaten as part of the token, yylineno will have
|
||||
// been incremented to the value 2 if the shebang was on the first
|
||||
// line. This yyless undoes that, setting yylineno back to 1.
|
||||
yyless(yyleng - 1);
|
||||
if (yyget_lineno() == 1) {
|
||||
BEGIN(INITIAL);
|
||||
return SHEBANG_LINE;
|
||||
} else {
|
||||
BEGIN(INITIAL);
|
||||
yyless(2);
|
||||
return SHEBANG;
|
||||
}
|
||||
}
|
||||
<pound>. { BEGIN(INITIAL); yyless(1); return '#'; }
|
||||
|
||||
\~ { return '~'; }
|
||||
:: { return MOD_SEP; }
|
||||
: { return ':'; }
|
||||
\$ { return '$'; }
|
||||
\? { return '?'; }
|
||||
|
||||
== { return EQEQ; }
|
||||
=> { return FAT_ARROW; }
|
||||
= { return '='; }
|
||||
\!= { return NE; }
|
||||
\! { return '!'; }
|
||||
\<= { return LE; }
|
||||
\<\< { return SHL; }
|
||||
\<\<= { return SHLEQ; }
|
||||
\< { return '<'; }
|
||||
\>= { return GE; }
|
||||
\>\> { return SHR; }
|
||||
\>\>= { return SHREQ; }
|
||||
\> { return '>'; }
|
||||
|
||||
\x27 { BEGIN(ltorchar); yymore(); }
|
||||
<ltorchar>static { BEGIN(INITIAL); return STATIC_LIFETIME; }
|
||||
<ltorchar>{ident} { BEGIN(INITIAL); return LIFETIME; }
|
||||
<ltorchar>\\[nrt\\\x27\x220]\x27 { BEGIN(suffix); return LIT_CHAR; }
|
||||
<ltorchar>\\x[0-9a-fA-F]{2}\x27 { BEGIN(suffix); return LIT_CHAR; }
|
||||
<ltorchar>\\u\{([0-9a-fA-F]_*){1,6}\}\x27 { BEGIN(suffix); return LIT_CHAR; }
|
||||
<ltorchar>.\x27 { BEGIN(suffix); return LIT_CHAR; }
|
||||
<ltorchar>[\x80-\xff]{2,4}\x27 { BEGIN(suffix); return LIT_CHAR; }
|
||||
<ltorchar><<EOF>> { BEGIN(INITIAL); return -1; }
|
||||
|
||||
b\x22 { BEGIN(bytestr); yymore(); }
|
||||
<bytestr>\x22 { BEGIN(suffix); return LIT_BYTE_STR; }
|
||||
|
||||
<bytestr><<EOF>> { return -1; }
|
||||
<bytestr>\\[n\nrt\\\x27\x220] { yymore(); }
|
||||
<bytestr>\\x[0-9a-fA-F]{2} { yymore(); }
|
||||
<bytestr>\\u\{([0-9a-fA-F]_*){1,6}\} { yymore(); }
|
||||
<bytestr>\\[^n\nrt\\\x27\x220] { return -1; }
|
||||
<bytestr>(.|\n) { yymore(); }
|
||||
|
||||
br\x22 { BEGIN(rawbytestr_nohash); yymore(); }
|
||||
<rawbytestr_nohash>\x22 { BEGIN(suffix); return LIT_BYTE_STR_RAW; }
|
||||
<rawbytestr_nohash>(.|\n) { yymore(); }
|
||||
<rawbytestr_nohash><<EOF>> { return -1; }
|
||||
|
||||
br/# {
|
||||
BEGIN(rawbytestr);
|
||||
yymore();
|
||||
num_hashes = 0;
|
||||
saw_non_hash = 0;
|
||||
end_hashes = 0;
|
||||
}
|
||||
<rawbytestr># {
|
||||
if (!saw_non_hash) {
|
||||
num_hashes++;
|
||||
} else if (end_hashes != 0) {
|
||||
end_hashes++;
|
||||
if (end_hashes == num_hashes) {
|
||||
BEGIN(INITIAL);
|
||||
return LIT_BYTE_STR_RAW;
|
||||
}
|
||||
}
|
||||
yymore();
|
||||
}
|
||||
<rawbytestr>\x22# {
|
||||
end_hashes = 1;
|
||||
if (end_hashes == num_hashes) {
|
||||
BEGIN(INITIAL);
|
||||
return LIT_BYTE_STR_RAW;
|
||||
}
|
||||
yymore();
|
||||
}
|
||||
<rawbytestr>(.|\n) {
|
||||
if (!saw_non_hash) {
|
||||
saw_non_hash = 1;
|
||||
}
|
||||
if (end_hashes != 0) {
|
||||
end_hashes = 0;
|
||||
}
|
||||
yymore();
|
||||
}
|
||||
<rawbytestr><<EOF>> { return -1; }
|
||||
|
||||
b\x27 { BEGIN(byte); yymore(); }
|
||||
<byte>\\[nrt\\\x27\x220]\x27 { BEGIN(INITIAL); return LIT_BYTE; }
|
||||
<byte>\\x[0-9a-fA-F]{2}\x27 { BEGIN(INITIAL); return LIT_BYTE; }
|
||||
<byte>\\u([0-9a-fA-F]_*){4}\x27 { BEGIN(INITIAL); return LIT_BYTE; }
|
||||
<byte>\\U([0-9a-fA-F]_*){8}\x27 { BEGIN(INITIAL); return LIT_BYTE; }
|
||||
<byte>.\x27 { BEGIN(INITIAL); return LIT_BYTE; }
|
||||
<byte><<EOF>> { BEGIN(INITIAL); return -1; }
|
||||
|
||||
r\x22 { BEGIN(rawstr); yymore(); }
|
||||
<rawstr>\x22 { BEGIN(suffix); return LIT_STR_RAW; }
|
||||
<rawstr>(.|\n) { yymore(); }
|
||||
<rawstr><<EOF>> { return -1; }
|
||||
|
||||
r/# {
|
||||
BEGIN(rawstr_esc_begin);
|
||||
yymore();
|
||||
num_hashes = 0;
|
||||
saw_non_hash = 0;
|
||||
end_hashes = 0;
|
||||
}
|
||||
|
||||
<rawstr_esc_begin># {
|
||||
num_hashes++;
|
||||
yymore();
|
||||
}
|
||||
<rawstr_esc_begin>\x22 {
|
||||
BEGIN(rawstr_esc_body);
|
||||
yymore();
|
||||
}
|
||||
<rawstr_esc_begin>(.|\n) { return -1; }
|
||||
|
||||
<rawstr_esc_body>\x22/# {
|
||||
BEGIN(rawstr_esc_end);
|
||||
yymore();
|
||||
}
|
||||
<rawstr_esc_body>(.|\n) {
|
||||
yymore();
|
||||
}
|
||||
|
||||
<rawstr_esc_end># {
|
||||
end_hashes++;
|
||||
if (end_hashes == num_hashes) {
|
||||
BEGIN(INITIAL);
|
||||
return LIT_STR_RAW;
|
||||
}
|
||||
yymore();
|
||||
}
|
||||
<rawstr_esc_end>[^#] {
|
||||
end_hashes = 0;
|
||||
BEGIN(rawstr_esc_body);
|
||||
yymore();
|
||||
}
|
||||
|
||||
<rawstr_esc_begin,rawstr_esc_body,rawstr_esc_end><<EOF>> { return -1; }
|
||||
|
||||
\x22 { BEGIN(str); yymore(); }
|
||||
<str>\x22 { BEGIN(suffix); return LIT_STR; }
|
||||
|
||||
<str><<EOF>> { return -1; }
|
||||
<str>\\[n\nr\rt\\\x27\x220] { yymore(); }
|
||||
<str>\\x[0-9a-fA-F]{2} { yymore(); }
|
||||
<str>\\u\{([0-9a-fA-F]_*){1,6}\} { yymore(); }
|
||||
<str>\\[^n\nrt\\\x27\x220] { return -1; }
|
||||
<str>(.|\n) { yymore(); }
|
||||
|
||||
\<- { return LARROW; }
|
||||
-\> { return RARROW; }
|
||||
- { return '-'; }
|
||||
-= { return MINUSEQ; }
|
||||
&& { return ANDAND; }
|
||||
& { return '&'; }
|
||||
&= { return ANDEQ; }
|
||||
\|\| { return OROR; }
|
||||
\| { return '|'; }
|
||||
\|= { return OREQ; }
|
||||
\+ { return '+'; }
|
||||
\+= { return PLUSEQ; }
|
||||
\* { return '*'; }
|
||||
\*= { return STAREQ; }
|
||||
\/ { return '/'; }
|
||||
\/= { return SLASHEQ; }
|
||||
\^ { return '^'; }
|
||||
\^= { return CARETEQ; }
|
||||
% { return '%'; }
|
||||
%= { return PERCENTEQ; }
|
||||
|
||||
<<EOF>> { return 0; }
|
||||
|
||||
%%
|
@ -1,193 +0,0 @@
|
||||
#include <stdio.h>
|
||||
#include <stdarg.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
extern int yylex();
|
||||
extern int rsparse();
|
||||
|
||||
#define PUSHBACK_LEN 4
|
||||
|
||||
static char pushback[PUSHBACK_LEN];
|
||||
static int verbose;
|
||||
|
||||
void print(const char* format, ...) {
|
||||
va_list args;
|
||||
va_start(args, format);
|
||||
if (verbose) {
|
||||
vprintf(format, args);
|
||||
}
|
||||
va_end(args);
|
||||
}
|
||||
|
||||
// If there is a non-null char at the head of the pushback queue,
|
||||
// dequeue it and shift the rest of the queue forwards. Otherwise,
|
||||
// return the token from calling yylex.
|
||||
int rslex() {
|
||||
if (pushback[0] == '\0') {
|
||||
return yylex();
|
||||
} else {
|
||||
char c = pushback[0];
|
||||
memmove(pushback, pushback + 1, PUSHBACK_LEN - 1);
|
||||
pushback[PUSHBACK_LEN - 1] = '\0';
|
||||
return c;
|
||||
}
|
||||
}
|
||||
|
||||
// Note: this does nothing if the pushback queue is full. As long as
|
||||
// there aren't more than PUSHBACK_LEN consecutive calls to push_back
|
||||
// in an action, this shouldn't be a problem.
|
||||
void push_back(char c) {
|
||||
for (int i = 0; i < PUSHBACK_LEN; ++i) {
|
||||
if (pushback[i] == '\0') {
|
||||
pushback[i] = c;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
extern int rsdebug;
|
||||
|
||||
struct node {
|
||||
struct node *next;
|
||||
struct node *prev;
|
||||
int own_string;
|
||||
char const *name;
|
||||
int n_elems;
|
||||
struct node *elems[];
|
||||
};
|
||||
|
||||
struct node *nodes = NULL;
|
||||
int n_nodes;
|
||||
|
||||
struct node *mk_node(char const *name, int n, ...) {
|
||||
va_list ap;
|
||||
int i = 0;
|
||||
unsigned sz = sizeof(struct node) + (n * sizeof(struct node *));
|
||||
struct node *nn, *nd = (struct node *)malloc(sz);
|
||||
|
||||
print("# New %d-ary node: %s = %p\n", n, name, nd);
|
||||
|
||||
nd->own_string = 0;
|
||||
nd->prev = NULL;
|
||||
nd->next = nodes;
|
||||
if (nodes) {
|
||||
nodes->prev = nd;
|
||||
}
|
||||
nodes = nd;
|
||||
|
||||
nd->name = name;
|
||||
nd->n_elems = n;
|
||||
|
||||
va_start(ap, n);
|
||||
while (i < n) {
|
||||
nn = va_arg(ap, struct node *);
|
||||
print("# arg[%d]: %p\n", i, nn);
|
||||
print("# (%s ...)\n", nn->name);
|
||||
nd->elems[i++] = nn;
|
||||
}
|
||||
va_end(ap);
|
||||
n_nodes++;
|
||||
return nd;
|
||||
}
|
||||
|
||||
struct node *mk_atom(char *name) {
|
||||
struct node *nd = mk_node((char const *)strdup(name), 0);
|
||||
nd->own_string = 1;
|
||||
return nd;
|
||||
}
|
||||
|
||||
struct node *mk_none() {
|
||||
return mk_atom("<none>");
|
||||
}
|
||||
|
||||
struct node *ext_node(struct node *nd, int n, ...) {
|
||||
va_list ap;
|
||||
int i = 0, c = nd->n_elems + n;
|
||||
unsigned sz = sizeof(struct node) + (c * sizeof(struct node *));
|
||||
struct node *nn;
|
||||
|
||||
print("# Extending %d-ary node by %d nodes: %s = %p",
|
||||
nd->n_elems, c, nd->name, nd);
|
||||
|
||||
if (nd->next) {
|
||||
nd->next->prev = nd->prev;
|
||||
}
|
||||
if (nd->prev) {
|
||||
nd->prev->next = nd->next;
|
||||
}
|
||||
nd = realloc(nd, sz);
|
||||
nd->prev = NULL;
|
||||
nd->next = nodes;
|
||||
nodes->prev = nd;
|
||||
nodes = nd;
|
||||
|
||||
print(" ==> %p\n", nd);
|
||||
|
||||
va_start(ap, n);
|
||||
while (i < n) {
|
||||
nn = va_arg(ap, struct node *);
|
||||
print("# arg[%d]: %p\n", i, nn);
|
||||
print("# (%s ...)\n", nn->name);
|
||||
nd->elems[nd->n_elems++] = nn;
|
||||
++i;
|
||||
}
|
||||
va_end(ap);
|
||||
return nd;
|
||||
}
|
||||
|
||||
int const indent_step = 4;
|
||||
|
||||
void print_indent(int depth) {
|
||||
while (depth) {
|
||||
if (depth-- % indent_step == 0) {
|
||||
print("|");
|
||||
} else {
|
||||
print(" ");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void print_node(struct node *n, int depth) {
|
||||
int i = 0;
|
||||
print_indent(depth);
|
||||
if (n->n_elems == 0) {
|
||||
print("%s\n", n->name);
|
||||
} else {
|
||||
print("(%s\n", n->name);
|
||||
for (i = 0; i < n->n_elems; ++i) {
|
||||
print_node(n->elems[i], depth + indent_step);
|
||||
}
|
||||
print_indent(depth);
|
||||
print(")\n");
|
||||
}
|
||||
}
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
if (argc == 2 && strcmp(argv[1], "-v") == 0) {
|
||||
verbose = 1;
|
||||
} else {
|
||||
verbose = 0;
|
||||
}
|
||||
int ret = 0;
|
||||
struct node *tmp;
|
||||
memset(pushback, '\0', PUSHBACK_LEN);
|
||||
ret = rsparse();
|
||||
print("--- PARSE COMPLETE: ret:%d, n_nodes:%d ---\n", ret, n_nodes);
|
||||
if (nodes) {
|
||||
print_node(nodes, 0);
|
||||
}
|
||||
while (nodes) {
|
||||
tmp = nodes;
|
||||
nodes = tmp->next;
|
||||
if (tmp->own_string) {
|
||||
free((void*)tmp->name);
|
||||
}
|
||||
free(tmp);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
void rserror(char const *s) {
|
||||
fprintf(stderr, "%s\n", s);
|
||||
}
|
File diff suppressed because it is too large
Load Diff
@ -1,64 +0,0 @@
|
||||
Rust's lexical grammar is not context-free. Raw string literals are the source
|
||||
of the problem. Informally, a raw string literal is an `r`, followed by `N`
|
||||
hashes (where N can be zero), a quote, any characters, then a quote followed
|
||||
by `N` hashes. Critically, once inside the first pair of quotes,
|
||||
another quote cannot be followed by `N` consecutive hashes. e.g.
|
||||
`r###""###"###` is invalid.
|
||||
|
||||
This grammar describes this as best possible:
|
||||
|
||||
R -> 'r' S
|
||||
S -> '"' B '"'
|
||||
S -> '#' S '#'
|
||||
B -> . B
|
||||
B -> ε
|
||||
|
||||
Where `.` represents any character, and `ε` the empty string. Consider the
|
||||
string `r#""#"#`. This string is not a valid raw string literal, but can be
|
||||
accepted as one by the above grammar, using the derivation:
|
||||
|
||||
R : #""#"#
|
||||
S : ""#"
|
||||
S : "#
|
||||
B : #
|
||||
B : ε
|
||||
|
||||
(Where `T : U` means the rule `T` is applied, and `U` is the remainder of the
|
||||
string.) The difficulty arises from the fact that it is fundamentally
|
||||
context-sensitive. In particular, the context needed is the number of hashes.
|
||||
|
||||
To prove that Rust's string literals are not context-free, we will use
|
||||
the fact that context-free languages are closed under intersection with
|
||||
regular languages, and the
|
||||
[pumping lemma for context-free languages](https://en.wikipedia.org/wiki/Pumping_lemma_for_context-free_languages).
|
||||
|
||||
Consider the regular language `R = r#+""#*"#+`. If Rust's raw string literals are
|
||||
context-free, then their intersection with `R`, `R'`, should also be context-free.
|
||||
Therefore, to prove that raw string literals are not context-free,
|
||||
it is sufficient to prove that `R'` is not context-free.
|
||||
|
||||
The language `R'` is `{r#^n""#^m"#^n | m < n}`.
|
||||
|
||||
Assume `R'` *is* context-free. Then `R'` has some pumping length `p > 0` for which
|
||||
the pumping lemma applies. Consider the following string `s` in `R'`:
|
||||
|
||||
`r#^p""#^{p-1}"#^p`
|
||||
|
||||
e.g. for `p = 2`: `s = r##""#"##`
|
||||
|
||||
Then `s = uvwxy` for some choice of `uvwxy` such that `vx` is non-empty,
|
||||
`|vwx| < p+1`, and `uv^iwx^iy` is in `R'` for all `i >= 0`.
|
||||
|
||||
Neither `v` nor `x` can contain a `"` or `r`, as the number of these characters
|
||||
in any string in `R'` is fixed. So `v` and `x` contain only hashes.
|
||||
Consequently, of the three sequences of hashes, `v` and `x` combined
|
||||
can only pump two of them.
|
||||
If we ever choose the central sequence of hashes, then one of the outer sequences
|
||||
will not grow when we pump, leading to an imbalance between the outer sequences.
|
||||
Therefore, we must pump both outer sequences of hashes. However,
|
||||
there are `p+2` characters between these two sequences of hashes, and `|vwx|` must
|
||||
be less than `p+1`. Therefore we have a contradiction, and `R'` must not be
|
||||
context-free.
|
||||
|
||||
Since `R'` is not context-free, it follows that the Rust's raw string literals
|
||||
must not be context-free.
|
@ -1,66 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
# ignore-tidy-linelength
|
||||
|
||||
import sys
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import argparse
|
||||
|
||||
# usage: testparser.py [-h] [-p PARSER [PARSER ...]] -s SOURCE_DIR
|
||||
|
||||
# Parsers should read from stdin and return exit status 0 for a
|
||||
# successful parse, and nonzero for an unsuccessful parse
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('-p', '--parser', nargs='+')
|
||||
parser.add_argument('-s', '--source-dir', nargs=1, required=True)
|
||||
args = parser.parse_args(sys.argv[1:])
|
||||
|
||||
total = 0
|
||||
ok = {}
|
||||
bad = {}
|
||||
for parser in args.parser:
|
||||
ok[parser] = 0
|
||||
bad[parser] = []
|
||||
devnull = open(os.devnull, 'w')
|
||||
print("\n")
|
||||
|
||||
for base, dirs, files in os.walk(args.source_dir[0]):
|
||||
for f in filter(lambda p: p.endswith('.rs'), files):
|
||||
p = os.path.join(base, f)
|
||||
parse_fail = 'parse-fail' in p
|
||||
if sys.version_info.major == 3:
|
||||
lines = open(p, encoding='utf-8').readlines()
|
||||
else:
|
||||
lines = open(p).readlines()
|
||||
if any('ignore-test' in line or 'ignore-lexer-test' in line for line in lines):
|
||||
continue
|
||||
total += 1
|
||||
for parser in args.parser:
|
||||
if subprocess.call(parser, stdin=open(p), stderr=subprocess.STDOUT, stdout=devnull) == 0:
|
||||
if parse_fail:
|
||||
bad[parser].append(p)
|
||||
else:
|
||||
ok[parser] += 1
|
||||
else:
|
||||
if parse_fail:
|
||||
ok[parser] += 1
|
||||
else:
|
||||
bad[parser].append(p)
|
||||
parser_stats = ', '.join(['{}: {}'.format(parser, ok[parser]) for parser in args.parser])
|
||||
sys.stdout.write("\033[K\r total: {}, {}, scanned {}"
|
||||
.format(total, os.path.relpath(parser_stats), os.path.relpath(p)))
|
||||
|
||||
devnull.close()
|
||||
|
||||
print("\n")
|
||||
|
||||
for parser in args.parser:
|
||||
filename = os.path.basename(parser) + '.bad'
|
||||
print("writing {} files that did not yield the correct result with {} to {}".format(len(bad[parser]), parser, filename))
|
||||
with open(filename, "w") as f:
|
||||
for p in bad[parser]:
|
||||
f.write(p)
|
||||
f.write("\n")
|
@ -1,99 +0,0 @@
|
||||
enum Token {
|
||||
SHL = 257, // Parser generators reserve 0-256 for char literals
|
||||
SHR,
|
||||
LE,
|
||||
EQEQ,
|
||||
NE,
|
||||
GE,
|
||||
ANDAND,
|
||||
OROR,
|
||||
SHLEQ,
|
||||
SHREQ,
|
||||
MINUSEQ,
|
||||
ANDEQ,
|
||||
OREQ,
|
||||
PLUSEQ,
|
||||
STAREQ,
|
||||
SLASHEQ,
|
||||
CARETEQ,
|
||||
PERCENTEQ,
|
||||
DOTDOT,
|
||||
DOTDOTDOT,
|
||||
MOD_SEP,
|
||||
LARROW,
|
||||
RARROW,
|
||||
FAT_ARROW,
|
||||
LIT_BYTE,
|
||||
LIT_CHAR,
|
||||
LIT_INTEGER,
|
||||
LIT_FLOAT,
|
||||
LIT_STR,
|
||||
LIT_STR_RAW,
|
||||
LIT_BYTE_STR,
|
||||
LIT_BYTE_STR_RAW,
|
||||
IDENT,
|
||||
UNDERSCORE,
|
||||
LIFETIME,
|
||||
|
||||
// keywords
|
||||
SELF,
|
||||
STATIC,
|
||||
ABSTRACT,
|
||||
ALIGNOF,
|
||||
AS,
|
||||
BECOME,
|
||||
BREAK,
|
||||
CATCH,
|
||||
CRATE,
|
||||
DEFAULT,
|
||||
DO,
|
||||
ELSE,
|
||||
ENUM,
|
||||
EXTERN,
|
||||
FALSE,
|
||||
FINAL,
|
||||
FN,
|
||||
FOR,
|
||||
IF,
|
||||
IMPL,
|
||||
IN,
|
||||
LET,
|
||||
LOOP,
|
||||
MACRO,
|
||||
MATCH,
|
||||
MOD,
|
||||
MOVE,
|
||||
MUT,
|
||||
OFFSETOF,
|
||||
OVERRIDE,
|
||||
PRIV,
|
||||
PUB,
|
||||
PURE,
|
||||
REF,
|
||||
RETURN,
|
||||
SIZEOF,
|
||||
STRUCT,
|
||||
SUPER,
|
||||
UNION,
|
||||
TRUE,
|
||||
TRAIT,
|
||||
TYPE,
|
||||
UNSAFE,
|
||||
UNSIZED,
|
||||
USE,
|
||||
VIRTUAL,
|
||||
WHILE,
|
||||
YIELD,
|
||||
CONTINUE,
|
||||
PROC,
|
||||
BOX,
|
||||
CONST,
|
||||
WHERE,
|
||||
TYPEOF,
|
||||
INNER_DOC_COMMENT,
|
||||
OUTER_DOC_COMMENT,
|
||||
|
||||
SHEBANG,
|
||||
SHEBANG_LINE,
|
||||
STATIC_LIFETIME
|
||||
};
|
Loading…
Reference in New Issue
Block a user