qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Michael Roth	b011f61931	json-lexer: make lexer error-recovery more deterministic Currently when we reach an error state we effectively flush everything fed to the lexer, which can put us in a state where we keep feeding tokens into the parser at arbitrary offsets in the stream. This makes it difficult for the lexer/tokenizer/parser to get back in sync when bad input is made by the client. With these changes we emit an error state/token up to the tokenizer as soon as we reach an error state, and continue processing any data passed in rather than bailing out. The reset token will be used to reset the tokenizer and parser, such that they'll recover state as soon as the lexer begins generating valid token sequences again. We also map chr(192,193,245-255) to an error state here, since they are invalid UTF-8 characters. QMP guest proxy/agent will use chr(255) to force a flush/reset of previous input for reliable delivery of certain events, so also we document that thoroughly here. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2011-06-07 13:52:11 -05:00
Anthony Liguori	5ab8558d9b	Add a lexer for JSON Our JSON parser is a three stage parser. The first stage tokenizes the stream into a set of lexical tokens. Since the lexical grammar is regular, we can use a finite state machine to model it. The state machine will emit tokens as they are identified. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2009-11-17 08:49:39 -06:00

Author

SHA1

Message

Date

Michael Roth

b011f61931

json-lexer: make lexer error-recovery more deterministic

Currently when we reach an error state we effectively flush everything
fed to the lexer, which can put us in a state where we keep feeding
tokens into the parser at arbitrary offsets in the stream. This makes it
difficult for the lexer/tokenizer/parser to get back in sync when bad
input is made by the client.

With these changes we emit an error state/token up to the tokenizer as
soon as we reach an error state, and continue processing any data passed
in rather than bailing out. The reset token will be used to reset the
tokenizer and parser, such that they'll recover state as soon as the
lexer begins generating valid token sequences again.

We also map chr(192,193,245-255) to an error state here, since they are
invalid UTF-8 characters. QMP guest proxy/agent will use chr(255) to
force a flush/reset of previous input for reliable delivery of certain
events, so also we document that thoroughly here.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

2011-06-07 13:52:11 -05:00

Anthony Liguori

5ab8558d9b

Add a lexer for JSON

Our JSON parser is a three stage parser.  The first stage tokenizes the stream
into a set of lexical tokens.  Since the lexical grammar is regular, we can
use a finite state machine to model it.  The state machine will emit tokens
as they are identified.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

2009-11-17 08:49:39 -06:00

2 Commits