qemu-e2k

Author	SHA1	Message	Date
Liam Merwick	19e8ff485a	json: Fix off-by-one assert check in next_state() The assert checking if the value of lexer->state in next_state(), which is used as an index to the 'json_lexer' array, incorrectly checks for an index value less than or equal to ARRAY_SIZE(json_lexer). Fix assert so that it just checks for an index less than the array size. Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <1553169472-25325-1-git-send-email-liam.merwick@oracle.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-03-26 08:10:11 +01:00
Christophe Fergeau	bbc0586ced	json: Fix % handling when not interpolating Commit `8bca4613` added support for %% in json strings when interpolating, but in doing so broke handling of % when not interpolating. When parse_string() is fed a string token containing '%', it skips the '%' regardless of ctxt->ap, i.e. even it's not interpolating. If the '%' is the string's last character, it fails an assertion. Else, it "merely" swallows the '%'. Fix parse_string() to handle '%' specially only when interpolating. To gauge the bug's impact, let's review non-interpolating users of this parser, i.e. code passing NULL context to json_message_parser_init(): * tests/check-qjson.c, tests/test-qobject-input-visitor.c, tests/test-visitor-serialization.c Plenty of tests, but we still failed to cover the buggy case. * monitor.c: QMP input * qga/main.c: QGA input * qobject_from_json(): - qobject-input-visitor.c: JSON command line option arguments of -display and -blockdev Reproducer: -blockdev '{"%"}' - block.c: JSON pseudo-filenames starting with "json:" Reproducer: https://bugzilla.redhat.com/show_bug.cgi?id=1668244#c3 - block/rbd.c: JSON key pairs Pseudo-filenames starting with "rbd:". Command line, QMP and QGA input are trusted. Filenames are trusted when they come from command line, QMP or HMP. They are untrusted when they come from from image file headers. Example: QCOW2 backing file name. Note that this is not the security boundary between host and guest. It's the boundary between host and an image file from an untrusted source. Neither failing an assertion nor skipping a character in a filename of your choice looks exploitable. Note that we don't support compiling with NDEBUG. Fixes: `8bca4613e6` Cc: qemu-stable@nongnu.org Signed-off-by: Christophe Fergeau <cfergeau@redhat.com> Message-Id: <20190102140535.11512-1-cfergeau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> [Commit message extended to discuss impact] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-01-24 15:20:59 +01:00
Markus Armbruster	00382fa851	json: Fix to reject duplicate object member names The JSON parser happily accepts duplicate object member names. The last value wins. Reproducer #1: $ qemu-system-x86_64 -qmp stdio {"QMP": {"version": {"qemu": {"micro": 93, "minor": 0, "major": 3}, "package": "v3.1.0-rc3-7-g87a45d86ed"}, "capabilities": []}} {'execute':'qmp_capabilities'} {"return": {}} {'execute':'blockdev-add','arguments':{'driver':'null-co', 'node-name':'foo','node-name':'bar'}} {"return": {}} {'execute':'query-named-block-nodes'} {"return": [{ [...] "node-name": "bar" [...] }]} Reproducer #2 is iotest 229. Fix the parser to reject duplicates, and fix iotest 229 not to use them. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181206121743.20762-1-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [Trailing whitespace tidied up] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-12-13 19:20:11 +01:00
Philippe Mathieu-Daudé	7396972021	qobject: Catch another straggler for use of qdict_put_str() Patch created mechanically by rerunning: $ spatch --sp-file scripts/coccinelle/qobject.cocci \ --macro-file scripts/cocci-macro-file.h \ --dir . --in-place Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20180705155811.20366-2-f4bug@amsat.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2018-10-26 17:17:32 +02:00
Markus Armbruster	1e960b4602	json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP The lexer ignores whitespace like this: on whitespace on non-ws spontaneously IN_START --> IN_WHITESPACE --> JSON_SKIP --> IN_START ^ \| \__/ on whitespace This accumulates a whitespace token in state IN_WHITESPACE, only to throw it away on the transition via JSON_SKIP to the start state. Wasteful. Go from IN_START to IN_START on whitespace directly, dropping the whitespace character. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180831075841.13363-7-armbru@redhat.com>	2018-09-24 18:08:07 +02:00
Markus Armbruster	2ce4ee64c4	json: Eliminate lexer state IN_ERROR Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180831075841.13363-6-armbru@redhat.com>	2018-09-24 18:08:07 +02:00
Markus Armbruster	0f07a5d5f1	json: Nicer recovery from lexical errors When the lexer chokes on an input character, it consumes the character, emits a JSON error token, and enters its start state. This can lead to suboptimal error recovery. For instance, input 0123 , produces the tokens JSON_ERROR 01 JSON_INTEGER 23 JSON_COMMA , Make the lexer skip characters after a lexical error until a structural character ('[', ']', '{', '}', ':', ','), an ASCII control character, or '\xFE', or '\xFF'. Note that we must not skip ASCII control characters, '\xFE', '\xFF', because those are documented to force the JSON parser into known-good state, by docs/interop/qmp-spec.txt. The lexer now produces JSON_ERROR 01 JSON_COMMA , Update qmp-test for the nicer error recovery: QMP now reports just one error for input %p instead of two. Also drop the newline after %p; it was needed to tease out the second error. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180831075841.13363-5-armbru@redhat.com> [Conflict with commit `ebb4d82d88` resolved]	2018-09-24 18:08:07 +02:00
Markus Armbruster	c0ee3afa7f	json: Make lexer's "character consumed" logic less confusing The lexer uses macro TERMINAL_NEEDED_LOOKAHEAD() to decide whether a state transition consumes the input character. It returns true when the state transition is defined with the TERMINAL() macro. To detect that, it checks whether input '\0' would have resulted in the same state transition, and the new state is not IN_ERROR. Why does that even work? For all states, the new state on input '\0' is either IN_ERROR or defined with TERMINAL(). If the state transition equals the one we'd get for input '\0', it goes to IN_ERROR or to the argument of TERMINAL(). We never use TERMINAL(IN_ERROR), because it makes no sense. Thus, if it doesn't go to IN_ERROR, it must be defined with TERMINAL(). Since this isn't quite confusing enough, we negate the result to get @char_consumed, and ignore it when @flush is true. Instead of deriving the lookahead bit from the state transition, make it explicit. This is easier to understand, and a bit more flexible, too. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180831075841.13363-4-armbru@redhat.com>	2018-09-24 18:06:09 +02:00
Markus Armbruster	852dfa76b8	json: Clean up how lexer consumes "end of input" When the lexer isn't in its start state at the end of input, it's working on a token. To flush it out, it needs to transit to its start state on "end of input" lookahead. There are two ways to the start state, depending on the current state: * If the lexer is in a TERMINAL(JSON_FOO) state, it can emit a JSON_FOO token. * Else, it can go to IN_ERROR state, and emit a JSON_ERROR token. There are complications, however: * The transition to IN_ERROR state consumes the input character and adds it to the JSON_ERROR token. The latter is inappropriate for the "end of input" character, so we suppress that. See also recent commit `a2ec6be72b` "json: Fix lexer to include the bad character in JSON_ERROR token". * The transition to a TERMINAL(JSON_FOO) state doesn't consume the input character. In that case, the lexer normally loops until it is consumed. We have to suppress that for the "end of input" input character. If we didn't, the lexer would consume it by entering IN_ERROR state, emitting a bogus JSON_ERROR token. We fixed that in commit `bd3924a33a`. However, simply breaking the loop this way assumes that the lexer needs exactly one state transition to reach its start state. That assumption is correct now, but it's unclean, and I'll soon break it. Clean up: instead of breaking the loop after one iteration, break it after it reached the start state. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180831075841.13363-3-armbru@redhat.com>	2018-09-24 18:06:09 +02:00
Markus Armbruster	2a96042a8d	json: Fix lexer for lookahead character beyond '\x7F' The lexer fails to end a valid token when the lookahead character is beyond '\x7F'. For instance, input true\xC2\xA2 produces the tokens JSON_ERROR true\xC2 JSON_ERROR \xA2 This should be JSON_KEYWORD true JSON_ERROR \xC2 JSON_ERROR \xA2 instead. The culprit is #define TERMINAL(state) [0 ... 0x7F] = (state) It leaves [0x80..0xFF] zero, i.e. IN_ERROR. Has always been broken. Fix it to initialize the complete array. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180831075841.13363-2-armbru@redhat.com>	2018-09-24 18:06:09 +02:00
Markus Armbruster	37aded92c2	json: Update references to RFC 7159 to RFC 8259 RFC 8259 (December 2017) obsoletes RFC 7159 (March 2014). Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180823164025.12553-59-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-08-24 20:27:14 +02:00
Markus Armbruster	8bca4613e6	json: Support %% in JSON strings when interpolating The previous commit makes JSON strings containing '%' awkward to express in templates: you'd have to mask the '%' with an Unicode escape \u0025. No template currently contains such JSON strings. Support the printf conversion specification %% in JSON strings as a convenience anyway, because it's trivially easy to do. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-58-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	16a4859921	json: Improve safety of qobject_from_jsonf_nofail() & friends The JSON parser optionally supports interpolation. This is used to build QObjects by parsing string templates. The templates are C literals, so parse errors (such as invalid interpolation specifications) are actually programming errors. Consequently, the functions providing parsing with interpolation (qobject_from_jsonf_nofail(), qobject_from_vjsonf_nofail(), qdict_from_jsonf_nofail(), qdict_from_vjsonf_nofail()) pass &error_abort to the parser. However, there's another, more dangerous kind of programming error: since we use va_arg() to get the value to interpolate, behavior is undefined when the variable argument isn't consistent with the interpolation specification. The same problem exists with printf()-like functions, and the solution is to have the compiler check consistency. This is what GCC_FMT_ATTR() is about. To enable this type checking for interpolation as well, we carefully chose our interpolation specifications to match printf conversion specifications, and decorate functions parsing templates with GCC_FMT_ATTR(). Note that this only protects against undefined behavior due to type errors. It can't protect against use of invalid interpolation specifications that happen to be valid printf conversion specifications. However, there's still a gaping hole in the type checking: GCC recognizes '%' as start of printf conversion specification anywhere in the template, but the parser recognizes it only outside JSON strings. For instance, if someone were to pass a "{ '%s': %d }" template, GCC would require a char * and an int argument, but the parser would va_arg() only an int argument, resulting in undefined behavior. Avoid undefined behavior by catching the programming error at run time: have the parser recognize and reject '%' in JSON strings. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-57-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	ada74c3ba1	json: Keep interpolation state in JSONParserContext The recursive descent parser passes along a pointer to JSONParserContext. It additionally passes a pointer to interpolation state (a va_alist *) as needed to reach its consumer parse_interpolation(). Stuffing the latter pointer into JSONParserContext saves us the trouble of passing it along, so do that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-56-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	86cdf9ec8d	json: Clean up headers The JSON parser has three public headers, json-lexer.h, json-parser.h, json-streamer.h. They all contain stuff that is of no interest outside qobject/json-*.c. Collect the public interface in include/qapi/qmp/json-parser.h, and everything else in qobject/json-parser-int.h. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-54-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	812ce33ead	qobject: Drop superfluous includes of qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-53-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	abe7c2067c	json: Make JSONToken opaque outside json-parser.c Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-52-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	a2731e08ee	json: Unbox tokens queue in JSONMessageParser Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-51-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	8d3265b3d0	json: Streamline json_message_process_token() Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-50-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	da09cfbf9d	json: Enforce token count and size limits more tightly Token count and size limits exist to guard against excessive heap usage. We check them only after we created the token on the heap. That's assigning a cowboy to the barn to lasso the horse after it has bolted. Close the barn door instead: check before we create the token. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-49-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	dd98e84819	qjson: Have qobject_from_json() & friends reject empty and blank The last case where qobject_from_json() & friends return null without setting an error is empty or blank input. Callers: * block.c's parse_json_protocol() reports "Could not parse the JSON options". It's marked as a work-around, because it also covered actual bugs, but they got fixed in the previous few commits. * qobject_input_visitor_new_str() reports "JSON parse error". Also marked as work-around. The recent fixes have made this unreachable, because it currently gets called only for input starting with '{'. * check-qjson.c's empty_input() and blank_input() demonstrate the behavior. * The other callers are not affected since they only pass input with exactly one JSON value or, in the case of negative tests, one error. Fail with "Expecting a JSON value" instead of returning null, and simplify callers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-48-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	5d50113cf6	json: Assert json_parser_parse() consumes all tokens on success Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-47-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	f9277915ee	json: Fix streamer not to ignore trailing unterminated structures json_message_process_token() accumulates tokens until it got the sequence of tokens that comprise a single JSON value (it counts curly braces and square brackets to decide). It feeds those token sequences to json_parser_parse(). If a non-empty sequence of tokens remains at the end of the parse, it's silently ignored. check-qjson.c cases unterminated_array(), unterminated_array_comma(), unterminated_dict(), unterminated_dict_comma() demonstrate this bug. Fix as follows. Introduce a JSON_END_OF_INPUT token. When the streamer receives it, it feeds the accumulated tokens to json_parser_parse(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-46-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	e06d008ac8	json: Fix latent parser aborts at end of input json-parser.c carefully reports end of input like this: token = parser_context_pop_token(ctxt); if (token == NULL) { parse_error(ctxt, NULL, "premature EOI"); goto out; } Except parser_context_pop_token() can't return null, it fails its assertion instead. Same for parser_context_peek_token(). Broken in commit `65c0f1e955`, and faithfully preserved in commit `95385fe9ac`. Only a latent bug, because the streamer throws away any input that could trigger it. Drop the assertions, so we can fix the streamer in the next commit. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-45-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	2a4794ba14	qjson: Fix qobject_from_json() & friends for multiple values qobject_from_json() & friends use the consume_json() callback to receive either a value or an error from the parser. When they are fed a string that contains more than either one JSON value or one JSON syntax error, consume_json() gets called multiple times. When the last call receives a value, qobject_from_json() returns that value. Any other values are leaked. When any call receives an error, qobject_from_json() sets the first error received. Any other errors are thrown away. When values follow errors, qobject_from_json() returns both a value and sets an error. That's bad. Impact: * block.c's parse_json_protocol() ignores and leaks the value. It's used to to parse pseudo-filenames starting with "json:". The pseudo-filenames can come from the user or from image meta-data such as a QCOW2 image's backing file name. * vl.c's parse_display_qapi() ignores and leaks the error. It's used to parse the argument of command line option -display. * vl.c's main() case QEMU_OPTION_blockdev ignores the error and leaves it in @err. main() will then pass a pointer to a non-null Error * to net_init_clients(), which is forbidden. It can lead to assertion failure or other misbehavior. * check-qjson.c's multiple_values() demonstrates the badness. * The other callers are not affected since they only pass strings with exactly one JSON value or, in the case of negative tests, one error. The impact on the _nofail() functions is relatively harmless. They abort when any call receives an error. Else they return the last value, and leak the others, if any. Fix consume_json() as follows. On the first call, save value and error as before. On subsequent calls, if any, don't save them. If the first call saved a value, the next call, if any, replaces the value by an "Expecting at most one JSON value" error. Take care not to leak values or errors that aren't saved. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-44-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	4d40066142	json: Improve names of lexer states related to numbers Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-43-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	53a0d616fe	json: Replace %I64d, %I64u by %PRId64, %PRIu64 Support for %I64d got added in commit `2c0d4b36e7` "json: fix PRId64 on Win32". We had to hard-code I64d because we used the lexer's finite state machine to check interpolations. No more, so clean this up. Additional conversion specifications would be easy enough to implement when needed. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-42-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	f7617d45d4	json: Leave rejecting invalid interpolation to parser Both lexer and parser reject invalid interpolation specifications. The parser's check is useless. The lexer ends the token right after the first bad character. This tends to lead to suboptimal error reporting. For instance, input [ %04d ] produces the tokens JSON_LSQUARE [ JSON_ERROR %0 JSON_INTEGER 4 JSON_KEYWORD d JSON_RSQUARE ] The parser then yields an error, an object and two more errors: error: Invalid JSON syntax object: 4 error: JSON parse error, invalid keyword error: JSON parse error, expecting value Dumb down the lexer to accept [A-Za-z0-9]*. The parser's check is now used. Emit a proper error there. The lexer now produces JSON_LSQUARE [ JSON_INTERP %04d JSON_RSQUARE ] and the parser reports just JSON parse error, invalid interpolation '%04d' Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-41-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	84a56f38b2	json: Pass lexical errors and limit violations to callback The callback to consume JSON values takes QObject json, Error err. If both are null, the callback is supposed to make up an error by itself. This sucks. qjson.c's consume_json() neglects to do so, which makes qobject_from_json() null instead of failing. I consider that a bug. The culprit is json_message_process_token(): it passes two null pointers when it runs into a lexical error or a limit violation. Fix it to pass a proper Error object then. Update the callbacks: * monitor.c's handle_qmp_command(): the code to make up an error is now dead, drop it. * qga/main.c's process_event(): lumps the "both null" case together with the "not a JSON object" case. The former is now gone. The error message "Invalid JSON syntax" is misleading for the latter. Improve it to "Input must be a JSON object". * qobject/qjson.c's consume_json(): no update; check-qjson demonstrates qobject_from_json() now sets an error on lexical errors, but still doesn't on some other errors. * tests/libqtest.c's qmp_response(): the Error object is now reliable, so use it to improve the error message. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-40-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	2cbd15aa6f	json: Treat unwanted interpolation as lexical error The JSON parser optionally supports interpolation. The lexer recognizes interpolation tokens unconditionally. The parser rejects them when interpolation is disabled, in parse_interpolation(). However, it neglects to set an error then, which can make json_parser_parse() fail without setting an error. Move the check for unwanted interpolation from the parser's parse_interpolation() into the lexer's finite state machine. When interpolation is disabled, '%' is now handled like any other unexpected character. The next commit will improve how such lexical errors are handled. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-39-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	61030280ca	json: Rename token JSON_ESCAPE & friends to JSON_INTERP The JSON parser optionally supports interpolation. The code calls it "escape". Awkward, because it uses the same term for escape sequences within strings. The latter usage is consistent with RFC 8259 "The JavaScript Object Notation (JSON) Data Interchange Format" and ISO C. Call the former "interpolation" instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-38-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	269e57ae28	json: Don't create JSON_ERROR tokens that won't be used Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-37-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	ff281a272f	json: Don't pass null @tokens to json_parser_parse() json_parser_parse() normally returns the QObject on success. Except it returns null when its @tokens argument is null. Its only caller json_message_process_token() passes null @tokens when emitting a lexical error. The call is a rather opaque way to say json = NULL then. Simplify matters by lifting the assignment to json out of the emit path: initialize json to null, set it to the value of json_parser_parse() when there's no lexical error. Drop the special case from json_parser_parse(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-36-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	62815d85ae	json: Redesign the callback to consume JSON values The classical way to structure parser and lexer is to have the client call the parser to get an abstract syntax tree, the parser call the lexer to get the next token, and the lexer call some function to get input characters. Another way to structure them would be to have the client feed characters to the lexer, the lexer feed tokens to the parser, and the parser feed abstract syntax trees to some callback provided by the client. This way is more easily integrated into an event loop that dispatches input characters as they arrive. Our JSON parser is kind of between the two. The lexer feeds tokens to a "streamer" instead of a real parser. The streamer accumulates tokens until it got the sequence of tokens that comprise a single JSON value (it counts curly braces and square brackets to decide). It feeds those token sequences to a callback provided by the client. The callback passes each token sequence to the parser, and gets back an abstract syntax tree. I figure it was done that way to make a straightforward recursive descent parser possible. "Get next token" becomes "pop the first token off the token sequence". Drawback: we need to store a complete token sequence. Each token eats 13 + input characters + malloc overhead bytes. Observations: 1. This is not the only way to use recursive descent. If we replaced "get next token" by a coroutine yield, we could do without a streamer. 2. The lexer reports errors by passing a JSON_ERROR token to the streamer. This communicates the offending input characters and their location, but no more. 3. The streamer reports errors by passing a null token sequence to the callback. The (already poor) lexical error information is thrown away. 4. Having the callback receive a token sequence duplicates the code to convert token sequence to abstract syntax tree in every callback. 5. Known bug: the streamer silently drops incomplete token sequences. This commit rectifies 4. by lifting the call of the parser from the callbacks into the streamer. Later commits will address 3. and 5. The lifting removes a bug from qjson.c's parse_json(): it passed a pointer to a non-null Error * in certain cases, as demonstrated by check-qjson.c. json_parser_parse() is now unused. It's a stupid wrapper around json_parser_parse_err(). Drop it, and rename json_parser_parse_err() to json_parser_parse(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-35-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	037f244088	json: Have lexer call streamer directly json_lexer_init() takes the function to process a token as an argument. It's always json_message_process_token(). Makes the code harder to understand for no actual gain. Drop the indirection. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-34-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Marc-André Lureau	e8b19d7d73	json-parser: simplify and avoid JSONParserContext allocation parser_context_new/free() are only used from json_parser_parse(). We can fold the code there and avoid an allocation altogether. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180719184111.5129-9-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180823164025.12553-33-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Marc-André Lureau	7c1e1d5481	json: remove useless return value from lexer/parser The lexer always returns 0 when char feeding. Furthermore, none of the caller care about the return value. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180326150916.9602-10-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180823164025.12553-32-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	dc45a07c36	json: Fix \uXXXX for surrogate pairs The JSON parser treats each half of a surrogate pair as unpaired surrogate. Fix it to recognize surrogate pairs. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-30-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	46a628b139	json: Reject invalid \uXXXX, fix \u0000 The JSON parser translates invalid \uXXXX to garbage instead of rejecting it, and swallows \u0000. Fix by using mod_utf8_encode() instead of flawed wchar_to_utf8(). Valid surrogate pairs are now differently broken: they're rejected instead of translated to garbage. The next commit will fix them. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-29-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	de6decfe8e	json: Simplify parse_string() Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-28-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	b2da4a4d75	json: Leave rejecting invalid escape sequences to parser Both lexer and parser reject invalid escape sequences in strings. The parser's check is useless. The lexer ends the token right after the first non-well-formed byte. This tends to lead to suboptimal error reporting. For instance, input {"abc\@ijk": 1} produces the tokens JSON_LCURLY { JSON_ERROR "abc\@ JSON_KEYWORD ijk JSON_ERROR ": 1}\n The parser then reports three errors Invalid JSON syntax JSON parse error, invalid keyword 'ijk' Invalid JSON syntax before it recovers at the newline. Drop the lexer's escape sequence checking, and make it accept the same characters after backslash it accepts elsewhere in strings. It now produces JSON_LCURLY { JSON_STRING "abc\@ijk" JSON_COLON : JSON_INTEGER 1 JSON_RCURLY and the parser reports just JSON parse error, invalid escape sequence in string While there, fix parse_string()'s inaccurate function comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-27-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	4b1c0cd7c7	json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") Since the JSON grammer doesn't accept U+0000 anywhere, this merely exchanges one kind of parse error for another. It's purely for consistency with qobject_to_json(), which accepts \xC0\x80 (see commit `e2ec3f9768`). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-26-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	de930f45cb	json: Leave rejecting invalid UTF-8 to parser Both the lexer and the parser (attempt to) validate UTF-8 in JSON strings. The lexer rejects bytes that can't occur in valid UTF-8: \xC0..\xC1, \xF5..\xFF. This rejects some, but not all invalid UTF-8. It also rejects ASCII control characters \x00..\x1F, in accordance with RFC 8259 (see recent commit "json: Reject unescaped control characters"). When the lexer rejects, it ends the token right after the first bad byte. Good when the bad byte is a newline. Not so good when it's something like an overlong sequence in the middle of a string. For instance, input {"abc\xC0\xAFijk": 1}\n produces the tokens JSON_LCURLY { JSON_ERROR "abc\xC0 JSON_ERROR \xAF JSON_KEYWORD ijk JSON_ERROR ": 1}\n The parser then reports four errors Invalid JSON syntax Invalid JSON syntax JSON parse error, invalid keyword 'ijk' Invalid JSON syntax before it recovers at the newline. The commit before previous made the parser reject invalid UTF-8 sequences. Since then, anything the lexer rejects, the parser would reject as well. Thus, the lexer's rejecting is unnecessary for correctness, and harmful for error reporting. However, we want to keep rejecting ASCII control characters in the lexer, because that produces the behavior we want for unclosed strings. We also need to keep rejecting \xFF in the lexer, because we documented that as a way to reset the JSON parser (docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which means we can't change how we recover from this error now. I wish we hadn't done that. I think we should treat \xFE the same as \xFF. Change the lexer to accept \xC0..\xC1 and \xF5..\xFD. It now rejects only \x00..\x1F and \xFE..\xFF. Error reporting for invalid UTF-8 in strings is much improved, except for \xFE and \xFF. For the example above, the lexer now produces JSON_LCURLY { JSON_STRING "abc\xC0\xAFijk" JSON_COLON : JSON_INTEGER 1 JSON_RCURLY and the parser reports just JSON parse error, invalid UTF-8 sequence in string Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-25-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	574bf16ff1	json: Report first rather than last parse error Quiz time! When a parser reports multiple errors, but the user gets to see just one, which one is (on average) the least useful one? Yes, you're right, it's the last one! You're clearly familiar with compilers. Which one does QEMU report? Right again, the last one! You're clearly familiar with QEMU. Reproducer: feeding {"abc\xC2ijk": 1}\n to QMP produces {"error": {"class": "GenericError", "desc": "JSON parse error, key is not a string in object"}} Report the first error instead. The reproducer now produces {"error": {"class": "GenericError", "desc": "JSON parse error, invalid UTF-8 sequence in string"}} Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-24-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	e59f39d403	json: Reject invalid UTF-8 sequences We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1, \xF5..\xFF in the lexer. That's insufficient; there's plenty of invalid UTF-8 not containing these bytes, as demonstrated by check-qjson: * Malformed sequences - Unexpected continuation bytes - Missing continuation bytes after start bytes other than \xC0..\xC1, \xF5..\xFD. * Overlong sequences with start bytes other than \xC0..\xC1, \xF5..\xFD. * Invalid code points Fixing this in the lexer would be bothersome. Fixing it in the parser is straightforward, so do that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-23-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	00ea57fadc	json: Tighten and simplify qstring_from_escaped_str()'s loop Simplify loop control, and assert that the string ends with the appropriate quote (the lexer ensures it does). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-21-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	eddc0a7f0a	json: Revamp lexer documentation Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-20-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	340db1ed82	json: Reject unescaped control characters Fix the lexer to reject unescaped control characters in JSON strings, in accordance with RFC 8259 "The JavaScript Object Notation (JSON) Data Interchange Format". Bonus: we now recover more nicely from unclosed strings. E.g. {"one: 1}\n{"two": 2} now recovers cleanly after the newline, where before the lexer remained confused until the next unpaired double quote or lexical error. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-19-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Markus Armbruster	a2ec6be72b	json: Fix lexer to include the bad character in JSON_ERROR token json_lexer[] maps (lexer state, input character) to the new lexer state. The input character is consumed unless the new state is terminal and the input character doesn't belong to this token, i.e. the state transition uses look-ahead. When this is the case, input character '\0' would result in the same state transition. TERMINAL_NEEDED_LOOKAHEAD() exploits this. Except this is wrong for transitions to IN_ERROR. There, the offending input character is in fact consumed: case IN_ERROR returns. It isn't added to the JSON_ERROR token, though. Fix that by making TERMINAL_NEEDED_LOOKAHEAD() return false for transitions to IN_ERROR. There's a slight complication. json_lexer_flush() passes input character '\0' to flush an incomplete token. If this results in JSON_ERROR, we'd now add the '\0' to the token. Suppress that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-18-armbru@redhat.com>	2018-08-24 20:26:37 +02:00
Peter Maydell	c542a9f979	Testing patches for 2018-08-16 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbdTcjAAoJEDhwtADrkYZTZEYP/ivp0ozEfMeGgc6PFItv3zmx QVD+NYJ8bnv/iEoWl/pnQ0/HY3YLHz4G1DTK0dSlJAvAiChpPiR7YCeJRXeTyLHL 9KCFQV5SV9llstVi0f4ebEK21mUkYWoqtlzxxyqXh0q2N/QLtaVQ85ysE6ufwhNH jlunmJLGRRwPR95F4a05uVHNOym1ig9eo5CtQ1Fa8viV9BgWTbpSp1t4feB1OLnt Ml9cbFubb1cA7CuhdNHazNOnRZtEW5A9eOo6rX4d5JcH/zgFWdPpKCRn/X/NdvSE aRKqk7ll0gxYlacqVpkea23pVKVl7e1oUqkziaL8rq/BYE0SePkRv+SnmsifD8uT kWl+eHLyaW1g43omc0uttyAuTkFnvAa+l9TqIrdEYcPJJNaCsZVgJpDvj9+Oxril fk3OIHAnzSWwp/AmFLCSOYdaoVuZhppp6rqnu26B0w9Rxkbqe1790LbjDJrLUB+2 vN7+JmDhUfJk7/2pi+MGZrBtj3zcgbb3Qc5+NG8H1401bA/n8FNnPKgWdmAlmO7i pTafa1FXArJGWiBhzg2PUqmZq45MQiheQ1+SWgviIodQX5oHB3kEimcRPg4Wk18c fTKJDe7w8NFFNjuH6ou2LI4KzgQeewW+oCjxh2A7kwCqDmq5Eq8nBw/bYO1DgcDr bfCnicNJinjCHcgvvCVM =DuZ8 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-tests-2018-08-16' into staging Testing patches for 2018-08-16 # gpg: Signature made Thu 16 Aug 2018 09:34:43 BST # gpg: using RSA key 3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-tests-2018-08-16: (25 commits) libqtest: Improve error reporting for bad read from QEMU tests/libqtest: Improve kill_qemu() libqtest: Rename qtest_FOOv() to qtest_vFOO() for consistency libqtest: Replace qtest_startf() by qtest_initf() libqtest: Enable compile-time format string checking migration-test: Clean up string interpolation into QMP, part 3 migration-test: Clean up string interpolation into QMP, part 2 migration-test: Clean up string interpolation into QMP, part 1 migration-test: Make wait_command() cope with '%' tests: New helper qtest_qmp_receive_success() migration-test: Make wait_command() return the "return" member tests: Clean up string interpolation around qtest_qmp_device_add() cpu-plug-test: Don't pass integers as strings to device_add tests: Clean up string interpolation into QMP input (simple cases) tests: Pass literal format strings directly to qmp_FOO() qobject: qobject_from_jsonv() is dangerous, hide it away test-qobject-input-visitor: Avoid format string ambiguity libqtest: Simplify qmp_fd_vsend() a bit qobject: New qobject_from_vjsonf_nofail(), qdict_from_vjsonf_nofail() qobject: Replace qobject_from_jsonf() by qobject_from_jsonf_nofail() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-08-16 09:50:54 +01:00

1 2 3 4

170 Commits