e59f39d403
We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1, \xF5..\xFF in the lexer. That's insufficient; there's plenty of invalid UTF-8 not containing these bytes, as demonstrated by check-qjson: * Malformed sequences - Unexpected continuation bytes - Missing continuation bytes after start bytes other than \xC0..\xC1, \xF5..\xFD. * Overlong sequences with start bytes other than \xC0..\xC1, \xF5..\xFD. * Invalid code points Fixing this in the lexer would be bothersome. Fixing it in the parser is straightforward, so do that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-23-armbru@redhat.com>
8 lines
182 B
C
8 lines
182 B
C
#ifndef QEMU_UNICODE_H
|
|
#define QEMU_UNICODE_H
|
|
|
|
int mod_utf8_codepoint(const char *s, size_t n, char **end);
|
|
ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint);
|
|
|
|
#endif
|