diff --git a/src/doc/rust.md b/src/doc/rust.md index 7233288a813..39b62615536 100644 --- a/src/doc/rust.md +++ b/src/doc/rust.md @@ -3136,8 +3136,12 @@ machine. The types `char` and `str` hold textual data. -A value of type `char` is a Unicode character, -represented as a 32-bit unsigned word holding a UCS-4 codepoint. +A value of type `char` is a [Unicode scalar value]( +http://www.unicode.org/glossary/#unicode_scalar_value) +(ie. a code point that is not a surrogate), +represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF +or 0xE000 to 0x10FFFF range. +A `[char]` vector is effectively an UCS-4 / UTF-32 string. A value of type `str` is a Unicode string, represented as a vector of 8-bit unsigned bytes holding a sequence of UTF-8 codepoints.