TOML parser: surrogate-half unicode escapes accepted in strings (original) (raw)
Summary
TOML 1.0.0 limits unicode escapes (\uXXXX / \UXXXXXXXX) to valid Unicode scalar values — surrogate halves (U+D800..U+DFFF) are not allowed. The current parser accepts them.
Reproduction
toml-test fixtures that should fail but currently succeed:
tests/invalid/string/bad-uni-esc-06.tomltests/invalid/string/bad-uni-esc-ml-06.toml
Spec citation
TOML 1.0.0 § "Basic strings": "Any Unicode character may be used except those that must be escaped: quotation mark, backslash, and the control characters other than tab (U+0000 to U+0008, U+000A to U+001F, U+007F)." The \u/\U escape must produce a valid Unicode scalar value.