GitHub - tc39/proposal-json-superset: Proposal to make all JSON text valid ECMA-262 (original) (raw)
Subsume JSON (a.k.a. JSON ⊂ ECMAScript)
A proposal to extend ECMA-262 syntax into a superset of JSON.
Status
This proposal is at stage 4 of the TC39 Process and is scheduled to be included in ES2019.
Champions
- Mark Miller
- Mathias Bynens
Motivation
ECMAScript claims JSON as a subset in JSON.parse, but (as has been well-documented) that is not true because JSON strings can contain unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR characters while ECMAScript strings cannot.
These exceptions add unnecessary complexity to the specification and increase the cognitive burden on both implementers and users, allowing for the introduction of subtle bugs. Also, as a lesser but concrete corrolary problem, certain source concatenation and construction tasks currently require additional steps to process valid JSON into valid ECMAScript before embedding it.
Proposed Solution
JSON syntax is defined by ECMA-404 and permanently fixed by RFC 7159, but the DoubleStringCharacter and SingleStringCharacter productions of ECMA-262 can be extended to allow unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR characters.
Examples
const LS = " "; const PS = eval("'\u2029'");
Discussion
Backwards Compatibility
This change is backwards-compatible. User-visible effects will be limited to the elimination of SyntaxError completions when parsing strings that include unescaped LINE SEPARATOR or PARAGRAPH SEPARATOR characters, which in practice are extremely uncommon (we also hope to collect data for the related question of how often those characters are used as line terminators outside of strings).
Regular Expression Literals
Unescaped LINE SEPARATOR and PARAGRAPH SEPARATOR characters are not currently allowed in regular expression literals either, but that restriction has been left in place because regular expression literals are not part of JSON.
Template Literals
Unescaped LINE SEPARATOR and PARAGRAPH SEPARATOR characters are already allowed in template literals.
Validity
Encompassing JSON syntax does not imply the semantic validity of all JSON text. For example, ({ "__proto__": 1, "__proto__": 2 })
triggers an early SyntaxError under Annex B, and will continue to do so. However, it will become possible to generate a parse tree from ({ "LineTerminators": "\n\r " })
.
Objections
Allen Wirfs-Brock argues that ECMAScript and JSON are distinct and don't need an easily-described relationship, and is concerned that acceptance of this proposal would be used as leverage by others attempting to "fix JSON".
The latter is addressed by this proposal explicitly acknowledging JSON syntax as a fixed point. As for the former, it is clear from the definition of JSON.parse
that ECMAScript benefits from the similarity (e.g., step 4 includes "parsing and evaluating scriptText as if it was the source text of an ECMAScript Script"). This proposal argues that eliminating the need for an alternate DoubleStringCharacter production and the associated cognitive burden in reasoning about the two languages is sufficiently beneficial to justify such a change.
Conformance tests
Test262 tests are here: tc39/test262#1544
TC39 meeting notes
Implementations
- V8, shipping in Chrome 66
- JavaScriptCore, shipping in Safari Technology Preview 49+
- Babel
Specification
The specification is available in ecmarkup or rendered HTML.