GitHub - tc39/proposal-json-superset: Proposal to make all JSON text valid ECMA-262 (original) (raw)

Subsume JSON (a.k.a. JSON ⊂ ECMAScript)

A proposal to extend ECMA-262 syntax into a superset of JSON.

Status

This proposal is at stage 4 of the TC39 Process and is scheduled to be included in ES2019.

Champions

Motivation

ECMAScript claims JSON as a subset in JSON.parse, but (as has been well-documented) that is not true because JSON strings can contain unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR characters while ECMAScript strings cannot.

These exceptions add unnecessary complexity to the specification and increase the cognitive burden on both implementers and users, allowing for the introduction of subtle bugs. Also, as a lesser but concrete corrolary problem, certain source concatenation and construction tasks currently require additional steps to process valid JSON into valid ECMAScript before embedding it.

Proposed Solution

JSON syntax is defined by ECMA-404 and permanently fixed by RFC 7159, but the DoubleStringCharacter and SingleStringCharacter productions of ECMA-262 can be extended to allow unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR characters.

Examples

const LS = "
"; const PS = eval("'\u2029'");

Discussion

Backwards Compatibility

This change is backwards-compatible. User-visible effects will be limited to the elimination of SyntaxError completions when parsing strings that include unescaped LINE SEPARATOR or PARAGRAPH SEPARATOR characters, which in practice are extremely uncommon (we also hope to collect data for the related question of how often those characters are used as line terminators outside of strings).

Regular Expression Literals

Unescaped LINE SEPARATOR and PARAGRAPH SEPARATOR characters are not currently allowed in regular expression literals either, but that restriction has been left in place because regular expression literals are not part of JSON.

Template Literals

Unescaped LINE SEPARATOR and PARAGRAPH SEPARATOR characters are already allowed in template literals.

Validity

Encompassing JSON syntax does not imply the semantic validity of all JSON text. For example, ({ "__proto__": 1, "__proto__": 2 }) triggers an early SyntaxError under Annex B, and will continue to do so. However, it will become possible to generate a parse tree from ({ "LineTerminators": "\n\r " }).

Objections

Allen Wirfs-Brock argues that ECMAScript and JSON are distinct and don't need an easily-described relationship, and is concerned that acceptance of this proposal would be used as leverage by others attempting to "fix JSON".

The latter is addressed by this proposal explicitly acknowledging JSON syntax as a fixed point. As for the former, it is clear from the definition of JSON.parse that ECMAScript benefits from the similarity (e.g., step 4 includes "parsing and evaluating scriptText as if it was the source text of an ECMAScript Script"). This proposal argues that eliminating the need for an alternate DoubleStringCharacter production and the associated cognitive burden in reasoning about the two languages is sufficiently beneficial to justify such a change.

Conformance tests

Test262 tests are here: tc39/test262#1544

TC39 meeting notes

Implementations

Specification

The specification is available in ecmarkup or rendered HTML.