GitHub - tc39/proposal-regex-escaping: Proposal for investigating RegExp escaping for the ECMAScript standard (original) (raw)

RegExp Escaping Proposal

This ECMAScript proposal seeks to investigate the problem area of escaping a string for use inside a Regular Expression.

Formal specification

Champions:

Status

This proposal is a stage 4 proposal.

Motivation

It is often the case when we want to build a regular expression out of a string without treating special characters from the string as special regular expression tokens. For example, if we want to replace all occurrences of the the string let text = "Hello." which we got from the user, we might be tempted to do ourLongText.replace(new RegExp(text, "g")). However, this would match . against any character rather than matching it against a dot.

This is commonly-desired functionality, as can be seen from this years-old es-discuss thread. Standardizing it would be very useful to developers, and avoid subpar implementations they might create that could miss edge cases.

Chosen solutions:

RegExp.escape function

This would be a RegExp.escape static function, such that strings can be escaped in order to be used inside regular expressions:

const str = prompt("Please enter a string"); const escaped = RegExp.escape(str); const re = new RegExp(escaped, 'g'); // handles reg exp special tokens with the replacement. console.log(ourLongText.replace(re));

Note the double backslashes in the example string contents, which render as a single backslash.

RegExp.escape("The Quick Brown Fox"); // "\x54he\x20Quick\x20Brown\x20Fox" RegExp.escape("Buy it. use it. break it. fix it.") // "\x42uy\x20it\.\x20use\x20it\.\x20break\x20it\.\x20fix\x20it\." RegExp.escape("(.)"); // "\(\\.\\)" RegExp.escape("。^・ェ・^。") // "。\^・ェ・\^。" RegExp.escape("😊 _ +_+ ... 👍"); // "😊\x20\_\\x20\+_\+\x20\.\.\.\x20👍" RegExp.escape("\d \D (?:)"); // "\\d\x20\\D\x20\(\?\x3a\)"

Cross-cutting concerns

Per https://gist.github.com/bakkot/5a22c8c13ce269f6da46c7f7e56d3c3f, we now escape anything that could possible cause a “context escape”.

This would be a commitment to only entering/exiting new contexts using whitespace or ASCII punctuators. That seems like it will not be a significant impediment to language evolution.

Other solutions considered:

Template tag function

This would be, for example, a template tag function RegExp.tag, used to produce a complete regular expression instead of potentially a piece of one:

const str = prompt("Please enter a string"); const re = RegExp.tag/${str}/g; console.log(ourLongText.replace(re));

In other languages

Note that the languages differ in what they do (e.g. Perl does something different from C#), but they all have the same goal.

We've had a meeting about this subject, whose notes include a more detailed writeup of what other languages do, and the pros and cons thereof.

FAQ

Polyfills