Address feedback from plenary by bakkot · Pull Request #77 · tc39/proposal-regex-escaping (original) (raw)

Conversation

@bakkot

Commits should be reviewed individually. Summary:

@bakkot

ljharb

1. Let _otherPunctuators_ be the string-concatenation of *",-=<>#&!%:;@~'`"* and the code unit 0x0022 (QUOTATION MARK).
1. Let _toEscape_ be StringToCodePoints(_otherPunctuators_).
1. If _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace or
1. If _toEscape_ contains _c_, _c_ is matched by |WhiteSpace or

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

          1. If _toEscape_ contains _c_, _c_ is matched by |WhiteSpace| or |LineTerminator|, or _c_ is a leading surrogate or a trailing surrogate, then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, leading and trailing surrogates are defined to be code units, not code points. I figure it's close enough for the parenthetical, but not the actual algorithm step. I could change the parenthetical to be "c is the code point corresponding to a leading surrogate or trailing surrogate" but that's getting pretty wordy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than the oxford comma, this suggestion is a nit, so i'm ambivalent

@erights

How do I see a rendered form of this?

@bakkot

I've put up a rendering here.

@erights

Thanks. FWIW LGTM, but I delegate an approval decision to @gibson042 who understands this better than I do.

@erights

Thanks for the rendering. Cut down the review effort for me by a huge factor.

gibson042

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on LineTerminator! I still prefer the universally applicable \x…/\u… approach, but this does work (even if it could get stale).

Comment on lines +61 to +62

| 1. If _c_ is matched by |SyntaxCharacter| or _c_ is U+002F (SOLIDUS), then | | ------------------------------------------------------------------------------------------------ | | 1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and UTF16EncodeCodePoint(_c_). |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as a point-in-time snapshot, and I'm not a fan because it will be weird if e.g. \@ becomes a valid escape in the future. My preference remains the universally applicable \x…/\u… approach. But that said, there is no technical issue here; merely an aesthetic one.

jridgewell

| 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. | | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | | 1. Set _escaped_ to the string-concatenation of _escaped_, the code unit 0x005C (REVERSE SOLIDUS), *"x3"*, and the code unit whose numeric value is the numeric value of _c_. | | | 1. If _escaped_ is the empty String, and _c_ is matched by |DecimalDigit| or | AsciiLetter|, then | | 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`. | |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this sentence is too long to not use a parenthetical.

| 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`. | | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text, which may be used after a `\0` character escape or a |DecimalEscape| such as `\1`, and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those commas read as ungrammatical to me. In particular the second comma cuts a clause in the middle - the pattern text "maybe used after \0 [...] and still match S". Open to other rephrasing here but I don't like this particular suggestion.

| 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. | | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | | 1. Set _escaped_ to the string-concatenation of _escaped_, the code unit 0x005C (REVERSE SOLIDUS), *"x3"*, and the code unit whose numeric value is the numeric value of _c_. | | | 1. If _escaped_ is the empty String, and _c_ is matched by |DecimalDigit| or | AsciiLetter|, then | | 1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`. | |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, TIL \c0 is a valid escape. This also prevents new RegExp("\\" + escape('n')) from combining.

ljharb

@bakkot bakkot deleted the address-feedback branch

May 13, 2024 05:35

@cb-sl cb-sl mentioned this pull request

May 20, 2024

This was referenced

Oct 6, 2024

This was referenced

Nov 5, 2024

This was referenced

Nov 26, 2024

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})