PROPOSAL: Multiline strings (original) (raw)

Reinier Zwitserloot reinier at zwitserloot.com
Sun Mar 1 10:41:54 PST 2009


Embedding regexps has two significant advantages:

  1. compile-time checking of your regexps. Sure, most random
    gibberish just so happens to be a valid regexp, but there are rules -
    you can have mismatched parentheses, for example.

  2. compile-time compilation of regexps. If compiling the regexp is
    allowed to take rather long, then you can effectively create O(n)
    matching algorithms, where n is the size of the input string. What
    better time is there to do the compilation of the regexp than when
    you're compiling the code? Javac would essentially include the
    serialized form of a compiled regexp into the class file, instead of
    the string.

As far as the multi-line string proposal: It is very incomplete. I
suggest resubmitting it with documentation on handling raw strings
(let's leave regexp literals for another proposal; as has been said,
even if the language has regexp literals, raw strings are still a
useful constrict), and on handling white space. It should also cover
handling of newlines (if the file contains \r\n because it was written
on windows, should those be kept as is or should they be replaced with
\n line-endings, which seems like the right answer to me).

My personal favourite way to do whitespace:

After the first newline, eliminate all leading whitespace. Then
consider that amount of whitespace (no translating of tabs to spaces)
to be the indent. Thus, the following:

String foo = """ bar baz bla qux";

is equal to: String foo = "bar\n baz\n bla\nqux";

and the following:

String foo = """ foo bar""";

is a compile-time error.

If you need leading whitespace, you'll need to prefix this in a
separate string and concatenate them, or add them on the same line.
So, if you need "\t\nfoo\n", and you don't want to use \t, you could
write it as:

String foo = "" < - You don't see it, but there's a tab here. foo """;

or as:

String foo = "\t" + """ foo """;

--Reinier Zwitserloot Like it? Tip it! http://tipit.to

On Mar 1, 2009, at 18:53, Jeremy Manson wrote:

The plus side of the escaped String approach is that you can then use any language, not just regexps. Also, escaped Strings might be a plus for security purposes.

Also, I'm not a big fan of the idea of embedding the domain-specific-language-du-jour into my programming language. I see it as a slippery slope. It's regexps today, but it's XML tomorrow (I'm looking at you, Scala). Jeremy On Sun, Mar 1, 2009 at 1:16 AM, Adrian Kuhn <akuhn at gmx.ch> wrote: On 01.03.2009, at 09:58, Jeremy Manson wrote:

Frankly, to me, the big win would actually not be multiline literals, but would be escaped String literals. I'm sick of writing all of my regexps with twice as many \ characters as they need. In this case, why not allow regexps to be written literally in source code? As is done in many other languages. Although, this change would couple the regexp API with the language. But maybe here, the benefit might be worth the costs. --AA



More information about the coin-dev mailing list