(original) (raw)
On 28 October 2017 at 16:48, Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Oct 29, 2017 at 12:31:01AM +0100, MRAB wrote:
\> Not that I'm planning on making any further additions, just bug fixes
\> and updates to follow the Unicode updates. I think I've crammed enough
\> into it already. There's only so much you can do with the regex syntax
\> with its handful of metacharacters and possible escape sequences...
What do you think of the Perl 6 regex syntax?
https://en.wikipedia.org/wiki/Perl\_6\_rules#Changes\_from\_ Perl\_5
​If you're going to change the notation, why not use notations similar to what linguists use for FSTs? These allow building FSTs (with operations such as adding/subtracting/composing/projecting FSTs) with millions of states — and there are some impressive optimisers for them also, so that encoding a dictionary with inflections is both more compact and faster than a hash of just the words without inflections. Some of this work is open source, but I haven't kept up with it.
If you're interested, you can start here:
etc.
;)
\--
Steve
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/ pludemann%40google.com