[Python-Dev] New string method - splitquoted (original) (raw)

Giovanni Bajo rasky at develer.com
Thu May 18 12:26:05 CEST 2006


Heiko Wundram <me+python-dev at modelnine.org> wrote:

Don't get me wrong, I personally find this functionality very, very interesting (I'm +0.5 on adding it in some way or another), especially as a part of the standard library (not necessarily as an extension to .split()).

It's already there. It's called shlex.split(), and follows the semantic of a standard UNIX shell, including escaping and other things. I knew about *nix shell escaping, but that isn't necessarily what I find in input I have to process (although generally it's what you see, yeah). That's why I said that it would be interesting to have a generalized method, sort of like the csv module but only for string "interpretation", which takes a dialect, and parses a string for the specified dialect. Remember, there also escaping by doubling the end of string marker (for example, '""this is not a single argument""'.split() should be parsed as ['"this','is','not','a',....]), and I know programs that use exactly this format for file storage.

I never met this one. Anyway, I don't think it's harder than:

def mysplit(s): ... """Allow double quotes to escape a quotes""" ... return shlex.split(s.replace(r'""', r'"')) ... mysplit('""This is not a single argument""') ['"This', 'is', 'not', 'a', 'single', 'argument"']

Maybe, one could simply export the function the csv module uses to parse the actual data fields as a more prominent method, which accepts keyword arguments, instead of a Dialect-derived class.

I think you're over-generalizing a very simple problem. I believe that str.split, shlex.split, and some simple variation like the one above (maybe using regular expressions to do the substitution if you have slightly more complex cases) can handle 99.99% of the splitting cases. They surely handle 100% of those I myself had to parse.

I believe the standard library already covers common usage. There will surely be cases where a custom lexer/splitetr will have to be written, but that's life :)

Giovanni Bajo



More information about the Python-Dev mailing list