[Python-Dev] New string method - splitquoted

Giovanni Bajo rasky at develer.com
Thu May 18 12:26:05 CEST 2006


Heiko Wundram <me+python-dev at modelnine.org> wrote:

>>> Don't get me wrong, I personally find this functionality very, very
>>> interesting (I'm +0.5 on adding it in some way or another),
>>> especially as a
>>> part of the standard library (not necessarily as an extension to
>>> .split()).
>>
>> It's already there. It's called shlex.split(), and follows the
>> semantic of a standard UNIX shell, including escaping and other
>> things.
>
> I knew about *nix shell escaping, but that isn't necessarily what I
> find in input I have to process (although generally it's what you
> see, yeah). That's why I said that it would be interesting to have a
> generalized method, sort of like the csv module but only for string
> "interpretation", which takes a dialect, and parses a string for the
> specified dialect.
>
> Remember, there also escaping by doubling the end of string marker
> (for example, '""this is not a single argument""'.split() should be
> parsed as ['"this','is','not','a',....]), and I know programs that
> use exactly this format for file storage.

I never met this one. Anyway, I don't think it's harder than:

>>> def mysplit(s):
...     """Allow double quotes to escape a quotes"""
...     return shlex.split(s.replace(r'""', r'\"'))
...
>>> mysplit('""This is not a single argument""')
['"This', 'is', 'not', 'a', 'single', 'argument"']


> Maybe, one could simply export the function the csv module uses to
> parse the actual data fields as a more prominent method, which
> accepts keyword arguments, instead of a Dialect-derived class.


I think you're over-generalizing a very simple problem. I believe that
str.split, shlex.split, and some simple variation like the one above (maybe
using regular expressions to do the substitution if you have slightly more
complex cases) can handle 99.99% of the splitting cases. They surely handle
100% of those I myself had to parse.

I believe the standard library already covers common usage. There will surely
be cases where a custom lexer/splitetr will have to be written, but that's life
:)

Giovanni Bajo



More information about the Python-Dev mailing list