Parsing parameters with quotes

Bengt Richter bokr at oz.net
Sat Mar 15 17:44:37 EST 2003


On 14 Mar 2003 15:02:32 -0800, sjmachin at lexicon.net (John Machin) wrote:
>"Giovanni Bajo" <noway at sorry.com> wrote in message news:<Ywaca.11032$Lr4.323544 at twister2.libero.it>...
>> My input is:
>> 'foo "this is one" and this not':
>> and I want to output:
>> ["foo", "this is one", "and", "this", "not"]
>> 
>> Basically, a string.split() but must take into account quotes used to group
>> as a single word (no escaping is supported within quotes). Now, is there
>> already something in the python library to do this? My code is a bit longer
>> than I would have expected:
>> Is there any faster way? getopt() does not seem to do this (it's done
>> beforehand by whoever fills sys.argv[])
>
>See the following code. Version 2 reproduces the results of (your)
>Version 1. Should be faster, but I haven't tested this as I was more
>concerned with correctness. I would expect that any argument
>surrounded by "" should survive with its contents unmangled. In the
>worst case an empty arg "" doesn't even survive. Version 3 below gives
>the serendipitous outcome of simpler code plus less disturbing (to me
>anyway) results.
>
>Hope this helps,
>John
>
I rearranged the test loops to test by function outermost, and reformattted
the output a little, and added SplitParms4. The result follows, replacing
what was within your snip lines:

>8<---
def SplitParms1(s):
    s = s.split('"')
    L = []
    for i,t in zip(range(0,len(s)), s):
        if t:
            if i%2 == 1:
                L.append(t.strip())
            else:
                L.extend(t.split())
    return L

def SplitParms2(s):
    L = []
    Lappend = L.append
    Lextend = L.extend
    odd = 0
    for t in s.split('"'):
        if t:
            if odd:
                Lappend(t.strip())
            else:
                Lextend(t.split())
        odd = not odd
    return L

def SplitParms3(s):
    L = []
    Lappend = L.append
    Lextend = L.extend
    odd = 0
    for t in s.split('"'):
        if odd:
            Lappend(t)
        else:
            Lextend(t.split())
        odd = not odd
    return L

import re
def SplitParms4(s, p=re.compile(r'"(.*?)"|(\S+)')):
    return [g2 or g1 for g1,g2 in p.findall(s)]

tests = [
   ['foo "bar zot" ugh', ['foo', 'bar zot', 'ugh']],
   ['',                  []                       ],
   ['one',               ['one']                  ],
   ['"qone"',            ['qone']                 ],
   ['""',                ['']                     ],
   ['arg1 "" "3rd arg"', ['arg1', '', '3rd arg']  ],
   ['"   "',             ['   ']                  ],
   [' x ',               ['x']                    ],
   ['  x  y  ',          ['x', 'y']               ],
   ['" x "',             [' x ']                  ],
   ['x y z "bar zot""   " a', ['x', 'y', 'z', 'bar zot', '   ', 'a']],
]

funcs = [
   SplitParms1,
   SplitParms2,
   SplitParms3,
   SplitParms4,
]
   
if __name__ == "__main__":
   for func in funcs:
      print
      for test_arg, expected in tests:
         actual = func(test_arg)
         if actual == expected:
            flag = "   "
         else:
            flag = "***"
         print "%s %s(%s) ->\n       %s (actual)\n       %s (expected)" % \
            (flag, func.__name__, repr(test_arg), repr(actual), repr(expected))
>8<---

Regards,
Bengt Richter




More information about the Python-list mailing list