Splitting a quoted string.
Paul Melis
paul at science.uva.nl
Wed May 16 07:32:14 EDT 2007
Paul Melis wrote:
> Hi,
>
> mosscliffe wrote:
>
>> I am looking for a simple split function to create a list of entries
>> from a string which contains quoted elements. Like in 'google'
>> search.
>>
>> eg string = 'bob john "johnny cash" 234 june'
>>
>> and I want to have a list of ['bob', 'john, 'johnny cash', '234',
>> 'june']
>>
>> I wondered about using the csv routines, but I thought I would ask the
>> experts first.
>>
>> There maybe a simple function, but as yet I have not found it.
>
>
> Here a not-so-simple-function using regular expressions. It repeatedly
> matched two regexps, one that matches any sequence of characters except
> a space and one that matches a double-quoted string. If there are two
> matches the one occurring first in the string is taken and the matching
> part of the string cut off. This is repeated until the whole string is
> matched. If there are two matches at the same point in the string the
> longer of the two matches is taken. (This can't be done with a single
> regexp using the A|B operator, as it uses lazy evaluation. If A matches
> then it is returned even if B would match a longer string).
Here a slightly improved version which is a bit more compact and which
removes the quotes on the matched output quoted string.
import re
def split_string(s):
pat1 = re.compile('[^" ]+')
pat2 = re.compile('"([^"]*)"')
parts = []
m1 = pat1.search(s)
m2 = pat2.search(s)
while m1 or m2:
if m1 and m2:
if m1.start(0) < m2.start(0):
match = 1
elif m2.start(0) < m1.start(0):
match = 2
else:
if len(m1.group(0)) > len(m2.group(0)):
match = 1
else:
match = 2
elif m1:
match = 1
else:
match = 2
if match == 1:
part = m1.group(0)
s = s[m1.end(0):]
else:
part = m2.group(1)
s = s[m2.end(0):]
parts.append(part)
m1 = pat1.search(s)
m2 = pat2.search(s)
return parts
print split_string('bob john "johnny cash" 234 june')
print split_string('"abc""abc"')
More information about the Python-list
mailing list