regular expression for space seperated quoted string

Padraig Brady padraig at linux.ie
Sun Sep 15 15:50:04 EDT 2002


Padraig Brady wrote:
> Padraig Brady wrote:
> 
>> Eric Brunel wrote:
>>
>>> Padraig Brady wrote:
>>>
>>>
>>>> Hi, I'm trying to split a string that is seperated
>>>> by spaces and also contains double quoted words which
>>>> can contain spaces:
>>>
>>>
>>
>> [snip]
>>
>>> What about:
>>>
>>>>>> p = r'[^ \t\n\v\f"]+|"[^"]*"'
>>>>>> re.findall(p, '1 2 3')
>>>>>
>>>>>
>>
>> FYI, I'm using the following:
>>
>> import fileinput, re
>> for line in fileinput.input():
>>     #split fields
>>     fields = re.findall('[^ "]+|"[^"]+"', line[:-1])
>>     #remove quotes
>>     fields = map(lambda field: field.replace('"', ''), listLine)
>>
>> thanks again,
>> Pádraig.
>>
> 
> Just in case it's useful I might as well make it correct and more 
> efficient:
> 
> import fileinput, re
> reFieldSplitter = re.compile('[^ "]+|"[^"]+"')
> for line in fileinput.input():
>     #split fields
>     fields = reFieldSplitter.findall(line[:-1])
>     #remove quotes
>     fields = map(lambda field: field.replace('"', ''), fields)

Just to be more complete the compilation of the regular expression
above gave a 1.3% speedup. However changing the map(lambda...) to
a list comprehension gives another 4% speedup.

import fileinput, re
reFieldSplitter = re.compile('[^ "]+|"[^"]+"')
for line in fileinput.input():
     #split fields
     fields = reFieldSplitter.findall(line[:-1])
     #remove quotes
     fields = [field.replace('"', '') for field in fields)]

Pádraig.




More information about the Python-list mailing list