Splitting a quoted string.

Paul Melis paul at science.uva.nl
Wed May 16 07:32:14 EDT 2007


Paul Melis wrote:
> Hi,
> 
> mosscliffe wrote:
> 
>> I am looking for a simple split function to create a list of entries
>> from a string which contains quoted elements.  Like in 'google'
>> search.
>>
>> eg  string = 'bob john "johnny cash" 234 june'
>>
>> and I want to have a list of ['bob', 'john, 'johnny cash', '234',
>> 'june']
>>
>> I wondered about using the csv routines, but I thought I would ask the
>> experts first.
>>
>> There maybe a simple function, but as yet I have not found it.
> 
> 
> Here a not-so-simple-function using regular expressions. It repeatedly 
> matched two regexps, one that matches any sequence of characters except 
> a space and one that matches a double-quoted string. If there are two 
> matches the one occurring first in the string is taken and the matching 
> part of the string cut off. This is repeated until the whole string is 
> matched. If there are two matches at the same point in the string the 
> longer of the two matches is taken. (This can't be done with a single 
> regexp using the A|B operator, as it uses lazy evaluation. If A matches 
> then it is returned even if B would match a longer string).

Here a slightly improved version which is a bit more compact and which 
removes the quotes on the matched output quoted string.

import re

def split_string(s):
	
	pat1 = re.compile('[^" ]+')
	pat2 = re.compile('"([^"]*)"')

	parts = []

	m1 = pat1.search(s)
	m2 = pat2.search(s)
	while m1 or m2:
		
		if m1 and m2:
			if m1.start(0) < m2.start(0):
				match = 1
			elif m2.start(0) < m1.start(0):
				match = 2
			else:
				if len(m1.group(0)) > len(m2.group(0)):
					match = 1
				else:
					match = 2
		elif m1:
			match = 1
		else:
			match = 2
				
		if match == 1:
			part = m1.group(0)
			s = s[m1.end(0):]
		else:
			part = m2.group(1)
			s = s[m2.end(0):]
					
		parts.append(part)
			
		m1 = pat1.search(s)
		m2 = pat2.search(s)
		
	return parts

print split_string('bob john "johnny cash" 234 june')
print split_string('"abc""abc"')



More information about the Python-list mailing list