Splitting a quoted string.

Paul Melis paul at science.uva.nl
Wed May 16 07:17:23 EDT 2007


Hi,

mosscliffe wrote:
> I am looking for a simple split function to create a list of entries
> from a string which contains quoted elements.  Like in 'google'
> search.
> 
> eg  string = 'bob john "johnny cash" 234 june'
> 
> and I want to have a list of ['bob', 'john, 'johnny cash', '234',
> 'june']
> 
> I wondered about using the csv routines, but I thought I would ask the
> experts first.
> 
> There maybe a simple function, but as yet I have not found it.

Here a not-so-simple-function using regular expressions. It repeatedly 
matched two regexps, one that matches any sequence of characters except 
a space and one that matches a double-quoted string. If there are two 
matches the one occurring first in the string is taken and the matching 
part of the string cut off. This is repeated until the whole string is 
matched. If there are two matches at the same point in the string the 
longer of the two matches is taken. (This can't be done with a single 
regexp using the A|B operator, as it uses lazy evaluation. If A matches 
then it is returned even if B would match a longer string).

import re

def split_string(s):
	
	pat1 = re.compile('[^ ]+')
	pat2 = re.compile('"[^"]*"')

	parts = []

	m1 = pat1.search(s)
	m2 = pat2.search(s)
	while m1 or m2:
		
		if m1 and m2:
			# Both match, take match occurring earliest in the string
			p1 = m1.group(0)
			p2 = m2.group(0)
			if m1.start(0) < m2.start(0):
				part = p1
				s = s[m1.end(0):]
			elif m2.start(0) < m1.start(0):
				part = p2
				s = s[m2.end(0):]		
			else:
				# Both match at the same string position, take longest match
				if len(p1) > len(p2):
					part = p1
					s = s[m1.end(0):]
				else:
					part = p2
					s = s[m2.end(0):]
		elif m1:
			part = m1.group(0)
			s = s[m1.end(0):]
		else:
			part = m2.group(0)
			s = s[m2.end(0):]
					
		parts.append(part)
			
		m1 = pat1.search(s)
		m2 = pat2.search(s)
		
	return parts

 >>> s = 'bob john "johnny cash" 234 june'
 >>> split_string(s)
['bob', 'john', '"johnny cash"', '234', 'june']
 >>>


Paul



More information about the Python-list mailing list