[Tutor] str.split and quotes

Tony Meyer tameyer at ihug.co.nz
Wed Apr 6 07:59:23 CEST 2005


> >>> s = 'Hi "Python Tutors" please help'
> >>> s.split()
> ['Hi', '"Python', 'Tutors"', 'please', 'help']
> >>> 
> 
> I wish it would leave the stuff in quotes in tact:
> 
> ['Hi', '"Python Tutors"', 'please', 'help']

You can do this with a regular expression:

>>> import re
>>> re.findall(r'\".*\"|[^ ]+', s)
['Hi', '"Python Tutors"', 'please', 'help']

The regular expression says to find patterns that are either a quote (\")
then any number of any characters (.*)then a quote (/") or (|) more than one
of any character except a space ([^ ]).

Or you can just join them back up again:

>>> combined = []
>>> b = []
>>> for a in s.split():
... 	if '"' in a:
... 		if combined:
... 			combined.append(a)
... 			b.append(" ".join(combined))
... 			combined = []
... 		else:
... 			combined.append(a)
... 	else:
... 		b.append(a)
... 		
>>> b
['Hi', '"Python Tutors"', 'please', 'help']

(There are probably tidier ways of doing that).

Or you can do the split yourself:

def split_no_quotes(s):
    index_start = 0
    index_end = 0
    in_quotes = False
    result = []
    while index_end < len(s):
        if s[index_end] == '"':
            in_quotes = not in_quotes
        if s[index_end] == ' ' and not in_quotes:
            result.append(s[index_start:index_end])
            index_start = index_end + 1
        index_end += 1
    if s[-1] != ' ':
        result.append(s[index_start:index_end])
    return result

>>> print split_no_quotes(s)
['Hi', '"Python Tutors"', 'please', 'help']            

=Tony.Meyer



More information about the Tutor mailing list