Help with splitting

George Sakkis gsakkis at rutgers.edu
Sat Apr 2 20:56:22 EST 2005


Jeremy Bowers wrote:
> On Fri, 01 Apr 2005 14:20:51 -0800, RickMuller wrote:
>
> > I'm trying to split a string into pieces on whitespace, but I want
to
> > save the whitespace characters rather than discarding them.
> >
> > For example, I want to split the string '1    2' into ['1','
','2'].
> > I was certain that there was a way to do this using the standard
string
> > functions, but I just spent some time poring over the documentation
> > without finding anything.
>
> importPython 2.3.5 (#1, Mar  3 2005, 17:32:12)
> [GCC 3.4.3  (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
> Type "help", "copyright", "credits" or "license" for more
information.
> >>> import re
> >>> whitespaceSplitter = re.compile("(\w+)")
> >>> whitespaceSplitter.split("1 2  3   \t\n5")
> ['', '1', ' ', '2', '  ', '3', '   \t\n', '5', '']
> >>> whitespaceSplitter.split(" 1 2  3   \t\n5 ")
> [' ', '1', ' ', '2', '  ', '3', '   \t\n', '5', ' ']
>
> Note the null strings at the beginning and end if there are no
instances
> of the split RE at the beginning or end. Pondering the second
invocation
> should show why they are there, though darned if I can think of a
good way
> to put it into words.

If you don't want any null strings at the beginning or the end, an
equivalent regexp is:

>>> whitespaceSplitter_2 = re.compile("\w+|\s+")
>>> whitespaceSplitter_2.findall("1 2  3   \t\n5")
['1', ' ', '2', '  ', '3', '   \t\n', '5']
>>> whitespaceSplitter_2.findall(" 1 2  3   \t\n5 ")
[' ', '1', ' ', '2', '  ', '3', '   \t\n', '5', ' ']


George




More information about the Python-list mailing list