Split text file into words

Duncan Booth duncan.booth at invalid.invalid
Wed Mar 9 10:34:50 EST 2005


qwweeeit wrote:

>    ll=re.split(r"[\s,{}[]()+=-/*]",i)

The stack overflow comes because the ()+ tried to match an empty string as 
many times as possible.

This regular expression contains a character set '\s,{}[' followed by the 
expression '()+=-/*]'. You can see that the parentheses aren't part of a 
character set if you reverse their order which gives you an error when the 
expression is compiled instead of failing when trying to match:

>>> ll=re.split(r"[\s,{}[])(+=-/*]",i)

Traceback (most recent call last):
  File "<pyshell#10>", line 1, in -toplevel-
    ll=re.split(r"[\s,{}[])(+=-/*]",i)
  File "C:\Python24\Lib\sre.py", line 157, in split
    return _compile(pattern, 0).split(string, maxsplit)
  File "C:\Python24\Lib\sre.py", line 227, in _compile
    raise error, v # invalid expression
error: unbalanced parenthesis
>>> 

I suspect you actually meant the character set to include the other 
punctuation characters in which case you need to escape the closing square 
bracket or make it the first character:

Try:

    ll=re.split(r"[\s,{}[\]()+=-/*]",i)

or:

    ll=re.split(r"[]\s,{}[()+=-/*]",i)

instead.





More information about the Python-list mailing list