split on blank lines

Duncan Booth duncan at NOSPAMrcp.co.uk
Mon Dec 1 09:35:50 EST 2003


jburgy at hotmail.com (Jan Burgy) wrote in 
news:807692de.0312010610.4461c0e3 at posting.google.com:

> can somebody tell me why (using Python 2.3.2)
> 
>>>> import re
>>>> re.compile(r"^$", re.MULTILINE).split("foo\n\nbar\n\nbaz")
> ['foo\n\nbar\n\nbaz']
> 
> ? Being used to Perl semantics, I expect
> 
> ['foo\n', 'bar\n', 'baz']
> 
> or something equivalent without the '\n' characters in the result
> strings. I have found that
> 
>>>> re.compile(r"^\n", re.MULTILINE).split("foo\n\nbar\n\nbaz")
> ['foo\n', 'bar\n', 'baz']
> 
> I prefer the first version however because my intent is stated more
> clearly. Could this be a bug in sre.py (I looked at the code for a
> good two minutes but then my head started hurting)
> 

Given that re.compile("^$", re.MULTILINE).findall("foo\n\nbar\n\nbaz") 
returns ['', ''] I would agree this looks like a bug. You could submit a 
bug report on Sourceforge.

Of course, if you really want to state your intentions, you could just use:

   >>> "foo\n\nbar\n\nbaz".split('\n\n')
   ['foo', 'bar', 'baz']

as you aren't doing anything here that obviously benefits from regex 
obfuscation.

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?




More information about the Python-list mailing list