split on blank lines
Jan Burgy
jburgy at hotmail.com
Tue Dec 2 05:32:25 EST 2003
Duncan Booth <duncan at NOSPAMrcp.co.uk> wrote in message news:<Xns9444932BB7A17duncanrcpcouk at 127.0.0.1>...
> jburgy at hotmail.com (Jan Burgy) wrote in
> news:807692de.0312010610.4461c0e3 at posting.google.com:
>
> > can somebody tell me why (using Python 2.3.2)
> >
> >>>> import re
> >>>> re.compile(r"^$", re.MULTILINE).split("foo\n\nbar\n\nbaz")
> > ['foo\n\nbar\n\nbaz']
> >
> > ? Being used to Perl semantics, I expect
> >
> > ['foo\n', 'bar\n', 'baz']
> >
> > or something equivalent without the '\n' characters in the result
> > strings. I have found that
> >
> >>>> re.compile(r"^\n", re.MULTILINE).split("foo\n\nbar\n\nbaz")
> > ['foo\n', 'bar\n', 'baz']
> >
> > I prefer the first version however because my intent is stated more
> > clearly. Could this be a bug in sre.py (I looked at the code for a
> > good two minutes but then my head started hurting)
> >
>
> Given that re.compile("^$", re.MULTILINE).findall("foo\n\nbar\n\nbaz")
> returns ['', ''] I would agree this looks like a bug. You could submit a
> bug report on Sourceforge.
>
> Of course, if you really want to state your intentions, you could just use:
>
> >>> "foo\n\nbar\n\nbaz".split('\n\n')
> ['foo', 'bar', 'baz']
>
> as you aren't doing anything here that obviously benefits from regex
> obfuscation.
Thank you Duncan for your input. You're right, I will post a bug
report on sourceforge. Why, you ask, do I split on "^$" and not simply
"\n\n"? Simply because I'm dealing with an idiotic file format (not my
own mind you) and that I really want to split on "^\t*$" (I agree with
you that it's a rather arbitrary definition of a blank line, once
again, not mine). When the above didn't work, I spent a long time
questioning my understanding of regular expressions until I could
simplify my code to the minimal amount that still yielded the error.
Sometimes I wish that Python contained more elements from AWK (in
particularly "RS" for instance)
Cheers,
Jan
--
Being an actuary is a lot harder than being a mathematician: it is
enough for a mathematician to prove that he or she is right.
More information about the Python-list
mailing list