[ python-Bugs-852532 ] ^$ won't split on empty line

SourceForge.net noreply at sourceforge.net
Tue Dec 2 10:20:27 EST 2003


Bugs item #852532, was opened at 2003-12-02 06:01
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=852532&group_id=5470

Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Jan Burgy (jburgy)
Assigned to: Fredrik Lundh (effbot)
Summary: ^$ won't split on empty line

Initial Comment:
Python 2.3.2 (#49, Oct  2 2003, 20:02:00) [MSC v.1200 
32 bit (Intel)] on win32

>>> import re
>>> re.compile('^$', re.MULTILINE).split('foo\n\nbar')
['foo\n\nbar']

I expect ['foo\n', '\nbar'], since, according to the 
documentation $ "in MULTILINE mode also matches 
before a newline".

Thanks, Jan

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2003-12-02 10:20

Message:
Logged In: YES 
user_id=31435

Confirmed on Pythons 2.1.3, 2.2.3, 2.3.2, and current CVS.

More generally, split() doesn't appear to split on any empty 
(0-length) match.  For example,

>>> pat = re.compile(r'\b')
>>> pat.split('(a b)')
['(a b)']
>>> pat.findall('(a b)')  # but the pattern matches 4 places
['', '', '', '']
>>>

That's probably a design constraint, but isn't documented.  
For example, if you split "abc" by the pattern x*, what do you 
expect?  The pattern matches (with length 0) at 4 places, 
but I bet most people would be surprised to get

['', 'a', 'b', 'c', '']

back instead of (as they do get)

['abc']

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=852532&group_id=5470



More information about the Python-bugs-list mailing list