beginners question about return value of re.split

Diez B. Roggisch deets at nospam.web.de
Fri Mar 21 11:30:55 EDT 2008


klaus schrieb:
> Hello,
> 
> I have a question regarding the return value of re.split() since I have 
> been unable to find any answers in the regular sources of documentation.
> 
> Please consider the following:
> 
> #!/usr/bin/env python
> 
> import re
> 
> if __name__ == "__main__":
>     datum = "2008-03-14"
>     the_date = re.split('^([0-9]{4})-([0-9]{2})-([0-9]{2})$', datum, 3)
>     print the_date
> 
> Now the result that is printed is:
> ['', '2008', '03', '14', '']
> 
> My question: what are the empty strings doing there in the beginning and 
> in the end ? Is this due to a faulty regular expression ?

Read the manual:

"""
split(  	pattern, string[, maxsplit = 0])
     Split string by the occurrences of pattern. If capturing 
parentheses are used in pattern, then the text of all groups in the 
pattern are also returned as part of the resulting list. If maxsplit is 
nonzero, at most maxsplit splits occur, and the remainder of the string 
is returned as the final element of the list. (Incompatibility note: in 
the original Python 1.5 release, maxsplit was ignored. This has been 
fixed in later releases.)

"""

The Key issue here being "If capturing parentheses are used in pattern, 
then the text of all groups in the pattern are also returned as part of 
the resulting list."

Consider this:

 >>> re.compile("a").split("bab")
['b', 'b']
 >>> re.compile("(a)").split("bab")
['b', 'a', 'b']
 >>>

Consider using match or search if split isn't what you actually want.

Diez



More information about the Python-list mailing list