beginners question about return value of re.split
Diez B. Roggisch
deets at nospam.web.de
Fri Mar 21 11:30:55 EDT 2008
klaus schrieb:
> Hello,
>
> I have a question regarding the return value of re.split() since I have
> been unable to find any answers in the regular sources of documentation.
>
> Please consider the following:
>
> #!/usr/bin/env python
>
> import re
>
> if __name__ == "__main__":
> datum = "2008-03-14"
> the_date = re.split('^([0-9]{4})-([0-9]{2})-([0-9]{2})$', datum, 3)
> print the_date
>
> Now the result that is printed is:
> ['', '2008', '03', '14', '']
>
> My question: what are the empty strings doing there in the beginning and
> in the end ? Is this due to a faulty regular expression ?
Read the manual:
"""
split( pattern, string[, maxsplit = 0])
Split string by the occurrences of pattern. If capturing
parentheses are used in pattern, then the text of all groups in the
pattern are also returned as part of the resulting list. If maxsplit is
nonzero, at most maxsplit splits occur, and the remainder of the string
is returned as the final element of the list. (Incompatibility note: in
the original Python 1.5 release, maxsplit was ignored. This has been
fixed in later releases.)
"""
The Key issue here being "If capturing parentheses are used in pattern,
then the text of all groups in the pattern are also returned as part of
the resulting list."
Consider this:
>>> re.compile("a").split("bab")
['b', 'b']
>>> re.compile("(a)").split("bab")
['b', 'a', 'b']
>>>
Consider using match or search if split isn't what you actually want.
Diez
More information about the Python-list
mailing list