[Python-ideas] str.split with multiple individual split characters
Bruce Leban
bruce at leapyear.org
Mon Feb 28 07:51:51 CET 2011
On Sun, Feb 27, 2011 at 10:19 PM, Stephen J. Turnbull <stephen at xemacs.org>wrote:
> def multisplit (source, char1, char2):
> ... return re.split("".join(["[",char1,char2,"]"]),source)
>
actually you need re.escape there in case one of the characters is \ or ].
And if remembering [...] is hard using | makes this a bit more general
(accepting multi-character separators)
def multisplit(source, *separators):
return re.split('|'.join([re.escape(t) for t in separators]), source)
multisplit(s, '\r\n', '\r', '\n')
Bonus points if you see the problem with the above. Correct code below
spoiler space
.
.
.
.
.
.
.
.
.
.
.
The problem is that an |-separated regex matches in order, so if a longer
separator appears after a shorter one, the shorter one will take precedence.
def multisplit(source, *separators):
return re.split('|'.join([re.escape(t) for t in
sorted(separators, key=len, reverse=True)]), source)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20110227/649cae81/attachment.html>
More information about the Python-ideas
mailing list