multi split function taking delimiter list

Frederic Rentsch anthra.norell at vtxmail.ch
Thu Nov 16 04:43:00 EST 2006


Paddy wrote:
> Paddy wrote:
>
>> Paddy wrote:
>>
>>> martinskou at gmail.com wrote:
>>>
>>>> Hi, I'm looking for something like:
>>>>
>>>> multi_split( 'a:=b+c' , [':=','+'] )
>>>>
>>>> returning:
>>>> ['a', ':=', 'b', '+', 'c']
>>>>
>>>> whats the python way to achieve this, preferably without regexp?
>>>>
>>>> Thanks.
>>>>
>>>> Martin
>>> I resisted my urge to use a regexp and came up with this:
>>>
>>>>>> from itertools import groupby
>>>>>> s = 'apple=blue+cart'
>>>>>> [''.join(g) for k,g in groupby(s, lambda x: x in '=+')]
>>> ['apple', '=', 'blue', '+', 'cart']
>>> For me, the regexp solution would have been clearer, but I need to
>>> stretch my itertools skills.
>>>
>>> - Paddy.
>> Arghhh!
>> No colon!
>> Forget the above please.
>>
>> - Pad.
>
> With colon:
>
>>>> from itertools import groupby
>>>> s = 'apple:=blue+cart'
>>>> [''.join(g) for k,g in groupby(s,lambda x: x in ':=+')]
> ['apple', ':=', 'blue', '+', 'cart']
>
> - Pad.
>
Automatic grouping may or may not work as intended. If some subsets 
should not be split, the solution raises a new problem.

I have been demonstrating solutions based on SE with such frequency of 
late that I have begun to irritate some readers and SE in sarcastic 
exaggeration has been characterized as the 'Solution of Everything'. 
With some trepidation I am going to demonstrate another SE solution, 
because the truth of the exaggeration is that SE is a versatile tool for 
handling a variety of relatively simple problems in a simple, 
straightforward manner.

 >>> test_string =  'a:=b+c: apple:=blue:+cart''
 >>> SE.SE (':\==/:\=/ +=/+/')(test_string).split ('/')   # For repeats 
the SE object would be assigned to a variable
['a', ':=', 'b', '+', 'c: apple', ':=', 'blue:', '+', 'cart']

This is a nuts-and-bolts approach. What you do is what you get. What you 
want is what you do. By itself SE doesn't do anything but search and 
replace, a concept without a learning curve. The simplicity doesn't 
suggest versatility. Versatility comes from application techniques.
    SE is a game of challenge. You know the result you want. You know 
the pieces you have. The game is how to get the result with the pieces 
using search and replace, either per se or as an auxiliary, as in this 
case for splitting. That's all. The example above inserts some 
appropriate split mark ('/'). It takes thirty seconds to write it up and 
see the result. No need to ponder formulas and inner workings. If you 
don't like what you see you also see what needs to be changed. Supposing 
we should split single colons too, adding the corresponding substitution 
and verifying the effect is a matter of another ten seconds:

 >>> SE.SE (':\==/:\=/ +=/+/ :=/:/')(test_string).split ('/')
['a', ':=', 'b', '+', 'c', ':', ' apple', ':=', 'blue', ':', '', '+', 
'cart']

Now we see an empty field we don't like towards the end. Why?

 >>> SE.SE (':\==/:\=/ +=/+/ :=/:/')(test_string)
'a/:=/b/+/c/:/ apple/:=/blue/://+/cart'

Ah! It's two slashes next to each other. No problem. We de-multiply 
double slashes in a second pass:

 >>> SE.SE (':\==/:\=/ +=/+/ :=/:/ | //=/')(test_string).split ('/')
['a', ':=', 'b', '+', 'c', ':', ' apple', ':=', 'blue', ':', '+', 'cart']

On second thought the colon should not be split if a plus sign follows:

 >>> SE.SE (':\==/:\=/ +=/+/ :=/:/ :+=:/+/ | //=/')(test_string).split ('/') 

['a', ':=', 'b', '+', 'c', ':', ' apple', ':=', 'blue:', '+', 'cart']

No, wrong again! 'Colon-plus' should be exempt altogether. And no spaces 
please:

 >>> SE.SE (':\==/:\=/ +=/+/ :=/:/ :+=:+ " =" | 
//=/')(test_string).split ('/')
['a', ':=', 'b', '+', 'c', ':', 'apple', ':=', 'blue:+cart']

etc.

It is easy to get carried away and to forget that SE should not be used 
instead of Python's built-ins, or to get carried away doing contextual 
or grammar processing explicitly, which gets messy very fast. SE fills a 
gap somewhere between built-ins and parsers.
     Stream editing is not a mainstream technique. I believe it has the 
potential to make many simple problems trivial and many harder ones 
simpler. This is why I believe the technique deserves more attention, 
which, again, may explain the focus of my posts.

Frederic




More information about the Python-list mailing list