RE Help

Fri Sep 21 17:53:10 EDT 2007

Thomas Jollans wrote:
> On Friday 21 September 2007, byte8bits at gmail.com wrote:
>   
>> Not specific to Python, but it will be implemented in it... how do I
>> compile a RE to catch everything between two know values? Here's what
>> I've tried (but failed) to accomplish... the knowns here are START and
>> END:
>>
>> data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg"
>> x = re.compile('START.END', re.DOTALL)
>>
>> x.findall(data)
>>     
>
> I'm not sure finding a variable number of occurences can be done with re. How 
> about
>
> # data = the string
> strings = []
> for s in data.split('START')[1:]:
>     strings.append(s.split('END')[0])
>   
Nice.  I've noticed that since I switched from Perl to Python, I hardly
ever use regular expressions anymore.  In perl, they're so easy to fire
up that they become the first tool out of the toolbox, but when you make
the barrier to access just a tiny bit higher (import re/re.compile) you
start noticing how easy it is to accomplish most of those feats without
regexes, and much more readably, too.

Of course, it should be noted that the different implementations
suggested behave differently, which could also affect the choice of
method.  If you have "abcSTARTdefSTARTghiEND", your version will spit
out strings = ['def', 'ghi'], but a regex, depending on whether it is
greedy or non greedy, will either spit out ['STARTdefSTARTghiEND'] or
['STARTghiEND'].

Correction, it will spit out the first one, whether greedy or not.  The
difference comes with two END tags in a row.

Cheers,
Cliff