Extracting parts of string between anchor points

Denis McMahon denismfmcmahon at gmail.com
Thu Feb 27 19:55:01 EST 2014


On Thu, 27 Feb 2014 20:07:56 +0000, Jignesh Sutar wrote:

> I've kind of got this working but my code is very ugly. I'm sure it's
> regular expression I need to achieve this more but not very familiar
> with use regex, particularly retaining part of the string that is being
> searched/matched for.
> 
> Notes and code below to demonstrate what I am trying to achieve. Any
> help,
> much appreciated.

It seems you have a string which may be split into between 1 and 3 
substrings by the presence of up to 2 delimeters, and that if both 
delimeters are present, they are in a specified order.

You have several possible cases which, broadly speaking, break down into 
4 groups:

(a) no delimiters present
(b) delimiter 1 present
(c) delimiter 2 present
(d) both delimiters present

It is important when coding for such scenarios to consider the possible 
cases that are not specified, as well as the ones that are.

For example, consider the string:

"<delim1><delim2>"

where you have both delims, in sequence, but no other data elements.

I believe there are at least 17 possible combinations, and maybe another 
8 if you allow for the delims being out of sequence.

The code in the file at the url below processes 17 different cases. It 
may help, or it may confuse.

http://www.sined.co.uk/tmp/strparse.py.txt

-- 
Denis McMahon, denismfmcmahon at gmail.com



More information about the Python-list mailing list