[Python-bugs-list] [ python-Bugs-476912 ] regex annoyance

noreply@sourceforge.net noreply@sourceforge.net
Wed, 31 Oct 2001 12:17:40 -0800


Bugs item #476912, was opened at 2001-10-31 12:17
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=476912&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Bill Bumgarner (bbum)
Assigned to: Nobody/Anonymous (nobody)
Summary: regex annoyance

Initial Comment:
(this may be a feature request-- but it is annoying 
enough that I filed it as a bug)

Python's named sub expressions  within regular 
expressions are an incredibly valuable feature;  
between it and the ability to automatically collapse 
multiline regex's w/comments leads to very 
readable regex's.   

However, there is an annoyance in named 
subexpressions that has bitten me several times.

Namely, if you have a situation where a particular 
token must be parsed out of the input through the 
use of one of two (or more) expressions in a 
fashion that cannot be expressed without multiple 
possible means of matching any given 
subexpression, then the named subexpression 
will only be non-None intermittently (depending on 
expression order and what was matched).

That is, given:

(?:(?<Tok1>[a-z]+)\s(?<Tok2>[a-z]+))|(?:(?<Tok1>
[a-z]+)\t(?<Tok2>[a-z]+))

In this case, Tok1 and Tok2 will be None if the first 
expression matches... 

(Yes, this is a contrived example that could be 
refactored to not use multiple <Tok1>/<Tok2> 
references-- however, more complex expressions 
do not always enable easy refactoring.)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=476912&group_id=5470