regex question: backreferences in brackets

Jeremy Jones cypher_dpg at yahoo.com
Fri Dec 28 10:23:50 EST 2001


From the Python Library Reference on regular expressions, I read (in section 4.2.1 - concerning backreferences):

"""
\number 
<snip> Inside the "[" and "]" of a character class, all numeric escapes are treated as characters. 
"""


I want to be able to match any character other than a character that I already matched and have a named (or numbered) group for.  For example:

"""
match_string = 'XXX|1|22|333|4444:'
test_compile = re.compile(r'XXX(.)[^|]{1}\1[^|]{2}\1[^|]{3}\1[^|]{4}(.)')
mymatch = test_compile.match(match_string)
if mymatch:
	print "Found a match"
	print mymatch.group(0)
else:
	print "No match found"
"""

The format of the strings that I am trying to match are 3 specific characters followed by some delimter followed by N number of characters other than the delimiter followed by the delimiter, etc.  The above code snipped works and matches the string perfectly.  I guess my question is this:  how can I match any other character except for the delimiter without knowing it beforehand (which I won't know what it is beforehand)?  I am thinking use the backreference to the delimiter (i.e. \1), but you can't but that in the brackets.  I tried putting \1 in the brackets and it sort of worked.  But when I changed one of the numbers to a |, it still matched, which isn't what I want (and actually, it doesn't match if I leave the | hard-coded).  I also tried using named backreferences like (?P<delimiter>.) rather than (.) and tried putting (?P=delimiter) in the brackets and no luck.  Any suggestions?  TIA.


Jeremy Jones




More information about the Python-list mailing list