regexp upward compatibility bug ?

Jeff Epler jepler at unpythonic.net
Thu Jan 29 10:19:47 EST 2004


The problem is the use of '-' in the character groups, like
    r'[\w-]'

Here's what the library reference manual has to say:
[]
    Used to indicate a set of characters. Characters can be listed
    individually, or a range of characters can be indicated by giving
    two characters and separating them by a "-". Special characters are
    not active inside sets. For example, [akm$] will match any of the
    characters "a", "k", "m", or "$"; [a-z] will match any lowercase
    letter, and [a-zA-Z0-9] matches any letter or digit. Character
    classes such as \w or \S (defined below) are also acceptable inside
    a range. If you want to include a "]" or a "-" inside a set, precede
    it with a backslash, or place it as the first character. The pattern
    []] will match ']', for example.
           http://www.python.org/doc/current/lib/re-syntax.html

So you may want to write r'[-\w]' or r'[\w\-]' instead, based on my
reading.

The same goes for the later part of the pattern [\w-\.?=].

Jeff




More information about the Python-list mailing list