Regular expressions
Tim Chase
python.list at tim.thechases.com
Thu Nov 5 09:00:00 EST 2015
On 2015-11-05 23:05, Steven D'Aprano wrote:
> Oh the shame, I knew that. Somehow I tangled myself in a knot,
> thinking that it had to be 1 *followed by* zero or more characters.
> But of course it's not a glob, it's a regex.
But that's a good reminder of fnmatch/glob modules too. Sometimes
all you need is to express a simple glob, in which case using a
regexp can cloud the clarity.
The overarching principle is to go for clarity & simplicity, rather
than favoring built-ins/glob/regex/parser modules all the time.
Want to test for presence in a string? Just use the builtin "a in b"
test. At the beginning/end? Use .startswith()/.endswith() for
clarity. Need to check if a string is purely
digits/alpha/alphanumerics/etc? Use the
string .is{alnum,alpha,decimal,digit,identifier,lower,numeric,printable,space,title,upper}
methods on the string.
For simple wild-carding, use the fnmatch module to do simple
globbing.
For more complex pattern matching, you've got regexps.
Finally, for occasions when you're searching for repeated/nested
structures, using an add-on module like pyparsing will give you
clearer code.
Oh, and with regexps, people should be less afraid of verbose
multi-line strings with commenting
r = re.compile(r"""
^ # start of the string
(?P<year>\d{4}) # capture 4 digits
- # a literal dash
(?P<month>\d{1,2}) # capture 1-2 digits
- # another literal dash
(?P<day>\d{1,2}) # capture 1-2 digits
_ # a literal underscore
(?P<accountnum> # capture the account-number
[A-Z]{1,3} # 1-3 letters
\d+ # followed by 1+ digits
)
\.txt # the extension of the file (ignored)
$ # the end of the string
""", re.VERBOSE)
They are a LOT easier to come back to if you haven't touched the code
for a year.
-tkc
More information about the Python-list
mailing list