[Tutor] matching a street address with regular expressions

John Nagle nagle at animats.com
Fri Oct 12 00:50:12 EDT 2007


Shawn Milochik wrote:
> On 10/4/07, Ricardo Aráoz <ricaraoz at gmail.com> wrote:
>> Christopher Spears wrote:
>>> One of the exercises in Core Python Programming is to
>>> create a regular expression that will match a street
>>> address.  Here is one of my attempts.

    This is actually quite difficult to do well.  Worse,
regular expressions are the wrong tool for the job,
because addresses are properly parsed in reverse, from the END of
the address.  See the USPS Postal Address Standards at
"http://pe.usps.gov/cpim/ftp/pubs/Pub28/pub28.pdf".  Also
worth reading is "Frank's Compulsive Guide to Postal Addresses"
at "http://www.columbia.edu/kermit/postal.html".

    Here's a fun exercise: convert this address parser in Perl
to Python:

http://cpan.uwinnipeg.ca/htdocs/Geo-StreetAddress-US/Geo/StreetAddress/US.pm.html

There are features in those regular expressions that I can't find in three
Perl books or the online documentation.

    If anyone has a first-rate address parser in Python that will cover
most of the developed world, I'd like to talk to them.

				John Nagle
				SiteTruth



More information about the Python-list mailing list