better regular expression?

Robert Brewer fumanchu at amor.org
Mon Dec 6 20:04:02 EST 2004


Vivek wrote:
> I am trying to construct a regular expression using the re module that
> matches for
> 1. my hostname
> 2. absolute from the root URLs including just "/"
> 3. relative URLs.
> 
> Basically I want the attern to not match for URLs that are not on my
> host.

Far easier would be grabbing the URL's and then using
urlparse.urlparse() on them. Relative paths should be combined with the
base scheme://location/path. When you want to see if they are on your
host, just use .startswith(). If you're worried about ../, make the
paths concrete (os paths) and call os.path.normpath before comparing
them.


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org



More information about the Python-list mailing list