URL parsing for the hard cases

Miles semanticist at gmail.com
Mon Jul 23 01:51:45 EDT 2007


On 7/23/07, John Nagle wrote:
> Here's another hard case.  This one might be a bug in urlparse:
>
> import urlparse
>
> s = 'ftp://administrator:password@64.105.135.30/originals/6 june
> 07/ebay/login/ebayisapi.html'
>
> urlparse.urlparse(s)
>
> yields:
>
> (u'ftp', u'administrator:password at 64.105.135.30', u'/originals/6 june
> 07/ebay/login/ebayisapi.html', '', '', '')
>
> That second field is supposed to be the "hostport" (per the RFC usage
> of the term; Python uses the term "netloc"), and the username/password
> should have been parsed and moved to the "username" and "password" fields
> of the object. So it looks like urlparse doesn't really understand FTP URLs.

Those values aren't "moved" to the fields; they're extracted on the
fly from the netloc.  Use the .hostname property of the result tuple
to get just the hostname.

-Miles



More information about the Python-list mailing list