[Patches] [ python-Patches-1462525 ] URI parsing library

Mon Apr 3 19:13:03 CEST 2006

Patches item #1462525, was opened at 2006-04-01 03:30
Message generated for change (Comment added) made by paulj
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1462525&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Paul Jimenez (paulj)
Assigned to: Nobody/Anonymous (nobody)
Summary: URI parsing library

Initial Comment:
Per the original discussion at
http://mail.python.org/pipermail/python-dev/2005-November/058301.html
I'm submitting a library meant to deprecate the
existing urlparse library.  Questions and comments welcome.

----------------------------------------------------------------------

>Comment By: Paul Jimenez (paulj)
Date: 2006-04-03 17:13

Message:
Logged In: YES 
user_id=25150

Oops. fix some editing bugs. 

----------------------------------------------------------------------

Comment By: Paul Jimenez (paulj)
Date: 2006-04-02 21:19

Message:
Logged In: YES 
user_id=25150

Naming:
  I also considered urlparse2 (ala urllib2) but liked having
a name without a version number attached.  rfc3986 would
also work I suppose, but seems a bit... clunky.

MailtoURIParser:
  You seem to have missed the point (probably due to my poor
documentation): none of the *URIParser classes are meant to
be directly used; they're just the default population of an
extensible structure that URIParser uses to do the work of
parsing.  

Let's move discussion to python-dev.  I'll put
changed/fixed/upgraded versions here as I adjust them due to
feedback.  Here's the first (adjusted due to your feedback).

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2006-04-02 00:32

Message:
Logged In: YES 
user_id=261020

Just a quick note listing some of the things I intend to
worry about <wink>:

1. IRIs

2. Python unicode strings

3. Percent-encoding.  See 1. and 2.

4. Interaction with other stdlib modules

5. RFC 3986 compliance (duh :-)

It certainly seemed from a brief email discussion with Mike
Brown a while back (who knows all this 10 times better than
me) that 1., 2. and 3. are not so easily brushed under the
carpet as you hope, but I'm very glad if you're right!-)

I think these things need to be at least thought through by
a few people before rushing a new module into the stdlib: we
already have two modules containing outdated URL parsing
code, we don't want to end up with a third one.

Don't want to sound negative though, it's great that you
wrote this!

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2006-04-02 00:20

Message:
Logged In: YES 
user_id=261020

Some mostly-stylistic / minor comments on the patch from a
quick skim (I hope to post some comments on the trickier
issues later):

Follow PEP 8.  Some issues I noticed:

- Inconsistent use of case: URI vs. Uri.
- Triple-quoted docstrings should use " not ' for
editor-friendliness.
- Strings should not be abused as comments: If you mean to
use a docstring, use a docstring; otherwise, use a comment
(I'm referring here to your use of strings immediately
*before* def statements).
- import usage like import posixpath as ppath is usually
frowned upon: just import posixpath.
- Use of whitespace in e.g. dict displays and listcomps is
non-standard.  [x for x in y], not [ x for x in y ]
- Indentation in docstrings is non-standard.
- Docstring-writing conventions are non-standard.

Other things:

- Having read your original python-dev post, I still think
UrlParser / URIParser could be simpler.  I'll try and supply
an actual suggested patch later.
- MailToURIParser appears to support a different interface
to all the others.  If this is really necessary for
standards or pragmatic reasons, those parse and unparse
methods should just be separate functions.
- Documentation for the module is missing.  This would
document the API and perhaps briefly explain the background
(what's changed to require this new module) and correct
usage, briefly explaining terms like "URI reference".  Some
well-chosen examples are always good, of course.
- The tests should go in a separate module
test/test_<modulename>.py and follow the conventions there.
- Would be very nice to explicitly reference RFC 3986
section numbers in the code.  I'll try and do this when I
review it properly.
- Use of URI vs. URL distinction is incorrect.

Finally, just BTW:

http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
"""
The contemporary point of view among the working group that
oversees URIs is that the terms URL and URN are
context-dependent aspects of URI and rarely need to be
distinguished.
"""

Heh, spot on!  Still, like I said, I agree terms like "URI
reference" deserve to be adopted.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2006-04-01 23:07

Message:
Logged In: YES 
user_id=261020

This certainly seems needed (though I still haven't properly
read 3986 and 3987, and not sure how IRIs fit in with
everything else).  Perhaps a bit late for 2.5.

-1 on the name: makes it seem the difference between
urlparse and uriparse is something to do with the already
murky distinction between URIs and URLs.  How about rfc3986?
 Prosaic, but hits the nail on the head.

Must read those RFCs and review this...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1462525&group_id=5470