[Patches] [ python-Patches-1462525 ] URI parsing library

Sun Apr 2 02:20:20 CEST 2006

Patches item #1462525, was opened at 2006-04-01 04:30
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1462525&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Paul Jimenez (paulj)
Assigned to: Nobody/Anonymous (nobody)
Summary: URI parsing library

Initial Comment:
Per the original discussion at
http://mail.python.org/pipermail/python-dev/2005-November/058301.html
I'm submitting a library meant to deprecate the
existing urlparse library.  Questions and comments welcome.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2006-04-02 01:20

Message:
Logged In: YES 
user_id=261020

Some mostly-stylistic / minor comments on the patch from a
quick skim (I hope to post some comments on the trickier
issues later):

Follow PEP 8.  Some issues I noticed:

- Inconsistent use of case: URI vs. Uri.
- Triple-quoted docstrings should use " not ' for
editor-friendliness.
- Strings should not be abused as comments: If you mean to
use a docstring, use a docstring; otherwise, use a comment
(I'm referring here to your use of strings immediately
*before* def statements).
- import usage like import posixpath as ppath is usually
frowned upon: just import posixpath.
- Use of whitespace in e.g. dict displays and listcomps is
non-standard.  [x for x in y], not [ x for x in y ]
- Indentation in docstrings is non-standard.
- Docstring-writing conventions are non-standard.

Other things:

- Having read your original python-dev post, I still think
UrlParser / URIParser could be simpler.  I'll try and supply
an actual suggested patch later.
- MailToURIParser appears to support a different interface
to all the others.  If this is really necessary for
standards or pragmatic reasons, those parse and unparse
methods should just be separate functions.
- Documentation for the module is missing.  This would
document the API and perhaps briefly explain the background
(what's changed to require this new module) and correct
usage, briefly explaining terms like "URI reference".  Some
well-chosen examples are always good, of course.
- The tests should go in a separate module
test/test_<modulename>.py and follow the conventions there.
- Would be very nice to explicitly reference RFC 3986
section numbers in the code.  I'll try and do this when I
review it properly.
- Use of URI vs. URL distinction is incorrect.

Finally, just BTW:

http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
"""
The contemporary point of view among the working group that
oversees URIs is that the terms URL and URN are
context-dependent aspects of URI and rarely need to be
distinguished.
"""

Heh, spot on!  Still, like I said, I agree terms like "URI
reference" deserve to be adopted.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2006-04-02 00:07

Message:
Logged In: YES 
user_id=261020

This certainly seems needed (though I still haven't properly
read 3986 and 3987, and not sure how IRIs fit in with
everything else).  Perhaps a bit late for 2.5.

-1 on the name: makes it seem the difference between
urlparse and uriparse is something to do with the already
murky distinction between URIs and URLs.  How about rfc3986?
 Prosaic, but hits the nail on the head.

Must read those RFCs and review this...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1462525&group_id=5470