[XML-SIG] Ideas for web/ package

Fred L. Drake, Jr. fdrake@acm.org
Fri, 15 Feb 2002 13:14:59 -0500


Andrew Kuchling writes:
 > As part of the RELAX NG stuff, I've discovered that urlparse() is
 > really lenient in its parsing.  For example, the fragment value is ''
 > if no fragment is supplied, so you can't distinguish between
 > http://www.amk.ca and http://www.amk.ca# .  Unfortunately this can't

It's not clear that the distinction is meaningful in the RFC, as best
as I can recall (it's been a couple of months since I looked at it).

 > really be fixed without changing the API of urlparse() and breaking
 > old code.

That's a big issue.  I added some new functions in Python 2.2
(urlsplit() and urlunsplit()), but they won't address your concern
about fragments.

 > 1) a stricter URL parser, and

You'll have to be more specific about requirements than this!  You're
asking for lexical information about the URL rather than logical
information; I'm not sure that's even come up before.

 > 2) the skeleton of a Web client that
 > handles cookies and caching sensibly (so you could write
 > screen-scraping applications on top of it).

This would be *really* nice to have!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation