[Web-SIG] [stdlib-sig] Choosing one of two options for url* in the stdlib reorg

Sun Mar 2 14:48:36 CET 2008

On 2008-03-01 21:13, Brett Cannon wrote:
> On Sat, Mar 1, 2008 at 4:34 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 2008-03-01 05:06, Brett Cannon wrote:
>>  > Seriously, I just don't want to support two different approaches to
>>  > the same problem.
>>
>>  Then what makes you believe that the urllib2 approach is the
>>  better one ?
>>
>>  Why not move urllib2 to PyPI and keep urllib ?
>>
> 
> Well, I have personal experience where urllib2 was much easier to use
> for some custom fetching than urllib.
> 
> But I get your point. If it comes down to preference then your
> argument is to choose the one the is used more widely.

Right.

I also believe that having a choice is more useful than trying
to invent the One Right Way. This may exist for simple problems,
but as soon as things get more complicated limiting yourself to
just one path on the search tree is bound to cause problems.

>>  >>  It's not really an argument for dropping the more used module in
>>  >>  favor of a different module without any real benefit.
>>  >
>>  > Benefit to old users, no. Benefit to the developers, definitely.
>>  > Benefit to new users, yes as there will be less to deal with.
>>
>>  Same question as above.
>>
>>
>>  >>  You have to ask yourself whether
>>  >>  it's ok to ask the maintainers of those ~1000 code modules
>>  >>  using urllib for subclassing from the two main classes
>>  >>  URLopener and FancyURLopener to download an external dependency
>>  >>  from PyPI or ship the module with their code.
>>  >
>>  > Well, I obviously think it is.
>>
>>  Please explain. I have yet to see a single comment explaining why
>>  urllib2 would be the better choice - if there's really a need to
>>  decide (which I don't think there really is).
>>
>>  If you can put up some sound arguments for why urllib2 is better
>>  than urllib, we could move the discussion forward. If not, then
>>  I don't really see any benefit in having the discussion at all.
> 
> Well, look at the docs for urllib. There is a list of restrictions
> (e.g., does not support the use of proxies which require
> authentication). From what I can tell, those items on the list that
> are an actual restriction do not carry over to urllib2. 

I'm not sure I follow you: urllib *does* support proxies that
require authentication (see the .open_http() method).

> Another thing,
> how do you add a custom line to the header for the request in urllib
> (e.g., Referer)? The docs for URLOpener don't seem to provide a way.
> urllib2, on the other hand, has a very specific way to add headers.

That's easy:

class URLReader(urllib.URLopener):

    # Crawler name
    agentname = 'mxHTMLTools-Crawler'

    def __init__(*args):

        """ Add a user-agent header to the HTTP requests.
        """
        self = args[0]
        apply(urllib.URLopener.__init__, args)
        # Override the default settings for self.addheaders:
        assert len(self.addheaders) == 1
        self.addheaders = [
            ('user-agent', '%s/%s' % (self.agentname, HTMLTools.__version__)),
            ]
    ...

> But as I said in my last email, I am happy to include URLOpener if
> some other people are willing to back the idea up.

Fair enough.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 02 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611