data: protocol

Robin Becker robin at reportlab.com
Thu May 8 06:34:01 EDT 2014


On 08/05/2014 04:46, Steven D'Aprano wrote:
> On Wed, 07 May 2014 11:42:24 +0100, Robin Becker wrote:
>
>> I have an outstanding request for ReportLab to allow images to be opened
>> using the data: scheme. That used to be supported in python 2.7 using
>> urllib, but in python 3.3 urllib2 --> urllib and at least the default
>> urlopener doesn't support data:
>
>
> It looks like you intended to show an example, but left it out.
>
>> Is there a way to use the residual legacy of the old urllib code that's
>> now in urllib.URLopener to open unusual schemes? I know it can be used
>> directly eg
>>
>> urllib.request.URLopener().open('data:.........')
>>
>> but that seems to leave the splitting & testing logic up to me when it
>> logically belongs in some central place ie urllib.request.urlopen.
>
> You may need to explain in a little more detail. When you say "splitting
> and testing", what are you splitting and testing? It may also help if you
> show some Python 2.7 code that works, and what happens in 3.3.
>
>
OK not sure about 3.4, but in 3.3 the urllib module cannot open a request like this

C:\code-trunk\hg-repos\reportlab\tests>\python33\python.exe
Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit 
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
 >>> import urllib.request
 >>> 
urllib.request.urlopen('data:image/gif;base64,R0lGODdhAQABAIAAAP///////ywAAAAAAQABAAACAkQBADs=').read()
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "C:\python33\lib\urllib\request.py", line 156, in urlopen
     return opener.open(url, data, timeout)
   File "C:\python33\lib\urllib\request.py", line 469, in open
     response = self._open(req, data)
   File "C:\python33\lib\urllib\request.py", line 492, in _open
     'unknown_open', req)
   File "C:\python33\lib\urllib\request.py", line 447, in _call_chain
     result = func(*args)
   File "C:\python33\lib\urllib\request.py", line 1310, in unknown_open
     raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: data>
 >>>

in python27 one can do

C:\tmp>python
Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
 >>> import urllib
 >>> 
data=urllib.urlopen('data:image/gif;base64,R0lGODdhAQABAIAAAP///////ywAAAAAAQABAAACAkQBADs=').read()
 >>> len(data)
35
 >>>

and as indicated  by Ian Kelly in 3.4
C:\tmp>\python34\python.exe
Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:24:06) [MSC v.1600 32 bit 
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
 >>> import urllib.request
 >>> 
data=urllib.request.urlopen('data:image/gif;base64,R0lGODdhAQABAIAAAP///////ywAAAAAAQABAAACAkQBADs=').read()
 >>> len(data)
35



in 3.3 we have the old code URLopener class. However, when I use that I see this

C:\code-trunk\hg-repos\reportlab\tests>\python33\python.exe
Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit 
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
 >>> from urllib.request import URLopener
 >>> data = 
URLopener().open('data:image/gif;base64,R0lGODdhAQABAIAAAP///////ywAAAAAAQABAAACAkQBADs=').read()
 >>> len(data)
115
 >>> data
'Date: Thu, 08 May 2014 10:21:45 GMT\nContent-type: image/gif\nContent-Length: 
35\n\nGIF87a\x01\x00\x01\x00\x80\x00\x00├
┐├┐├┐├┐├┐├┐,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02D\x01\x00;'
 >>>

so I seem to be getting the real data and some headers now. I think this is 
different from what is expected, but that code is labelled as old/deprecated and 
possibly going away.

Since urllib doesn't always work as expected in 3.3 I've had to write a small 
stub for the special data: case. Doing all the splitting off of the headers 
seems harder than just doing the special case.

However, there are a lot of these 'schemes' so should I be doing this sort of 
thing? Apparently it's taken 4 versions of python to get urllib in 3.4 to do 
this so it's not clear to me whether all schemes are supposed to hang off 
urllib.request.urlopen or if instead of special casing the 3.3 data: I should 
have special cased a handler for it and injected that into my opener (or 
possibly the default opener). Doing the handler means I do have to handle the 
headers stuff whereas my stub is just returning the data bits.
-- 
Robin Becker



More information about the Python-list mailing list