urllib.request giving unexpected results

Chris Angelico rosuav at gmail.com
Wed Nov 16 03:24:21 EST 2016


On Wed, Nov 16, 2016 at 7:09 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> I'm trying to download a file using urllib.request and pipe it straight to an
> external process. On Linux systems, the following is a test file that
> demonstrates the problem:
>
>
> --- cut ---
>
> #!/usr/bin/python3.5
>
> import urllib.request
> import subprocess
>
> TEST_URL = 'https://www.irs.gov/pub/irs-prior/f1040--1864.pdf'
>
> with urllib.request.urlopen(TEST_URL) as f:
>     data = subprocess.check_output(['file', '-'], stdin=f)
>     print(data)

Interesting.

rosuav at sikorsky:~$ python3
Python 3.7.0a0 (default:72e64fc8746b+, Oct 28 2016, 12:35:28)
[GCC 6.2.0 20161010] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> import subprocess
>>> TEST_URL = 'https://www.irs.gov/pub/irs-prior/f1040--1864.pdf'
>>> with urllib.request.urlopen(TEST_URL) as f:
...    data = subprocess.check_output(['tee', 'tmp/asdfasdf'], stdin=f)
...

rosuav at sikorsky:~/tmp$ hd asdfasdf |head
00000000  17 03 03 40 18 e9 b0 79  7c 03 c8 5d 21 40 2f 11  |... at ...y|..]!@/.|
00000010  4a a3 f1 4d e0 19 04 fc  42 84 d9 cf 59 0b f8 56  |J..M....B...Y..V|
00000020  7d 35 08 88 17 50 24 8c  26 fe d8 13 2b fd 14 55  |}5...P$.&...+..U|
00000030  16 81 c3 1e 13 ae 00 1d  d4 8e 9f 0f a4 19 bb 44  |...............D|
00000040  46 d5 bf 25 28 d0 b0 23  44 6f 1c ef 84 d9 82 9b  |F..%(..#Do......|
00000050  17 15 3a 11 e1 ec de 59  65 d7 ea 41 dc 53 07 70  |..:....Ye..A.S.p|
00000060  99 d5 11 75 b7 90 7e cd  46 b5 67 ee 9a 62 18 63  |...u..~.F.g..b.c|
00000070  36 7f 7b df a1 fb 6d b8  66 8b 2f 82 e6 05 7e aa  |6.{...m.f./...~.|
00000080  d7 9f 9e 05 cf 06 68 6b  c8 4c df 5e 24 9d 92 f6  |......hk.L.^$...|
00000090  3d 53 76 11 c1 70 05 14  94 e5 5b ec b0 cf 64 70  |=Sv..p....[...dp|


So that's what file(1) is seeing. My guess is that a urlopen object
isn't "file-like" enough for subprocess. Maybe it's showing a more
"raw" version?

ChrisA



More information about the Python-list mailing list