Reading image dimensions with PIL
Will McGugan
news at NOwillmcguganSPAM.com
Wed May 18 05:02:11 EDT 2005
Dave Brueck wrote:
>
>
> If you're tossing images that are too _small_, is there any benefit to
> not downloading the whole image, checking it, and then throwing it away?
Its a 'webscraper' app that downloads images based on search criteria.
The user may want only images above 640x480, although the general case
will be something like 200x200 to avoid downloading thumbnails
>
> Checking just the first 1K probably won't save you too much time unless
> you're over a modem. Are you using a byte-range HTTP request to pull
> down the images or just a normal GET (via e.g. urllib)? If you're not
> using a byte-range request, then all of the data is already on its way
> so maybe you could go ahead and get it all.
I'm not familiar with byte-range requests. Is this a standard feature of
webservers? I know there will be more that one K in the pipeline if I do
a read, but if I close the file object from urllib it will stop the
download if there is data remaining - wont it?
>
> But hey, if your current approach works... :) It _is_ a bit
> unconventional, so to reduce the risk you could test it on a decent mix
> of image types (normal JPEG, progressive JPEG, normal & progressive GIF,
> png, etc.) - just to make sure PIL is able to handle partial data for
> all different types you might encounter.
>
> Also, if PIL can't handle the partial data, can you reliably detect that
> scenario? If so, you could detect that case and use the
> download-it-all-and-check approach as a failsafe.
The PIL code worked with most of the images I threw at it (just jpegs),
if there was no 'size' attribute then I just continue to download the
entire image. It may have caused a memory leak though, with this code in
memory usage increased continuously..
Actualy, this may all be moot now. Originally I looked at reading the
image dimensions from the jpeg header, but that turned out to be
non-trivial and I gave up. Fortunately I found some Perl code that does
it, and converted it to Python (and I dont even know Perl!). Here's the
code if anyone is interested..
import struct
def GetJpegSize(data):
idata = iter(data)
width = None
height = None
try:
B1 = ord(idata.next())
B2 = ord(idata.next())
if B1 != 0xFF or B2 != 0xD8:
return -1, -1
while True:
byte = ord(idata.next())
while byte != 0xFF:
byte = ord(idata.next())
while byte == 0xFF:
byte = ord(idata.next())
if byte >= 0xc0 and byte <= 0xc3:
idata.next()
idata.next()
idata.next()
height, width = struct.unpack( '>HH',
"".join(idata.next() for b in range(4)) )
break
else:
offset = struct.unpack('>H', idata.next() +
idata.next())[0] - 2
for _ in xrange(offset):
idata.next()
except StopIteration:
pass
return width, height
if __name__ == "__main__":
first_k = file("test.jpg","rb").read(1024)
print GetJpegSize(first_k)
Returns (-1, -1) for a non-jpeg, or (None, None) if the size wasn't
contained in the data supplied (some jpegs have embedded thumbnails), or
(width, height) if the dimensions were found.
And the original source: http://wiki.tcl.tk/757
Thanks,
Will
--
http://www.willmcgugan.com
"".join( [ {'*':'@','^':'.'}.get(c,None) or chr(97+(ord(c)-84)%26) for c
in "jvyy*jvyyzpthtna^pbz" ] )
More information about the Python-list
mailing list