[SciPy-user] The IO library and image file formats -- compare with with PIL

Zachary Pincus zachary.pincus at yale.edu
Mon Apr 21 11:31:25 EDT 2008


> On 21/04/2008, Zachary Pincus <zachary.pincus at yale.edu> wrote:
>> To answer Stéfan's earlier question of how I see things fitting
>> together, I *think* that the pure-python file format interpretation
>> code could be used (either by importing from PIL or using patched
>> copies as needed) to figure out what a given image file type is, and
>> where in the file the pixels are stored. Then the relevant region of
>> the file would be passed through python.zipfile/deflate/etc. if  
>> needed
>> to decompress the pixels, and sent to numpy for unpacking the bits
>> from the string.
>
> So this is the bit that I don't understand.  Those pixel values are
> encoded, so which component do you use to take the data chunk and
> convert it to actual pixel values?

numpy.fromstring takes a byte sequence and unpacks it into an array of  
a specified shape and data type. Most image file formats are just  
different ways of putting byte sequences on disk and specifying how  
they were compressed, if at all. Most formats have either no  
compression, or LZW/Deflate/zlib-style compression, for which there  
are already python libraries.

So for example, reading a TIFF file would consist of looking at the  
header to determine the pixel format, image size, and compression,  
then rooting around in the file to assemble the relevant bytes, then  
running that through deflate (most often), and passing the resulting  
string to numpy.fromstring. Same for PNG, or most anything that's not  
JPEG. Writing is similar.

Again, what I'm imagining wouldn't be a full-featured image IO  
library, but something lightweight with no dependencies outside of  
numpy, and potentially (if JPEG decoding isn't desired), no C- 
extensions. (One could conceivably use numpy to do JPEG encoding and  
decoding, but I've no interest in doing that...)

This is all just an idea, and I'm not convinced whether it's a great  
idea. But I just wanted to put the suggestion out there...

Zach


More information about the SciPy-User mailing list