[Image-SIG] PIL and 16-bit image types -- an offer of help

Thu Nov 30 03:51:50 CET 2006

Hello folks,

SUMMARY:
I am offering to implement fixes in PIL to make its handling of 16- 
bit image types less vexing. Are the PIL maintainers interested in  
such help? I have a few questions to ask, and then I could get to  
work if some maintainer would give me a preliminary go-ahead.

THE FULL STORY:
I'm trying to use PIL for a scientific imaging library.  
Unfortunately, many of the images I need to deal with are 16-bit  
grayscale image files; an image mode that PIL does not deal well  
with. Thus I will need to either write work-arounds in my code to fix  
the problems, or fix PIL itself. I would prefer to do the latter.

I previously sent a patch to this list to allow for proper reading of  
16-bit TIFF files; however this patch is a very simple-minded and I  
think that with a few more changes, PIL could support 16-bit  
grayscale files in a much better way. However there's a major  
philosophical issue that needs to be addressed first!

Specifically, does mode 'I;16' mean '16-bit little-endian integers'  
or '16-bit native integers'? In practice the code uses the former  
definition; however, Imaging.h declares the latter to be the case.

The real issue is that all other multi-byte types like 'I' and 'F'  
are stored as native byte order in memory, regardless of how they are  
read in. Thus, both users and developers are abstracted from the  
question of byte ordering. (Except that developers still need to care  
about endian-ness at serialization time.)

However, 16-bit image types are not so insulated, which is what makes  
them so vexing in PIL. This makes double the work for trying to add  
16-bit compatibility to any image function, because you have to write  
the compatibility twice, once for each endian-ness. Also, writing a  
function to deal with a particular byte ordering which may or may not  
be native is both error-prone and inefficient.

The real problem is just one of nomenclature. Pack.c and Unpack.c  
make a distinction between 'raw modes' like 'I;32' (which implicitly  
means 32-bit little-endian) and normal use-level 'modes' like  
'I' (which means 32-bit native endian). However, the use of 'I;16' as  
a user-level image mode has clouded issues, because even at the user  
level it means 'little endian'. These subtle differences in meaning  
cause a lot of the 16-bit manipulation bugs that I've seen in PIL so  
far.

I think that Imaging.h is correct in that 'I;16' ought to be treated  
as native byte order when it is used as an image mode (just like 'I'  
is). However, as a raw mode, 'I;16' needs to mean 'little-endian'  
just as 'I;32' means 'little endian'. This change wouldn't be too  
hard to make, and it would be (mostly) backwards compatible.

However, having one name for two different entities (a 'mode' with  
native order and a 'raw mode' with little-endian order) is likely to  
be very confusing and the source of future bugs. It seems like the  
better solution would be to add a new '16-bit unsigned native byte  
order' image type to PIL -- maybe 'S' for 'short' -- and reserve 'I; 
16' and 'I;16B' strictly for raw modes. The only problem would be  
that this would break some older code that relied on these  
'experimental' features.

Is anyone interested in discussing these options (and several other  
bugs in PIL's image packing and unpacking that I've discovered in  
looking at the code)? I'm happy to take this on, since these are  
changes I need to make anyway for my project and I'd rather see them  
in the PIL trunk than in my own fork.

Thanks,

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine