[Image-SIG] using PIL to process very large images

Tue May 11 16:19:23 EDT 2004

I am working on a Zope product using PIL that will easily enable Zope to
directly support Zoomify (www.zoomify.com). Basically, you add a high
resolution image to Zope and it automatically creates tiles at various
scales that are streamed to a Flash viewer so that very large image files
can be viewed through the Web--only the necessary image data is streamed
to the client.

I have been working directly with Zoomify Corp. on this and hope to
release it this Summer under the Zope public license. I have it working
correctly, but the problem is that the Windows/Mac version of this
software, written in C++ can handle images that are many gigabytes large.
(Yes, you read that right, there are people who have published 20+
gigabyte images on the Web using Zoomify.) I am currently optimizing the
product so that you can store the image data on the filesystem instead of
the ZODB, providing a command line interface so you don't have to use the
ZMI, etc. But what I can't seem to pin down is the PIL code I am using.

The basic approach is this: start with the full image, and create 256x256
'tiles' using the crop method and save these images using a naming
convention, then use the resize method to scale the original image in half
and create tiles again until the entire image is scaled to below 256x256.

This approach is technically correct and for small to moderately large
images, this works fine, but for images in the gigabyte range, I worried
about the amount of image data that is loaded into memory. Although it
looks like the crop method probably only loads in the 256x256 area it
needs each time, I assume that the resize method is loading a lot (all?)
of the image data into memory and returning an image object that holds all
of its image data in memory. (I found the load method in PIL, but this
doesn't work with JPEGs, and I'm not sure I would know how to use it
properly anyway.)

I am resorting to some very ugly code to take progressively bigger tiles
and resize them down instead of resizing the whole image, then tiling.
After getting to a certain point, I do resize the whole image, but at the
sixth scale level, betting that the resize method aggressively manages the
data it is reading in, and that at the sixth level, the image returned
will be small enough to hold in memory easily. I keep thinking there has
to be an easier way to do this, closer to the original 'stupid but it
works' approach I began with--PIL seems very mature and very widely used,
and I assume that others have worked with very large images using it. Or
is there a lot going on behind the scenes that I don't see, and I don't
need to worry about the memory issue?

I am still fairly new to Zope and Python and am very new to this kind of
image processing and handling of large binary files. Because of this, I
have been hesitant to dive into the PIL source to see what is going on,
because I am not confident that I will understand it!

Any guidance is most appreciated. If the Zoomify capability sounds
interesting to you and you want to directly join in on the fun, I am very
open to sharing this with other developers. (A little more background--the
'official' Zoomify software that does the tiling is closed source, and
only available for Mac and Win. I am deploying on Solaris and needed a way
to integrate it with Zope running there. Luckily, the Zoomify folks have
been very open to my releasing this under the Zope public license and have
been very helpful in the development of the Python version of their
software. And, when you buy a license to Zoomify, you do get the source
for the Flash and Java version of the viewer, so this Zope product could
be the basis for using Zoomify in an open way.)

Thanks,

____________________________
adam smith
ajs17 at cornell.edu
255-8893
215 ccc