[Image-SIG] [PATCH] PIL produces corrupt PDFs

Nicholas Riley njriley at uiuc.edu
Sat Nov 22 11:48:55 EST 2003


I took a closer look at the PDF specification and the error messages I 
was getting, and it proved to be relatively trivial to fix PIL to 
generate correct PDFs.

The following code will (somewhat inefficiently) repair an incorrectly 
generated PDF.

import re

def fixImage(image):
     """PIL PDF driver 0.2 creates corrupted PDFs which neither 
pageCatcher nor
     Ghostscript can understand.  Returns a repaired version of such a 
PDF."""
     prefix = '%PDF-1.2\n% created by PIL PDF driver 0.2\n'
     if not image[:len(prefix)] == prefix:
         # no fixing necessary
         return image
     else:
         if type(image) != type(''):
             # may be a buffer
             image = str(image)
         inUnclosedObj = 0
         inUnclosedStream = 0
         lines = image.split('\n')
         fixed = ''
         xref = [0]
         for line in lines:
             if line == '% created by PIL PDF driver 0.2':
                 line = '% created by PIL PDF driver 0.2, repaired'
             if line == ('xref'):
                 break
             if re.match('[0-9]+ 0 obj', line):
                 xref.append(len(fixed))
             elif line == 'endobj' and inUnclosedStream:
                 line = 'endstream\n' + line
                 inUnclosedStream = 0
             fixed = '%s%s\n' % (fixed, line)
             if line == '/Contents 5 0 R':
                 inUnclosedObj = 1
             elif inUnclosedObj and line == '>>':
                 fixed += 'endobj\n'
                 inUnclosedObj = 0
             elif re.match('q [0-9]+ 0 0 [0-9]+ 0 0 cm /image Do Q', 
line):
                 inUnclosedStream = 1
         startxref = len(fixed)
         fixed += ("xref\n0 %d\n0000000000 65535 f \n" % len(xref))
         for x in xref[1:]:
             fixed += "%010d 00000 n \n" % x
         fixed += "trailer\n<<\n/Size %d\n/Root 1 0 R\n>>\n" % len(xref)
         fixed += "startxref\n%d\n%%%%EOF\n" % startxref
         return fixed

Since nobody responded to my post I can only assume nobody else uses 
PIL's PDF generation with anything but Acrobat, but in case anyone else 
needs it in future, a patch against PIL 1.1.4 is attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pil-pdf-fix.patch
Type: application/octet-stream
Size: 1817 bytes
Desc: not available
Url : http://mail.python.org/pipermail/image-sig/attachments/20031122/eb9ddb9e/pil-pdf-fix.obj
-------------- next part --------------

-- 
Nicholas Riley <njriley at uiuc.edu>


More information about the Image-SIG mailing list