Using PIL to find separator pages

Steve Holden steve at holdenweb.com
Thu May 31 16:39:54 EDT 2007


Larry Bates wrote:
> I have a project that I wanted to solicit some advice
> on from this group.  I have millions of pages of scanned
> documents with each page in and individual .JPG file.
> When the documents were scanned the people that did
> the scanning put a colored (hot pink) separator page
> between the individual documents.  I was wondering if
> there was any way to utilize PIL to scan through the
> individual files, look at some small section on the
> page, and determine if it is a separator page by
> somehow comparing the color to the separator page
> color?  I realize that this would be some sort of
> percentage match where 100% would be a perfect match
> and any number lower would indicate that it was less
> likely that it was a coverpage.
> 
> Thanks in advance for any thoughts or advice.
> 
I suspect the easiest way would be to select a few small patches of each 
image and average the color values of the pixels, then normalize to hue 
rather than RGB.

Close enough to the hue you want (and you could include saturation and 
intensity too, if you felt like it) across several areas of the page 
would be a hit for a separator.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd           http://www.holdenweb.com
Skype: holdenweb      http://del.icio.us/steve.holden
------------------ Asciimercial ---------------------
Get on the web: Blog, lens and tag your way to fame!!
holdenweb.blogspot.com        squidoo.com/pythonology
tagged items:         del.icio.us/steve.holden/python
All these services currently offer free registration!
-------------- Thank You for Reading ----------------




More information about the Python-list mailing list