Using PIL to find separator pages

Steve Holden steve at holdenweb.com
Fri Jun 1 12:35:03 EDT 2007


Larry Bates wrote:
> Steve Holden wrote:
>> Larry Bates wrote:
>>> I have a project that I wanted to solicit some advice
>>> on from this group.  I have millions of pages of scanned
>>> documents with each page in and individual .JPG file.
>>> When the documents were scanned the people that did
>>> the scanning put a colored (hot pink) separator page
>>> between the individual documents.  I was wondering if
>>> there was any way to utilize PIL to scan through the
>>> individual files, look at some small section on the
>>> page, and determine if it is a separator page by
>>> somehow comparing the color to the separator page
>>> color?  I realize that this would be some sort of
>>> percentage match where 100% would be a perfect match
>>> and any number lower would indicate that it was less
>>> likely that it was a coverpage.
>>>
>>> Thanks in advance for any thoughts or advice.
>>>
>> I suspect the easiest way would be to select a few small patches of each
>> image and average the color values of the pixels, then normalize to hue
>> rather than RGB.
>>
>> Close enough to the hue you want (and you could include saturation and
>> intensity too, if you felt like it) across several areas of the page
>> would be a hit for a separator.
>>
>> regards
>>  Steve
> 
> Steve,
> 
> I'm completely lost on how to proceed.  I don't know how to average color
> values, normalize to hue...  Any guidance you could give would be greatly
> appreciated.
> 
> Thanks in advance,
> Larry

I'd like to help but I don't have any sample code to hand. Maybe someone 
who does could give you more of a clue. Let's hope so, anyway ...

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd           http://www.holdenweb.com
Skype: holdenweb      http://del.icio.us/steve.holden
------------------ Asciimercial ---------------------
Get on the web: Blog, lens and tag your way to fame!!
holdenweb.blogspot.com        squidoo.com/pythonology
tagged items:         del.icio.us/steve.holden/python
All these services currently offer free registration!
-------------- Thank You for Reading ----------------




More information about the Python-list mailing list