pdf library.

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Tue Jan 1 07:38:08 EST 2008


On Tue, 01 Jan 2008 04:21:29 -0800, Shriphani wrote:

> On Jan 1, 4:28 pm, Piet van Oostrum <p... at cs.uu.nl> wrote:
>> >>>>>Shriphani<shripha... at gmail.com> (S) wrote:
>> >S> I tried pyPdf for this and decided to get the pagelinks. The trouble
>> >S> is that I don't know how to determine whether a particular page is the
>> >S> first page of a chapter. Can someone tell me how to do this ?
>>
>> AFAIK PDF doesn't have the concept of "Chapter". If the document has an
>> outline, you could try to use the first level of that hierarchy as the
>> chapter starting points. But you don't have a guarantee that they really
>> are chapters.
>
> How would a pdf to html conversion work ? I've seen Google's search
> engine do it loads of times. Just that running a 500odd page ebook
> through one of those scripts might not be such a good idea.

Heuristics?  Neither PDF nor HTML know "chapters".  So it might be
guesswork or just in your head.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list