[BangPypers] PyPDF to read hindi

Amal raj.amal at gmail.com
Wed Jun 2 11:58:19 CEST 2010


Hi Aaditya,
 Actually reading hindi text is not as simple as reading english text. Most
of the Hindi PDFs don't have standard encoding.

And Encoding is value given to each Unicode code point.
And each encoding corresponds to font representation.
So a PDF takes the encoding, maps it to a font using a Font map and then
renders the font. It does not know what character it is.
So For reading most of hindi PDFs, we have to know the encoding to character
mapping.

I worked in my previous company with Dainik Bhaskar, and other hindi
newspaper PDFs and faced the same problem.
So a generic hindi PDF to text is not possible.

But if u know a specific encoding, then u u might be able to write a
specific Hindi PDF to text.

Amal.

On Wed, Jun 2, 2010 at 2:50 AM, Srinivas Reddy Thatiparthy <
srinivas_thatiparthy at akebonosoft.com> wrote:

> Hindhi is a unicode text , your input data should be treated as Unicode
> instead of
> ASCII and last but not the least the encoding format in editor should be
> set to unicode ,otherwise you see garbled text.
>
>
> This is my guess , i have never worked with unicode in python.If i am wrong
> please correct me.
>
> Thanks&Regards,
> Srinivas Reddy Thatiparthy,
> Mobile:9393099772,
>
>
>
> -----Original Message-----
> From: bangpypers-bounces+srinivas_thatiparthy=akebonosoft.com at python.orgon behalf of AADITYA SRIRAM
> Sent: Wed 6/2/2010 2:22 PM
> To: bangpypers at python.org
> Subject: [BangPypers] PyPDF to read hindi
>
> Hi guys, i am writing a small program to convert pdf to text files(i know
> its easy and lame but need to start somewhere !!), anyway i am not bale to
> rip the hindi text in readable form :( can anyone please help ? Its working
> fine with english text .
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>
>
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>
>


More information about the BangPypers mailing list