reading text in pdf, some working sample code

Wed Nov 22 02:37:20 EST 2017

Daniel Gross <grossd18 at gmail.com> writes:
> I am new to python and jumped right into trying to read out (english) text
> from PDF files.
>
> I tried various libraries (including slate)

You could give "pdfminer" a try.

Note, however, that it may not be possible to extract the text:
PDF is a generic format which works by mapping character codes to glyphs
(i.e. visual symbols); if your PDF uses a special map for this
(especially with non standard glyph collections (aka "font"s)),
then the text extraction (which in fact extracts sequences
of character codes) can give unusable results.