[Tutor] PDF Scrapping

shawn wilson ag4ve.us at gmail.com
Wed Nov 25 14:38:57 EST 2015


On Nov 25, 2015 12:44 PM, "Francois Dion" <francois.dion at gmail.com> wrote:
>

> if you
> have any choice at all, avoid PDF at all cost to get data.
>

Agreed and IIRC all of that data should be in xml somewhere (look for their
rpc pages). Probably start by searching for similar table names (and Google
dorking their site for appropriate APIs and/or look through the code of w/e
tables you find). That's simpler than dealing with pdf. Might also try
emailing them and asking where the data came from (keeping in mind
thanksgiving is a federal holiday in the States so you won't get a reply
until Monday earliest). OTOH, they can just tell you to go away since pdf
is "open" - YMMV.


More information about the Tutor mailing list