[Tutor] Web scraping using selenium and navigating nested dictionaries / lists.

Marco Mistroni mmistroni at gmail.com
Sun Jan 27 05:46:06 EST 2019


Hi my 2 cents. Have a look at scrapy for scraping.selenium is v good  tool
to learn but is mainly to automate uat of guis
Scrapy will scrape for you and u can automate it via cron. It's same stuff
I am doing ATM
Hth

On Sun, Jan 27, 2019, 8:34 AM <mhysnm1964 at gmail.com wrote:

> All,
>
>
>
> Goal of new project.
>
> I want to scrape all my books from Audible.com that I have purchased.
> Eventually I want to export this as a CSV file or maybe Json. I have not
> got
> that far yet. The reasoning behind this is to  learn selenium  for my work
> and get the list of books I have purchased. Killing two birds with one
> stone
> here. The work focus is to see if selenium   can automate some of the
> testing I have to do and collect useful information from the web page for
> my
> reports. This part of the goal is in the future. As I need to build my
> python skills up.
>
>
>
> Thus far, I have been successful in logging into Audible and showing the
> library of books. I am able to store the table of books and want to use
> BeautifulSoup to extract the relevant information. Information I will want
> from the table is:
>
> *       Author
> *       Title
> *       Date purchased
> *       Length
> *       Is the book in a series (there is a link for this)
> *       Link to the page storing the publish details.
> *       Download link
>
> Hopefully this has given you enough information on what I am trying to
> achieve at this stage. AS I learn more about what I am doing, I am adding
> possible extra's tasks. Such as verifying if I have the book already
> download via itunes.
>
>
>
> Learning goals:
>
> Using the BeautifulSoup  structure that I have extracted from the page
> source for the table. I want to navigate the tree structure. BeautifulSoup
> provides children, siblings and parents methods. This is where I get stuck
> with programming logic. BeautifulSoup does provide find_all method plus
> selectors which I do not want to use for this exercise. As I want to learn
> how to walk a tree starting at the root and visiting each node of the tree.
> Then I can look at the attributes for the tag as I go. I believe I have to
> set up a recursive loop or function call. Not sure on how to do this.
> Pseudo
> code:
>
>
>
> Build table structure
>
> Start at the root node.
>
> Check to see if there is any children.
>
> Pass first child to function.
>
> Print attributes for tag at this level
>
> In function, check for any sibling nodes.
>
> If exist, call function again
>
> If no siblings, then start at first sibling and get its child.
>
>
>
> This is where I get struck. Each sibling can have children and they can
> have
> siblings. So how do I ensure I visit each node in the tree?
>
> Any tips or tricks for this would be grateful. As I could use this in other
> situations.
>
>
>
> Sean
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list