[Tutor] Web scraping using selenium and navigating nested dictionaries / lists.

mhysnm1964 at gmail.com mhysnm1964 at gmail.com
Sun Jan 27 06:03:43 EST 2019


Marco,

 

Thanks. The reason for learning selenium is for the automation. As I want to test web sites for keyboard and mouse interaction and record the results. That at least is the long term goal. In the short term, I will have a look at your suggestion.

 

 

From: Marco Mistroni <mmistroni at gmail.com> 
Sent: Sunday, 27 January 2019 9:46 PM
To: mhysnm1964 at gmail.com
Cc: tutor at python.org
Subject: Re: [Tutor] Web scraping using selenium and navigating nested dictionaries / lists.

 

Hi my 2 cents. Have a look at scrapy for scraping.selenium is v good  tool to learn but is mainly to automate uat of guis

Scrapy will scrape for you and u can automate it via cron. It's same stuff I am doing ATM

Hth

On Sun, Jan 27, 2019, 8:34 AM <mhysnm1964 at gmail.com <mailto:mhysnm1964 at gmail.com>  wrote:

All,



Goal of new project.

I want to scrape all my books from Audible.com that I have purchased.
Eventually I want to export this as a CSV file or maybe Json. I have not got
that far yet. The reasoning behind this is to  learn selenium  for my work
and get the list of books I have purchased. Killing two birds with one stone
here. The work focus is to see if selenium   can automate some of the
testing I have to do and collect useful information from the web page for my
reports. This part of the goal is in the future. As I need to build my
python skills up. 



Thus far, I have been successful in logging into Audible and showing the
library of books. I am able to store the table of books and want to use
BeautifulSoup to extract the relevant information. Information I will want
from the table is:

*       Author 
*       Title
*       Date purchased 
*       Length
*       Is the book in a series (there is a link for this)
*       Link to the page storing the publish details. 
*       Download link

Hopefully this has given you enough information on what I am trying to
achieve at this stage. AS I learn more about what I am doing, I am adding
possible extra's tasks. Such as verifying if I have the book already
download via itunes.



Learning goals:

Using the BeautifulSoup  structure that I have extracted from the page
source for the table. I want to navigate the tree structure. BeautifulSoup
provides children, siblings and parents methods. This is where I get stuck
with programming logic. BeautifulSoup does provide find_all method plus
selectors which I do not want to use for this exercise. As I want to learn
how to walk a tree starting at the root and visiting each node of the tree.
Then I can look at the attributes for the tag as I go. I believe I have to
set up a recursive loop or function call. Not sure on how to do this. Pseudo
code:



Build table structure

Start at the root node.

Check to see if there is any children.

Pass first child to function.

Print attributes for tag at this level 

In function, check for any sibling nodes.

If exist, call function again 

If no siblings, then start at first sibling and get its child.



This is where I get struck. Each sibling can have children and they can have
siblings. So how do I ensure I visit each node in the tree? 

Any tips or tricks for this would be grateful. As I could use this in other
situations.



Sean 

_______________________________________________
Tutor maillist  -  Tutor at python.org <mailto:Tutor at python.org> 
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list