BeautifulSoup help !!

Pierre-Alain Dorange pdorange at pas-de-pub-merci.mac.com
Fri Oct 7 12:59:06 EDT 2016


Navneet Siddhant <desolate.soul.me at gmail.com> wrote:

> I guess I will have to extract data from multiple divs as only extracting
> data from the parent div which has other divs in it with the different
> data is coming up all messed up. Will play around and see if I could get
> through it. Let me clarify once again I dont need complete code , a
> resource where I could find more info about using Beautifulsoup will be
> appreciated.  Also do I need some kind of plugin etc to extract data to
> csv ? or it is built in python and I could simply import csv and write
> other commands needed ??

BeautifulSoup was a good tool to do the task.
But it would require a bit more code to accomplish this.
I don't have enough time to look at the (coupon) data but i work on
similar task and often data are clustered throught many dov and html
tag. You need to understand (reverse ingeneering) the data structure and
extract piece of data (Beautifulsoup and lot of tools for that) then
aggregate the data in an internal structure (you design this according
to your needs : a class), the you sipmply export this aggregated data to
CSV using the csv module.
The main task is to extract data with BeautifulSoup.
BS provide tool to extract from div, or any html tag, just play with it
a little, and read docs.

Don't know if it could help but here a sample code i used sometime as an
example and also in real life to extract data (river level in real time)
from a french web site (vigiecrue) : retrieve the page, extract data,
extract last river level and mean the 24h last levels.

<https://www.dropbox.com/sh/k5974t374zmcoj6/AACes_Xo5DrxCbE1RjSaeKXYa?dl=0>
 
Note : it was probably not beautiful python code, but it works for the
purpose it was written.

-- 
Pierre-Alain Dorange               Moof <http://clarus.chez-alice.fr/>

Ce message est sous licence Creative Commons "by-nc-sa-2.0"
<http://creativecommons.org/licenses/by-nc-sa/2.0/fr/>



More information about the Python-list mailing list