Does Python allow variables to be passed into function for dynamic screen scraping?

Laura Creighton lac at openend.se
Sat Nov 28 17:44:21 EST 2015


In a message of Sat, 28 Nov 2015 14:37:26 -0800, ryguy7272 writes:
>On Saturday, November 28, 2015 at 5:28:55 PM UTC-5, Laura Creighton wrote:
>> In a message of Sat, 28 Nov 2015 14:03:10 -0800, ryguy7272 writes:
>> >I'm looking at this URL.
>> >https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names
>> >
>> >If I hit F12 I can see tags such as these:
>> ><a title=
>> ><a class=
>> >And so on and so forth.  
>> >
>> >I'm wondering if someone can share a script, or a function, that will allow me to pass in variables and download (or simply print) the results.  I saw a sample online that I thought would work, and I made a few modifications but now I keep getting a message that says: ValueError: All objects passed were None
>> >
>> >Here's the script that I'm playing around with.
>> >
>> >import requests
>> >import pandas as pd
>> >from bs4 import BeautifulSoup
>> >
>> >#Get the relevant webpage set the data up for parsing
>> >url = "https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names"
>> >r = requests.get(url)
>> >soup=BeautifulSoup(r.content,"lxml")
>> >
>> >#set up a function to parse the "soup" for each category of information and put it in a DataFrame
>> >def get_match_info(soup,tag,class_name):
>> >    info_array=[]
>> >    for info in soup.find_all('%s'%tag,attrs={'class':'%s'%class_name}):
>> >        return pd.DataFrame(info_array)
>> >
>> >#for each category pass the above function the relevant information i.e. tag names
>> >tag1 = get_match_info(soup,"td","title")
>> >tag2 = get_match_info(soup,"td","class")
>> >
>> >#Concatenate the DataFrames to present a final table of all the above info 
>> >match_info = pd.concat([tag1,tag2],ignore_index=False,axis=1)
>> >
>> >print match_info
>> >
>> >I'd greatly appreciate any help with this.
>> 
>> Post your error traceback.  If you are getting Value Errors about None,
>> then probably something you expect to return a match, isn't.  But without
>> the actual error, we cannot help much.
>> 
>> Laura
>
>
>Ok.  How do I post the error traceback?  I'm using Spyder Python 2.7.

You cut and paste it out of wherever you are reading it, and paste it
into the email, along with your code, also cut and pasted from somewhere
(like an editor).  That way we get the exact code that caused the exact
traceback you are getting.

Laura




More information about the Python-list mailing list