[Tutor] Problem using lxml

Joel Goldstick joel.goldstick at gmail.com
Sat Aug 22 23:14:43 CEST 2015


On Sat, Aug 22, 2015 at 5:05 PM, Anthony Papillion <papillion at gmail.com> wrote:
> Hello Everyone,
>
> I'm pretty new to lxml but I pretty much thought I'd understood the basics.
> However, for some reason, my first attempt at using it is failing miserably.
>
> Here's the deal:
>
> I'm parsing specific page on Craigslist (
> http://joplin.craigslist.org/search/rea) and trying to retreive the text of
> each link on that page. When I do an "inspect element" in Firefox, a sample
> anchor link looks like this:
>
> <a href="/reb/5185592209.html" data-id="5185592209" class="hdrlnk">FIRST
> OPEN HOUSE TOMORROW 2:00pm-4:00pm!!! (8-23-15)</a>
>
> The code I'm using to try to get the link text is this:
>
> from lxml import html
> import requests
>
> page = requests.get("http://joplin.craigslist.org/search/rea")
> titles = tree.xpath('//a[@title="hdrlnk"]/text()')
> print titles
>
> The last line, where it supposedly will print the text of each anchor
> returns [].
>
> I can't seem to figure out what I'm doing wrong. lmxml seems pretty
> straightforward but I can't seem to get this down.
>
> Can anyone make any suggestions?
>
> Thanks!
> Anthony

Not an answer, but have you checked out Beautiful Soup?  It is a great
html parsing tool, with a good tutorial:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor



-- 
Joel Goldstick
http://joelgoldstick.com


More information about the Tutor mailing list