parse html:what is the meaning of "//"?

Stefan Behnel stefan_ml at behnel.de
Fri Sep 16 07:02:06 EDT 2011


alias, 16.09.2011 08:39:
> code1:
> import lxml.html
> import urllib
> down='http://finance.yahoo.com/q/op?s=C+Options'
> content=urllib.urlopen(down).read()
> root=lxml.html.document_fromstring(content)

I see this quite often, but many people don't know that this can be 
simplified to

     import lxml.html
     url = 'http://finance.yahoo.com/q/op?s=C+Options'
     root = lxml.html.parse(url).getroot()

which is less code, but substantially more efficient.


> table = root.xpath("//table[@class='yfnc_mod_table_title1']")[0]
> tds=table.xpath("tr[@valign='top']//td")
> for  td  in tds:
>      print  td.text_content()
>
> what i get is :
> Call Options
> Expire at close Friday, September 16, 2011
> these are waht i want.
>
> code2
> import lxml.html
>   import urllib
>   down='http://finance.yahoo.com/q/op?s=C+Options'
>   content=urllib.urlopen(down).read()
>   root=lxml.html.document_fromstring(content)
>   table = root.xpath("//table[@class='yfnc_mod_table_title1']")[0]
>   tds=table.xpath("//tr[@valign='top']//td")

Here, you are looking for all "tr" tags in the table recursively, instead 
of taking just the ones that are direct children of the "table" tag.

That's what "//" is there for, it's a recursive subtree selector. You might 
want to read up on XPath expressions.


> what i get is :
> N/A
> N/A
> 2
> 114
> 48.00
> C110917P00048000
> 16.75
>   0.00
> N/A
> N/A
> 0
> 23
> 50.00
> C110917P00050000
> 23.16
>   0.00
> N/A
> N/A
> 115
> 2,411
>
>
> Highlighted options are in-the-money.

I don't see any highlighting in your text above, and I don't know what you 
mean by "in-the-money".

Stefan




More information about the Python-list mailing list