BeautiflSoup -- getting all the attributes of a tag?

Paul McGuire ptmcg at austin.rr.com
Tue Apr 1 21:01:01 EDT 2008


On Apr 1, 6:28 pm, 7stud <bbxx789_0... at yahoo.com> wrote:
> On Apr 1, 5:25 pm, 7stud <bbxx789_0... at yahoo.com> wrote:
>
>
>
>
>
> > You can treat a tag like a dictionary to obtain a specific attribute:
>
> > import BeautifulSoup as bs
>
> > html = "<div x='a' y='b' z='c'>hello</div>"
>
> > doc = bs.BeautifulSoup(html)
> > div = doc.find("div")
> > print div
> > print div["x"]
>
> > --output:--
> > a
>
> > But you can't iterate over a tag to get all the attributes:
>
> > import BeautifulSoup as bs
>
> > html = "<div x='a' y='b' z='c'>hello</div>"
>
> > doc = bs.BeautifulSoup(html)
> > div = doc.find("div")
>
> > for key in div:
> >     print key, div[key]
>
> > --output:--
> > hello
> > Traceback (most recent call last):
> >   File "test1.py", line 9, in ?
> >     print key, div[key]
> >   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> > python2.4/site-packages/BeautifulSoup.py", line 430, in __getitem__
> >     return self._getAttrMap()[key]
> > KeyError: u'hello'
>
> > How can you get all the attributes when you don't know the attribute
> > names ahead of time?
>
> I figured it out:
>
> import BeautifulSoup as bs
>
> html = "<div x='a' y='b' z='c'>hello</div>"
>
> doc = bs.BeautifulSoup(html)
> div = doc.find("div")
>
> for attr, val in div.attrs:
>     print "%s:%s" % (attr, val)
>
> --output:--
> x:a
> y:b
> z:c- Hide quoted text -
>

Just for another datapoint, here's how it looks with pyparsing.
-- Paul

from pyparsing import makeHTMLTags,SkipTo

html = """<div x="a" y="b" z="c">hello</div>"""

# HTML tags match case-insensitive'ly
divStart,divEnd = makeHTMLTags("DIV")
divTag = divStart + SkipTo(divEnd)("body") + divEnd

for div in divTag.searchString(html):
    print div.dump()
    print
    # dict-like access to results
    for k in div.keys():
        print k,div[k]
    # object.attribute access to results
    print div.body
    print div.x
    print div.y
    print

Prints:
['DIV', ['x', 'a'], ['y', 'b'], ['z', 'c'], False, 'hello', '</DIV>']
- body: hello
- empty: False
- endDiv: </DIV>
- startDiv: ['DIV', ['x', 'a'], ['y', 'b'], ['z', 'c'], False]
  - empty: False
  - x: a
  - y: b
  - z: c
- x: a
- y: b
- z: c

body hello
endDiv </DIV>
y b
x a
z c
startDiv ['DIV', ['x', 'a'], ['y', 'b'], ['z', 'c'], False]
empty False
hello
a
b



More information about the Python-list mailing list