Python equivalent of "lynx -dump"?

Fredrik Lundh effbot at telia.com
Wed Mar 29 19:21:49 EST 2000


lewst <lewst at yahoo.com> wrote:
> > An all Python solution is a little bit more complicated:
> >
> > import htmllib, formatter
> >
> > p =
htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter()))
> > f = open('test.html')
> > p.feed(f.read())
> > p.close()
> > f.close()
>
> Yes, but how can I store the output of "p.feed(f.read())" in a
> variable such as `data' like I'm doing above with lynxcmd.  Your
> code writes everything out to the terminal.

did you read the fine manual?

    http://www.python.org/doc/current/lib/writer-impls.html

    DumbWriter ([file[, maxcol = 72]])

    Simple writer class which writes output on the file object
    passed in as file or, if file is omitted, on standard output.

in your case, using a StringIO file object is probably the best
solution:

    import StringIO

    file = StringIO.StringIO()

    # build formatting pipeline
    w = formatter.DumbWriter(file)
    f = formatter.AbstractFormatter(w)
    p = htmllib.HTMLParser(f)

    ...

    data = file.getvalue()

</F>

<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->





More information about the Python-list mailing list