[Python-Dev] sgmllib Comments
Sam Ruby
rubys at intertwingly.net
Mon Jun 12 06:05:06 CEST 2006
Terry Reedy wrote:
> "Fred L. Drake, Jr." <fdrake at acm.org> wrote in message
> news:200606112039.37834.fdrake at acm.org...
>> On Sunday 11 June 2006 16:26, Sam Ruby wrote:
>>> Planet is a feed aggregator written in Python. It depends heavily on
>>> SGMLLib. A recent bug report turned out to be a deficiency in sgmllib,
>>> and I've submitted a test case and a patch[1] (use or discard the
>>> patch,
>>> it is the test that I care about).
> ...
>>> and which are original. (Note: feeds often contain such abominations
>>> as
>>> &copy; which the new code will treat indistinguishably from ©)
>
>> It really sounds like sgmllib is the wrong foundation for this.
> ...
>> Have you looked at HTMLParser as an alternate to sgmllib?
>> It has better support for XHTML constructs.
>
> Have you (the OP), checked how related Python projects, such as Mark
> Pilgrim's feed parser,
> http://www.feedparser.org/
> handle the same sort of input (I have only looked at docs and tests, not
> code).
Just to be clear: Planet uses Mark's feed parser, which uses SGMLlib.
I'm a committer on that project:
http://sourceforge.net/project/memberlist.php?group_id=112328
I was investigating a bug in sgmllib which affected the feed parser (and
therefore Planet), and noticed that there were changes in the SVN head
of Python which broke three feed parser unit tests.
It is my belief that these changes will break other existing users of
sgmllib.
- Sam Ruby
More information about the Python-Dev
mailing list