[Python-Dev] HTMLParser and HTML5
Glyph Lefkowitz
glyph at twistedmatrix.com
Fri Jul 29 20:03:02 CEST 2011
On Jul 29, 2011, at 7:46 AM, Stefan Behnel wrote:
> Joao S. O. Bueno, 29.07.2011 13:22:
>> On Fri, Jul 29, 2011 at 1:37 AM, Stefan Behnel wrote:
>>> Brett Cannon, 28.07.2011 23:49:
>>>>
>>>> On Thu, Jul 28, 2011 at 11:25, Matt wrote:
>>>>>
>>>>> - What policies are in place for keeping parity with other HTML
>>>>> parsers (such as those in web browsers)?
>>>>
>>>> There aren't any beyond "it would be nice".
>>>> [...]
>>>> It's more of an issue of someone caring enough to do the coding work to
>>>> bring the parser up to spec for HTML5 (or introduce new code to live
>>>> beside
>>>> the HTML4 parsing code).
>>>
>>> Which, given that html5lib readily exists, would likely be a lot more work
>>> than anyone who is interested in HTML5 handling would want to invest.
>>>
>>> I don't think we need a new HTML5 parsing implementation only to have it in
>>> the stdlib. That's the old sunny Java way of doing it.
>>
>> I disaagree.
>> Having proper html parsing out of the box is part of the "batteries
>> included" thing.
>
> Well, you can easily prove me wrong by implementing this.
>
> Stefan
Please don't implement this just to profe Stefan wrong :).
The thing to do, if you want html parsing in the stdlib, is to _incorporate_ html5lib, which is already a perfectly good, thoroughly tested HTML parser, and simply deprecate HTMLParser and friends. Implementing a new parser would serve no purpose I can see.
-glyph
More information about the Python-Dev
mailing list