Unexpected behaviour with HTMLParser...
Andrew Durdin
adurdin at gmail.com
Wed Oct 10 09:19:58 EDT 2007
On 10/9/07, Just Another Victim of the Ambient Morality
<ihatespam at hotmail.com> wrote:
>
> "Diez B. Roggisch" <deets at nospam.web.de> wrote in message
> news:5n2avjFfh6h8U1 at mid.uni-berlin.de...
> >
> > Without code, that's hard to determine. But you are aware of e.g.
> >
> > handle_entityref(name)
> > handle_charref(ref)
> >
> > ?
>
> Actually, I am not aware of these methods but I will certainly look into
> them!
> I was hoping that the issue would be known or simple before I commited
> to posting code, something that is, to my chagrin, not easily done with my
> news client...
For example, here's something simple/simplistic you can do to handle
character and entity references:
from htmlentitydefs import name2codepoint
...
def handle_charref(self, ref):
try:
if ref.startswith('x'):
char = unichr(int(ref[1:], 16))
else:
char = unichr(int(ref))
except (TypeError, ValueError):
char = ' '
# Do something with char
def handle_entityref(self, ref):
try:
char = unichr(name2codepoint[ref])
except (KeyError, ValueError):
char = ' '
# Do something with char
A.
More information about the Python-list
mailing list