[omaha] Web scraping and funky characters

Eric Edens eric.edens at gmail.com
Wed Nov 5 20:37:08 CET 2014


Hey Wes,

Yeah! Good catch! It looks like HTMLParser.unescape [1] and
namedentities.unescape [2] both handle named entities (©) and
numerical entities (;©).

The only difference I see is that HTMLParser ignores ' and
namedentities ignores < > and &.

-- Eric

1. https://github.com/python-git/python/blob/715a6e5035bb21ac49382772076ec4c630d6e960/Lib/HTMLParser.py#L362
2. https://bitbucket.org/jeunice/namedentities/src/da4a9889c8509352aa691b7375a1ca6b716a528f/namedentities/core.py?at=default#cl-18


More information about the Omaha mailing list