codec for html/xml entities!?
Martin Bless
m.bless at gmx.de
Sun Apr 20 09:49:07 EDT 2008
[Stefan Behnel] wrote & schrieb:
>Martin Bless wrote:
>> What's a good way to encode and decode those entities like € or
>> € ?
>
>Hmm, since you provide code, I'm not quite sure what your actual question is.
- What's a GOOD way?
- Am I reinventing the wheel?
- Are there well tested, fast, state of the art, builtin ways?
- Is something like line.decode('htmlentities') out there?
- Am I in conformity with relevant RFCs? (I'm hoping so ...)
>So I'll just comment on the code here.
>
>
>> def entity2uc(entity):
>> """Convert entity like { to unichr.
>>
>> Return (result,True) on success or (input string, False)
>> otherwise. Example:
>> entity2cp('€') -> (u'\u20ac',True)
>> entity2cp('€') -> (u'\u20ac',True)
>> entity2cp('€') -> (u'\u20ac',True)
>> entity2cp('&foobar;') -> ('&foobar;',False)
>> """
>
>Is there a reason why you return a tuple instead of just returning the
>converted result and raising an exception if the conversion fails?
Mainly a matter of style. When I'll be using the function in future
this way it's unambigously clear that there might have been
unconverted entities. But I don't have to deal with the details of how
this has been discovered. And may be I'd like to change the algorithm
in future? This way it's nicely encapsulated.
Have a nice day
Martin
More information about the Python-list
mailing list