codec for html/xml entities!?

Martin Bless m.bless at gmx.de
Sun Apr 20 09:49:07 EDT 2008


[Stefan Behnel] wrote & schrieb:

>Martin Bless wrote:
>> What's a good way to encode and decode those entities like € or
>> € ?
>
>Hmm, since you provide code, I'm not quite sure what your actual question is.

- What's a GOOD way?
- Am I reinventing the wheel?
- Are there well tested, fast, state of the art, builtin ways?
- Is something like line.decode('htmlentities') out there?
- Am I in conformity with relevant RFCs? (I'm hoping so ...)

>So I'll just comment on the code here.
>
>
>> def entity2uc(entity):
>>     """Convert entity like { to unichr.
>> 
>>     Return (result,True) on success or (input string, False)
>>     otherwise. Example:
>>         entity2cp('€')   -> (u'\u20ac',True)
>>         entity2cp('€') -> (u'\u20ac',True)
>>         entity2cp('€')  -> (u'\u20ac',True)
>>         entity2cp('&foobar;') -> ('&foobar;',False)
>>     """
>
>Is there a reason why you return a tuple instead of just returning the
>converted result and raising an exception if the conversion fails?

Mainly a matter of style. When I'll be using the function in future
this way it's unambigously clear that there might have been
unconverted entities. But I don't have to deal with the details of how
this has been discovered. And may be I'd like to change the algorithm
in future? This way it's nicely encapsulated.

Have a nice day

Martin



More information about the Python-list mailing list