Python & Unicode decimal interpretation

Scott David Daniels scott.daniels at acm.org
Sat Dec 3 01:19:13 EST 2005


In reading over the source for CPython's PyUnicode_EncodeDecimal,
I see a dance to handle characters which are neither dec-equiv nor
in Latin-1.  Does anyone know about the intent of such a conversion?

As far as I can tell, error handling is one of:
     strict, replace, ignore, xmlcharrefreplace, or something_else
What I don't understand is whether, in the ignore or something_else
cases, there is any chance that digits will show up anywhere that
they would not if these characters were treated as a character like '?'?

Can someone either give me definitive "why not" or (preferably) give
me a test case that shows where that interpretation does not hold.

--Scott David Daniels
scott.daniels at acm.org



More information about the Python-list mailing list