[issue9769] PyUnicode_FromFormatV() doesn't handle non-ascii text correctly

Mariano Reingart report at bugs.python.org
Sun Oct 28 22:22:44 CET 2012


Mariano Reingart added the comment:

(moved from issue #16343)

Working in an internationalization proposal <http://python.org.ar/pyar/TracebackInternationalizationProposal> (issue #16344)
I've stopped at this problem (#9769) where multi byte encodings (like utf-8) is not supported by PyUnicode_FromFormatV()

Beside my proposal, I think utf-8 should be supported for consistency with the other unicode functions, like PyUnicode_FromString() or even unicode_fromformat_arg()

Attached is a patch that:
- enhanced the iterator to detect multibyte sequences, with sanity checks about start & continuation bytes
- replaced unicode_write_cstr with PyUnicode_DecodeUTF8Stateful
- tests

Hope it helps, this is my first patch for cpython and my C skills are a bit rusty, so excuse me if there is any newbie glitch

----------
nosy: +reingart
Added file: http://bugs.python.org/file27771/pyunicode_fromformat_utf8.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9769>
_______________________________________


More information about the Python-bugs-list mailing list