Where is the ucs-32 codec?
"Martin v. Löwis"
martin at v.loewis.de
Mon Jun 5 04:29:37 EDT 2006
beni.cherniavsky at gmail.com wrote:
> Python seems to be missing a UCS-32 codec, even in wide builds (not
> that it the build should matter).
> Is there some deep reason or should I just contribute a patch?
The only reason is that nobody has needed one so far, and because
it is quite some work to do if done correctly. Why do you need it?
> There should be '-le' and '-be' variats, I suppose. Should there be a
> variant without explicit endianity, using a BOM to decide (like
> 'utf-16')?
Right.
> And it should combine surrogates into valid characters (on all builds),
> like the 'utf-8' codec does, right?
Right.
Also, it should support the incremental interface (as any multi-byte
codec should).
If you want it complete, it should also support line-oriented input.
Notice that .readline/.readlines is particularly difficult to implement,
as you can't rely on the underlying stream's .readline implementation
to provide meaningful results.
While we are discussing problems: there also is the issue whether
.readline/.readlines should take the additional Unicode linebreak
characters into account (e.g. U+2028, U+2029), and if so, whether
that should be restricted to "universal newlines" mode.
Regards,
Martin
More information about the Python-list
mailing list