Where is the ucs-32 codec?

Mon Jun 5 04:29:37 EDT 2006

beni.cherniavsky at gmail.com wrote:
> Python seems to be missing a UCS-32 codec, even in wide builds (not
> that it the build should matter).
> Is there some deep reason or should I just contribute a patch?

The only reason is that nobody has needed one so far, and because
it is quite some work to do if done correctly. Why do you need it?

> There should be  '-le' and '-be' variats, I suppose.  Should there be a
> variant without explicit endianity, using a BOM to decide (like
> 'utf-16')?

Right.

> And it should combine surrogates into valid characters (on all builds),
> like the 'utf-8' codec does, right?

Right.

Also, it should support the incremental interface (as any multi-byte
codec should).

If you want it complete, it should also support line-oriented input.
Notice that .readline/.readlines is particularly difficult to implement,
as you can't rely on the underlying stream's .readline implementation
to provide meaningful results.

While we are discussing problems: there also is the issue whether
.readline/.readlines should take the additional Unicode linebreak
characters into account (e.g. U+2028, U+2029), and if so, whether
that should be restricted to "universal newlines" mode.

Regards,
Martin