[I18n-sig] Transparent Encoding
M.-A. Lemburg
mal@lemburg.com
Sat, 19 May 2001 14:08:08 +0200
Paul Prescod wrote:
>
> "Martin v. Loewis" wrote:
> >
> >...
> > An EncodedFile is not suitable since it has byte strings on both ends,
> > and Unicode strings only inside.
>
> EncodedFile seems to work as I ask if I pass it the encoding name as
> "unicode-internal". Furthermore, code that does that is much simpler
> than code that looks up the codec manually. I'm not a big fan of those
> codec tuples.
>
> Current:
>
> writer = codecs.lookup("utf-8")[3]
> stream = writer(fileobj)
>
> Proposed:
>
> codecs.EncodedFile(fileobj, None, "utf-8")
>
> As I understand it, you can almost always go without looking up the
> encoder tuple thanks to the .encode method. And you can almost always go
> without looking up the decoder, thanks to the .decode method. This
> EncodedFile convention would allow most common cases of wrapping Unicode
> to avoid looking up the tuple also.
Paul, I still don't understand what you really want to achieve.
Do you want a file-like object which writes utf-8 and can
take Unicode as input for write (as well as strings which are
then handled in the usual ASCII way) and returns Unicode for
.read() ?
The encoding 'unicode-internal' is really only meant for low-level
access to how we chose to represent Unicode at C level. This could
well change in some future version (note that Unicode is still
evolving and probably will continue to do so for some time;
e.g. Unicode 3.1 is just out the door and adds another 50k character
points, using the non-BMP space for the first time...).
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/