different encodings for unicode() and u''.encode(), bug?

Wed Jan 2 07:16:16 EST 2008

On Jan 2, 12:28 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Jan 2, 9:57 pm, mario <ma... at ruggier.org> wrote:
>
> > Do not know what the implications of encoding according to "ANSI
> > codepage (CP_ACP)" are.
>
> Neither do I. YAGNI (especially on darwin) so don't lose any sleep
> over it.
>
> > Windows only seems clear, but why does it only
> > complain when decoding a non-empty string (or when encoding the empty
> > unicode string) ?
>
> My presumption: because it doesn't need a codec to decode '' into u'';
> no failed codec look-up, so no complaint. Any realistic app will try
> to decode a non-empty string sooner or later.

Yes, I suspect I will never need it ;)

Incidentally, the situation is that in a script that tries to guess a
file's encoding, it bombed on the file ".svn/empty-file" -- but why it
was going so far with an empty string was really due to a bug
elsewhere in the script, trivially fixed. Still, I was curious about
this non-symmetric behaviour for the empty string by some encodings.

Anyhow, thanks a lot to both of you for the great feedback!

mario