different encodings for unicode() and u''.encode(), bug?

John Machin sjmachin at lexicon.net
Wed Jan 2 06:28:38 EST 2008


On Jan 2, 9:57 pm, mario <ma... at ruggier.org> wrote:
> On Jan 2, 10:44 am, John Machin <sjmac... at lexicon.net> wrote:
>
>
>
> > Two things for you to do:
>
> > (1) Try these at the Python interactive prompt:
>
> > unicode('', 'latin1')
> > unicode('', 'mbcs')
> > unicode('', 'raboof')
> > unicode('abc', 'latin1')
> > unicode('abc', 'mbcs')
> > unicode('abc', 'raboof')
>
> $ python
> Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
> [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.>>> unicode('', 'mbcs')
> u''
> >>> unicode('abc', 'mbcs')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> LookupError: unknown encoding: mbcs
>
>
>
> Hmmn, strange. Same behaviour for "raboof".
>
> > (2) Read what the manual (Library Reference -> codecs module ->
> > standard encodings) has to say about mbcs.
>
> Page athttp://docs.python.org/lib/standard-encodings.htmlsays that
> mbcs "purpose":
> Windows only: Encode operand according to the ANSI codepage (CP_ACP)
>
> Do not know what the implications of encoding according to "ANSI
> codepage (CP_ACP)" are.

Neither do I. YAGNI (especially on darwin) so don't lose any sleep
over it.

> Windows only seems clear, but why does it only
> complain when decoding a non-empty string (or when encoding the empty
> unicode string) ?

My presumption: because it doesn't need a codec to decode '' into u'';
no failed codec look-up, so no complaint. Any realistic app will try
to decode a non-empty string sooner or later.






More information about the Python-list mailing list