different encodings for unicode() and u''.encode(), bug?

mario mario at ruggier.org
Thu Jan 3 16:03:08 EST 2008


On Jan 2, 2:25 pm, Piet van Oostrum <p... at cs.uu.nl> wrote:

> Apparently for the empty string the encoding is irrelevant as it will not
> be used. I guess there is an early check for this special case in the code.

In the module I an working on [*] I am remembering a failed encoding
to allow me, if necessary, to later re-process fewer encodings. In the
case of an empty string AND an unknown encoding this strategy
failed...

Anyhow, the question is, should the behaviour be the same for these
operations, and if so what should it be:

u"".encode("non-existent")
unicode("", "non-existent")

mario

[*] a module to decode heuristically, that imho is actually starting
to look quite good, it is at http://gizmojo.org/code/decodeh/ and any
comments very welcome.



More information about the Python-list mailing list