UTF-8 problem encoding and decoding in Python3
MRAB
python at mrabarnett.plus.com
Tue Oct 12 12:04:56 EDT 2010
On 12/10/2010 15:45, Hidura wrote:
> Don't work this is the error what give me TypeError: sequence item 0:
> expected bytes, str found, i continue trying to figure out how resolve
> it if you have another idea please tellme, but thanks anyway!!!
>
> On Mon, Oct 11, 2010 at 4:27 AM, Almar Klein <almar.klein at gmail.com
> <mailto:almar.klein at gmail.com>> wrote:
>
>
> On 10 October 2010 23:01, Hidura <hidura at gmail.com
> <mailto:hidura at gmail.com>> wrote:
>
> I try to encode a binary file what was upload to a server and is
> extract from the wsgi.input of the environ and comes as an unicode
> string.
>
>
> Firstly, UTF-8 is not meant to encode arbitrary binary data. But I
> guess you could have a Unicode string in which the character index
> represents a byte number. (But it's ugly!)
>
> So if you can, you could make sure to send the file as just bytes,
> or if it must be a string, base64 encoded. If this is not possible
> you can try the code below to obtain the bytes, not a very fast
> solution, but it should work (Python 3):
>
>
> MAP = {}
> for i in range(256):
> MAP[tmp] = eval("'\\u%04i'" % i)
>
> # Let's say 'a' is your string
> b''.join([MAP[c] for c in a])
>
I don't know what you're trying to do here.
1. 'tmp' is the same for every iteration of the 'for' loop.
2. A Unicode escape sequence expects 4 hexadecimal digits; the 'i'
format gives a decimal number.
3. Using 'eval' to make a string this way is the long (and wrong) way
to do it; chr(i) would have the same effect.
4. The result of the eval is a string, but you're performing a join
with a bytestring, hence the exception.
More information about the Python-list
mailing list