unicode problem?

hidura at gmail.com hidura at gmail.com
Sun Oct 10 00:10:56 EDT 2010


I had a similar problem but i can 't encode a byte to a file what has been  
uploaded, without damage the data if i used utf-8 to encode the file  
duplicates the size, and i try to change the codec to raw_unicode_escape  
and this barely give me the correct size but still damage the file, i used  
Python 3 and i have to encode the file again.

On Oct 9, 2010 11:39pm, Chris Rebert <crebert at ucsd.edu> wrote:
> On Sat, Oct 9, 2010 at 4:59 PM, Brian Blais bblais at bryant.edu> wrote:

> > This may be a stemming from my complete ignorance of unicode, but when  
> I do this (Python 2.6):

> >

> > s='\xc2\xa9 2008 \r\n'

> >

> > and I want the ascii version of it, ignoring any non-ascii chars, I  
> thought I could do:

> >

> > s.encode('ascii','ignore')

> >

> > but it gives the error:

> >

> > In [20]:s.encode('ascii','ignore')

> >  
> ----------------------------------------------------------------------------

> > UnicodeDecodeError Traceback (most recent call last)

> >

> > /Users/bblais/python/doit100810a.py in ()

> > ----> 1

> > 2

> > 3

> > 4

> > 5

> >

> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:  
> ordinal not in range(128)

> >

> > am I doing something stupid here?



> In addition to Benjamin's explanation:



> Unicode strings in Python are of type `unicode` and written with a

> leading "u"; eg u"A unicode string for ¥500". Byte strings lack the

> leading "u"; eg "A plain byte string". Note that "Unicode string"

> does not refer to strings which have been encoded using a Unicode

> encoding (eg UTF-8); such strings are still byte strings, for

> encodings emit bytes.



> As to why you got the /exact/ error you did:

> As a backward compatibility hack, in order to satisfy your nonsensical

> encoding request, Python implicitly tried to decode the byte string

> `s` using ASCII as a default (the choice of ASCII here has nothing to

> do with the fact that you specified ASCII in your encoding request),

> so that it could then try and encode the resulting unicode string;

> hence why you got a Unicode*De*codeError as opposed to a

> Unicode*En*codeError, despite the fact you called *en*code().



> Highly suggested further reading:

> "The Absolute Minimum Every Software Developer Absolutely, Positively

> Must Know About Unicode and Character Sets (No Excuses!)"

> http://www.joelonsoftware.com/articles/Unicode.html



> Cheers,

> Chris

> --

> http://mail.python.org/mailman/listinfo/python-list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101010/2bf10571/attachment-0001.html>


More information about the Python-list mailing list