unicode problem?
hidura at gmail.com
hidura at gmail.com
Sun Oct 10 00:10:56 EDT 2010
I had a similar problem but i can 't encode a byte to a file what has been
uploaded, without damage the data if i used utf-8 to encode the file
duplicates the size, and i try to change the codec to raw_unicode_escape
and this barely give me the correct size but still damage the file, i used
Python 3 and i have to encode the file again.
On Oct 9, 2010 11:39pm, Chris Rebert <crebert at ucsd.edu> wrote:
> On Sat, Oct 9, 2010 at 4:59 PM, Brian Blais bblais at bryant.edu> wrote:
> > This may be a stemming from my complete ignorance of unicode, but when
> I do this (Python 2.6):
> >
> > s='\xc2\xa9 2008 \r\n'
> >
> > and I want the ascii version of it, ignoring any non-ascii chars, I
> thought I could do:
> >
> > s.encode('ascii','ignore')
> >
> > but it gives the error:
> >
> > In [20]:s.encode('ascii','ignore')
> >
> ----------------------------------------------------------------------------
> > UnicodeDecodeError Traceback (most recent call last)
> >
> > /Users/bblais/python/doit100810a.py in ()
> > ----> 1
> > 2
> > 3
> > 4
> > 5
> >
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
> ordinal not in range(128)
> >
> > am I doing something stupid here?
> In addition to Benjamin's explanation:
> Unicode strings in Python are of type `unicode` and written with a
> leading "u"; eg u"A unicode string for ¥500". Byte strings lack the
> leading "u"; eg "A plain byte string". Note that "Unicode string"
> does not refer to strings which have been encoded using a Unicode
> encoding (eg UTF-8); such strings are still byte strings, for
> encodings emit bytes.
> As to why you got the /exact/ error you did:
> As a backward compatibility hack, in order to satisfy your nonsensical
> encoding request, Python implicitly tried to decode the byte string
> `s` using ASCII as a default (the choice of ASCII here has nothing to
> do with the fact that you specified ASCII in your encoding request),
> so that it could then try and encode the resulting unicode string;
> hence why you got a Unicode*De*codeError as opposed to a
> Unicode*En*codeError, despite the fact you called *en*code().
> Highly suggested further reading:
> "The Absolute Minimum Every Software Developer Absolutely, Positively
> Must Know About Unicode and Character Sets (No Excuses!)"
> http://www.joelonsoftware.com/articles/Unicode.html
> Cheers,
> Chris
> --
> http://mail.python.org/mailman/listinfo/python-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101010/2bf10571/attachment-0001.html>
More information about the Python-list
mailing list