Novice: replacing strings with unicode variables in a list

Wed Dec 6 05:10:16 EST 2006

aine_canby at yahoo.com wrote:

> Im totally new to Python so please bare with me.

That's no problem, really. I don't use a spellchecker, either, and it
wouldn't have protected you from that particular typo...

> Data is entered into my program using the folling code -
> 
> str = raw_input(command)
> words = str.split()
> 
> for word in words:
>   word = unicode(word,'latin-1')
>   word.encode('utf8')
> 
> This gives an error:
> 
>   File "C:\Python25\lib\encodings\cp850.py", line 12, in encode
>     return codecs.charmap_encode(input,errors,encoding_map)
> UnicodeEncodeError: 'charmap' codec can't encode character u'\x94' in
> position 0
> : character maps to <undefined>
> 
> but the following works.
> 
> str = raw_input(command)
> words = str.split()
> 
> for word in words:
>   uni = u""
>   uni = unicode(word,'latin-1')
>   uni.encode('utf8')

Here you show us the same code twice, as the 

uni = u""

assignment has no effect, and a traceback that is probably generated when
you try to

print uni

Here's my guess: The encoding you actually need is cp850, the same that your
Python interpreter is trying to use, but in which unichr(0x94) is
undefined. In general, you are not free to use a random encoding; rather,
you have to use what your console expects.

import sys

s = raw_input(command)
s = unicode(s, sys.stdin.encoding) # trust python to find out the proper 
                                   # encoding. If that fails use a constant,
                                   # probably "cp850"
words = s.split():
for word in words:
   print word # trust python, but if it doesn't work out:
   # word = word.encode("cp850")
   # print word

By the way, strings are immutable (cannot be altered once created), so the
following 

> word.encode('utf8')
> print word

is actually spelt

word = word.encode("utf8")
print word

If your data is not read from the console and it contains characters that
cannot be printed, unicode.encode() accepts a second parameter to deal with
it, see

>>> help(u"".encode)

Peter