Beginner python 3 unicode question

Sat Nov 16 16:19:31 EST 2013

> Why it is behaving differently on the command line? What should I do 
> to fix this?
>
I was experimenting with this a bit more and found some more confusing 
things. Can somebody please enlight me?

Here is a test function:

     def password_hash(self,password):
         public = bytearray([random.randint(0,255) for _ in range(5)])
         private = bytearray([random.randint(0,255)])
         pwd = bytearray(password.encode())
         digest = hashlib.sha1(public+pwd+private).digest()
         print("digest",digest,type(digest))
         print("de",digest.encode())
         # and some more stuff here...

This function was called inside a script, and gave me this:

('digest', '\xa0\x98\x8b\xff\x04\xf9V;\xbd\x1eIHzh\x10-\xc5!\x14\x1b', 
<type 'str'>)
Traceback (most recent call last):
   File "/home/gandalf/Python/Lib/shopzeus/scripts/yaaf_pwmgr.py", line 
478, in <module>
     pwmgr.run(parser,args)
   File "/home/gandalf/Python/Lib/shopzeus/scripts/yaaf_pwmgr.py", line 
241, in run
     self.authdb.user_create(name,password,propvalues)
   File "/home/gandalf/Python/Lib/shopzeus/yaaf/db/authdb.py", line 205, 
in user_create
     "password":(password and Binary(self.password_hash(password))) or None,
   File "/home/gandalf/Python/Lib/shopzeus/yaaf/db/authdb.py", line 134, 
in password_hash
     print("de",digest.encode())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: 
ordinal not in range(128)

Then I have tried the very same thing from the interactive shell:

gandalf at gandalf-HP-G62-Notebook-PC:~/Python/Projects/appserver$ python3
Python 3.3.1 (default, Sep 25 2013, 19:29:01)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
 >>> digest = '\xa0\x98\x8b\xff\x04\xf9V;\xbd\x1eIHzh\x10-\xc5!\x14\x1b'
 >>> digest.encode()
b'\xc2\xa0\xc2\x98\xc2\x8b\xc3\xbf\x04\xc3\xb9V;\xc2\xbd\x1eIHzh\x10-\xc3\x85!\x14\x1b'
 >>>

WHAT??? Seems like the default value of the encoding parameter of the 
str.encode method is different if I start it interactively. But this 
contradicts its documentation:

 >>> print(digest.encode.__doc__)
S.encode(encoding='utf-8', errors='strict') -> bytes

Encode S using the codec registered for encoding. Default encoding
is 'utf-8'. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.

So is the default utf-8 or not? Should the documentation be updated? Or 
do we have a bug in the interactive shell?

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.