[New-bugs-announce] [issue13717] print fails on unicode '\udce5' surrogates not allowed

Atle Pedersen report at bugs.python.org
Thu Jan 5 21:12:55 CET 2012


New submission from Atle Pedersen <atle.pedersen at gmail.com>:

I've made a short program to traverse file tree and print file names.

for root, dirs, files in os.walk(path):
        for f in files:
                hex = ' '.join(["%02X"%ord(x) for x in f])
                print('file is',hex,f)

This fails with the following file:

file is 67 72 DCE5 6B 61 6C 6C 65 6E 2E 6A 70 67 2E 68 74 6D 6C Traceback (most recent call last):
  File "/home/atle/bin/findpictures.py", line 16, in <module>
    print('file is',hexa,f)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce5' in position 2: surrogates not allowed

I don't really understand the issue, but this works with Python 2, and fails using 3.1.4 (gentoo: dev-lang/python-3.1.4-r3)

Same code using Python 2.7.2 gives:
('file is', '67 72 E5 6B 61 6C 6C 65 6E 2E 6A 70 67 2E 68 74 6D 6C', 'gr\xe5kallen.jpg.html')

----------
components: Unicode
messages: 150684
nosy: Atle.Pedersen, ezio.melotti
priority: normal
severity: normal
status: open
title: print fails on unicode '\udce5' surrogates not allowed
type: behavior
versions: Python 3.1

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13717>
_______________________________________


More information about the New-bugs-announce mailing list