[Python-Dev] lone surrogates in utf-8

Antoine Pitrou solipsis at pitrou.net
Tue Apr 28 15:13:37 CEST 2009


Hrvoje Niksic <hrvoje.niksic <at> avl.com> writes:
> 
> "Should be considered" or "will be considered"?  Python 3.0's UTF-8 
> decoder happily accepts it and returns u'\udcff':
> 
>  >>> b'\xed\xb3\xbf'.decode('utf-8')
> '\udcff'

Yes, there is already a bug entry for it:
http://bugs.python.org/issue3672

I think we could happily fix it for 3.1 (perhaps leaving 2.7 unchanged for
compatibility reasons - I don't know if some people may rely on the current
behaviour).





More information about the Python-Dev mailing list