doctests compatibility for python 2 & python 3

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Jan 17 10:27:38 EST 2014


On Fri, 17 Jan 2014 12:12:35 +0000, Robin Becker wrote:

> On 17/01/2014 11:41, Steven D'Aprano wrote:
>> def func(a):
>>      """
>>      >>> print(func(u'aaa'))
>>      aaa
>>      """
>>      return a
>
> I think this approach seems to work if I turn the docstring into unicode
> 
> def func(a):
> 	u"""
> 	>>> print(func(u'aaa\u020b'))
> 	aaa\u020b
> 	"""
> 	return a

Good catch! Without the u-prefix, the \u... is not interpreted as an 
escape sequence, but as a literal backslash-u.


> If I leave the u off the docstring it goes wrong in python 2.7. I also
> tried to put an encoding onto the file and use the actual utf8
> characters ie
> 
> # -*- coding: utf-8 -*-
> def func(a):
>      """
>      >>> print(func(u'aaa\u020b'))
>      aaaȋ
>      """
>      return a

There seems to be some mojibake in your post, which confuses issues.

You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE. 
At least, that's what it ought to be. But in your post, it shows up as 
the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND 
RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that 
your posting software somehow got confused and inserted the two 
characters which you would have got using cp-437 while claiming that they 
are UTF-8. (Your post is correctly labelled as UTF-8.)

I'm confident that the problem isn't with my newsreader, Pan, because it 
is pretty damn good at getting encodings right, but also because your 
post shows the same mojibake in the email archive:

https://mail.python.org/pipermail/python-list/2014-January/664771.html

To clarify: you tried to show \u020B as a literal. As a literal, it ought 
to be the single character ȋ which is a lower case I with curved accent on 
top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code 
page is two characters ╚ ï. 

py> '\u020b'.encode('utf8').decode('cp437')
'ȋ'

Hence, mojibake.


> def _doctest():
>      import doctest
>      doctest.testmod()
> 
> and that works in python3, but fails in python 2 with this
>> (py27) C:\code\hg-repos>python tdt1.py C:\python\Lib\doctest.py:1531:
>> UnicodeWarning: Unicode equal comparison failed to convert both
>> arguments to Unicode - in terpreting them as being unequal
>>   if got == want:
>> C:\python\Lib\doctest.py:1551: UnicodeWarning: Unicode equal comparison
>> failed to convert both arguments to Unicode - in terpreting them as
>> being unequal

I cannot replicate this specific exception. I think it may be a side-
effect of you being on Windows. (I'm on Linux, and everything is UTF-8.)

>>   if got == want:
>> **********************************************************************
>> File "tdt1.py", line 4, in __main__.func Failed example:
>>     print(func(u'aaa\u020b'))
>> Expected:
>>     aaaȋ
>> Got:
>>     aaaȋ

The difficulty here is that it is damn near impossible to sort out which, 
if any, bits are mojibake inserted by your posting software, which by 
your editor, your terminal, which by Python, and which are artifacts of 
the doctest system.

The usual way to debug these sorts of errors is to stick a call to repr() 
just before the print.

print(repr(func(u'aaa\u020b')))



-- 
Steven



More information about the Python-list mailing list