doctests compatibility for python 2 & python 3

Robin Becker robin at reportlab.com
Fri Jan 17 11:17:27 EST 2014


On 17/01/2014 15:27, Steven D'Aprano wrote:
..........
>>
>> # -*- coding: utf-8 -*-
>> def func(a):
>>       """
>>       >>> print(func(u'aaa\u020b'))
>>       aaaȋ
>>       """
>>       return a
>
> There seems to be some mojibake in your post, which confuses issues.
>
> You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE.
> At least, that's what it ought to be. But in your post, it shows up as
> the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND
> RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that
> your posting software somehow got confused and inserted the two
> characters which you would have got using cp-437 while claiming that they
> are UTF-8. (Your post is correctly labelled as UTF-8.)
>
> I'm confident that the problem isn't with my newsreader, Pan, because it
> is pretty damn good at getting encodings right, but also because your
> post shows the same mojibake in the email archive:
>
> https://mail.python.org/pipermail/python-list/2014-January/664771.html
>
> To clarify: you tried to show \u020B as a literal. As a literal, it ought
> to be the single character ȋ which is a lower case I with curved accent on
> top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code
> page is two characters ╚ ï.

when I edit the file in vim with ut88 encoding I do see your ȋ as the literal. 
However, as you note I'm on windows and no amount of cajoling will get it to 
work reasonably so my printouts are broken. So on windows

(py27) C:\code\hg-repos>python -c"print(u'aaa\u020b')"
aaaȋ

on my linux

$ python2 -c"print(u'aaa\u020b')"
aaaȋ

$ python2 tdt1.py
/usr/lib/python2.7/doctest.py:1531: UnicodeWarning: Unicode equal comparison 
failed to convert both arguments to Unicode - interpreting them as being unequal
   if got == want:
/usr/lib/python2.7/doctest.py:1551: UnicodeWarning: Unicode equal comparison 
failed to convert both arguments to Unicode - interpreting them as being unequal
   if got == want:
**********************************************************************
File "tdt1.py", line 4, in __main__.func
Failed example:
     print(func(u'aaa\u020b'))
Expected:
     aaaȋ
Got:
     aaaȋ
**********************************************************************
1 items had failures:
    1 of   1 in __main__.func
***Test Failed*** 1 failures.
robin at everest ~/tmp:
$ cat tdt1.py
# -*- coding: utf-8 -*-
def func(a):
     """
     >>> print(func(u'aaa\u020b'))
     aaaȋ
     """
     return a
def _doctest():
     import doctest
     doctest.testmod()

if __name__ == "__main__":
     _doctest()
robin at everest ~/tmp:

so the error persists with our without copying errors.

Note that on my putty terminal I don't see the character properly (I see unknown 
glyph square box), but it copies OK.
-- 
Robin Becker




More information about the Python-list mailing list