[Python-Dev] doctest, unicode repr, and 2to3

Fri Mar 5 22:22:36 CET 2010

On Thu, Mar 4, 2010 at 8:11 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Johan Harjano ran into an interesting problem when trying to run the
> Django test suite under Python 3.1.
>
> Django has doctests of the form
>
>>>> a6.headline
> u'Default headline'
>
> Even when converting the doctest with 2to3, the expected output is
> unmodified. However, in 3.x, the expected output will change (i.e. not
> produce an u"" prefix anymore).
>
> Now, it might be possible to reformulate the test case (e.g. use print()
> instead of relying on repr), however, this is undesirable as a) the test
> should continue to test in 2.x that the result object is a unicode
> string, and b) it makes the test less readable.
>
> I would like to find a solution where this gets automatically corrected,
> e.g. through 2to3, or through changes to doctest, or through changes of
> str.__repr__.
>
> Any proposal appreciated.

How about a heuristic rule (which you have to explicitly select) that
changes u'XXX' into 'XXX' inside triply-quoted strings given certain
contexts, e.g. only at the start of the line, only if there is a
nearby preceding line starting with '>>>'? Exactly what context is of
the right strength will have to be determined experimentally; if there
are a lot of tests outputting things like [u'...'] or {u'...': u'...'}
the context may have to be made more liberal. Possibly \bu('.*'|".*")
would do it?

The issue shows (yet again) a general problem with doctests being
overspecified -- the test shouldn't care that the output starts with
'u', it should only care that the value is unicode, but there's no
easy way to express this in doctests. But since these doctests exist I
suggest that the practical way forward is to stick with them rather
than trying to reformulate all the tests.

-- 
--Guido van Rossum (python.org/~guido)