UTF-8 characters in doctest

Wed Sep 19 07:47:39 EDT 2007

Hello,
I have problems with running doctests if I use czech national
characters in UTF-8 encoding.

I have Python script, which begin with encoding definition:

# -*- coding: utf-8 -*-

I have this function with doctest:

def get_inventary_number(block):
    """
        >>> t = u'''28. České královské insignie
        ... mědirytina, grafika je zcela vyřezána z papíru - max.
rozměr
        ... 420×582 neznačeno
        ... text: opis v levém medailonu: CAROL VI IMP.ELIS.CHR. AVG.
P.P.'''
        >>> get_inventary_number(t)
        (u'nezna\xc4\x8deno', u'28. \xc4\x8cesk\xc3\xa9 kr\xc3\xa1lovsk
\xc3\xa9 insignie\nm\xc4\x9bdirytina, grafika je zcela vy\xc5\x99ez
\xc3\xa1na z pap\xc3\xadru \xe2\x80\x93 max. rozm\xc4\x9br
\n420\xc3\x97582 \ntext: opis v lev\xc3\xa9m medailonu: CAROL VI
IMP.ELIS.CHR. AVG. P.P.')
    """
    m = RE_INVENTARNI_CISLO.search(block)
    if m: return m.group(1), block.replace(m.group(0), '')
    else: return None, block

After running doctest.testmod() I get this error message:

  File "vizovice_03.py", line 417, in ?
    doctest.testmod()
  File "/usr/local/lib/python2.4/doctest.py", line 1841, in testmod
    for test in finder.find(m, name, globs=globs,
extraglobs=extraglobs):
  File "/usr/local/lib/python2.4/doctest.py", line 851, in find
    self._find(tests, obj, name, module, source_lines, globs, {})
  File "/usr/local/lib/python2.4/doctest.py", line 910, in _find
    globs, seen)
  File "/usr/local/lib/python2.4/doctest.py", line 895, in _find
    test = self._get_test(obj, name, module, globs, source_lines)
  File "/usr/local/lib/python2.4/doctest.py", line 985, in _get_test
    filename, lineno)
  File "/usr/local/lib/python2.4/doctest.py", line 602, in get_doctest
    return DocTest(self.get_examples(string, name), globs,
  File "/usr/local/lib/python2.4/doctest.py", line 616, in
get_examples
    return [x for x in self.parse(string, name)
  File "/usr/local/lib/python2.4/doctest.py", line 577, in parse
    (source, options, want, exc_msg) = \
  File "/usr/local/lib/python2.4/doctest.py", line 648, in
_parse_example
    lineno + len(source_lines))
  File "/usr/local/lib/python2.4/doctest.py", line 732, in
_check_prefix
    raise ValueError('line %r of the docstring for %s has '
ValueError: line 17 of the docstring for __main__.get_inventary_number
has inconsistent leading whitespace: 'm\xc4\x9bdirytina, grafika je
zcela vy\xc5\x99ez\xc3\xa1na z pap\xc3\xadru \xe2\x80\x93 max. rozm
\xc4\x9br'

I try to fill expected output in docstring according to output from
Python shell, from doctest (if I bypass it in docstring, doctest says
me what he expect and what it get), I try to set variable t as t='some
text' together t=u'some unicode text'. But everything fails.

So my question is: Is it possible to run doctests with UTF-8
characters? And if your answer will be YES, tell me please how...

Thank you for any advice.

Regards
Michal