[issue11909] Doctest sees directives in strings when it should only see them in comments

Fri Jun 24 10:40:45 CEST 2011

Devin Jeanpierre <jeanpierreda at gmail.com> added the comment:

You're right, and good catch. If a doctest starts with a "#coding:XXX" line, this should break.

One option is to replace the call to tokenize.tokenize with a call to tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private and undocumented API. The alternative is to manually add a coding line that specifies UTF-8, so that any coding line in the doctest would be ignored. 

My preferred option would be to add the ability to read unicode to the tokenize API, and then use that. I can file a separate ticket if that sounds good, since it's probably useful to others too.

One other thing to be worried about -- I'm not sure how doctest would treat tests with leading "coding:XXX" lines. I'd hope it ignores them, if it doesn't then this is more complicated and the above stuff wouldn't work.

I'll see if I have the time to play around with this (and add more test cases to the patch, correspondingly) this weekend.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11909>
_______________________________________