[issue3353] make built-in tokenizer available via Python C API

Dustin J. Mitchell report at bugs.python.org
Tue Apr 14 18:08:00 CEST 2015


Dustin J. Mitchell added the comment:

Here's an updated patch for #1:

Existing Patch:
 - move tokenizer.h from Parser/ to Include/
 - Add PyAPI_Func to export tokenizer functions

New:
 - Removed unused, undefined PyTokenizer_RestoreEncoding
 - Include PyTokenizer_State with limited ABI compatibility (but still undocumented)
 - namespace the struct name (PyTokenizer_State)
 - Documentation

I'd like particular attention to the documentation for the tokenizer -- I'm not entirely confident that I have documented the functions correctly!  In particular, I'm not sure how PyTokenizer_FromString handles encodings.

There's a further iteration possible here, but it's beyond my understanding of the tokenizer and of possible uses of the API. That would be to expose some of the tokenizer state fields and document them, either as part of the limited ABI or even the stable API.  In particular, there are about a half-dozen struct fields used by the parser, and those would be good candidates for addition to the public API.

If that's desirable, I'd prefer to merge a revision of my patch first, and keep the issue open for subsequent improvement.

----------
Added file: http://bugs.python.org/file38992/issue3353.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3353>
_______________________________________


More information about the Python-bugs-list mailing list