ast.parse, ast.dump, but with comment preservation?

Chris Angelico rosuav at gmail.com
Thu Dec 16 01:44:19 EST 2021


On Thu, Dec 16, 2021 at 2:47 PM samue... at gmail.com
<samuelmarks at gmail.com> wrote:
>
> I wrote a little open-source tool to expose internal constructs in OpenAPI. Along the way, I added related functionality to:
> - Generate/update a function prototype to/from a class
> - JSON schema
> - Automatically add type annotations to all function arguments, class attributes, declarations, and assignments
>
> alongside a bunch of other features. All implemented using just the builtin modules (plus astor on Python < 3.9; and optionally black).
>
> Now I'm almost at the point where I can run it—without issue—against, e.g., the entire TensorFlow codebase. Unfortunately this is causing huge `diff`s because the comments aren't preserved (and there are some whitespace issues… but I should be able to resolve the latter).
>
> Is the only viable solution available to rewrite around redbaron | libcst? - I don't need to parse the comments just dump them out unedited whence they're found…
>
> Thanks for any suggestions
>
> PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense with CC0… anyway too early for others to use; wait for the 0.1.0 release ;])

I haven't actually used it, but what you may want to try is lib2to3.
It's capable of full text reconstruction like you're trying to do.

Otherwise: Every AST node contains line and column information, so you
could possibly work the other way: keep the source code as well as the
AST, and make changes line by line as you have need.

ChrisA


More information about the Python-list mailing list