[code-quality] Jedi 0.9.0 is now a static analysis library

Kay Hayen kay.hayen at gmail.com
Sun May 3 11:40:20 CEST 2015


Hello Dave,

in my Python compiler Nuitka, I am using the "ast" module, and it's working
fine for many things, but it lends itself badly to other things. I am
condemned to do my own static analysis, as it's intended for optimization.

My interest is in the reporting and also auto-format of source code.

I would love to be able to report about source code easily for "profile
guided optimization", or make annotations about Nuitka's finding in HTML
reports from generated output.

And I want a coherent code base for my private and work projects, and would
like to be able to apply formatting, even function call style changes
automatically, but on a programmatic base.

I think the two tools are very similar. The biggest difference is
>
probably static analysis, which you definitely need for certain
> refactorings.
>

Static analysis is great for auto-format indeed. Being e.g. able to tell
that a method is only there for overload because it raises a "must be
overloaded" exception, that kind of thing would otherwise be too hard.

However Jedi definitely has fewer AST functions. The node/leaf objects
> of Jedi are at the moment quite simple. I'm willing to add
> functionality there, but only if it's used. Currently there's only the
> functions there that Jedi needs internally. To support the well known
> refactorings (e.g. inline/extract name/function), we might need to add
> a few methods there.
>

With RedBaron, I can do this (bare with me on code quality, and by no means
assume it's competent RedBaron usage):

def updateString(string_node):
    # Skip doc strings for now.
    if string_node.parent.type in ("class", "def", None):
        return

    value = string_node.value

    def isQuotedWith(quote):
        return value.startswith(quote) and value.endswith(quote)

    for quote in "'''", '"""', "'", '"':
        if isQuotedWith(quote):
            break
    else:
        assert False, value

    real_value = value[len(quote):-len(quote)]
    assert quote + real_value + quote == value

    if "\n" not in real_value:
        # Single characters, should be quoted with "'"
        if len(eval(value)) == 1:
            if real_value != "'":
                string_node.value = "'" + real_value + "'"
        else:
            if '"' not in real_value:
                string_node.value = '"' + real_value + '"'

And then call this:

for node in red.find_all("StringNode"):
    try:
        updateString(node)
    except Exception:
        print("Problem with", node)
        node.help(deep = True, with_formatting = True)
        raise

It allows me do enforce some rules according to strings that are not
multi-line. My rules there are, do not use triple quites without a
new-line, strings of resulting length 1, become 'a' or '\n' and others, use
"" or "ab", except of course "'" is valid and 'some "quote" would be too'
as well.

I do a bunch of these, and like to have these things, to e.g. make sure
that my multi-line calling convention lines up the "=" signs nicely, etc,
inserting and removing white space from tuples, comma separated stuff, etc.

To me, it's not about, if I should do this, but if it can be done.


> > Does it provide a bounding box for code constructs?
>
> I'm not really sure what you mean. Jedi knows the exact positions of
> objects. At the moment there's no method like RedBaron's
> `bounding_box`. Relative positions could be easily calculated with the
> current parser. However, I don't know what such a BoundingBox would be
> doing.
>

That is the caret, typically used in stack traces, and fits the concept of
a cursor. If I go to a report, then I will want to have the bounding box.

When e.g. highlighting a function or a call expression, or argument
expression, I am not only interested in where it starts, but where it ends.
For an XHTML report of Nuitka performance compared to CPython performance
on the same code (which is my plan for this autumn, at about the time it
starts to make actual sense), with the "ast" module and apparently "jedi",
all I get is this:

f( arg1(), arg2(), c ** d) + g()

And I would like to mouse over or highlight and know where the call to f()
ends, where the third argument ends, what the operation "+" entails.
Without a bounding box, that falls down. In fact, I would also want some
position, like the "+" or "(" to indicate which bounding box I mean.

My problem there with the "ast" module boils down to this:

>>> ast.parse("a+b").body[0].value.col_offset
0
>>> ast.dump(ast.parse("a+b").body[0])
"Expr(value=BinOp(left=Name(id='a', ctx=Load()), op=Add(),
right=Name(id='b', ctx=Load())))"

With RedBaron, I get to have a "bounding_box". I must admit to not yet have
used it. But I aspire to. And even for Nuitka, I would love to have the
position of the "+" for use tracebacks in at least improved mode, as
opposed to the first argument. But performance and bugs are keeping me away
from considering any alternatives to "ast" there.

So sure, I am asking of Jedi, if it has that bounding box, precisely to
address this.

With RedBaron, I can do this:

>>> from redbaron import RedBaron
>>> red = RedBaron("a+b")
>>> red[0]
a+b

>>> red[0].second_formatting = "  "
>>> red
0   a+  b

That of course also means, it knows the "+" location, or I can infer it:

>>> red[0].second.bounding_box
BoundingBox (Position (1, 1), Position (1, 1))
>>> red[0].first.bounding_box
BoundingBox (Position (1, 1), Position (1, 1))
>>> red[0].bounding_box
BoundingBox (Position (1, 1), Position (1, 5))

Seems bounding boxes are relative, but it also has
"get_absolute_bounding_box_of_attribute".

So, eventually I am faced with the issue of producing run time information
from expressions, and then to find the same expression again in two
different "ast" forms. But for rendering and editing, both RedBaron seems
like it might work with heavy fighting.

I would love for you to provide a code example how to use Jedi for editing
like I did above, and if you think that creating such reports could be
based on Jedi parsing.

My plan for Nuitka now entails to probably identify an expression in Nuitka
uniquely by path. That "uid" for code in a module is probably a new idea.
The "uid" could be a run time number, hash code, which is then resolvable
looking at the original code again in a new parse with even another tool
that provides more detail.

Finding it then in RedBaron or Jedi again may involve some normalization.
The "ast" module hides some things from me, e.g. try/except/finally is
nested statements in at least Python2.

Yours,
Kay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/code-quality/attachments/20150503/cae19867/attachment-0001.html>


More information about the code-quality mailing list