[Python-ideas] Statically checking purity of functions

Joonas Liik liik.joonas at gmail.com
Wed Jun 22 08:39:04 EDT 2016


On 22 June 2016 at 13:23, Jukka Lehtosalo <jlehtosalo at gmail.com> wrote:
> On Fri, Jun 10, 2016 at 11:57 AM, Christoph Groth <christoph at grothesque.org>
> wrote:
>>>
>>> This means that any pure function would have to accept as parameters
>>> everything it operates on, and you'd need to error on any call to a pure
>>> function with impure arguments.
>>
>>
>> You mean because one can never be sure about the purity of global
>> variables?  That's true, but exactly to the same extent that one cannot be
>> sure about the typing annotations of global variables.
>>
>> I believe that such "weak" checking for purity would still be useful, just
>> as the equally weak verification of typing by mypy is useful.
>
>
> I'm late to the discussion, but I agree with Christoph. I doubt purity
> checking can reliably enforce anything without making it impractically
> restrictive, but even best-effort, partial purity checking could be useful,
> similar to how mypy is useful. A checker could catch many (but not all)
> purity violations and purity annotations would make programs easier to
> reason about for humans and tools. I've thought about this a bit in the past
> -- mostly in the context of mypy. I'm not sure whether purity checking would
> be more than marginally useful, but at least in some use cases it might be a
> valuable a tool. For example, purity checking might help with reasoning
> about concurrency and parallelism. However, writing a purity checking tool
> for Python that is flexible enough that many programmers would actually want
> to use it sounds like a difficult (research?) project.
>
> Here are some practical considerations:
>
> 1) We'd need to annotate library functions with respect to purity, as Python
> derives much of its value from the available libraries, and we'd want pure
> functions to be able to use (pure) libraries. Perhaps stubs in typeshed
> would have purity annotations for individual functions. Alternatively, there
> could be a typeshed-like repository for purity information only, and a tool
> could use information from both repositories. The safest thing would be to
> default to impure, unless a function is explicitly annotated as pure.
>
> 2) Type annotations would likely be needed for practical checking of purity,
> as otherwise things like method calls would be hard to check for purity.
> Thus purity checking would be a stricter form of static type checking.
>
> 3) For consistency with PEP 484 style gradual typing, we may want to allow
> functions without annotations to be called from pure functions. Tools might
> make this user-configurable, as this is clearly a big hole in a purity
> checking system.
>
> 4) It may be necessary to allow impure operations to be used within a pure
> function as long the impurity won't be visible outside the function. For
> example, perhaps this function should be fine, as an external observer can't
> tell whether the function uses pure operations only:
>
> def squares(n: int) -> List[int]:
>     a = []
>     for i in range(n):
>         a.append(i * i)
>     return a
>
> I`m not sure if I know how to implement this in a useful way, though. I
> believe that there is some prior work in this area (not specific to Python).
>
> Jukka
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

we could have eg:

def parallel_map(f, data):
    if is_pure(f):
        # <impl that does special optimisations to take advantage of
the fact f is truly pure>
        # special case that uses light weight threads or sth (maybe
can run bits of it GIL free or sth, less overhead or sth)
        # its ok to share immutable data
    else:
        #<impl that has no issue with impure f>
        # eg: multiprocessing process pools..
        #its ok to mutate everything becauuse we no longer truly share anything.

in other words, even if only a small subset of code can be proven pure
it can be useful to help optimise code (in this example to avoid
locking by knowing that no 2 parrallel instances of the callback can
mock with mutually shared state)

for some scenarios it - might be - ok even if we leave the proving to
the programmer. (eg a C extension could implement the optimised
parallel map in this example and a user defined decorator is enough to
shove a flag on to the function f to trigger the optimised code path)
if this it is not truly pure tho despite the assertion it might cause
issues so in that case it probably is not something that should ship
with python by default?


More information about the Python-ideas mailing list