[Python-ideas] Make Python code read-only

Victor Stinner victor.stinner at gmail.com
Wed May 21 01:46:34 CEST 2014


2014-05-21 0:04 GMT+02:00 Eric Snow <ericsnowcurrently at gmail.com>:
> Make __readonly__ a data descriptor (getset in the C-API) on
> ModuleType, type, and FunctionType and people could toggle it as
> needed.

In my PoC, I chose to modify directly the builtin type "dict". I don't
think that I will keep this solution because I would prefer to not
touch such critical Python type. I may use a subclass instead. I added
a dict.setreadonly() method which can be used to make a dict
read-only, but a read-only dict cannot be made modifiable again.

I added a type.setreadonly() method which calls
type.__dict__.setreadonly(). I did this to access the underlying dict,
type.setreadonly() also works on builtin types like str. For example,
str.__dict__ is a mappingproxy, not the real dictionary.

> Alternately, the object structs for the 3 types (e.g. PyModuleObject)
> could each grow a "readonly" field (or an extra flag option if there
> is an appropriate flag).  The descriptor (in C) would use that instead
> of obj.__dict__['__readonly__'].  However, I'd prefer going through
> __dict__.

There is already a function.__readonly__ property (I just modified its
name, it was called __modifiable__ before, the opposite). It is used
to make a function read-only by importlib.

> Either way, the 3 types would share a tp_setattro implementation that
> checked the read-only flag.  That way there's no need to make sweeping
> changes to the 3 types, nor to the dict type.
>
> def __setattr__(self, name, value):
>     if self.__readonly__:
>         raise AttributeError('readonly')
>     super().__setattr__(name, value)

Are you sure that it's not possible to retrieve the underlying
dictionary somehow? For example, functions have a func.__dict__
attribute.

> Read-only by default would be backwards-incompatible, but having a
> commandline flag (and/or env var) to enable it would be useful.

My PoC had a PYTHONREADONLY env var to enable the read-only mode. I
just added a -r command line option for the same purpose.

It's disabled by default for backward compatibility. Only enable it if
you want to try my optimizations :-)

> For classes a decorator could be nice, though it should wait until it
> was more obviously worth doing.  I'm not sure it would matter for
> functions, though the same decorator would probably work.

I just pushed a change to make the classes read-only by default to
make also nested classes read-only. I modified the builtin
__build_class__ function for that.

The decorator is called after the class is defined, it's too late.
That's why I chose a class attribute.

>> One point remains unclear to me. There is a short time window between
>> a module is loaded and the module is made read-only. During this
>> window, we cannot rely on the read-only property of the code.
>> Specialized code cannot be used safetly before the module is known to
>> be read-only.
>
> How big a problem would this be in practice?

I have no idea right now :)

>> Issues with read-only code
>> ==========================
>>
>> * Currently, it's not possible to allow again to modify a module,
>> class or function to keep my implementation simple. With a registry of
>> callbacks, it may be possible to enable again modification and call
>> code to disable optimizations.
>
> With the data descriptor approach toggling read-only would work.
> Enabling/disabling optimizations at that point would depend on how
> they were implemented.

Hum, I should try to use your descriptor. I'm not sure that it works
for modules and classes. (Functions already have a __readonly__
property.)

>> * Lazy initialization of module variables does not work anymore. A
>> workaround is to use a mutable type. It can be a dict used as a
>> namespace for module modifiable variables.
>
> What do you mean by "lazy initialization of module variables"?

To reduce the memory footprint, "large" precomputed tables of the
base64 module are only filled at the first call of the function
needing the tables.

I also saw in other modules that a module is only imported the first
time that is it loaded. Example: "def _lazy_import_sys(): global sys;
import sys" and then "if sys is None: _lazy_import_sys(); # use sys".

>> * It is not possible yet to make the namespace of packages read-only.
>> For example, "import encodings.utf_8" adds the symbol "utf_8" to the
>> encodings namespace. A workaround is to load all submodules before
>> making the namespace read-only. This cannot be done for some large
>> modules. For example, the encodings has a lot of submodules, only a
>> few are needed.
>
> If read-only is only enforced via __setattr__ then the workaround is
> to bind the submodule directly via pkg.__dict__.

I don't like the idea of an "almost" read-only module object.

In one of my project, I would like to emit machine code. If a module
is modified whereas the machine code relies on the module read-only
property, Python may crash.

Victor


More information about the Python-ideas mailing list