[Python-ideas] Hooks into the IO system to intercept raw file reads/writes

Andrew Barnert abarnert at yahoo.com
Mon Feb 2 19:07:28 CET 2015


On Feb 2, 2015, at 6:53, Paul Moore <p.f.moore at gmail.com> wrote:

> There's a lot of flexibility in the new layered IO system, but one
> thing it doesn't allow is any means of adding "hooks" to the data
> flow, or manipulation of an already-created io object.

Why do you need to add hooks to an already-created object? Why not just create the subclassed (and effectively hooked) object in place of the original one? Obviously that requires building the stack of raw/buffered/text manually instead, which requires a few extra lines in subprocess or wherever else you want to do it, but that doesn't seem like a huge burden for something this uncommon.

The advantage is that you don't need to worry about exposing the buffer, making raw read-write, or anything else complicated. The disadvantage is that you can't do this if you've already started reading or writing--but that doesn't seem to apply to this use case, or to most other potential use cases.

As for Guido's concerns about subclassing as an API mechanism: You can easily translate this into a request to replace the os.read and os.write calls used by a raw io object; then, whether you do that externally or in a subclass, you get the same result.

The problem, either way, is that RawIOBase doesn't actually call os.read. Each implementation of the ABC does something different. For FileIO, I'm pretty sure it reads directly at the C level. A socket file calls recv on the socket. And so on. So, how does that affect your proposal?

> For example, when a subprocess.Popen object uses a pipe for the
> child's stdout, the data is captured instead of writing it to the
> console. Sometimes it would be nice to capture it, but still write to
> the console. That would be easy to do if we could wrap the underlying
> RawIOBase object and intercept read() calls[1]. A subclass of
> RawIOBase can do this trivially, but there's no way of replacing the
> class on an existing stream.
> 
> The obvious approach would be to reassign the "raw" attribute of the
> BufferedIOBase object, but that's readonly. Would it be possible to
> make it read/write?

Making it read/write is a couple lines of
C (plus some Python code for implementations that use pyio instead of _io). The problem is the buffer. If the raw you're replacing happens to reference the same file descriptor (or, I guess, another fd for the same file with the same file position) it would all work, but that seems to be stretching "consenting adults" freedom a bit.

Also, you still need some way to construct your HookedRawIO subclassed object. Is HookedRawIO a wrapper that provides the RawIOBase interface but delegates to another RawIOBase? Or does it share or take over or dup the fd? Or ...

> Or provide another way of replacing the raw IO
> object underlying an io object?

Somewhere in the bug database, someone (I think Nick Coghlan?) suggested a rewrap method on TextIOWrapper, which gives you a new TextIOWrapper around the same buffer (allowing you to override the other params). If the same idea were extended to the buffer classes, and if it allowed you to also replace the raw or buffer object (so the only thing you're "rewrapping" is the internal state), you could construct a HookedRawIO, then call rewrap on the buffer replacing it's raw, then call rewrap on the text replacing its buffer, then set the stdout attribute of the popen. But I'm not sure that's any cleaner.

> I'm sure there are buffer integrity issues to work out, but are there
> any more fundamental problems with this approach?
> 
> Paul
> 
> [1] Actually, it's *not* that easy, because subprocess.Popen objects
> are insanely hard to subclass - there are no hooks into the pipe
> creation process, and no way to intercept the object before the
> subprocess gets run (that happens in the __init__ method). But that's
> a separate issue, and also the subject of a different thread here.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


More information about the Python-ideas mailing list