Basic Python Query
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Thu Aug 22 23:28:12 EDT 2013
On Thu, 22 Aug 2013 13:54:14 +0200, Ulrich Eckhardt wrote:
> Firstly, there is one observation: The Python object of type Thread is
> one thing, the actual thread is another thing. This is similar to the
> File instance and the actual file. The Python object represents the
> other thing (thread or file) but it "is not" this thing. It is rather a
> handle to the file or thread. This is different for e.g. a string, where
> the Python object is the string.
Well, not quite. To users coming from other languages, "string" has a
clear and common meaning; it's an array of characters, possibly fixed-
width in older languages, but these days usually variable-width but
prefixed with the length (as in Pascal) or suffixed with a delimiter
(usually \0, as in C). Or occasionally both.
So as far as those people are concerned, Python strings aren't just a
string. They are rich objects, with an object header. For example, we can
see that there is a whole bunch of extra "stuff" required of a Python
string before you even get to the array-of-characters:
py> sys.getsizeof('')
25
25 bytes to store an empty string. Even if it had a four-byte length, and
a four-byte NUL character at the end, that still leaves 17 bytes
unaccounted for. So obviously Python strings contain a whole lot more
than just low-level Pascal/C strings.
So while I agree that it is sometimes useful to distinguish between a
Python Thread object and the underlying low-level thread data structure
it wraps, we can do the same with strings (and floats, and lists, and
everything really). In any case, it's rare to need to do so.
> Due to this pairing between the actual thing and the handle, there is
> also some arity involved. For a single thread or file, there could be
> multiple Python objects for handling it, or maybe even none.
I don't think this is correct for threads. I don't believe there is any
way to handle a low-level thread in Python except via an object of some
sort. (With files, you can use the os module to work with low-level OS
file descriptors, which are just integers.)
> When the
> Python object goes away, it doesn't necessarily affect the thread or
> file it represents.
That's certainly not true with file objects. When the file object goes
out of scope, the underlying low-level file is closed.
> This already casts a doubt on the habit of deriving
> from the Thread type, just like deriving from the File type is highly
> unusual, as you are just deriving from a handle class.
In Python 3, there is no "File" type. There are *multiple* file types,
depending on whether you open a file for reading or writing in binary or
text mode:
py> open('/tmp/junk', 'wb')
<_io.BufferedWriter name='/tmp/junk'>
py> open('/tmp/junk', 'rb')
<_io.BufferedReader name='/tmp/junk'>
py> open('/tmp/junk', 'w')
<_io.TextIOWrapper name='/tmp/junk' mode='w' encoding='UTF-8'>
But even if we limit the discussion to Python 2, it is unusual to inherit
from File because File already does everything we normally want from a
file. There's no need to override methods, so why make your own subclass?
On the other hand, threads by their very nature have to be customized.
The documentation is clear that there are two acceptable ways to do this:
This class represents an activity that is run in a separate
thread of control. There are two ways to specify the activity:
by passing a callable object to the constructor, or by
overriding the run() method in a subclass.
http://docs.python.org/2/library/threading.html#thread-objects
So to some degree, it is just a matter of taste which you use.
[...]
> In summary, I find that modelling something to "use a thread" is much
> clearer than modelling it as "is a thread".
The rest of your arguments seem good to me, but not compelling. I think
they effectively boil down to personal taste. I write lots of non-OOP
code, but when it comes to threads, I prefer to subclass Thread.
--
Steven
More information about the Python-list
mailing list