[Tutor] sockets, files, threads

Wed Jan 19 02:29:17 CET 2005

Thank you so much Danny.  I know how hard it is to look at and comment
on other people's code.  You know I teach C and Python and I have to
say, though, that reading students' Python is 100 times easier than
reading their C.

And, I hope you're feeling better.  I hate to think of you struggling
through my code when you should be resting.

> 
> Hi Marilyn,
> 
> 
> [Long program comments ahead.  Please forgive me if some of the comments
> are overbearing; I'm trying to get over a cold, and I'm very grumpy.
> *grin*]
> 
> 
> Some comments on the code follow.  I'll be focusing on:
> 
> > http://www.maildance.com/python/doorman/py_daemon.py
> 
> One of the import statements:
> 
> ###
> from signal import *
> ###
> 
> 
> may not be safe.  According to:
> 
>     http://docs.python.org/tut/node8.html#SECTION008410000000000000000
> 
> """Note that in general the practice of importing * from a module or
> package is frowned upon, since it often causes poorly readable code.

Note that it says it is frowned upon, and I almost never do it,
because it 'often causes poorly readable code' -- but not in this
case.  There is only signal.signal and signal.alarm and the signal
numbers, none of which are easy to mistake for something else.  I have
never seen 'unsafe' --unless you cause yourself a bug by hiding
something from yourself, which would be hard to do in this case.

But, ok, to encourage you to make suggestions, I used the regular
import on signals.  No change to the bugs.

> However, it is okay to use it to save typing in interactive sessions, and
> certain modules are designed to export only names that follow certain
> patterns."""
> 
> 
> There's a block of code in Server.run() that looks problematic:

Darn it!  I managed to put up a hybrid version.  Here's what I thought
I should have, to wrap a lock around the creation of the file
descriptor for the client socket:

        self.server_socket.setblocking(0)
        while 1:
            if log.level & log.calls:
                log.it("fd%d:py_daemon.py: Waiting ...", self.descriptor)
            try:
                file_lock.acquire()
                try:
                    client_socket, client_addr = self.server_socket.accept()
                finally:
                    file_lock.release()
            except socket.error, msg:
                if str(msg).find('Resource temporarily unavailable') > -1:
                    time.sleep(.2)
                    continue
            except (EOFError, KeyboardInterrupt):
                self.close_up()
            Spawn(client_socket).start()

Not that it matters to your suggestions.  But without locks, this
reduces to:

        while 1:
            if log.level & log.calls:
                log.it("fd%d:py_daemon.py: Waiting ...", self.descriptor)
            try:
                client_socket, client_addr = self.server_socket.accept()
            except (EOFError, KeyboardInterrupt):
                self.close_up()
            Spawn(client_socket).start()

But, I take it, you want me to put Spawn(client_socket).start() in that
try: clause.

> 
> The problem is that, as part of program flow, it appears to run after the
> try block.  But in one particular case of program flow, 'client_socket'
> will not be set to a valid value.  It is better to put that statement a

I don't understand.  Which particular case of program flow will
'client_socket' be invalid and yet make it through the socket.accept
call?

> few lines up, right where 'client_socket' is initialized.  Like this:
> 
> ###
>             try:
>                 client_socket, client_addr = self.server_socket.accept()
>                 Spawn(client_socket).start()
>             except socket.error, msg:
>                 time.sleep(.5)
>             except (EOFError, KeyboardInterrupt):
>                 self.close_up()
> ###
> 
> Not only does this make it more clear where 'client_socket' is being used,
> but it ends up making the code shorter.  I've dropped the 'continue'
> statement, as it becomes superfluous when the Spawn() moves into the try's
> body.

But but but, that wraps the whole thread in one try/except clause.
That Spawn() call starts the thread.  If a client_socket generates an
error, we don't want the main thread to sleep.  All the socket
operations in Spawn.start() are also wrapped in their own (hopefully)
intelligent try/excepts.  I thought it was good practice to -not- put
extra stuff in a try/except.

> 
> In fact, the placement of that statement may account partially for the
> error message that you were running into earlier.  The earlier error
> message:
> 
> ###
> close failed: [Errno 9] Bad file descriptor
> ###
> 
> can easliy occur if there's an EOFError or KeyboardInterrupt under the
> original code.  And although KeyboardInterrupt might be unusual, I'm not
> so sure if EOFError is.

I thought I listed exactly the two errors that come from pounding on
the keyboard.  I try to be careful not to mask any tracebacks.

And, the original way, this error only gets triggered while we are
waiting for a client to connect, not for anything else.

> 
> Furthermore, the 'except' blocks can emit more debugging information.
> You mentioned earlier that you're not getting good error tracebacks when
> exceptions occur.  Let's fix that.
> 
> Use the 'traceback' module here for all except blocks in your program.
> This will ensure that you get a good stack trace that we can inspect.

> 
> I'm using Python 2.4's format_exc() function to get the stack trace as a
> string.  If this version is not available to you, you can use this

Yes.  I'm using 2.4.

> workaround format_exc():
> 

> Once you've made the correction and augmented all the except blocks with

ALL of them?  That seems a similar practice to putting locks
everywhere?  Is there a sort of except clause that I'm writing that
you think might be problematic?

> traceback logging, try running your program again.  We should then be
> better able to trace down the root cause of those errors.
> 
> 
> Let me just look through a little bit more, just to see what else we can
> pick out.
> 
> >From what I read of the code so far, I believe that a lot of the locking
> that the code is doing is also superfluous.  You do not need to lock local
> variables, nor do you have to usually lock things during object

Where did I lock local variables?  The things I lock are all calls to
operations that create file descriptors with the operating system, I
think.  Except I think the experiment is in there where I tried
locking os.listdir, but it didn't help.

> initialization.  For example, FileReader.__init__()  has the following
> code:
> 
> ###
> class FileReader(TokenReader):
>     def __init__(self, file_socket):
>         self.local_name = file_socket.local_name
>         self.freader_name = '/tmp/fsf%s' % self.local_name
>         file_lock.acquire()
>         self.freader = open(self.freader_name, "w+")
>         file_lock.release()
> ###
> 
> The locks around the open() calls are unnecessary: you do not need to
> synchronize file opening here, as there's no way for another thread to get
> into the same initializer of the same instance.

But I think that I'm only wrapping the calls that create file
descriptors, because those get trampled on.  

> 
> In fact, as far as I can tell, none of the Spawn() threads are
> communicating with each other.  As long as your threads are working

Yes.  None communicate with each other.

So, why was I doing this?  I thought it was the obvious architecture
for what I was doing.  I didn't know it would be such a bear.

> independently of each other --- and as long as they are not writing to
> global variables --- you do not need locks.

I took it all out and nothing seemed to change, like you said.

> 
> In summary, a lot of that locking code isn't doing anything, and may
> actually be a source of problems if we're not careful.
> 
> 
> 
> The other classes are complex enough that I actually don't trust either of
> them.  I am a very paranoid person.  *grin*
> 
> FileReader appears to reimplement a temporary-file implementation: I'd
> strongly recommend using tempfile.TemporaryFile instead of reimplementing
> its functionality.  Either that, or use StringIO() to suck the whole file
> into memory.  How large do you expect a single email message to be?

As big as an email can be.  Huge and more huge.  We don't want to
think about this program when big emails come in.

When I first made my program to read pop-servers, I used the poplib
library and Charlie immediately tested my code with some big messages
and it brought the machine to its knees.  So I rewrote it only using
_socket for the real work, and keeping nothing in memory, and: boom!
it is now so fast and takes almost no extra memory in the python part.

And since then, please don't shoot me, but I don't immediately trust
the modules.  I read them and see how many times they loop through the
data, and how many copies of the data they put into memory -- and
usually decide to write the simple things I need myself, looping zero
times and keeping only one block in memory.

> 
> There appears to be missing symmetry between the open() and the close()
> calls in the code, and that's a sign of possible trouble.  I try to
> structure most file-handling code with the following structure:
> 
> ###
> f = open(somefile)
> try:
>     ... do some stuff with f
> finally:
>     f.close()
> ###
> 
> Code that doesn't follow this idiom might be susceptable to resource
> leakage: we might forget to close() something.  This idiom's probably not
> as important in Python, since we have garbage collection, but I still try
> to follow it to prevent things like running out of file handles too
> quickly.  *grin*

I try to put the closes with the opens too, but it's not often
practical.  I do log my openings and closings, as you can see, and
count them for leakage.

> 
> The eval() architecture that Spawn uses to dispatch to either
> calls.acl_rcpt, calls.route_mail, calls.route_errors, calls.doorman, or
> calls.doorman_errors is dangerous.  I know that in prior email, you
> mentioned that you were using eval().
> 
> I had misgivings then, and I definitely have them now: the program calls
> eval() based on stuff that's coming from a socket, and that's just asking
> for trouble.

Well, remember that it's an internal socket and the calls are
hardcoded in the exim configuration file.

> 
> I strongly recommend modifying Spawn.self() to something like this:
> 
> ######
> def tokenToDispatchFunction(self, prog):
>     """Given a command, returns the correct dispatch function.  If prog
>     is incorrect, throws a LookupError."""
>     dispatchTable = { 'acl_rept' : calls.acl_rept,
>                       'route_errors' : calls.route_errors,
>                       'doorman' : calls.doorman,
>                       'doorman_errors' : calls.doorman_errors'
>                     }
>     return dispatchTable[token].main
> 
> 
> def run(self):
>      """Executes one of the calls program."""
>      argv = []
>      dispatch = self.tokenToDispatchFunction(self.exim_io.call)
>      dispatch(argv)
> ######
> 
> I've oversimplified ("broken") the code here; you'll need to reinsert the
> argument-construction stuff and the logging.  But other than that, this
> should have the same external-command calling functionality as the
> original code, with additional safety.
> 
> The danger is that the tokens that we are reading from the socket can be
> arbitrarily formed.  One potential token that can be written might be:
> 
> ###
> evil = '__name__[__import__("os").system("ls"+chr(32)+"/")].py'
> ###
> 
> 'evil' is nonsensical, but according to the tokenization code, this is a
> single token, since it does not contain spaces.
> 
> 
> Imagine that that weird string is the first token that
> FileSocket.find_call() returns to us.  Back in Spawn.run(), under the
> original code, we end up trying to evaluate a string that looks like:
> 
>     'call.__name__[__import__("os").system("ls"+chr(32)+"/")].main(argv)'
> 
> which in fact will break with a runtime error, but not before doing
> mischief.  This is exactly the kind of devious exploit that eval() opens
> up.  It's hard to close the bottle after the genie's out.
> 

I'm sorry that you didn't realize that the socket input is hard-coded
on the other side.  To get to my socket, you must break into our
machine, and by then, wrecking havoc will be much easier than this.

> 
> The revised code that uses a dispatchTable method is much more resiliant
> to this kind of attack.  If someone tries to send malformed commands, the
> worse that can happen is a KeyError.  More than that, if we look back at
> that tokenToDispatchFunction() again:
> 
> ###
> def tokenToDispatchFunction(self, prog):
>     dispatchTable = { 'acl_rept' : calls.acl_rept,
>                       'route_errors' : calls.route_errors,
>                       'doorman' : calls.doorman,
>                       'doorman_errors' : calls.doorman_errors'
>                }
>     return dispatchTable[token].main
> ###
> 
> it's much more clear what possible calls can happen.
> 

Yes.  I like it, but not because of security concerns, but because of
the clarity.  I also simplified my protocol so now the Spawn.start()
is just this:

    def start(self):
        '''Given the command, provides the function to call.'''
        global TESTING
        function =  { 'acl_rcpt.py' : calls.acl_rcpt,
                      'route_mail.py' : calls.route_mail,
                      'route_errors.py' : calls.route_errors,
                      'doorman.py' : calls.doorman,
                      'doorman_errors.py' : calls.doorman_errors,
                      'is_known_to.py' : is_known_to 
                      }
        if log.level & log.calls:
            log.it('fd%d: %s: %s', self.descriptor, self.exim_io.call, self.exim_io.collector_name)
        function[self.exim_io.call].main(self.exim_io)

So nice!  Thank you.

But, I did take out threading and the big error went away.  I'm done
with threading, unless I see a big need one day.  I don't know what
I'll tell students from here on.

> 
> Anyway, my eyes are going dim, so I'd better stop right now.  *grin* Sorry
> for taking so much space; I hope this helps!
> 
> 
> I thought about this a little bit more: there is one place where you do
> need locks.  Locking needs to be done around Spawn's setting of its 'no'
> class attribute:
> 
> ###
> class Spawn(threading.Thread):
>     no = 1
>     def __init__(self, client_socket):
>         Spawn.no = Spawn.no + 2 % 30000
>         ## some code later...
>         self.local_name = str(Spawn.no)
> ####
> 
> The problem is that it's possible that two Spawn threads will be created
> with the same local_name, because they are both using a shared resource:
> the class attribute 'no'.  Two thread executions can interleave.  Let's
> draw it out.
> 

But, notice that the call to:

        threading.Thread.__init__(self, name = self.local_name)

came after, so the Spawn.no manipulation always happens in the main
thread.

OK?

> Anyway, I hope this helps!

Of course.  Code review is always a huge help, and you gave me some
good ideas.  I'm very grateful.

One problem remains after removing the threading stuff.  I still get
those pesky:

close failed: [Errno 9] Bad file descriptor

even though my logged openings and closings match up one-to-one.  Now
then, I haven't added that line of code to each except clause yet
because 1) it doesn't seem to cause any problem 2) it's a bunch of
busy-work and 3) I'm hoping that, when your illness clears, you'll
think of something less brute-force for me to do.

What about when I do an explicit call to a close inside a __del__.  Is
that a bad idea?

Again, thank you thank you thank you.  When I meet you (we are both in
the Bay Area) I'll give you a huge hand-shake for sure.

Marilyn