Intermittent Failure on Serial Port (Other thread code)

H J van Rooyen mail at microcorp.co.za
Tue Jun 13 04:24:44 EDT 2006


I would like to publicly thank Serge Orloff for the effort he has put in so far
and his patience...
He is a Scholar and a Gentleman.

Serge Orloff wrote:

| H J van Rooyen wrote:
|
| > Note that the point of failure is not the same place in the python file, but
it
| > is according to the traceback, again at a flush call...
|
| Yes, traceback is bogus. Maybe the error is raised during garbage
| collection, although the strace you've got doesn't show that. The main
| reason of the failure seems to be a workaround in python's function
| new_buffersize, it doesn't clear errno after lseek and then this errno
| pops up somewhere else. There are two places I can clearly see that
| don't clear errno: file_dealloc and get_line. Obviously this stuff
| needs to be fixed, so you'd better file a bug report.

Ouch! - I am new in this neck of the woods - what are the requirements for
something like this and where should I send it to so its useful? - so far its so
very vague in my mind that I am not sure that I can actually tell someone else
properly what's wrong - except for a "it does not work" bleat which is not very
illuminating...

| I'm not sure how
| to work around this bug in the meantime, since it is still not clear
| where this error is coming from. Try to pin point it.

I will put in a lot of try - except stuff looking for this errno 29 and see what
comes up and where.
Not sure if this will catch it but it may give a clue..

|For example, if
| your code relies on garbage collection to call file.close, try to close
| all files in your program explicitly. It seems like a good idea anyway,
| since your program is long running, errors during close are not that
| significant. Instead of standard close I'd call something like this:
|
| def soft_close(f):
|     try:
|         f.close()
|     except IOError, e:
|         print >>stderr, "Hmm, close of file failed. Error was: %s" %
| e.errno

As you remark - the code is long running - its supposed to work for ever and
come back up again if the power has failed - so for now the serial port is never
explicitly closed - I open and close the other files as I use them to try to
make sure the data is written to disk instead of just cached to memory.  I will
put this sort of thing in everywhere now to try and isolate whatever it is that
is biting me, not only on the close statements.

|
| > The "close failed" is explicable - it seems to happen during closedown, with
the
| > port already broken..,
|
| It is not clear who calls lseek right before close. lseek is called by
| new_buffersize that is called by file.read. But who calls file.read
| during closedown?

When I said closedown - I meant whatever the system does after the exception was
raised - I have not yet gotten as far as writing a clean close... - so far I am
concentrating on the polling protocol, to safely get the data from the readers
to the disk - port to file.... hence the name :-)

Now there is another thread running - it accesses files, (disk and a fifo to
trigger the disk write) but not the serial port - I have not laid any stress on
it because I thought it was irrelevant, but now I am not so sure - the code
follows below -

So question - is this error number a process global thing or is it local to a
thread or an object? - it could be this thread that calls read while the other
one is in the process of dying after the exception - it should not access the
port, though, although it repetitively reads a fifo... - come to think of it -
it could be this thread that first raises the ESPIPE for all I know (that is if
its global and not thread specific)...

def maintain_onsite(fifoname, filename):
 """Here we keep track of who is in, and who out of the site"""

 j = thread.get_ident()
 print 'New Thread identity printed by new thread is:', j
 pfifo  = open(fifoname,'r',1)    # Reading, line buffered
 unblock(pfifo)        # call some magic

 global on_site        #use top level dictionary to avoid a lot of copying

 s = ""
 d = {}

 while True:
  try:
   s = pfifo.readline()
  except IOError:
   time.sleep(1)
   continue
  if s == '':
   continue
  if s != 'goon\n':      # see if we got a go on signal
   continue
  d = on_site        # make a copy of the on site dictionary
  pfile = open(filename,'w',1)   # The file of people on site
  for x in d:
   pfile.write(x + ' ' + d[x] + '\n') # rewrite it - a bit brute force...
  pfile.close()
  s = ''         # clean out the receive string again

Here is unblock code:

# Some magic to make a file non blocking - from the internet

def unblock(f):
    """Given file 'f', sets its unblock flag to true."""

    fcntl.fcntl(f.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)


- Hendrik





More information about the Python-list mailing list