[Tutor] intercepting and recored I/O function calls

Fri Sep 17 10:55:28 CEST 2010

Thanks Martin, for the detailed break down, it actually helped me solve one
of my other problems..

My applogies to begin with,  it seems i didnt state my problem clearly for
this particular case - perharps I/O was not the best way to describe my
problem.

We have a system fully developed in python- with thousands of lines of code-
i know i can use the logging facilty for this task, but this means i have to
go into the code and edit it to log specifics of what i need, this is gone
take a while and this also implies all other users adding modules must
include logging statements..

Specifically, i would like to track all inputs/output to modules/functions -
if a module  retrieved and used files and run some analysis on them and
produced other files in return, i would like to take not of this. i.e what i
want is to recored input and outputs to a module. and also to record all
paramaters, attribute vaules used by the same module.

I thought i would build a wrapper around the orignial python program or
probably pick this information at OS level.

Sorry for the confusion..

Jojo

On Fri, Sep 17, 2010 at 12:45 AM, Martin A. Brown <martin at linux-ip.net>wrote:

>
> [apologies in advance for an answer that is partially off topic]
>
> Hi there JoJo,
>
>  : I could begin with tracing I/O calls in my App.. if its
>  : sufficient enough i may not need i/o calls for the OS.
>
> What do you suspect?  Filesystem I/O?
>
>  * open(), close(), opendir() closedir() filesystem latency?
>  * read(), write() latency?
>  * low read() and write() throughput?
>
> Network I/O?
>
>  * Are name lookups taking a long time?
>  * Do you have slow network throughput?  (Consider tcpdump.)
>
> Rather than writing code (at first glance), why not use a system
> call profiler to check this out.  It is very unlikely that python
> itself is the problem.  Could it be the filesystem/network?  Could
> it be DNS?  A system call profiler can help you find this.
>
> Are you asking this because you plan on diagnosing I/O performance
> issues in your application?  Is this a one time thing in a
> production environment that is sensitive to application latency?
> If so, you might try tickling the application and attaching to the
> process with a system call tracer.  Under CentOS you should be able
> to install 'strace'.  If you can run the proggie on the command
> line:
>
>  strace -o /tmp/trace-output-file.txt -f python yourscript.py args
>
> Then, go learn how to read the /tmp/trace-output-file.txt.
>
> Suggested options:
>
>  -f        follow children
>  -ttt      sane Unix-y timestamps
>  -T        total time spent in each system call
>  -s 256    256 byte limit on string output (default is 32)
>  -o file   store trace data in a file
>  -p pid    attach to running process of pid
>  -c        only show a summary of cumulative time per system call
>
>  : > But this is extremely dependant on the Operating System - you will
>  : > basically have to intercept the system calls. So, which OS are
>  : > you using?  And how familiar are you with its API?
>  :
>  : I am using centos, however i don't even have admin privileges.
>  : Which API are you referring to?
>
> You shouldn't need admin privileges if you can run the program as
> yourself.  If you have setuid/setgid bits, then you will need
> somebody with administrative privileges to help you.
>
> OK, so let's say that you have already done this and understand all
> of the above, you know it's not the system and you really want to
> understand where your application is susceptible to bad performance
> or I/O issues.  Now, we're back to python land.
>
>  * look at the profile module
>    http://docs.python.org/library/profile.html
>
>  * instrument your application by using the logging module
>    http://docs.python.org/library/logging.html
>
> You might ask how it is a benefit to use the logging module.  Well,
> if your program generates logging data (let's say to STDERR) and you
> do not include timestamps on each log line, you can trivially add
> timestamps to the logging data using your system's logging
> facilities:
>
>  { python thingy.py >/dev/null ; } 2>&1 | logger -ist 'thingy.py' --
>
> Or, if you like DJB tools:
>
>  { python thingy.py >/dev/null ; } 2>&1 | multilog t ./directory/
>
> Either of which solution leaves you (implicitly) with timing
> information.
>
>  : > Also, While you can probably do this in Python but its likely
>  : > to have a serious impact on the OS performance, it will slow
>  : > down the performamce quite noticeably. I'd normally recommend
>  : > using C for something like this.
>
> Alan's admonition bears repeating.  Trapping all application I/O is
> probably just fine for development, instrumenting and diagnosing,
> but you may wish to support that in an easily removable manner,
> especially if performance is paramount.
>
> Good luck,
>
> -Martin
>
> --
> Martin A. Brown
> http://linux-ip.net/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100917/abef7828/attachment.html>