[Tutor] intercepting and recored I/O function calls
Jojo Mwebaze
jojo.mwebaze at gmail.com
Fri Sep 17 10:55:28 CEST 2010
Thanks Martin, for the detailed break down, it actually helped me solve one
of my other problems..
My applogies to begin with, it seems i didnt state my problem clearly for
this particular case - perharps I/O was not the best way to describe my
problem.
We have a system fully developed in python- with thousands of lines of code-
i know i can use the logging facilty for this task, but this means i have to
go into the code and edit it to log specifics of what i need, this is gone
take a while and this also implies all other users adding modules must
include logging statements..
Specifically, i would like to track all inputs/output to modules/functions -
if a module retrieved and used files and run some analysis on them and
produced other files in return, i would like to take not of this. i.e what i
want is to recored input and outputs to a module. and also to record all
paramaters, attribute vaules used by the same module.
I thought i would build a wrapper around the orignial python program or
probably pick this information at OS level.
Sorry for the confusion..
Jojo
On Fri, Sep 17, 2010 at 12:45 AM, Martin A. Brown <martin at linux-ip.net>wrote:
>
> [apologies in advance for an answer that is partially off topic]
>
> Hi there JoJo,
>
> : I could begin with tracing I/O calls in my App.. if its
> : sufficient enough i may not need i/o calls for the OS.
>
> What do you suspect? Filesystem I/O?
>
> * open(), close(), opendir() closedir() filesystem latency?
> * read(), write() latency?
> * low read() and write() throughput?
>
> Network I/O?
>
> * Are name lookups taking a long time?
> * Do you have slow network throughput? (Consider tcpdump.)
>
> Rather than writing code (at first glance), why not use a system
> call profiler to check this out. It is very unlikely that python
> itself is the problem. Could it be the filesystem/network? Could
> it be DNS? A system call profiler can help you find this.
>
> Are you asking this because you plan on diagnosing I/O performance
> issues in your application? Is this a one time thing in a
> production environment that is sensitive to application latency?
> If so, you might try tickling the application and attaching to the
> process with a system call tracer. Under CentOS you should be able
> to install 'strace'. If you can run the proggie on the command
> line:
>
> strace -o /tmp/trace-output-file.txt -f python yourscript.py args
>
> Then, go learn how to read the /tmp/trace-output-file.txt.
>
> Suggested options:
>
> -f follow children
> -ttt sane Unix-y timestamps
> -T total time spent in each system call
> -s 256 256 byte limit on string output (default is 32)
> -o file store trace data in a file
> -p pid attach to running process of pid
> -c only show a summary of cumulative time per system call
>
> : > But this is extremely dependant on the Operating System - you will
> : > basically have to intercept the system calls. So, which OS are
> : > you using? And how familiar are you with its API?
> :
> : I am using centos, however i don't even have admin privileges.
> : Which API are you referring to?
>
> You shouldn't need admin privileges if you can run the program as
> yourself. If you have setuid/setgid bits, then you will need
> somebody with administrative privileges to help you.
>
> OK, so let's say that you have already done this and understand all
> of the above, you know it's not the system and you really want to
> understand where your application is susceptible to bad performance
> or I/O issues. Now, we're back to python land.
>
> * look at the profile module
> http://docs.python.org/library/profile.html
>
> * instrument your application by using the logging module
> http://docs.python.org/library/logging.html
>
> You might ask how it is a benefit to use the logging module. Well,
> if your program generates logging data (let's say to STDERR) and you
> do not include timestamps on each log line, you can trivially add
> timestamps to the logging data using your system's logging
> facilities:
>
> { python thingy.py >/dev/null ; } 2>&1 | logger -ist 'thingy.py' --
>
> Or, if you like DJB tools:
>
> { python thingy.py >/dev/null ; } 2>&1 | multilog t ./directory/
>
> Either of which solution leaves you (implicitly) with timing
> information.
>
> : > Also, While you can probably do this in Python but its likely
> : > to have a serious impact on the OS performance, it will slow
> : > down the performamce quite noticeably. I'd normally recommend
> : > using C for something like this.
>
> Alan's admonition bears repeating. Trapping all application I/O is
> probably just fine for development, instrumenting and diagnosing,
> but you may wish to support that in an easily removable manner,
> especially if performance is paramount.
>
> Good luck,
>
> -Martin
>
> --
> Martin A. Brown
> http://linux-ip.net/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100917/abef7828/attachment.html>
More information about the Tutor
mailing list