A design problem

Wed Jan 30 23:35:23 EST 2008

En Thu, 31 Jan 2008 01:57:41 -0200, Dan Upton <upton at virginia.edu>  
escribió:

> Or: How to write Python like a Python programmer, not a Java
> programmer.  This will be a little long-winded...
>
> So I just recently started picking up Python, mostly learning the new
> bits I need via Google and otherwise cobbling together the functions
> I've already written.  It occurred to me though that one of my
> programs was still probably written very much like I would in Java
> (part of the reason I'm picking up Python is I'm tired of my coworkers
> making fun of me for writing parsing/reformatting programs in Java).

Maybe you've already read this, but I'll post the links anyway:
http://dirtsimple.org/2004/12/python-is-not-java.html
http://dirtsimple.org/2004/12/java-is-not-python-either.html

> Anyway, basically here's the problem I have:
>
> -Fork off n copies of a program, where n is a command line parameter,
> and save their PIDs.  The way I've been accomplishing this is
> basically:
>
> processes=[]
> for i in range(numProcs):
>    pid=os.fork()
>    if pid == 0:
>       # do the forking
>    else:
>       processes.append(pid)

Looks fine to me.

> -Every so much time (say, every second), I want to ask the OS
> something about that process from under /proc/pid (this is on Linux),
> including what core it's on.
> while 1:
>    for i in processes:
>       file = open("/proc/"+str(i)+"/stat")

(I hope there is a time.sleep(1) after processing that)
- don't use file as a variable name, you're shadowing the builtin file  
type.
- "i" is very much overloaded as a variable name everywhere... I'd use pid  
instead
- string interpolation looks better (and is faster, but that's not so  
relevant here)
     for pid in processes:
         statfile = open("/proc/%d/stat" % pid)

>> From that, one of the pieces of data I'll get is which core it's
> running on, which then will prompt me to open another file.
> Ultimately, I want to have n files, that are a bunch of lines:
> corenum data1 data2 ...
> corenum data1 data2 ...
> ...
>
> and so on.  The way I was going to approach it was to every time
> through the loop, read the data for one of the processes, open its
> file, write out to it, and close it, then do the same for the next
> process, and so on.  Really though I don't need to be able to look at
> the data until the processes are finished, and it would be less I/O,
> at the expense of memory, to just save all of the lists of data as I
> go along and then dump them out to disk at the end of the Python
> program's execution.  I feel like Python's lists or dictionaries
> should be useful here, but I'm not really sure how to apply them,
> particularly in a "Python-like" way.
>
> For anybody who made it all the way through that description ;) any  
> suggestions?

The simplest solution would be to use a tuple to store a row of data. You  
know (implicitely) what every element contains: the first item is  
"corenum", the second item is "data1", the third item is "data2" and so  
on... (based on your example above).
Collect those tuples (rows) into a list (one list per process), and  
collect all lists into a dictionary indexed by pid.

That is, at the beginning, create an empty dictionary:
     info = {}

After each forking, at the same time you save the pid, create the empty  
list:
     info[pid] = []

After you read and process the /proc file to obtain what you want, apend a  
new element to that list:
     info[pid].append((corenum, data1, data2, ...))
(notice the double parenthesis)

At the end, write all that info on disk. The csv module looks like a good  
candidate:

import csv

for pid in processes:
     writer = csv.writer(open("process-%d.csv" % pid, "wb"))
     writer.writerows(info[pid])

That's all
-- 
Gabriel Genellina