Opposite of split

Stefan Schwarzer sschwarzer at sschwarzer.net
Tue Aug 17 10:14:05 EDT 2010


Hi Alex,

On 2010-08-16 18:44, Alex van der Spek wrote:
> Anybody catches any other ways to improve my program (attached), you are 
> most welcome. Help me learn, that is one of the objectives of this 
> newsgroup, right? Or is it all about exchanging the next to impossible 
> solution to the never to happen unreal world problems?

I don't know what a concordance table is, and I haven't
looked a lot into your program, but anyway here are some
things I noticed at a glance:

| #! usr/bin/env python
| # Merge log files to autolog file
| import os
| import fileinput
| #top='C:\\Documents and Settings\\avanderspek\\My Documents\\CiDRAdata\\Syncrude\\CSL\\August2010'
| top='C:\\Users\\ZDoor\\Documents\\CiDRA\\Syncrude\CSL\\August2010'

If you have backslashes in strings, you might want to use
"raw strings". Instead of "c:\\Users\\ZDoor" you'd write
r"c:\Users\ZDoor" (notice the r in front of the string).

| i,j,k=0,0,0
| date={}

I suggest to use more spacing to make the code more
readable. Have a look at

http://www.python.org/dev/peps/pep-0008/

for more formatting (and other) tips.

| fps=0.3048
| tab='\t'
|
| bt='-999.25'+'\t''-999.25'+'\t''-999.25'+'\t''-999.25'+'\t'+'-999.25'

If these numbers are always the same, you should use
something like

NUMBER = "-999.25"
COLUMNS = 5
bt = "\t".join(COLUMNS * [NUMBER])

(with better naming, of course).

Why don't you use `tab` here?

I _highly_ recommend to use longer (unabbreviated) names.

| al='Status'+'\t'+'State'+'\t'+'-999.25'
|
| for root,dirs,files in os.walk(top):
|     #Build a concordance table of days on which data was collected
|     for name in files:
|         ext=name.split('.',1)[1]

There's a function `splitext` in `os.path`.

|         if ext=='txt':
|             dat=name.split('_')[1].split('y')[1]
|             if dat in date.keys():

You can just write `if dat in date` (in Python versions >=
2.2, I think).

|                 date[dat]+=1
|             else:
|                 date[dat]=1
|     print 'Concordance table of days:'
|     print date
|     print 'List of files processed:'
|     #Build a list of same day filenames, 5 max for a profile meter,skip first and last days
|     for f in sorted(date.keys())[2:-1]:
|         logs=[]
|         for name in files:
|             ext=name.split('.')[1]
|             if ext=='txt':
|                 dat=name.split('_')[1].split('y')[1]

I guess I'd move the parsing stuff (`x.split(s)[i]` etc.)
into small functions with meaningful names. After that I'd
probably notice there's much redundancy and refactor them. ;)

|                 if dat==f:
|                     logs.append(os.path.join(root,name))
|         #Open the files and read line by line
|         datsec=False
|         lines=[[] for i in range(5)]

One thing to watch out for: The above is different from
`[[]] * 5` which uses the _same_ empty list for all entries.
Probably the semantics you chose is correct.

|         fn=0
|         for line in fileinput.input(logs):
|             if line.split()[0]=='DataID':
|                 datsec=True
|                 ln=0
|             if datsec:
|                 lines[fn].append(line.split())
|                 ln+=1
|                 if ln==10255:

This looks like a "magic number" and should be turned into a
constant.

|                     datsec=False
|                     fileinput.nextfile()
|                     fn+=1
|                     print fileinput.filename().rsplit('\\',1)[1]
|         fileinput.close()
|         aut='000_AutoLog'+f+'.log'
|         out=os.path.join(root,aut)
|         alf=open(out,'w')
|         alf.write('Timestamp (mm/dd/yyyy hh:mm:ss)	VF 1	VF 2	VF 3	VF 4	VF 5	Q 1	Q 2	Q 3	Q 4	Q 5	Status	State	Metric	Band Temperature 1	Band Temperature 2	Band Temperature 3	Band Temperature 4	Band Temperature 5	SPL 1	SPL 2	SPL 3	SPL 4	SPL 5'+'\n')
|         for wn in range(1,10255,1):

You don't need to write the step argument if it's 1.

|             for i in range(5):
|                 lines[i][wn][2]=str(float(lines[i][wn][2])/fps)
|             tp=lines[0][wn][0]+' '+lines[0][wn][1]
|             vf=tab.join([lines[i][wn][2] for i in range(5)])
|             vq=tab.join([lines[i][wn][3] for i in range(5)])
|             vs=tab.join([lines[i][wn][4] for i in range(5)])
|             #sf=tab.join([lines[i][wn][5] for i in range(5)])
|             #sq=tab.join([lines[i][wn][6] for i in range(5)])
|             #ss=tab.join([lines[i][wn][7] for i in range(5)])

Maybe use an extra function?

def choose_a_better_name():
    return tab.join([lines[index][wn][2] for index in range(5)])

Moreover, the repetition of this line looks as if you wanted
to put the right hand sides of the assignments in a list,
instead of assigning to distinct names (`vf` etc.).

By the way, you use the number 5 a lot. I guess this should
be a constant, too.

|             alf.write(tp+'\t'+vf+'\t'+vq+'\t'+al+'\t'+bt+'\t'+vs+'\n')

Suggestion: Use

tab.join([tp, vf, vq, al, bt, vs]) + "\n"

Again, not using distinct variables would have an advantage
here.

|         alf.close()
| print "Done"

Stefan



More information about the Python-list mailing list