processing input from multiple files

John Posner jjposner at optimum.net
Fri Oct 15 11:26:32 EDT 2010


On 10/15/2010 6:59 AM, Christopher Steele wrote:
> Thanks,
>
> The issue with the times is now sorted, however I'm running into a 
> problem towards the end of the script:
>
>  File "sortoutsynop2.py", line 131, in <module>
>     newline = 
> message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+ 
> "002" +c+"-9999"+c+"-9999"+c+str(pressure)+c
> TypeError: cannot concatenate 'str' and 'list' objects
>
>
> I think I can see the issue here, but I'm not entirely sure how to get 
> around it. Several of my variables change either from one file to the 
> next or from each line. Time and pressure would be examples of both of 
> these types.Yet others, such as message_type, are constant. As a 
> result I have a mixture of both lists and strings. Should I then 
> create a list of the constant values?

I suggest maintaining a list for each such variable, in order to keep 
your code simpler. It won't matter that some lists contain the same 
value over and over and over.

(There's a slight possibility it would matter if you're dealing with 
massive amounts of data. But that's the kind of problem that you don't 
need to solve until you encounter it.)

Some more notes below, interspersed with your code ...

> I'm a little confused, I'll send you the script that works for a 
> single file

Yes! That's a much better approach: figure out how to handle one file, 
place the code inside a function that takes the filename as an argument, 
and call the function on each file in turn.

> and I'll see if I can come up with a more logical way around it.
>
> #!/usr/bin/python
>
> import sys
> import os
> import re
>
> #foutname = 'test.txt'
> #filelist = os.system('ls
> fname = "datalist_201081813.txt"

There's a digit missing from the above filename.


> foutname1 = 'prestest.txt'
> foutname2 = 'temptest.txt'
> foutname3 = 'tempdtest.txt'
> foutname4 = 'wspeedtest.txt'
> foutname5 = 'winddtest.txt'
>
> time = fname.split('_')[1].split('.')[0]
> year = time[:4]
> month = time[4:6]
> day = time[6:8]
> hour = time[-2:]
>
> newtime = year+month+day+'_'+hour+'0000'
> c = ','
> file1 = open(fname,"r")
>
>
> file2 = open("uk_stations.txt","r")
> stations = file2.readlines()
> ids=[]
> names=[]
> lats=[]
> lons=[]
> for item in stations:
>     item_list = item.strip().split(',')
>     ids.append(item_list[0])
>     names.append(item_list[1])
>     lats.append(item_list[2])
>     lons.append(item_list[3])
>
>
> st = file1.readlines()
> print st
> data=[item[:item.find(' 333 ')] for item in st]

I still think there's a problem in the above statement. In the data file 
you provided in a previous message, some lines lack the ' 333 ' 
substring. In such lines, the find() method will return -1, which (I 
think) is not what you want. Ex:

 >>> item = '11111 22222 333 44444'
 >>> item[:item.find(' 333 ')]
   '11111 22222'

 >>> item = '11111 22222 44444'
 >>> item[:item.find(' 333 ')]
   '11111 22222 4444'

Note that the last digit, "4", gets dropped. I *think* you want 
something like this:

   data = []
   posn = item.find(' 333 ')
   if posn != -1:
       data.append(item[:posn])
   else:
       data.append(...some other value...)


> #data=st[split:]
> print data
>
> pres_out = ''
> temp_out = ''
> dtemp_out = ''
> dir_out = ''
> speed_out = ''
>
> for line in data:
>     elements=line.split(' ')

Do you really want to specify a SPACE character argument to split()?

 >>> 'aaa bbb    ccc'.split(' ')
   ['aaa', 'bbb', '', '', '', 'ccc']

 >>> 'aaa bbb    ccc'.split()
   ['aaa', 'bbb', 'ccc']


>     station_id = elements[0]
>     try:
>         index = ids.index(station_id)
>         lat = lats[index]
>         lon = lons[index]
>         message_type = 'blah'
>     except:

It's bad form to use a "bare except", which defines a code block to be 
executed if *anything* does wrong. You should specify what you're 
expecting to go wrong:

   except IndexError:

>         print 'Station ID',station_id,'not in list!'
>         lat = lon = 'NaN'
>         message_type = 'Bad_station_id'
>
>     try:
>         temp = [item for item in elements if item.startswith('1')][0]
>         temperature = float(temp[2:])/10
>         sign = temp[1]
>         if sign == 1:
>             temperature=-temperature
>     except:
>         temperature='NaN'

What are expecting to go wrong (i.e. what exception might occur) in the 
above try/except code?

>
>     try:
>         dtemp = [item for item in elements if item.startswith('2')][0]
>         dtemperature = float(dtemp[2:])/10
>         sign = dtemp[1]
>         if sign == 1:
>             dtemperature=-dtemperature
>     except:
>         detemperature='NaN'
>     try:
>         press = [item for item in elements[2:] if item.startswith('4')][0]
>         if press[1]=='9':
>             pressure = float(press[1:])/10
>         else:
>             pressure = float(press[1:])/10+1000
>     except:
>         pressure = 'NaN'
>
>     try:
>         wind = elements[elements.index(temp)-1]
>         direction = float(wind[1:3])*10
>         speed = float(wind[3:])*0.514444444
>     except:
>         direction=speed='NaN'
>
>
>
>     newline = 
> message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+'-9999'+c+'002'+c+'-9999'+c+'-9999'+c+str(pressure)+c

Try this:

   newline = c.join([message_type, str(station_id), newtime,
                    lat, lon, '-9999', '002',
                    '-9999', '-9999', str(pressure)]) + c

You can split a square-bracketed list onto multiple lines.

-John




More information about the Python-list mailing list