writing results to array

Matimus mccredie at gmail.com
Mon Dec 3 17:39:39 EST 2007


On Dec 3, 12:45 pm, Bevan Jenkins <beva... at gmail.com> wrote:
> Hello,
>
> I have recently discovered the python language and am having a lot of
> fun getting head around the basics of it.
> However, I have run into a stumbling block that I have not been able
> to overcome, so I thought I would ask for help.
> <Overview>
> I am trying to import a text file that has the following format:
> 02/01/2000 @ 00:00:00       0.983896 Q10  T2
> 03/01/2000 @ 00:00:00       0.557377 Q10  T2
> 04/01/2000 @ 00:00:00       0.508871 Q10  T2
> 05/01/2000 @ 00:00:00       0.583196 Q10  T2
> 06/01/2000 @ 00:00:00       0.518281 Q10  T2
> when there is missing data:
> 12/09/2000 @ 00:00:00                Q151 T2
> 13/09/2000 @ 00:00:00                Q151 T2
>
> I have cobbled together some code which imports the data.  The next
> step is to create an array in which each column contains a years worth
> of values.  Thus, if i have 6 years of data (2001-2006 inclusive),
> there will be six columns, with 365 rows (not all years have a full
> data set and may only have say 340 days of data.
> <The question>
> In the code below
> print answer[j,1] is giving me the right answer but i can't write it
> to an array.
> any suggestions welcomed.
>
> This is what I have:
> flow=[]
> flowdate=[]
> yeardate=[]
> uniqueyear=[]
> #flow_order=
> flow_rank=[]
> icount=[]
> p=[]
>
> filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
> linesep ="\n"
>
> # read in whole file
> tempdata = open( filename).read()
> # break into lines
> tempdata = string.split( tempdata, linesep )
> # for each record, get the field values
> for i in range( len( tempdata)):
>         # split into the lines
>         fields = string.split( tempdata[i])
>         if len(fields)>5:
>             flowdate.append(fields[0])
>             list =string.split(fields[0],"/")
>             yeardate.append(list[2])
>             flow.append(float(fields[3]))
>             answer=column_stack((flowdate,flow))
>
> for rows in yeardate:
>        if rows not in uniqueyear:
>           uniqueyear.append(rows)
>
> #print answer[:,0]   #date
> flow_order=empty((0,0),dtype=float)
> #for yr in enumerate(uniqueyear):
> for iyr,yr in enumerate(uniqueyear):
>     for j, val, in enumerate (answer[:,0]):
>         flowyr=string.split(val,"/")
>         if int(flowyr[2])==int(yr):
>             print answer[j,1]
>             #flow_order =

I'm not sure what you mean by `write it to an array'. `answers' is an
array. Perhaps you could show an example that has the bad behavior you
are observing. Or at least an example of what you expect to get.

Also, just a couple of pointers:

this:

> tempdata = open( filename).read()
> # break into lines
> tempdata = string.split( tempdata, linesep )
> # for each record, get the field values
> for i in range( len( tempdata)):
>         # split into the lines
>         fields = string.split( tempdata[i])

is better written (and usually written) in python like this:

for line in open(filename):
    fields = line.split()

Don't use the string module, use the methods of the strings
themselves.
Don't use built-in type names as variable names, as seen on this line:
>             list =string.split(fields[0],"/") # list is a built-in type

You only need to use enumerate if you actually want the index. If you
don't need the index, just iterate over the sequence. eg. use this:

> for yr in uniqueyear:

You don't need to re-create the column-stack each time you get a value
from the file. It is very inefficient.

eg. this:

> for i in range( len( tempdata)):
>         # split into the lines
>         fields = string.split( tempdata[i])
>         if len(fields)>5:
>             flowdate.append(fields[0])
>             list =string.split(fields[0],"/")
>             yeardate.append(list[2])
>             flow.append(float(fields[3]))
>             answer=column_stack((flowdate,flow))

to this:

> for i in range( len( tempdata)):
>         # split into the lines
>         fields = string.split( tempdata[i])
>         if len(fields)>5:
>             flowdate.append(fields[0])
>             list =string.split(fields[0],"/")
>             yeardate.append(list[2])
>             flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))

or, with the other suggested changes:

> for line in open(filename):
>         # split into the lines
>         fields = line.split()
>         if len(fields) > 5:
>             flowdate.append(fields[0])
>             year = fields[0].split("/")[2]
>             yeardate.append(year)
>             flow.append(float(fields[3]))
> answer=column_stack((flowdate,flow))

If I was doing this though, I would use a dictionary (dict) where the
keys are the year and the values are lists of flows for that year.

Something like this:
[code]
filename=r"C:\Documents and Settings\bevanj\Desktop\flow_duration.tsf"
year2flows = {}

fin = open(filename)
for line in fin:
    # split into the lines
    fields = line.split()
    if len(fields)>5:
        date = fields[0]
        year = fields[0].split("/")[-1]
        flow = float(fields[3])
        year2flows.setdefault(year, []).append((date, flow))
fin.close()

# This does what you were doing.
for yr in sorted(year2flows.keys()):
    for date, flow in year2flows[yr]
        print flow
# If you just wanted one year though you could do something like this:
for date, flow in year2flows[2004]:
    print flow

[/code]

The above code is untested, so I make no guarantees. If you are using
python 2.5, you might look into using defaultdict (in the collections
module). It will simplify the code a bit.

from this:
year2flows = {}
# bunch of stuff...
        year2flows.setdefault(year, []).append((date, flow))
to this:
from collections import defaultdict
year2flows = defaultdict(list)
# bunch of stuff...
        year2flows[year].append((date, flow))

Matt



More information about the Python-list mailing list