Reading a large csv file

Mag Gam magawake at gmail.com
Fri Jun 26 06:47:18 EDT 2009


Thankyou everyone for the responses! I took some of your suggestions
and my loading sped up by 25%



On Wed, Jun 24, 2009 at 3:57 PM, Lie Ryan<lie.1296 at gmail.com> wrote:
> Mag Gam wrote:
>> Sorry for the delayed response. I was trying to figure this problem
>> out. The OS is Linux, BTW
>
> Maybe I'm just being pedantic, but saying your OS is Linux means little
> as there are hundreds of variants (distros) of Linux. (Not to mention
> that Linux is a kernel, not a full blown OS, and people in GNU will
> insist to call Linux-based OS GNU/Linux)
>
>> Here is some code I have:
>> import numpy as np
>> from numpy import *
>
> Why are you importing numpy twice as np and as *?
>
>> import gzip
>> import h5py
>> import re
>> import sys, string, time, getopt
>> import os
>>
>> src=sys.argv[1]
>> fs = gzip.open(src)
>> x=src.split("/")
>> filename=x[len(x)-1]
>>
>> #Get YYYY/MM/DD format
>> YYYY=(filename.rsplit(".",2)[0])[0:4]
>> MM=(filename.rsplit(".",2)[0])[4:6]
>> DD=(filename.rsplit(".",2)[0])[6:8]
>
>>
>> f=h5py.File('/tmp/test_foo/FE.hdf5','w')
>
> this particular line would make it impossible to have more than one
> instance of the program open. May not be your concern...
>
>>
>> grp="/"+YYYY
>> try:
>>   f.create_group(grp)
>> except ValueError:
>>   print "Year group already exists"
>>
>> grp=grp+"/"+MM
>> try:
>>   f.create_group(grp)
>> except ValueError:
>>   print "Month group already exists"
>>
>> grp=grp+"/"+DD
>> try:
>>   group=f.create_group(grp)
>> except ValueError:
>>   print "Day group already exists"
>>
>
>> str_type=h5py.new_vlen(str)
>
>> mydescriptor = {'names': ('gender','age','weight'), 'formats': ('S1',
>> 'f4', 'f4')}
>> print "Filename is: ",src
>> fs = gzip.open(src)
>
>> dset = f.create_dataset ('Foo',data=arr,compression='gzip')
>
> What is `arr`?
>
>> s=0
>>
>> #Takes the longest here
>> for y in fs:
>>      continue
>>   a=y.split(',')
>
>>   s=s+1
>>   dset.resize(s,axis=0)
>
> You increment s by 1 for each iteration, would this copy the dataset? (I
> never worked with h5py, so I don't know how it works)
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list