python and very large data sets???

Rad zaka07 at hotmail.com
Wed Apr 24 12:41:22 EDT 2002


I am preparing myself to work on extracting data from 4 text files
(fixed width format) which combined size is about 80GB.  Considering
deadlines, costs, and my limited programming knowledge I thought using
Python/Windows for the job would be the best option for me.  However,
I am worried about the speed in which Python (me and my hardware) will
be able to deal with these massive data sets but I am hoping that this
is still a quicker route then learning C.
I still haven't received above mentioned files so I can't test the
time needed to (for example) read a 15GB "file1", filter by few
variables, and write a resulting subset as a "sub_file1".  Things
would afterwards get more complicated cause I will have to pullout
ID's from "sub_file1", remove duplicate ID's create
"no_dup_sub_file1", match those to ID's in remaining 3 main files and
pullout data linked with those ID's.

I have a few weeks to prepare myself before data arrives and my
question is: am I going the right way about the project, is Python
(with humanly written code) capable of doing this kind of staff
relatively quickly?

Any help and suggestions would be greatly appreciated.

Thanks

P.S As you probably guessed I'm new to python



More information about the Python-list mailing list