Sort Big File Help

Arnaud Delobelle arnodel at googlemail.com
Wed Mar 3 15:58:39 EST 2010


MRAB <python at mrabarnett.plus.com> writes:

> mk wrote:
>> John Filben wrote:
>>> I am new to Python but have used many other (mostly dead) languages
>>> in the past.  I want to be able to process *.txt and *.csv files.
>>> I can now read that and then change them as needed – mostly just
>>> take a column and do some if-then to create a new variable.  My
>>> problem is sorting these files:
>>>
>>> 1.)    How do I sort file1.txt by position and write out
>>> file1_sorted.txt; for example, if all the records are 100 bytes
>>> long and there is a three digit id in the position 0-2; here would
>>> be some sample data:
>>>
>>> a.       001JohnFilben……
>>>
>>> b.      002Joe  Smith…..
>>
>> Use a dictionary:
>>
>> linedict = {}
>> for line in f:
>>     key = line[:3]
>>     linedict[key] = line[3:] # or alternatively 'line' if you want
>> to include key in the line anyway
>>
>> sortedlines = []
>> for key in linedict.keys().sort():
>>     sortedlines.append(linedict[key])
>>
>> (untested)
>>
>> This is the simplest, and probably inefficient approach. But it
>> should work.
>>
> [snip]
> Simpler would be:
>
>     lines = f.readlines()
>     lines.sort(key=lambda line: line[ : 3])
>
> or even:
>
>     lines = sorted(f.readlines(), key=lambda line: line[ : 3]))

Or even:

    lines = sorted(f)

-- 
Arnaud



More information about the Python-list mailing list