Sorting Large File (Code/Performance)

Albert van der Horst albert at spenarnc.xs4all.nl
Sat Feb 2 09:04:44 EST 2008


In article <b1a32804-7fa2-4281-bf3a-bee95af04317 at i7g2000prf.googlegroups.com>,
 <Ira.Kovac at gmail.com> wrote:
>Thanks to all who replied. It's very appreciated.
>
>Yes, I had to doublecheck line counts and the number of lines is ~16
>million (insetead of stated 1.6B).
>
>Also:
>
>>What is a "Unicode text file"? How is it encoded: utf8, utf16, utf16le, utf16be, ??? If you don't know, do this:
>The file is UTF-8
>
>> Do the first two characters always belong to the ASCII subset?
>Yes, first two always belong to ASCII subset
>
>> What are you going to do with it after it's sorted?
>I need to isolate all lines that start with two characters (zz to be
>particular)

Like in?
   grep '^zz' longfile > aapje

You will have a hard time to beat that with python, in every respect.

<SNIP>

>
>Cheers,
>
>Ira

Groetjes Albert

--
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- like all pyramid schemes -- ultimately falters.
albert at spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst



More information about the Python-list mailing list