sorting 1172026 entries

Gary Herron gary.herron at islandtraining.com
Sun May 6 12:37:53 EDT 2012


On 05/06/2012 09:29 AM, J. Mwebaze wrote:
> sorry see, corrected code
>
>
> for filename in txtfiles:
>    temp=[]
>    f=open(filename)
>    for line in f.readlines():
>      line = line.strip()
>      line=line.split()
>      temp.append((parser.parse(line[0]), float(line[1])))
>    temp=sorted(temp)
>    with open(filename.strip('.txt')+ '.sorted', 'wb') as p:
>         for i, j in temp:
>            p.write('%s %s\n' %(str(i),j))

Don't do
     temp = sorted(temp)
That will create a *new* copy of the list to sort, and the assignment 
will free up the original list for deletion and garbage collection.

Instead do the in-place sort:
      temp.sort()
Same result, less thrashing.

This will make your program slightly more efficient, HOWEVER, it is not 
the solution of your week-long sort problem.


Gary Herron



>
>
> On Sun, May 6, 2012 at 6:26 PM, J. Mwebaze <jmwebaze at gmail.com 
> <mailto:jmwebaze at gmail.com>> wrote:
>
>     I have attached one of the files, try to sort and let me know the
>     results.  Kindly sort by date. ooops - am told the file exceed 25M.
>
>     below is the code
>
>     import glob
>     txtfiles =glob.glob('*.txt')
>     import dateutil.parser as parser
>
>
>     for filename in txtfiles:
>        temp=[]
>        f=open(filename)
>        for line in f.readlines():
>          line = line.strip()
>          line=line.split()
>          temp.append((parser.parse(line[0]), float(line[1])))
>        temp=sorted(temp)
>        with open(filename.strip('.txt')+ '.sorted', 'wb') as p:
>             for i, j in temp:
>                p.write('%s %s\n' %(str(i),j))
>
>
>     On Sun, May 6, 2012 at 6:21 PM, Devin Jeanpierre
>     <jeanpierreda at gmail.com <mailto:jeanpierreda at gmail.com>> wrote:
>
>         On Sun, May 6, 2012 at 12:11 PM, J. Mwebaze
>         <jmwebaze at gmail.com <mailto:jmwebaze at gmail.com>> wrote:
>         > [ (datatime, int) ] * 1172026
>
>         I can't duplicate slowness. It finishes fairly quickly here.
>         Maybe you
>         could try posting specific code? It might be something else
>         that is
>         making your program take forever.
>
>         >>> x = [(datetime.datetime.now() +
>         datetime.timedelta(random.getrandbits(10)),
>         random.getrandbits(32)) for _ in xrange(1172026)]
>         >>> random.shuffle(x)
>         >>> x.sort()
>         >>>
>
>         -- Devin
>
>
>
>
>     -- 
>     *Mob UG: +256 (0) 70 1735800 <tel:%2B256%20%280%29%2070%201735800>
>     | NL +31 (0) 6 852 841 38
>     <tel:%2B31%20%280%29%206%20852%20841%2038> | Gtalk: jmwebaze | 
>     skype: mwebazej | URL: www.astro.rug.nl/~jmwebaze
>     <http://www.astro.rug.nl/%7Ejmwebaze>
>
>     /* Life runs on code */*
>
>
>
>
> -- 
> *Mob UG: +256 (0) 70 1735800 | NL +31 (0) 6 852 841 38 | Gtalk: 
> jmwebaze |  skype: mwebazej | URL: www.astro.rug.nl/~jmwebaze 
> <http://www.astro.rug.nl/%7Ejmwebaze>
>
> /* Life runs on code */*
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120506/eb518f8e/attachment-0001.html>


More information about the Python-list mailing list