[Baypiggies] sorting a table by column

David Berthelot david.berthelot at gmail.com
Thu Jun 23 16:05:44 CEST 2011


Alternatively to bisect module which has log2(n) insertion cost, you
could look into B+trees which have an insertion cost of logB(n):
http://en.wikipedia.org/wiki/B%2B_tree

There's a Python implementation linked on that page. I used it before
and it seemed to have quite some problems while the performance over
bisect was non-existent (for my particular needs). But that being
said, it's worth checking.

On Thu, Jun 23, 2011 at 6:47 AM, David Berthelot
<david.berthelot at gmail.com> wrote:
> Looks like a typical SQL problem.
>
> If I was to solve it in Python, assuming that due to computational
> time motivations the data cannot be resorted completely with:
> sort(key=itemgetter(column))
>
> Then I would keep the table unsorted and I would create an index
> structure per column.
> index = [[] for c in xrange(columns)]
>
> When a row is added to the data table, I would add it to the index
> lists in sorted manner using the bisect module:
> row_id = len(data)
> data.append(row)
> for c in xrange(columns):
>  bisect.insort_right(index[c],(data[row_id][c],row_id))
>
> To lookup the table in sorted order according to column c, you would
> get the table indexes:
> ilist = map(itemgetter(1),index[c])
> for x in ilist:
>  print data[x]
>
> I just typed this on top of my head, so it's more to give the general
> principle than a robust implementation obviously.
>
> Similarly you could implement multi-column indexes, by replacing the
> tuple (data[row_id][x],row_id) with
> (data[row_id][col_1],data[row_id][col_2],...,data[row_id][col_n],row_id)
> assuming you desire a multi-column index on col_1,...,col_n
>
> On Thu, Jun 23, 2011 at 6:03 AM, Tripathi, Bibha
> <bibha.tripathi at jpmchase.com> wrote:
>> a huge table like an excel sheet, saved and accumulating more data more
>> rows, may be more tables
>>
>> user chooses which column to sort on
>>
>>
>>
>> what's the best python data structure to use? and which sorting method to
>> make it look like real time as the user enters her choice of column to sort
>> on?
>>
>>
>>
>> thanks.
>>
>> This communication is for informational purposes only. It is not intended as
>> an offer or solicitation for the purchase or sale of any financial
>> instrument or as an official confirmation of any transaction. All market
>> prices, data and other information are not warranted as to completeness or
>> accuracy and are subject to change without notice. Any comments or
>> statements made herein do not necessarily reflect those of JPMorgan Chase &
>> Co., its subsidiaries and affiliates. This transmission may contain
>> information that is privileged, confidential, legally privileged, and/or
>> exempt from disclosure under applicable law. If you are not the intended
>> recipient, you are hereby notified that any disclosure, copying,
>> distribution, or use of the information contained herein (including any
>> reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any
>> attachments are believed to be free of any virus or other defect that might
>> affect any computer system into which it is received and opened, it is the
>> responsibility of the recipient to ensure that it is virus free and no
>> responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and
>> affiliates, as applicable, for any loss or damage arising in any way from
>> its use. If you received this transmission in error, please immediately
>> contact the sender and destroy the material in its entirety, whether in
>> electronic or hard copy format. Thank you. Please refer to
>> http://www.jpmorgan.com/pages/disclosures for disclosures relating to
>> European legal entities.
>>
>> _______________________________________________
>> Baypiggies mailing list
>> Baypiggies at python.org
>> To change your subscription options or unsubscribe:
>> http://mail.python.org/mailman/listinfo/baypiggies
>>
>


More information about the Baypiggies mailing list