Custom alphabetical sort

Roy Smith roy at panix.com
Mon Dec 24 11:18:37 EST 2012


In article <40d108ec-b019-4829-a969-c8ef513866f1 at googlegroups.com>,
 Pander Musubi <pander.musubi at gmail.com> wrote:

> Hi all,
>
> I would like to sort according to this order:
>
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
> '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
> '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
> 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> How can I do this? The default sorted() does not give the desired result.

I'm assuming that doesn't correspond to some standard locale's collating 
order, so we really do need to roll our own encoding (and that you have 
a good reason for wanting to do this).  I'm also assuming that what I'm 
seeing as question marks are really accented characters in some encoding 
that my news reader just isn't dealing with (it seems to think your post 
was in ISO-2022-CN (Simplified Chinese).

I'm further assuming that you're starting with a list of unicode 
strings, the contents of which are limited to the above alphabet.  I'm 
even further assuming that the volume of data you need to sort is small 
enough that efficiency is not a huge concern.

Given all that, I would start by writing some code which turned your 
alphabet into a pair of dicts.  One maps from the code point to a 
collating sequence number (i.e. ordinals), the other maps back.  
Something like (for python 2.7):

alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
            '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
            [...]
            'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

map1 = {c: n for n, c in enumerate(alphabet)}
map2 = {n: c for n, c in enumerate(alphabet)}

Next, I would write some functions which encode your strings as lists of 
ordinals (and back again)

def encode(s):
   "encode('foo') ==> [34, 19, 19]"  # made-up ordinals
   return [map1[c] for c in s]

def decode(l):
   "decode([34, 19, 19]) ==> 'foo'"
    return ''.join(map2[i] for i in l)

Use these to convert your strings to lists of ints which will sort as 
per your specified collating order, and then back again:

encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]

That's just a rough sketch, and completely untested, but it should get 
you headed in the right direction.  Or at least one plausible direction.  
Old-time perl hackers will recognize this as the Schwartzian Transform.



More information about the Python-list mailing list