Custom alphabetical sort
Roy Smith
roy at panix.com
Mon Dec 24 11:18:37 EST 2012
In article <40d108ec-b019-4829-a969-c8ef513866f1 at googlegroups.com>,
Pander Musubi <pander.musubi at gmail.com> wrote:
> Hi all,
>
> I would like to sort according to this order:
>
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
> '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
> '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
> 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> How can I do this? The default sorted() does not give the desired result.
I'm assuming that doesn't correspond to some standard locale's collating
order, so we really do need to roll our own encoding (and that you have
a good reason for wanting to do this). I'm also assuming that what I'm
seeing as question marks are really accented characters in some encoding
that my news reader just isn't dealing with (it seems to think your post
was in ISO-2022-CN (Simplified Chinese).
I'm further assuming that you're starting with a list of unicode
strings, the contents of which are limited to the above alphabet. I'm
even further assuming that the volume of data you need to sort is small
enough that efficiency is not a huge concern.
Given all that, I would start by writing some code which turned your
alphabet into a pair of dicts. One maps from the code point to a
collating sequence number (i.e. ordinals), the other maps back.
Something like (for python 2.7):
alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
[...]
'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
map1 = {c: n for n, c in enumerate(alphabet)}
map2 = {n: c for n, c in enumerate(alphabet)}
Next, I would write some functions which encode your strings as lists of
ordinals (and back again)
def encode(s):
"encode('foo') ==> [34, 19, 19]" # made-up ordinals
return [map1[c] for c in s]
def decode(l):
"decode([34, 19, 19]) ==> 'foo'"
return ''.join(map2[i] for i in l)
Use these to convert your strings to lists of ints which will sort as
per your specified collating order, and then back again:
encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]
That's just a rough sketch, and completely untested, but it should get
you headed in the right direction. Or at least one plausible direction.
Old-time perl hackers will recognize this as the Schwartzian Transform.
More information about the Python-list
mailing list