sort order for strings of digits

DJC djc at news.invalid
Wed Oct 31 19:45:20 EDT 2012


On 31/10/12 23:09, Steven D'Aprano wrote:
> On Wed, 31 Oct 2012 15:17:14 +0000, djc wrote:
>
>> The best I can think of is to split the input sequence into two lists,
>> sort each and then join them.
>
> According to your example code, you don't have to split the input because
> you already have two lists, one filled with numbers and one filled with
> strings.

Sorry for the confusion, the pair of strings was just a way of testing 
variations on the input. So a sequence with any combination of strings 
that can be read as numbers and strings of chars that don't look like 
numbers (even if that string includes digits) is the expected input

>
> But I think that what you actually have is a single list of strings, and
> you are supposed to sort the strings such that they come in numeric order
> first, then alphanumerical. E.g.:
>
> ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
> => ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']

Not quite, what I want is to ensure that if the strings look like 
numbers they are placed in numerical order. ie 1 2 3 10 100 not 1 10 100 
2 3. Cases where a string has some leading digits can be treated as 
strings like any other.

> At least that is what I would expect as the useful thing to do when
> sorting.

Well it depends on the use case. In my case the strings are column and 
row labels for a report. I want them to be presented in a convenient to 
read sequence. Which the lexical sorting of the strings that look like 
numbers is not. I want a reasonable do-what-i-mean default sort order 
that can handle whatever strings are used.


>
> The trick is to take each string and split it into a leading number and a
> trailing alphanumeric string. Either part may be "empty". Here's a pure
> Python solution:
>
> from sys import maxsize  # use maxint in Python 2
> def split(s):
>      for i, c in enumerate(s):
>          if not c.isdigit():
>              break
>      else:  # aligned with the FOR, not the IF
>          return (int(s), '')
>      return (int(s[:i] or maxsize), s[i:])
>
> Now sort using this as a key function:
>
> py> L = ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
> py> sorted(L, key=split)
> ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']
>
>
> The above solution is not quite general:
>
> * it doesn't handle negative numbers or numbers with a decimal point;
>
> * it doesn't handle the empty string in any meaningful way;
>
> * in practice, you may or may not want to ignore leading whitespace,
>    or trailing whitespace after the number part;
>
> * there's a subtle bug if a string contains a very large numeric prefix,
>    finding and fixing that is left as an exercise.

That looks more than  general enough for my purposes! I will experiment 
along those lines, thank you.





More information about the Python-list mailing list