[Python-Dev] Python3 regret about deleting list.sort(cmp=...)

Terry Reedy tjreedy at udel.edu
Sun Mar 13 04:21:12 CET 2011


On 3/12/2011 8:47 PM, Glenn Linderman wrote:
> On 3/12/2011 2:09 PM, Terry Reedy wrote:
>> I believe that if the integer field were padded with leading blanks as
>> needed so that all are the same length, then no key would be needed.
>
> Did you mean that "if the integer field were" converted to string and
> "padded with leading blanks..."?

Guido presented a use case of a list a strings, each of form '%s,%d', 
where the %s part is a 'word'. 'Integer field' refers to the part of 
each string after the comma.

> Otherwise I'm not sure how to pad an integer with leading blanks.

The integers are already in string form. The *existing* key function his 
colleague used converted that part to an int as the second part of a 
tuple. I presume the integer field was separated by split(','), so the 
code was something like

def sikey(s):
   s,i = s.split(',')
   return s,int(i)

longlist.sort(key=sikey)

It does not matter if the splitting method is more complicated, because 
it is already part of the problem spec. I proposed instead

def sirep(s):
   s,i = s.split(',') # or whatever current key func does
   return '%s,%#s' % (s,i)
   # where appropriate value of # is known from application

longlist = map(sirep, longlist)
longlist.sort()

# or assuming that a simple split is correct

longlist = ['%s,%#s' % tuple(s.split(',')) for s in longlist]
longlist.sort()

> Also, what appears to be your revised data structure, strval + ',' +
> '%5.5d' % intval , assumes the strval is fixed length, also.

No it does not, and need not. ',' precedes all letters in ascii order. 
(Ok, I assumed that the 'word' field does not include any of 
!"#$%&'()*+. If that is not true, replace comma with space or even a 
control char such as '\a' which even precedes \t and \n.) Given the 
context of Google, I assumed that 'word' meant word, as in a web 
document, while the int might be a position or doc number (or both). The 
important point is that the separator cause all word-int pairs with the 
same word to string-sort before all word-int pairs with the same word + 
a suffix. My example intentionally tested that.

>Consider the following strval, intval pairs, using your syntax:
>
> ['a,997, 1','a, 1000']
>
> Nothing says the strval wouldn't contain data that look like your
> structure...

The problem as presented. 'a,997' is not a word. In any case, as I said 
before, the method of correctly parsing the strings into two fields is 
already specified. I am only suggesting a change in how to proceed 
thereafter.

-- 
Terry Jan Reedy



More information about the Python-Dev mailing list