sorted() erraticly fails to sort string numbers

Andre Engels andreengels at gmail.com
Tue Apr 28 04:06:24 EDT 2009


On Tue, Apr 28, 2009 at 9:47 AM, uuid <M8R-r1c6h51 at mailinator.com> wrote:
> I would be very interested in a logical explanation why this happens on
> python 2.5.1:
>
> In order to sort an etree by the .text value of one child, I adapted this
> snippet from effbot.org:
>
>> import xml.etree.ElementTree as ET
>>
>> tree = ET.parse("data.xml")
>>
>> def getkey(elem):
>>    return elem.findtext("number")
>>
>> container = tree.find("entries")
>>
>> container[:] = sorted(container, key=getkey)
>>
>> tree.write("new-data.xml")
>
> While working with a moderately sized xml file (2500 entries to sort by), I
> found that a few elements were not in order. It seems that numbers with
> seven digits were sorted correctly, while those with six digits were just
> added at the end.
>
> I fixed the problem by converting the numbers to int in the callback:
>
>> def getkey(elem):
>>    return int(elem.findtext("number"))
>
> So to my naive mind, it seems as if there was some error with the sorted()
> function. Would anyone be as kind as to explain why it could be happening?
> Thanks in advance!

When sorting strings, including strings that represent numbers,
sorting is done alphabetically. In this alphabetical order the numbers
are all ordered the normal way, so two numbers with the same number of
digits will be sorted the same way, but any number starting with "1"
will come before any number starting with "2", whether they denote
units, tens, hundreds or millions. Thus:

"1" < "15999" < "16" < "2"




-- 
André Engels, andreengels at gmail.com



More information about the Python-list mailing list