Python's simplicity philosophy

Erik Max Francis max at alcyone.com
Thu Nov 20 20:02:17 EST 2003


Curt wrote:

> Well, changing the order of the lines in my sample to ensure the
> contiguity of identical entries _is_ sorting.

The tweak I made to your sample file wasn't sorted.  It just had two
identical adjacent lines.  The modified sample again was:

max at oxygen:~/tmp% cat > uniq.txt
flirty
curty
curty
flirty
^D
max at oxygen:~/tmp% uniq uniq.txt
flirty
curty
flirty

You don't really think the sequence [flirty, curty, curty, flirty] is
sorted, do you?

> I don't know what else
> one could call that procedure, but "non-sorted" appears to me to be
> a rather provocative description of the modified sample which you were
> constrained to alter in order that it meet a criterion whose existence
> you deny.

man uniq on GNU:

DESCRIPTION
       Discard  all  but  one  of successive identical lines from
       INPUT (or standard input), writing to OUTPUT (or  standard
       output).

This says nothing about sorting.

man uniq on Solaris 8:

DESCRIPTION
     The uniq utility will read an input file comparing  adjacent
     lines,  and write one copy of each input line on the output.
     The second and succeeding copies of repeated adjacent  input
     lines will not be written.

     Repeated lines in the input will not be detected if they are
     not adjacent.

Neither of these detailed descriptions makes any reference to sorting
whatsoever; uniq acts completely locally and doesn't care whether its
input is sorted or not.

As a more extended example, consider processing by uniq with a
hypothetical log file:

max at oxygen:~/tmp% cat > uniq2.txt
startup
connect from A
message from A
message from A
message from A
message from A
disconnect from B
mark
mark
connect from B
message from B
disconnect from B
shutdown
^D
max at oxygen:~/tmp% uniq uniq2.txt
startup
connect from A
message from A
disconnect from B
mark
connect from B
message from B
disconnect from B
shutdown
max at oxygen:~/tmp% uniq -c uniq2.txt # to see the number of duplicates
      1 startup
      1 connect from A
      4 message from A
      1 disconnect from B
      2 mark
      1 connect from B
      1 message from B
      1 disconnect from B
      1 shutdown

I hope you'd agree that this input is obviously not sorted in any way. 
Yet uniq works precisely as described.

Yes, obviously sending uniq sorted input is a common way it is invoked. 
But it is by no means required.

-- 
   Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
 __ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/  \ 
\__/ We are victims of our circumstance.
    -- Sade Adu




More information about the Python-list mailing list