Python's simplicity philosophy

Erik Max Francis max at alcyone.com
Thu Nov 20 17:17:19 EST 2003


Curt wrote:

> curty at einstein:~$ less uniq.txt
> flirty
> curty
> flirty
> curty
> 
> curty at einstein:~$ uniq uniq.txt
> flirty
> curty
> flirty
> curty
> 
> curty at einstein:~$ sort uniq.txt | uniq
> curty
> flirty
> 
> Maybe my uniq is unique.

No, that's expected behavior, and consistent with what I said.  uniq
doesn't really care whether its input is sorted, it just takes
consecutive sequences of identical lines (identically by the criteria
you specify on the command line) and collapses them into at most one.

In your sample, there were no consecutive lines that were identical, so
uniq did nothing.  Change the order of them, and despite still being
non-sorted, you'll see that uniq is working:

max at oxygen:~/tmp% cat > uniq.txt
flirty
curty
curty
flirty
^D
max at oxygen:~/tmp% uniq uniq.txt
flirty
curty
flirty

The duplicate consecutive "curty" lines got collapsed into one.

> NAME
>        uniq - remove duplicate lines from a sorted file
>                                             ******

That's true that's in the one-line description of uniq on some systems,
such as GNU, since that's the most common usage.  But if you look at the
description of what it actually does, you'll see its behavior doesn't
require sorted input:

DESCRIPTION
       Discard  all  but  one  of successive identical lines from
       INPUT (or standard input), writing to OUTPUT (or  standard
       output).

And on some systems, the summary doesn't mention sorting at all; Solaris
8, for instance:

NAME
     uniq - report or filter out repeated lines in a file

and sort is only mentioned in the "SEE ALSO" section, nowhere in the
main descpription.

For an example where uniq would be useful despite the input deliberately
not being sorted, consider processing a log file with a lot of duplicate
entries, and you only want to see the first of each series of
consecutive duplicates.  (This is actually not unheard of; syslog for
instance will do this automatically.)

-- 
   Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
 __ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/  \ 
\__/ Extremes meet.
    -- John Hall Wheelock




More information about the Python-list mailing list