number of different lines in a file

Grant Edwards grante at visi.com
Fri May 19 15:04:02 EDT 2006


On 2006-05-19, Paul McGuire <ptmcg at austin.rr._bogus_.com> wrote:

>> If the log has a lot of repeated lines in its original state then
>> running uniq twice, once up front to reduce what needs to be sorted,
>> might be quicker?
>>
>>  uniq log_file | sort| uniq | wc -l
>>
>> - Pad.
>
> Why would the second running of uniq remove any additional lines that
> weren't removed in the first pass?

Because uniq only removes _adjacent_ identical lines.

> For that matter, if this is a log file, wont every line have a timestamp,
> making duplicates extremely unlikely?

Probably.

-- 
Grant Edwards                   grante             Yow!  If our behavior is
                                  at               strict, we do not need fun!
                               visi.com            



More information about the Python-list mailing list