"Newbie" questions - "unique" sorting ?

John Fitzsimons xpm4senn001 at sneakemail.com
Sun Jun 22 21:26:29 EDT 2003


On Sat, 21 Jun 2003 02:13:48 -0700, "Cousin Stanley"
<CousinStanley at hotmail.com> wrote:

< snip >

>So, I re-wrote the program
>using a dictionary based mechanism
>and all dups now seem to be gone
>from the output file ...

>Usage is ...

>    python word_list.py file_in.txt file_out.txt

>Download ...
>http://fastq.com/~sckitching/Python/word_list.zip

< snip >

When I tried that on my unsorted original file all I got was :

C:\Python>python word_list.py file_in.txt file_out.txt

    word_list.py

        Indexing Words ....

Then nothing. I waited a long time but still nothing. 

I then used the first word_list file to sort the original into order.
That worked perfectly, as before, to give a sorted result.

I then inputted this with the latest word_list.py file and got my
word_target.txt .

Unfortunately I still had duplicate strings eg :

Any
any
Any
any
Any
any
Any
any
ANY
any
ANY
any
Any
ANY
any        :-(

I have uploaded my original file to :

http://members.optushome.com.au/jfweb/jfin.txt

As you can see it is text from the other newsgroup. My "original" plan
was to :

(A) Sort all strings into different lines.

(B) Sort the result into "unique" lines.

(C) "Grep" (B) for all lines starting http or ftp or www.

Perhaps my thinking was wrong ? Perhaps (A), (C), (B) would make
things much easier ? Then the unique line sort would have a great 
deal less to process and thereby be much easier/faster ?

The problem is that I don't have the (C) python program/script yet
unless someone here wants to help me with that too. 

Anyone able to do that please ?

I feel that you have already spent sufficient time trying to help with
(A) and (B) so don't expect that you would have time for (C).

Is the above input file and/or comments a help ? Any constructive
feedback from anyone will be appreciated.

Regards, John.





More information about the Python-list mailing list