"Newbie" questions - "unique" sorting ?

John Fitzsimons xpm4senn001 at sneakemail.com
Fri Jun 20 20:56:36 EDT 2003


On Fri, 20 Jun 2003 04:23:24 -0700, "Cousin Stanley"
<CousinStanley at hotmail.com> wrote:

Hi Cousin Stanley,

http://fastq.com/~sckitching/Python/word_list.zip

< snip >

>The version you have should run without any command-line arguments
>but requires that the input file be named word_source.txt ...

Okay, before I try the new file I re-tried the original. Because I
have some big files I tried one. Here is what I did.

Started with word_source.txt 3,116KB

Result was the addition of 

word_dups.txt 3,116KB
word_target.txt 882KB

Now what was I supposed to do ? I thought I had to re-name one of the
above as word_source.txt ? Then re-run to get rid of duplicate lines ?

It didn't seem to work so I have obviously done something wrong. Have 
I got the steps correct ? If so then do I re-name the first, or
second, file ?

< snip >

>I've up-loaded a second version that does require arguments
>for path_in and path_out but leaves the temporary dups file
>named as word_dups.txt ...

< snip >

I want to try that option too BUT want to get the first one working
before that. I will possibly prefer the second option so that I can
better remember the format. 

Though, if I can work it out, I might someday split the last part 
into another file like zapdups_sorted.py. At my age I don't find
remembering things overly easy, particularly DOS syntax/
step by steps.

The first step worked amazingly well. I thought it might choke on 
such a large file. Great job ! Thanks again.  :-)

I wonder how the size affects things ? Is Python on windows 
limited by RAM ? Or does it use the HDD if there isn't sufficient
memory ? 

Could I get this to work on eg. a 50MB text file for example ? Has
anyone here used Python on text files this large ? If so then how 
did things go ? On a windows platform.


Regards, John.





More information about the Python-list mailing list