Lists, tuples and memory.

Thu Jul 15 16:15:16 EDT 2004

Any reason not to put the words into a dictionary?

Then your code becomes:

lstdict=dict([(x.lower().strip(), None) for x in
              file("D:\\CommonDictionary.txt")])

if lstdict.has_key(word):
  do something

(not tested, but should be close)

I'll bet access is faster (even if loading is slightly slower).
You also benefit from the fact that the dictionary file
doesn't need to be kept sorted.

HTH,
Larry Bates
Syscon, Inc.

"Elbert Lev" <elbertlev at hotmail.com> wrote in message
news:9418be08.0407151109.15894fe7 at posting.google.com...
> Hi, all!
>
> Here is the problem:
> I have a file, which contains a common dictionary - one word per line
> (appr. 700KB and 70000 words). I have to read it in memory for future
> "spell checking" of the words comming from the customer. The file is
> presorted. So here it goes:
>
> lstdict = map(lambda x: x.lower().strip(),
> file("D:\\CommonDictionary.txt"))
>
> Works like a charm. It takes on my machine 0.7 seconds to do the trick
> and python.exe (from task manager data) was using before this line is
> executed
> 2636K, and after 5520K. The difference is 2884K. Not that bad, taking
> into account that in C I'd read the file in memory (700K) scan for
> CRs, count them, replace with '\0' and allocate the index vector of
> the word beginnigs of the size I found while counting CRs. In this
> particular case index vector would be almost 300K. So far so good!
>
> Then I realized, that lstdict as a list is an overkill. Tuple is
> enough in my case. So I modified the code:
>
> t = tuple(file("D:\\CommonDictionary.txt"))
> lstdict = map(lambda x: x.lower().strip(), t)
>
> This code works a little bit faster: 0.5 sec, but takes 5550K memory.
> And maybe this is understandable: after all the first line creates a
> list and a tuple and the second another tuple (all of the same size).
>
> But then I decieded to compare this pieces:
>
> t = tuple(file("D:\\CommonDictionary.txt"))        # 1
> lstdict = map(lambda x: x.lower().strip(), t)      # 2
>
> lstdict = map(lambda x: x.lower().strip(),
> file("D:\\CommonDictionary.txt")) # 3
>
> As expected, after line 2 memory was 5550K, but after line 3 it jumped
> to 7996K!!!
>
> The question:
>
> If refference counting is used, why the second assignment to lstdict
> (line 3) did not free the memory alocated by the first one (line 2)
> and reuse it?
>
> So one more experiment:
>
> t = tuple(file("D:\\CommonDictionary.txt"))        # 1
> lstdict = map(lambda x: x.lower().strip(), t)      # 2
> del lstdict                                        # 3
> lstdict = map(lambda x: x.lower().strip(),
> file("D:\\CommonDictionary.txt")) # 4
>
> In this case executing line 4 did not add memory!
>
> By the way, the search speed is very-very good when done by function:
>
> def inDict(lstdict, word):
>     try:
>         return lstdict[bisect(lstdict, word) - 1] == word
>     except:
>         return False
>
> 500 words are tested in less then 0.03 seconds.