a pickle's pickle

temposs at gmail.com temposs at gmail.com
Tue Aug 2 12:01:58 EDT 2005


I'm trying to pickle a class, and while I get no errors or anything,
almost none of the class instance gets pickled, and I don't know
why...Here's the pickled output:

(i__main__
TrainingMatrix
p0
(dp1
S'matrixWords'
p2
I4714
sS'numWords'
p3
I4714
sS'totalWordsProcessed'
p4
I46735
sS'numContexts'
p5
I7664
sS'estimator'
p6
(dp7
sb.

--End of output

The class TrainingMatrix has no embedded classes and none of its
methods have embedded methods. An instance of this class running in my
program for about 10 minutes can build up on the order of 100MB in
resident memory, but the output seems to be the same regardless of the
data set size. The output seems to not even capture all of the member
variables in the class. Here is the class code, abridged:

class TrainingMatrix:
    matrix = []
    estimator = {}
    wordInfo = {}
    contextInfo = {}
    totalWordsProcessed = 0
    numWords = 0
    numContexts = 0
    matrixWords = 0

    def AddWordInfo(self,newWordInfo,newCapScheme):
        ...
    #End AddWordInfo

    def AddNewWord(self,newCapScheme):
        ...
    #End AddNewCapScheme

    def AddContext(self,newContext):
       ...
    #End AddContext

    def AddInstance(self,word,context):
        ...
    #End AddInstance

    def UpdateMatrix(self,wordIndex,contextIndex,isLowerCase):
       ...
    #End UpdateMatrix

    def PrintMatrix(self):
        ...
    #End PrintMatrix

    def EstimateLowerCase(self):
        ...
    #End GetNumWords

    def GetWordInfo(self,wordToFind):
        ...
    #End GetWordInfo

    def GetContext(self,wordList,direction):
        ...
    #End GetContext

    def GetBestCapScheme(self,wordInfo,precedeContext,followContext):
        ...
    #End GetBestCapScheme

    def IsLowerCase(self,word):
        ...
    #End IsLowerCase

#End TrainingMatrix
###################

And here is the pickling code:

try:
            trainDB = open(trainDBString,"r+")
        except IOError:
            trainDB = open(trainDBString,"w")
            trainDB.close()
            trainDB = open(trainDBString,"r+")
        #End try
...
try:
        trainerString = trainDB.read()
        trainer = loads(trainerString)

    except EOFError:
        trainer = TrainingMatrix()
    #End try
...
trainerString = dumps(trainer)
    trainDB.write(trainerString)

I've also tried a simple shelve implementation but got results similar
to this, which is why I recoded to pickle, since it's lower level. Any
help is appreciated :-)

-Andrew




More information about the Python-list mailing list