Automatically caching computationally intensive variable values?

Alec Taylor alec.taylor6 at gmail.com
Sat May 26 06:30:16 EDT 2012


I am working with a few corpora included in nltk-data with NTLK
(http://nltk.org/) to figure out certain algorithms.

So my code would generally be something of the style:

	import re, nltk, random
	from nltk.corpus import reuters

	def find_test_and_train_data():
		return [fileid for fileid in reuters.fileids() if
re.match(r"^training/", fileid)], [fileid for fileid in
reuters.fileids() if re.match(r"^test/", fileid)]

	def generate_random_data(train_and_test_fileids):
		random.seed(348) ; random.shuffle(train_and_test_fileids[0])
		return train_and_test_fileids[0][2000:], train_and_test_fileids[0][:2000]

	def fileid_words(fileid):
		return [word.lower() for line in reuters.words(fileid) for word in
line.split() if re.match('^[A-Za-z]+$', word)]

	if __name__ == '__main__':
		train_fileids, dev_fileids = generate_random_data(find_test_and_train_data())
		train_data=fileid_words(train_fileids)
		dev_data=fileid_words(dev_fileids)

So if I run it into an interactive interpreter I can then perform
tasks on `train_data`, `dev_data` and their corresponding fileids
without repopulating the variables (a very time consuming task, e.g.:
this takes 7 minutes to run each time).

However, I want to be able to write it into a .py file so that I can
save statistically interesting algorithms.

I could do this by double-typing, e.g.: when I get a function working
in the interpreter I then copy+paste it into the .py file, but this is
quite inefficient and I lose out on my IDEs' features.

Are there any IDEs or Python modules which can automatically keep the
Python script running in memory, or store the value of a variable—such
as `test_data`—in a db?

Thanks for all suggestions



More information about the Python-list mailing list