how to organize a module that requires a data file

Steven Bethard steven.bethard at gmail.com
Thu Nov 17 14:18:51 EST 2005


Ok, so I have a module that is basically a Python wrapper around a big 
lookup table stored in a text file[1].  The module needs to provide a 
few functions::

     get_stem(word, pos, default=None)
     stem_exists(word, pos)
     ...

Because there should only ever be one lookup table, I feel like these 
functions ought to be module globals.  That way, you could just do 
something like::

     import morph
     assist = morph.get_stem('assistance', 'N')
     ...

My problem is with the text file.  Where should I keep it?  If I want to 
keep the module simple, I need to be able to identify the location of 
the file at module import time.  That way, I can read all the data into 
the appropriate Python structure, and all my module-level functions will 
work immediatly after import.

I can only think of a few obvious places where I could find the text 
file at import time -- in the same directory as the module (e.g. 
lib/site-packages), in the user's home directory, or in a directory 
indicated by an environment variable.  The first seems weird because the 
text file is large (about 10MB) and I don't really see any other 
packages putting data files into lib/site-packages.  The second seems 
weird because it's not a per-user configuration - it's a data file 
shared by all users.  And the the third seems weird because my 
experience with a configuration depending heavily on environment 
variables is that this is difficult to maintain.

If I don't mind complicating the module functions a bit (e.g. by 
starting each function with "if _lookup_table is not None"), I could 
allow users to specify a location for the file after the module is 
imported, e.g.::

     import morph
     morph.setfile(r'C:\resources\morph_english.flat')
     ...

Then all the module-level functions would have to raise Exceptions until 
setfile() was called.  I don't like that the user would have to 
configure the module each time they wanted to use it, but perhaps that's 
unaviodable.

Any suggestions?  Is there an obvious place to put the text file that 
I'm missing?

Thanks in advance,

STeVe

[1] In case you're curious, the file is a list of words and their 
morphological stems provided by the University of Pennsylvania.



More information about the Python-list mailing list