how to organize a module that requires a data file

Larry Bates larry.bates at websafe.com
Thu Nov 17 15:17:49 EST 2005


Personally I would do this as a class and pass a path to where
the file is stored as an argument to instantiate it (maybe try
to help user if they don't pass it).  Something like:

class morph:
    def __init__(self, pathtodictionary=None):
        if pathtodictionary is None:
            #
            # Insert code here to see if it is in the current
            # directory and/or look in other directories.
            #

        try:  self.fp=open(pathtodictionary, 'r')
	except:
            print "unable to locate dictionary at: %s" % pathtodictionary

	else:
            #
            # Insert code here to load data from .txt file
            #

        fp.close()
        return

    def get_stem(self, arg1, arg2):
        #
        # Code for get_stem method
        #

The other way I've done this is to have a .INI file that always lives
in the same directory as the class with an entry in it that points me
to where the .txt file lives.

Hope this helps.

-Larry Bates

Steven Bethard wrote:
> Ok, so I have a module that is basically a Python wrapper around a big
> lookup table stored in a text file[1].  The module needs to provide a
> few functions::
> 
>     get_stem(word, pos, default=None)
>     stem_exists(word, pos)
>     ...
> 
> Because there should only ever be one lookup table, I feel like these
> functions ought to be module globals.  That way, you could just do
> something like::
> 
>     import morph
>     assist = morph.get_stem('assistance', 'N')
>     ...
> 
> My problem is with the text file.  Where should I keep it?  If I want to
> keep the module simple, I need to be able to identify the location of
> the file at module import time.  That way, I can read all the data into
> the appropriate Python structure, and all my module-level functions will
> work immediatly after import.
> 
> I can only think of a few obvious places where I could find the text
> file at import time -- in the same directory as the module (e.g.
> lib/site-packages), in the user's home directory, or in a directory
> indicated by an environment variable.  The first seems weird because the
> text file is large (about 10MB) and I don't really see any other
> packages putting data files into lib/site-packages.  The second seems
> weird because it's not a per-user configuration - it's a data file
> shared by all users.  And the the third seems weird because my
> experience with a configuration depending heavily on environment
> variables is that this is difficult to maintain.
> 
> If I don't mind complicating the module functions a bit (e.g. by
> starting each function with "if _lookup_table is not None"), I could
> allow users to specify a location for the file after the module is
> imported, e.g.::
> 
>     import morph
>     morph.setfile(r'C:\resources\morph_english.flat')
>     ...
> 
> Then all the module-level functions would have to raise Exceptions until
> setfile() was called.  I don't like that the user would have to
> configure the module each time they wanted to use it, but perhaps that's
> unaviodable.
> 
> Any suggestions?  Is there an obvious place to put the text file that
> I'm missing?
> 
> Thanks in advance,
> 
> STeVe
> 
> [1] In case you're curious, the file is a list of words and their
> morphological stems provided by the University of Pennsylvania.



More information about the Python-list mailing list