Storing a big amount of path names

Rob Gaddi rgaddi at highlandtechnology.invalid
Thu Feb 11 21:17:49 EST 2016


Tim Chase wrote:

> On 2016-02-12 00:31, Paulo da Silva wrote:
>> What is the best (shortest memory usage) way to store lots of
>> pathnames in memory where:
>> 
>> 1. Path names are pathname=(dirname,filename)
>> 2. There many different dirnames but much less than pathnames
>> 3. dirnames have in general many chars
>> 
>> The idea is to share the common dirnames.
>
> Well, you can create a dict that has dirname->list(filenames) which
> will reduce the dirname to a single instance.  You could store that
> dict in the class, shared by all of the instances, though that starts
> to pick up a code-smell.
>
> But unless you're talking about an obscenely large number of
> dirnames & filenames, or a severely resource-limited machine, just
> use the default built-ins.  If you start to push the boundaries of
> system resources, then I'd try the "anydbm" module or use the
> "shelve" module to marshal them out to disk.  Finally, you *could*
> create an actual sqlite database on disk if size really does exceed
> reasonable system specs.
>
> -tkc
>

Probably more memory efficient to make a list of lists, and just declare
that element[0] of each list is the dirname.  That way you're not
wasting memory on the unused entryies of the hashtable.

But unless the OP has both a) plus of a million entries and b) let's say
at least 20 filenames to each dirname, it's not worth doing.

Now, if you do really have a million entries, one thing that would help
with memory is setting __slots__ for MyFile rather than letting it
create an instance dictionary for each one.

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.



More information about the Python-list mailing list