[Tutor] More efficient than glob?

Yigal Duppen yduppen@xs4all.nl
Thu Nov 7 10:40:32 2002


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday 07 November 2002 16:20, Doug.Shawhan@gecits.ge.com wrote:
> I have a script that parses the contents of logfiles in a directory, pu=
ts
> the content in a database, then deletes the original logfile. The files=
 are
> produced at a rate of about one per minute.
>
> Since there is well over a year's worth of logfiles (> 31536000 files) =
the
> following method:
> >>> import glob
> >>> files =3D glob.glob('\\tmp\\*')
> ...would seem to use an undue amount of memory.
> Is there a more efficient way of doing this?=20

Usually I'd say generators, but in this particular case that won't work; =
all=20
functions that return a list of files do _not_ use generators.

So I can't give you a general solution, but if you logfiles have sensible=
=20
names, you could use that to partition your logfiles.

For example, let's assume that each logfile has the form
mm-dd-hh-MM
(two digits for the month, two for the day, etc...)
We could then process all files by first looping through the files of mon=
th=20
01, then those of month 02...
By using an iterator, you would get something like this (untested):

def log_glob():
=09for i in range(11):
=09=09month =3D "%02d" % (i+1, )
=09=09files =3D glob.glob('\\tmp\\%s-*' % (month, ))
=09=09for f in files:
=09=09=09yield f

for f in log_glob():
=09do_stuff()

This would already use 12 times less memory.
You could also iterate over both month and day, or ...

YDD
- --=20
http://www.xs4all.nl/~yduppen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE9yojnLsKMuCf5EdwRApRzAJ9HYBazV7/Ks7Q908SsDo34KabJ6gCg8UjG
gMigxRLQKFiJWG0gRzXQjIs=3D
=3DAYU7
-----END PGP SIGNATURE-----