file locking...

Sun Mar 1 13:11:41 EST 2009

Hi Bruce,

Excuse me if I'm a little blunt below. I'm ill grumpy...

bruce wrote:
> hi nigel...
> 
> using any kind of file locking process requires that i essentially have a
> gatekeeper, allowing a single process to enter, access the files at a
> time...

I don't beleive this is a necessary condition. That would only be the 
case if you allowed yourself a single lock.

> i can easily setup a file read/write lock process where a client app
> gets/locks a file, and then copies/moves the required files from the initial
> dir to a tmp dir. after the move/copy, the lock is released, and the client
> can go ahead and do whatever with the files in the tmp dir.. thie process
> allows multiple clients to operate in a psuedo parallel manner...
> 
> i'm trying to figure out if there's a much better/faster approach that might
> be available.. which is where the academic/research issue was raised..

I'm really not sure why you want to move the files around. Here are two 
different approaches from the one I initially gave you that deals 
perfectly well with a directory where files are constantly being added.

In both approaches we are going to try and avoid using OS-specific 
locking mechanisms, advisory locking, flock etc. So it should work 
everywhere as long as you also have write access to the filesystem 
you're on.

Approach 1 - Constant Number of Processes

This requires no central manager but for every file lock requires a few 
OS calls.

Start up N processes with the same working directory WORK_DIR.

Each process then follows this algorithm:

- sleep for some small random period.

- scan the WORK_DIR for a FILE that does not have a corresponding LOCK_FILE

- open LOCK_FILE in append mode and write our PID into it.

- close LOCK_FILE

- open LOCK_FILE

- read first line from LOCK_FILE and compare to our PID

- if the PID we just read from the LOCK_FILE matches ours then we may 
process the corresponding FILE otherwise another process beat us to it.

- repeat

After processing a file completely you can remove it and the lockfile at 
the same time.

As long as filenames follow some pattern then you can simply say that 
the LOCK_FILE for FILE is called FILE.lock

e.g.

WORK_DIR  : /home/wiggly/var/work
FILE      : /home/wiggly/var/work/data_2354272.dat
LOCK_FILE : /home/wiggly/var/work/data_2354272.dat.lock

Approach 2 - Managed Processes

Here we have a single main process that spawns children. The children 
listen for filenames on a pipe that the parent has open to them.

The parent constantly scans the WORK_DIR for new files to process and as 
it finds one it sends that filename to a child process.

You can either be clever about the children and ensure they tell the 
parent when they're free or just pass them work in a round-robin fashion.

I hope the two above descriptions make sense, let me know if they don't.

    n

> the issue that i'm looking at is analogous to a FIFO, where i have lots of
> files being shoved in a dir from different processes.. on the other end, i
> want to allow mutiple client processes to access unique groups of these
> files as fast as possible.. access being fetch/gather/process/delete the
> files. each file is only handled by a single client process.
> 
> thanks..
> 
> 
> 
> -----Original Message-----
> From: python-list-bounces+bedouglas=earthlink.net at python.org
> [mailto:python-list-bounces+bedouglas=earthlink.net at python.org]On Behalf
> Of Nigel Rantor
> Sent: Sunday, March 01, 2009 2:00 AM
> To: koranthala
> Cc: python-list at python.org
> Subject: Re: file locking...
> 
> 
> koranthala wrote:
>> On Mar 1, 2:28 pm, Nigel Rantor <wig... at wiggly.org> wrote:
>>> bruce wrote:
>>>> Hi.
>>>> Got a bit of a question/issue that I'm trying to resolve. I'm asking
>>>> this of a few groups so bear with me.
>>>> I'm considering a situation where I have multiple processes running,
>>>> and each process is going to access a number of files in a dir. Each
>>>> process accesses a unique group of files, and then writes the group
>>>> of files to another dir. I can easily handle this by using a form of
>>>> locking, where I have the processes lock/read a file and only access
>>>> the group of files in the dir based on the  open/free status of the
>>>> lockfile.
>>>> However, the issue with the approach is that it's somewhat
>>>> synchronous. I'm looking for something that might be more
>>>> asynchronous/parallel, in that I'd like to have multiple processes
>>>> each access a unique group of files from the given dir as fast as
>>>> possible.
>>> I don't see how this is synchronous if you have a lock per file. Perhaps
>>> you've missed something out of your description of your problem.
>>>
>>>> So.. Any thoughts/pointers/comments would be greatly appreciated. Any
>>>>  pointers to academic research, etc.. would be useful.
>>> I'm not sure you need academic papers here.
>>>
>>> One trivial solution to this problem is to have a single process
>>> determine the complete set of files that require processing then fork
>>> off children, each with a different set of files to process.
>>>
>>> The parent then just waits for them to finish and does any
>>> post-processing required.
>>>
>>> A more concrete problem statement may of course change the solution...
>>>
>>>    n
>> Using twisted might also be helpful.
>> Then you can avoid the problems associated with threading too.
> 
> No one mentioned threads.
> 
> I can't see how Twisted in this instance isn't like using a sledgehammer
> to crack a nut.
> 
>    n
> --
> http://mail.python.org/mailman/listinfo/python-list
>