lock file problems
From: Brett (NoSpam_at_grantb.org)
Date: 12/21/04
- Next message: Janne Blomqvist: "Re: lock file problems"
- Previous message: meek_at_skyway.usask.ca: "RE: Exceeding recl"
- Next in thread: Janne Blomqvist: "Re: lock file problems"
- Reply: Janne Blomqvist: "Re: lock file problems"
- Reply: Dave Thompson: "Re: lock file problems"
- Reply: Dave Thompson: "Re: lock file problems"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 21 Dec 2004 06:56:18 -0800
I am working with a monte carlo simulation that outputs each run
consecutivly to a file and then writes out the statistical summary at
the end. One line is a summary of the summary. This one line is also
written to a summary output file. The file i/o is set up so as to
create a lock file when this second file is opened and written to so as
to prevent file collision. In general, it appears to work.
Further, for a task, many different points in a grid are simulated, so
we use Sun's gridware to have a cluster of machines (about 50) to run
50+ simulations simultaneously. Each grid is about 2000 points. Each
machine is different - some are multiprocessors, some aren't. The
clock speeds of the computers are all different.
The problem is that while I might get 2000 output files, I may only get
XX summary output lines in the summary output file, where XX < 2000.
How many lines are missing seems to be random. As near as I can figure
out, multiple simulations try to write to the same file at the same
exact time, which causes problems. I believe I lose lines when two
executibles try to create the lockfile at the same time. One
executible actually gets it, while the other thinks it gets it, but
doesn't. Sometimes I get an error, sometimes I don't.
If I get an error, it is typically:
open: can't stat file
apparent state: unit 36 named summary.lck
lately writing sequential formatted external IO
which leads me to believe that some form of file collision is taking
place, as the lockfile code has a different error message.
Since the summary line apears in both the output and summary files, and
the output file names are all unique, I have been currently
post-processing the output files and ignoring the summary file.
However, my latest task generated 2 million+ output files and grep'ing
and sed'ing that many files is starting to take time. It would be nice
if I could just get the summary output file to work correctly, but I am
unsure as to how to do that. I have only been doing this for a few
months, so I am learning fortran as I go. The simulation was written
ages ago (>25 years) in f77.
I would think that due to the nature of gridware, the start and stop
times of each simulation is random, although I haven't tested this.
Its hard for me to believe that to executibles would try to set the
lockfile at the same exact time as the hardware on many of the machines
are different. It seems to me that everything is random enough that
this really wouldn't be an issue, but it is.
I am not sure that this is the proper newgroup to ask a question like
this, but I thought that I would start here.
Thanks,
Brett
- Next message: Janne Blomqvist: "Re: lock file problems"
- Previous message: meek_at_skyway.usask.ca: "RE: Exceeding recl"
- Next in thread: Janne Blomqvist: "Re: lock file problems"
- Reply: Janne Blomqvist: "Re: lock file problems"
- Reply: Dave Thompson: "Re: lock file problems"
- Reply: Dave Thompson: "Re: lock file problems"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|