lock file problems

From: Brett (NoSpam_at_grantb.org)
Date: 12/21/04


Date: 21 Dec 2004 06:56:18 -0800

I am working with a monte carlo simulation that outputs each run
consecutivly to a file and then writes out the statistical summary at
the end. One line is a summary of the summary. This one line is also
written to a summary output file. The file i/o is set up so as to
create a lock file when this second file is opened and written to so as
to prevent file collision. In general, it appears to work.

Further, for a task, many different points in a grid are simulated, so
we use Sun's gridware to have a cluster of machines (about 50) to run
50+ simulations simultaneously. Each grid is about 2000 points. Each
machine is different - some are multiprocessors, some aren't. The
clock speeds of the computers are all different.

The problem is that while I might get 2000 output files, I may only get
XX summary output lines in the summary output file, where XX < 2000.
How many lines are missing seems to be random. As near as I can figure
out, multiple simulations try to write to the same file at the same
exact time, which causes problems. I believe I lose lines when two
executibles try to create the lockfile at the same time. One
executible actually gets it, while the other thinks it gets it, but
doesn't. Sometimes I get an error, sometimes I don't.

If I get an error, it is typically:

open: can't stat file
apparent state: unit 36 named summary.lck
lately writing sequential formatted external IO

which leads me to believe that some form of file collision is taking
place, as the lockfile code has a different error message.

Since the summary line apears in both the output and summary files, and
the output file names are all unique, I have been currently
post-processing the output files and ignoring the summary file.
However, my latest task generated 2 million+ output files and grep'ing
and sed'ing that many files is starting to take time. It would be nice
if I could just get the summary output file to work correctly, but I am
unsure as to how to do that. I have only been doing this for a few
months, so I am learning fortran as I go. The simulation was written
ages ago (>25 years) in f77.

I would think that due to the nature of gridware, the start and stop
times of each simulation is random, although I haven't tested this.
Its hard for me to believe that to executibles would try to set the
lockfile at the same exact time as the hardware on many of the machines
are different. It seems to me that everything is random enough that
this really wouldn't be an issue, but it is.

I am not sure that this is the proper newgroup to ask a question like
this, but I thought that I would start here.

Thanks,
Brett



Relevant Pages

  • Re: 50 years later, Marvin Minsky still doesnt get it
    ... Human behaviour in non-human machines is a simulation ... You sure seem hung up on this concept of a simulation. ... Duplicating human behavior is not going to be as simple. ... Our machine will have to duplicate the high level function the ...
    (comp.ai.philosophy)
  • Re: Performance of Matlab/Simulink on Nehalem Mac Pro via Bootcamp
    ... 1.Is it faster to get the results of both trials if I run each instance of the simulation on different machines? ... Simulink Accelerator was incorporated in core Simulink. ... If> so, in R2008b, a new ideal switching algorithm was introduced in> SimPowerSystems, which will speed up the execution of power electronics> simulation with a lot of switching elements. ... SimMechanics Link was much improved in R2008b and the SimMechanics> visualization was also enhanced: ...
    (comp.soft-sys.matlab)
  • Re: The Matrix
    ... Not for energy, but as computational power, since the machines had evolved far beyond human technology at the time. ... The matrix, it would turn out, ran on the machines themselves and their higher level functions had been tricked into believing they were actually humans in a world of their own making. ... The machines that realize they inhabit a simulation would just have to think it works. ... The forth episode would ...
    (talk.origins)
  • Re: Verilog state machines, latches, syntax and a bet!
    ... machines in Verilog, targeting synthesis for FPGAs. ... if you have a combinational circuit I guess you could see ... (In reality I guess there might be some simulation issues. ...
    (comp.arch.fpga)