Re: Memory Allocation in Java
- From: Christopher Smith <csmith@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Sun, 27 Aug 2006 11:57:31 -0500
Eric Sosman <esosman@xxxxxxxxxxxxxxxxxxx> wrote in
news:4q-dnXACh4GnKGzZnZ2dnUVZ_rKdnZ2d@xxxxxxxxxxx:
Christopher Smith wrote:
Patricia Shanahan <pats@xxxxxxx> wrote in
news:Ha7Ig.1997$bM.233@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:
Eric Sosman wrote:
Christopher Smith wrote:
Hi All -
Problem: I have a large array of floating point numbers I need to
look for. These results come from a brut-force grid search, where
the coordinates (x,y) are non-parametric test results.
The problem is that the length of x and y, and thus the size of the
grid is quite large. The length is a minimum of 120,000 both
directions on the grid, for a total of 14,400,000,000 possible
combinations. Which obviously consumes a lot of memory --
somewhere on the order of 500 MB, if 32-bit floating point.
... for suitable values of "somewhere on the order of."
120000 * 120000 * 4 = 57600000000 ~= 55000 MB ~= 54 GB. Are
you sure the dimensions you've given are correct?
If the dimensions are correct, I hope you have a 64-bit
JVM and a pretty substantial machine to work with.
Good point. I didn't check the arithmetic on the memory size.
This raises a whole different set of issues. Maybe brute force is not
the way to go.
How many of the elements of the matrix are non-zero? Maybe this is a
case for sparse matrix techniques?
If dense, it might be better to keep it on disk. That raises a whole
set of issues of whether it is possible to batch and sort updates to
the matrix to reduce the amount of I/O.
Patricia
Thanks for straightnening out my math.
I do have horsepower, but don't want to tie up that many resources.
No, sparse matrix math won't work. Every field has a value.
It's a startlingly large number of values; may I ask where
they all came from? Just curious, really.
Sure. The numbers represent the (mean value/stdev value) of a longtudinal
time series, when tested against two conditions, x & y. In other words, it
looks to find the best result on daily basis historically when a sample is
tested against these two conditions (i.e., "in-sample"). Think of it as
looking back in time, testing against two conditions with perfect
knowledge, looking at the results for those two conditions, x & y, and find
those two conditional tests that give the best results over the time series
ex-post. I actually want to test against six conditions when I'm done, but
am working which are the the two which explain return variance most
significantly.
Amusing factoid: There are about 3.4 times as many fields
as there are distinct `float' values.
What is a "field" mean in this case?
I guess divide and conquer is the right way to go. What I can do is
splice the grid-search into quadrants, process each quadrant, and hen
and record the quadrant results (i.e., I'm searching for the Max
within the grid). From there, it's just a matter of rolling through
the quadrants.
Could you explain the nature of this search a little more?
Simply "searching for the Max" in a big collection of numbers
requires very little memory; there's no need to retain a number
that's known to be non-maximal.
I run through these test conditions over a time series, starting with the
lowest test parameter for x and for y. I then iterate over the time series,
looking at each time-record-entry for a set of conditions, including x&y,
if those conditions are true, and the values are greater than x or y, I
then record the entry. At the conclusion of the time series, I then
iterate the result to test again, with x += x + x_step and y += y + y_step.
After retesting 120k^2 times, I then look across the grid to determine
which x and y test provide the maximum value. This concludes the in-
sample testing. I then test out-of-sample to check projection error, but
that's a whole other discussion! HTH.
I guess what you're saying is that I can discard all previous numbers each
time I find a new maximum? If so, what would be the computational tradeoff
between storing the results and testing for the max result, as opposed to
testing for the max result as I go?
Chris
.
- Follow-Ups:
- Re: Memory Allocation in Java
- From: Eric Sosman
- Re: Memory Allocation in Java
- From: Patricia Shanahan
- Re: Memory Allocation in Java
- References:
- Memory Allocation in Java
- From: Christopher Smith
- Re: Memory Allocation in Java
- From: Eric Sosman
- Re: Memory Allocation in Java
- From: Patricia Shanahan
- Re: Memory Allocation in Java
- From: Christopher Smith
- Re: Memory Allocation in Java
- From: Eric Sosman
- Memory Allocation in Java
- Prev by Date: Re: Problem installing JDK in Win98 SE
- Next by Date: Re: Memory Allocation in Java
- Previous by thread: Re: Memory Allocation in Java
- Next by thread: Re: Memory Allocation in Java
- Index(es):
Relevant Pages
|
|