Re: Memory Allocation in Java



Eric Sosman <esosman@xxxxxxxxxxxxxxxxxxx> wrote in
news:4q-dnXACh4GnKGzZnZ2dnUVZ_rKdnZ2d@xxxxxxxxxxx:

Christopher Smith wrote:
Patricia Shanahan <pats@xxxxxxx> wrote in
news:Ha7Ig.1997$bM.233@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:


Eric Sosman wrote:

Christopher Smith wrote:


Hi All -

Problem: I have a large array of floating point numbers I need to
look for. These results come from a brut-force grid search, where
the coordinates (x,y) are non-parametric test results.

The problem is that the length of x and y, and thus the size of the
grid is quite large. The length is a minimum of 120,000 both
directions on the grid, for a total of 14,400,000,000 possible
combinations. Which obviously consumes a lot of memory --
somewhere on the order of 500 MB, if 32-bit floating point.

... for suitable values of "somewhere on the order of."
120000 * 120000 * 4 = 57600000000 ~= 55000 MB ~= 54 GB. Are
you sure the dimensions you've given are correct?

If the dimensions are correct, I hope you have a 64-bit
JVM and a pretty substantial machine to work with.


Good point. I didn't check the arithmetic on the memory size.

This raises a whole different set of issues. Maybe brute force is not
the way to go.

How many of the elements of the matrix are non-zero? Maybe this is a
case for sparse matrix techniques?

If dense, it might be better to keep it on disk. That raises a whole
set of issues of whether it is possible to batch and sort updates to
the matrix to reduce the amount of I/O.

Patricia


Thanks for straightnening out my math.

I do have horsepower, but don't want to tie up that many resources.

No, sparse matrix math won't work. Every field has a value.

It's a startlingly large number of values; may I ask where
they all came from? Just curious, really.


Sure. The numbers represent the (mean value/stdev value) of a longtudinal
time series, when tested against two conditions, x & y. In other words, it
looks to find the best result on daily basis historically when a sample is
tested against these two conditions (i.e., "in-sample"). Think of it as
looking back in time, testing against two conditions with perfect
knowledge, looking at the results for those two conditions, x & y, and find
those two conditional tests that give the best results over the time series
ex-post. I actually want to test against six conditions when I'm done, but
am working which are the the two which explain return variance most
significantly.

Amusing factoid: There are about 3.4 times as many fields
as there are distinct `float' values.

What is a "field" mean in this case?


I guess divide and conquer is the right way to go. What I can do is
splice the grid-search into quadrants, process each quadrant, and hen
and record the quadrant results (i.e., I'm searching for the Max
within the grid). From there, it's just a matter of rolling through
the quadrants.

Could you explain the nature of this search a little more?
Simply "searching for the Max" in a big collection of numbers
requires very little memory; there's no need to retain a number
that's known to be non-maximal.


I run through these test conditions over a time series, starting with the
lowest test parameter for x and for y. I then iterate over the time series,
looking at each time-record-entry for a set of conditions, including x&y,
if those conditions are true, and the values are greater than x or y, I
then record the entry. At the conclusion of the time series, I then
iterate the result to test again, with x += x + x_step and y += y + y_step.
After retesting 120k^2 times, I then look across the grid to determine
which x and y test provide the maximum value. This concludes the in-
sample testing. I then test out-of-sample to check projection error, but
that's a whole other discussion! HTH.

I guess what you're saying is that I can discard all previous numbers each
time I find a new maximum? If so, what would be the computational tradeoff
between storing the results and testing for the max result, as opposed to
testing for the max result as I go?

Chris
.



Relevant Pages

  • Re: Memory Allocation in Java
    ... longtudinal time series, when tested against two conditions, x & y. ... either that the measurements are grouped into sets by some rule ... those numbers in the 120k x 120k grid were calculated. ... the quadrants. ...
    (comp.lang.java.help)
  • Re: Memory Allocation in Java
    ... The problem is that the length of x and y, and thus the size of the grid is quite large. ... sparse matrix math won't work. ... it's just a matter of rolling through the quadrants. ... Simply "searching for the Max" in a big collection of numbers ...
    (comp.lang.java.help)
  • Re: Memory Allocation in Java
    ... The problem is that the length of x and y, and thus the size of the grid is quite large. ... sparse matrix math won't work. ... it's just a matter of rolling through the quadrants. ... Simply "searching for the Max" in a big collection of numbers ...
    (comp.lang.java.help)
  • Re: Memory Allocation in Java
    ... I have a large array of floating point numbers I need to ... grid is quite large. ... sparse matrix math won't work. ... it's just a matter of rolling through the quadrants. ...
    (comp.lang.java.help)
  • Re: Memory Allocation in Java
    ... Think of it as looking back in time, testing against two conditions with perfect knowledge, looking at the results for those two conditions, x & y, and find those two conditional tests that give the best results over the time series ex-post. ... either that the measurements are grouped into sets by some rule ... 2D grid, but I'm no longer sure what you mean. ... splice the grid-search into quadrants, process each quadrant, and hen and record the quadrant results (i.e., I'm searching for the Max ...
    (comp.lang.java.help)