Re: Memory Allocation in Java



Christopher Smith wrote:
Eric Sosman <esosman@xxxxxxxxxxxxxxxxxxx> wrote:
Christopher Smith wrote:
[...]
No, sparse matrix math won't work. Every field has a value.

It's a startlingly large number of values; may I ask where
they all came from? Just curious, really.


Sure. The numbers represent the (mean value/stdev value) of a longtudinal time series, when tested against two conditions, x & y. In other words, it looks to find the best result on daily basis historically when a sample is tested against these two conditions (i.e., "in-sample"). Think of it as looking back in time, testing against two conditions with perfect knowledge, looking at the results for those two conditions, x & y, and find those two conditional tests that give the best results over the time series ex-post. I actually want to test against six conditions when I'm done, but am working which are the the two which explain return variance most significantly.

I'm still having trouble following your description. You
started out with 120000 by 120000 32-bit floating-point values,
referred to the assemblage as a "grid," and spoke of the data
as being "non-parametric." Okay, thus far I'm imagining a big
blob of numbers representing fourteen billion measurements on
a two-dimensional surface (not necessarily spatial, but 2D).
But now you're speaking of "mean value/stdev value," which implies
either that the measurements are grouped into sets by some rule
that hasn't been specified, or that there are in fact two sets of
fourteen billion numbers (fourteen billion means, fourteen billion
variances) that summarize a still larger set of measurements. And
as if that weren't enough, you've thrown a time axis into the mix;
the data set is now to be understood as three-dimensional? Or
perhaps more?

Amusing factoid: There are about 3.4 times as many fields
as there are distinct `float' values.

What is a "field" mean in this case?

It was your term: "Every field has a value." I assumed you
meant it to refer to one of the foutreen billion points of the
2D grid, but I'm no longer sure what you mean.

I guess divide and conquer is the right way to go. What I can do is
splice the grid-search into quadrants, process each quadrant, and hen and record the quadrant results (i.e., I'm searching for the Max
within the grid). From there, it's just a matter of rolling through
the quadrants.

Note that each quadrant -- if by "quadrant" you mean "one
fourth" -- is still about 13.5 GB of data. (Assuming "only"
fourteen billion 4-byte numbers.)

Could you explain the nature of this search a little more?
Simply "searching for the Max" in a big collection of numbers
requires very little memory; there's no need to retain a number
that's known to be non-maximal.



I run through these test conditions over a time series, starting with the lowest test parameter for x and for y. I then iterate over the time series, looking at each time-record-entry for a set of conditions, including x&y, if those conditions are true, and the values are greater than x or y, I then record the entry. At the conclusion of the time series, I then iterate the result to test again, with x += x + x_step and y += y + y_step. After retesting 120k^2 times, I then look across the grid to determine which x and y test provide the maximum value. This concludes the in-
sample testing. I then test out-of-sample to check projection error, but that's a whole other discussion! HTH.

You examine "over a time series," so that's one dimension of
the search. But your search over x and y is apparently one-D also,
because you step both coordinates together to trace a "diagonal"
across the search area. And then you "re"-test fourteen billion
times; what do you mean by "re"-test? If this is the repetition,
what was the original (and why didn't it suffice)?

I guess what you're saying is that I can discard all previous numbers each time I find a new maximum? If so, what would be the computational tradeoff between storing the results and testing for the max result, as opposed to testing for the max result as I go?

I'm withdrawing all advice on the grounds that the more you
explain what you're doing, the less I understand. Good luck!

--
Eric Sosman
esosman@xxxxxxxxxxxxxxxxxxx
.



Relevant Pages

  • Re: Memory Allocation in Java
    ... grid is quite large. ... sparse matrix math won't work. ... time series, when tested against two conditions, x & y. ... splice the grid-search into quadrants, process each quadrant, and hen ...
    (comp.lang.java.help)
  • Re: Memory Allocation in Java
    ... longtudinal time series, when tested against two conditions, x & y. ... either that the measurements are grouped into sets by some rule ... those numbers in the 120k x 120k grid were calculated. ... the quadrants. ...
    (comp.lang.java.help)