determining available space for Float32, for instance



I am looking for a way to determine the maxium array size I can allocate
for arrays of Float32 values (or Int32, or Int8, ...) at an arbitrary
point in the program's execution. This is needed because Python cannot
allocate enough memory for all of the data we need to process, so we
need to "chunk" the processing, as described below.

Python's memory management process makes this more complicated, since
once memory is allocated for Float32, it cannot be used for any other
data type, such as Int32. I'd like a solution that includes either
memory that is not yet allocated, or memory that used to be allocated
for that type, but is no longer used.

We do not want a solution that requires recompiling Python, since we
cannot expect our end users to do that.

Does anyone know how to do this?

The following describes our application context in more detail.

Our application is UrbanSim (www.urbansim.org), a micro-simulation
application for urban planning. It uses "datasets," where each dataset
may have millions of entities (e.g. households), and each entity (e.g.
household) may have dozens of attributes (e.g. number_of_cars, income,
etc.). Attributes can be any of the standard Python "base" types,
though most attributes are Float32 or Int32 values. Our models often
create a set of 2D arrays with one dimension being agents, and the
second dimention being choices from another dataset. For insances, the
agents may be households that choose a new gridcell to live in. For our
Puget Sound application, there are 1 to 2 million households, and 800K
gridcells. Each attribute of a dataset has such a 2D array. Given that
we may have dozens of attributes, they can eat up a lot of memory,
quickly.

Given the sizes of these arrays, and Python's limited address space,
Python usually cannot allocate enough memory for us to create the entire
set of 2D arrays at once. Instead, we "chunk" the model along the
agents dimension, processing a chunk of agents at a time. Some of our
models can do their work in a single chunk. Others require hundreds of
chunks. It depends upon the number of agents, the number of locations,
the number of agent attributes, and the number of location attributes
used by that particular model.

What we would like is for the code to be able to automatically determine
the number of agents that can be in a single chunk. This requires we
solve two sub-problems.

First, we need to know how many attributes of each type (Float32, Int32,
etc.) will be used by this model. We can do that.

Second, we need to know how much space is available for an array of a
particular type of values, e.g. for Float32 values. Is there a way to
get this information for Python?

Cheers,

David Socha
Center for Urban Simulation and Policy Analysis
University of Washington
www.urbansim.org
.



Relevant Pages

  • (patch for Bash) regex(3) splitting/matching
    ... I usually do this in Python. ... 'help array' will give you more info on other options for 'array' ... int dollarflag, zeropad, compareflag; ... SHELL_VAR *var; ...
    (comp.unix.shell)
  • Re: Cons cell archaic!?
    ... from s-expression or XML or other syntax you keep the bloated array ... For using vectors to emulate lists that ... Allocate 2, move 1 element: ... What do you think of that algorithm? ...
    (comp.lang.lisp)
  • Re: Why C Is Not My Favourite Programming Language
    ... And the number of modules in Python 2.4's Global Module Index is 362. ... The PDP architecture ideals ... fflushcan't be used to flush the contents of standard ... But it's not foolish that in ksh if you refer to an array name ...
    (comp.lang.c)
  • Storing/Retrieving TYPEs with ALLOCATABLE components (TR) (long)
    ... tBrd, including array descriptor of tEn )). ... Without previous DEALLOCATE, the allocate line fails at run time with message ... the fact that I'm loading an invalid descriptor tBrd%tEn from the file... ... status (which is not possible according to Standard, but then BINARY files ...
    (comp.lang.fortran)
  • Re: Storing the size of an array in the structure itself
    ... >> I think every C programmer can relate to the frustrations that malloc ... >> the size of an array must be stored separately to be a nightmare. ... is anything more than just that - a chunk of memory. ... > Otherwise you couldn't tell it how much to allocate. ...
    (comp.lang.c)