Re: teaching a child - console or GUI

From: Marco van de Voort (marcov_at_stack.nl)
Date: 07/30/04


Date: Fri, 30 Jul 2004 11:27:39 +0000 (UTC)

On 2004-07-29, J French <erewhon@nowhere.com> wrote:
> On Wed, 28 Jul 2004 15:23:12 +0000 (UTC), Marco van de Voort

>>> It does not need to be bitmaps, sparse results (eg: all 2003 records)
>>> can be just a list of 4 byte pointers
>>
>>This could be done for us too. Keep in mem even. However my own situation
>>doesn't benefit from this.
>
> Not even if you can 'add' known data sets together
> eg: All 2004 transactions if Corporate clients sorted by Alpha

Not really, 2004 is a moving target anyway.

It could be done for 2002 and 2003, and that could be added to 2004.

However it would be a _lot_ of work (take care of max/min/avg/sum), and
there are not that much reoccuring queries.

Most queries however have a where munc=<my muncipality>, and that is indexed.

We originally planned to also factor out year, but till now, the performance
is ok.

>>> A place I once worked used to thrive on selling databases to financial
>>> institutions (they still do) and we developed a whole load of ways of
>>> accelerating searching and sorting
>>
>>Why bother? Taking an RDBMDS should make things easier, not more difficult.
>
> I am not proposing an RDBMS
> - I'm not convinced they make things faster

Certainly not. I use a RDBMS to reuse existing code and optimizations.

> - mostly they just save time through working on the server rather than
> passing gigs of raw data through the network

That would be not so much of a problem in this case, since most of the data
is not needed on the client. Everything in stored procs or so.

>>> At its simplest an 'index' is only a list of sorted 4 byte pointers
>>> One can hold a lot of those on disk ...
>>
>>And loading them in a system under load (with constant disk io) is worse
>>than the _real_ querytime in our system.
>
> surprisingly little, because one is doing very few large disk reads
> rather than thousands of small disk reads

In our case: probably yes, because the writes are scarce. At least on the
timescale of modern computers.

>>Implementing tricks to make a RDBMS compete with an in-mem solution is not
>>smart, since similar tricks benefit the in-mem solution too (and usually
>>more)
>
> Sure they do - however since memory is finite ....

Everything is finite. You have to set a maximal magnitude of scaling anyway.

>>
>>Yes, of course the data is retained, but it is no longer online. Moreover,
>>the main app is single .exe, and the CPU power needed is +/- 2GHz. Memory
>>req is now 800 MB, but that grows 300MB/year.
>
> Sounds pretty much like home user kit !

It is a normal business machine now. However we want something beefier. More
because we want a decent redundant disk array system. (just mirroring, but
quality stuff), and quality power supply, quality memory etc.

An entry level server is better. The main reason that it is not necessary
now, is because of migration reasons there are several systems in paralel that
are redundant

>>>>However that 300MB/year figure and the five year figure is the
>>>>current situation.
>>>
>>> Any chance of it going wild ?
>>
>>Not in the coming 1/2 years. The system is build to scale to 64-bit. Even
>>if it contained all of Holland, 16GB would do the trick.
>
> Right....

1..2 years I mean. I just realize that 1/2 is usually taken as an half :-)

>>Moreover the data is quite partitionable, so a cluster solution is also
>>possible (though not prefered)
>
> You mean cluster of PCs - yes I also wondered about that
> Personally, if going down thate route, I would have one for preparing
> an ordered list of selected records, and another for pumping the data
> back to the client

Pretty much yes. One per year or per region, and one frontloader that does
client communications and bundles queries.

>>> eg: trashing white space and tokenizing longer fields I guess
>>
>>Pretty much half of it yes, whitespace, tokenizing, string2datetime etc.
>
> Right, I wonder whether you have looked into replacing the string
> system - I should imagine the data is pretty repetitive

Yes and no. We used every trick in the book to fix up the main entity, which
accounts for over 95% of the objects. We did some dictionary based
elimination on the larger stringfields of the other two larger entities

I don't care if a muncipality record is 200 bytes if I have maximally a
few hundreds, and are talking about GB's of ram.

>>> It sounds as if you have some raw files that you crunch and slap into
>>> RAM - rather like building a CD 'database'
>
>>We start from .DBFs of the old system. What is a CD database according to
>>you?
>
> To me a CD database is a collection of R/O files that have been
> heavily pre-processed so that one has numerous sort orders stored as
> lists of pointers on disk, extract files of frequent search fields in
> a normalized format .... basically any trick to make searching and
> sorting a matter of adding/removing/merging pre-formed sets of data

There is not much read only actually. At least not now. Since the previous
year is still open for corrections of accountants. Of course if the number.

Moreover all this is much too much work. The current system is totally
straightforward programming on a Eur 1000-2000 machine, as general as the
system allows.

Keep in mind this is a once of project. I can sell my extra optimization
work multiple times.

>>> You mean you are, or I am, or both of us ?
>>
>>Programmers in general.
>
> I'm not so sure from looking at the horrors some coders come up with,
> but even so, I do prefer to be mean with memory.

I'm mean with memory in general too. So that I can stuff more in it.

But that is pretty much the point. The more you save, the easier this
kind of thing gets, from a scaling point of view.

>>> I agree it is a commodity, but if anything, that is the problem
>>> It stops people looking at the underlying data structure
>>
>>The underlying datastructure is what is in memory. Trying to stuff it
>>in a RDBMS, and then making it more complex is what is unnatural.
>
> I really was not advocating a conventional RDBMS

Ok. Keep in mind that the data is live, and programmers cost money too.

>>> different order.
>>
>>Sure. But there are a lot of crazy optimisations that one could do. The
>>fundamental question remains. Why would I ?
>
> It could improve performance several hundred fold
> Just using a BChop on a sorted list is many times faster than
> sequentially scanning a list

But we have a lot of different queries, with a low reuse counts.

And the upper level (and if performance needed it we could do another) of
that are prepared in mem too. And that/those first lvl magnitudes matter.

>>I only have to make sure that mutations are journaled and flushed. I don't
>>> Your system has rather caught my interest, probably because it sounds
>>> similar to problems I've worked on in the past.
>>>
>>> I really do believe that the key to speed is algorithms, not RAM
>>
>>That's a common mistake. It is an equation, and algorithms is a variable
>>in that. Language, compiler speed, hardware are all variables too.
>
> True - but the wrong algorithm can have a dramatic effect

Same if you get the constants wrong.

>>If your algorithms totally suck, it is the limiting factor sure. But it
>>makes no sense to build an own custom system, while one weeks of wages can
>>pay for the hardware to run it.
>
> Yes - but once you have the hardware it seems to make sense to get
> things going faster

If it is fast enough, no. Minimal effort, maximal effect.

> I was interested in several things you mentioned, the Strings are
> rather interesting - I'm assuming AnsiStrings here not effectively
> arrays of Chars.

Yes.

> From digesting data in the past I have generally found that 'String
> Fields' tend to be very repetitive, and that one is often better off
> just having a 4 byte pointer into a 'Lexicon'

I call it a dictionary :-)

> Another thing I think I mentioned earlier is that 'RAM' devices are
> getting very large and very cheap, one could literally stick a few
> Memory Stick devices into some USB ports and get a vast amount of very
> fast 'near RAM'

Nope. USB sticks are slower than HD, even normal IDE discs, let alone decent
SCSI arrays like Compaq has. However you could use them as cheap
redundant/backup kind of thing (mutations both to stick and HD)



Relevant Pages

  • Re: teaching a child - console or GUI
    ... >> Are these objects very complex, or are they really a bunch of pointers ... >> One can hold a lot of those on disk ... ... Sure they do - however since memory is finite ... lists of pointers on disk, extract files of frequent search fields in ...
    (comp.lang.pascal.delphi.misc)
  • Re: teaching a child - console or GUI
    ... because one is doing very few large disk reads ... >quality stuff), and quality power supply, quality memory etc. ... >> lists of pointers on disk, extract files of frequent search fields in ... >> From digesting data in the past I have generally found that 'String ...
    (comp.lang.pascal.delphi.misc)
  • a position on Treestructure in SQL and request for comments on same
    ... Loading an entire tree into memory before searching is not scalable. ... This whole "store tree nodes as individual records in an RDBMS" sounds ... portions are only retrieved from disk on demand). ...
    (comp.lang.python)
  • Re: interview question: merging huge lists on disk
    ... Suppose you have two sorted lists of ... Simply store one list on one disk, the other on a second disk, and write ... With lists so large that they don't fit into the memory of an ordinary ...
    (comp.programming)
  • Re: About Entity-Relationship Diagram in BDS 2007
    ... reads data from disk into memory, manipulates it and then writes it ... RDBMS does. ...
    (borland.public.delphi.non-technical)