Re: Large Database System



On Oct 19, 7:54 am, raidv...@xxxxxxxxx wrote:
Hi there,

We have been looking for some time now for a database system that can
fit a large distributed computing project, but we haven't been able to
find one.
I was hoping that someone can point us in the right direction or give
us some advice.

Here is what we need. Mind you, these are ideal requirements so we do
not expect to find something that fits entirely into what we need
but we hope to get somewhat closer to that.

We need a database/file system:
1. built in C preferrably ANSI C, so that we can port it to Linux/
Unix, Windows, Mac and various other platforms; if it can work on
Linux only then it is OK for now
2. that has a public domain or GPL/LGPL licence and source code access
3. uses hashing or b-trees or a similar structure
4. has support for files in the range of 1-10 GB; if it can get to 1
GB only, that should still be OK
5. can work with an unlimited number of files on a local machine; we
don't need access over a network, just local file access
6. that is fairly simple (i.e. library-style, key/data records); it
doesn't have to have SQL support of any kind; as long as we can add,
update, possibly delete data, browse through the records and filter/
query them it should be OK; no other features are required, like
backup, restore, users & security, stored procedures...
7. reliable if possible
8 .local transactional support if possible; there is no need for
distributed transactions
9. fast data access if possible

We can not use any of the major commercial databases (e.g. Oracle, SQL
Server, DB2 or larger systems like Daytona...) obviously because of
licensing and source code issues. We looked closer to MySQL,
PostgreSQL but they are too big and have way too many features that we
do not need. We need to be able to install a database/file system on
possibly tens of thousands of machines and we also expect it to work
without administration.

Say that last sentence out loud in front of a group of DBAs and I
guess you will get a little bit of mirth. This statement alone is
proof that your project will fail. Every database system (even simple
keysets like the Sleepycat database) needs administration.

Listen, you are going to have tens of thousands of points of failure
in your system. Is that what you really want? If you have (for
instance) 20,000 machines getting a big pile of data shoved down their
throat, you pretty much have a guarantee that a few hundred are going
to be out of space and that once a month a disk drive is going to fail
somewhere.

On top of that, we might end up with thousands of files of different
sizes on each machine. Are there any embedded (i.e. "lighter")
versions of these two databases?

Do you know what happens to performance when you put thousands of
active files on a machine? Pretend that you are a disk head and
imagine the jostling you are going to receive.

We haven't been able to find anything like that. I am not sure how
much work would involve in "trimming" down some of these databases,
but that doesn't seem to be too easy to do.

They are the size that they are for a reason. It's not fat that gets
trimmed off to scale things down, it's muscle.

Berkeley-DB would have been the best but is now under Oracle hands and
the licence has changed. TinyCDB was a close call, but the fact
that we need to rebuild the database for each data update is making it
unfeasible for large files (i.e. ~1Gb). SQL Lite is very interesting,
but it has many features that we don't need, like SQL support.

You do know that SQLite is a single user database?

Right now we are using plain XML files so anything else would be a
great improvement.

I'll say.

Any suggestions or links to sites or papers or books would be welcome.
Any help would be greatly appreciated.

If this is not in the proper forum I appreciate if someone can move
the post to the right location or point us to the right one.

The right thing to do is go to SourceForge and execute a few
searches. The pedagogic answer to to refer to newsgroup
news:comp.sources.wanted, but it's a ghost town.

I suspect that you have no idea what you are doing. Do you have any
concept about what is going to happen when your problem scales to
10GB? Get a consultant who understands the problem space or you'll be
sorry. By the way, this is definitely not the right forum for your
post -- which does not exactly make it appear that you have anything
on the ball. (Really a newsgroup post in general is the wrong
approach here).

I guess that FastDB or GigaBase might be suitable (WARNING! One
writer at a time). I also guess that you are going to severely need
the capabilities that you do not think you need at some point.
http://www.garret.ru/~knizhnik/databases.html

Another possibility is QDBM:
http://sourceforge.net/projects/qdbm/
I guess that you will like this one but also that it is the wrong
choice.

I don't know anything about your project but I think you need to
rethink your big picture of how you are going to solve it.

.



Relevant Pages

  • Re: FP/DOS 2.6 -> vfp9 report conversion
    ... resulting application will run in a standard browser ... I hope that's for single-user applications. ... access with that of the database system. ...
    (comp.databases.pick)
  • OT: Long IT Rant - how difficult can it be?
    ... Our main database system has just had a major server and client software upgrade. ... First obvious no-no is that the server paths in the module configuration didn't match the reality of the directories on the new installation and a file prefix parameter had been somehow lost in the upgrade - nil points for Ci!!!ca. ... We have the bright idea of munging the server paths sufficiently to be certain of preventing local temporary files from being erroneously cleaned up. ...
    (uk.rec.motorcycles)
  • Re: ADP Applications
    ... That's right, hon. Paying the rent, putting food on the table, and having ... You and I both know that there are only three reasons for reducing a database ... The database designer wasn't yet competent in relational database design ... which the previous database system employed. ...
    (comp.databases.ms-access)
  • Re: Help with cost and additional software
    ... WMSDE is the built-in database system for WSS 2.0. ... little reason to use a "real" SQL Server 2005. ...
    (microsoft.public.sharepoint.windowsservices)
  • Re: What is a ROW-ID
    ... database system, one of the most fundamental is that the DBMS should not ... Since this is a very basic concept, most decent books about relational ... database systems should include comments on this, if you want to dig into ... Tibor Karaszi, SQL Server MVP ...
    (microsoft.public.sqlserver.programming)