Re: SQL




Dmitry A. Kazakov wrote:
On 5 Feb 2006 14:00:22 -0800, topmind wrote:

Dmitry A. Kazakov wrote:

When I see an actual case with code from my domain, I might change my
mind. Types don't model biz things very well IMO.

They need not to. The whole point is that relational is typed as well. Full
stop.

For a loose or wide definition of "types", perhaps.


Whatever advantages of modern typed systems could be used to improve (sic!)
existing typed (sic!) relational systems is another question. It is up to
developers.

Go for it. Improve away.


How specificly is "types" different from "validation" in this example?

x = add("123","99.28");
y = add("foo","7");

The language does not care what "foo" is here. Only the "add" function
will care when it checks to see that the first parameter is a valid
number.

The above isn't properly typed. "123" has the type String. Strings aren't
additive.

Language can and do this. Label it however you want, it works.

Really? What is the result of add("fee", "123") ?

"1069"? "4201"? "fee123"? "sysadmin@xxxxxxxxx"?

An error. Unless perhaps we have something optional like:

x = add(a, b, valueOnError=0)

Or

x = add(a, b, onError="raiseErr('std')")


There are other things it might check such as range because it
may not be able to add large numbers. Types cannot do this very well
unless we either pick arbitrary chunk sizes, or create a type for every
possible length/size, which is dumb.

Types do it perfectly. I can have a wide set of numeric types having
different models.

And long-winded confusing code to use it.

Huh, the code you presented is 1) longer, 2) far more confused.

I don't have to define and hunt down definitions of types. You will
have something like:

a = new integer(min=-223452345, max=234234234);
b = new integer(min=-09809809809, max=242093423423);
x = a.add(b);


As I have shown, it is not type free. It is typed in a definite way.

I suspect we would need a clear definition of "types" to settle this.
Either way, it is not relying on "types" in the usual strong-typed way.
By the way, many scripting languages make no internal distinction
between:

x = 123

and

x = "123"

Some keep an internal "flag" to track how it was defined, and some
don't, treating everything more or less as strings with no internal
flags to indicate origin or history of usage.


How do
you know that this way is the proper one? Do you have use-tests to verify
that? Did you count all these tests? Where are use tests for accuracy,
rounding errors?

Rounding errors for "add"?

(I never liked how floating point did division anyhow.)


Businesses are run by scrooges: they want productivity at a low cost.

I'm sure, they know pretty well what risk management is.

Indeed, and often don't want to pay for too much of it.


Strong typing does not provide this.

SQL is typed, so I see no point.

Perhaps, but it is possible to create a nearly type-free version of
SQL.

SQL-Lite (sqlite.org) is allegedly one such tool, although I've never
tried it.



but also utterly inefficient with respect of
space and time required for may operations.

When I encounter enough of such scenarios,

Put an image in relational table, so that each pixel would be a cell.
Put a program code in a relational table and write compiler in SQL
...

Why SQL? SQL is one of an infinite possible relational languages.

Fine, show us an outline of a better language limited to solely relations
and we'll see.

Hold on, are we talking about implementation here, or the language? I
don't see how the language relates to compactly representing pixels
internally.

As far as implementation, one could create a table such as:

table: images
--------------
imageRef // image ID
x
y
pixelValue

It could be mapped directly to a RAM image if we want speed and
compactness. (Perhaps it should be called a "view" and not a table,
although philosophically they are perhaps the same thing.)


Further, relational does not dictate implementation.

Wrong, it puts definite limits of the implementation.

Show me a definite limit.

1. Row ID. It falls out of model, completely. In a truly relational model
you cannot mark either cells or rows, they have no identity of their own.

Please clarify. Relational does not limit auto-generated keys. Some say
it "encourages" one to use domain keys, but this has never been
settled, and some argue that auto-gen keys are or can be domain keys.


2. Tables of tables.

This is a *good* restriction of the relational model. Hard-nesting is
hard-wiring a particular viewpoint or (access path) into the model,
which goes against the relativistic philosophy of relational. Dr. Codd
set out to purposely avoid hard-wired access paths when he started
thinking about relational due to the navigational messes that others
were creating. However, it does not limit what can be modelled
externaly. One can still model nested stuff using non-nested tables.

It is not a technical "limit", but a philosophical guide-wall.
Relational imposes rules to be called "relational". Otherwise, it would
turn into the navigational messes that motivated the creation of
relational to begin with.


3. Generic programming in terms of sets of tables sharing common
properties.

You seem to want to model paradigm X in paradigm Y. That is the wrong
approach. You *use* paradigm X, not change it to look like another
paradigm. Relational is a paradigm. If you don't like its rules, then
leave it.

It is like saying the "no side-effects" philosophy of Functional
Programming is limiting, and thus one should dump it. If you dump it,
you don't have FP anymore.


Referential GPS resolution is about 30cm horizontally. Let's take, say, a
Germany map (not such a big country) with 30x30cm mesh... [Hint: roads have
limited width.]

One could model roads as polygons if you want that precision. GIGO in
effect here.


There are other, perhaps better, ways to do this. I added a "Point"
table so intersection info may be easier to represent. This is just a
quickie demo.

... of why this cannot work. Not to mention that GPS navigation system
calculates from a stream of longitude, latitude, height, mileage, speed
from the car, not only the road position, but relative positions of close
objects and the direction you are driving. All this is done with a high
level or error tolerance. GPS is very unreliable.

Are you bashing relational or GPS, I cannot tell here.

Forgetting for a while,
that searching the DB you proposed would require several days (years?) per
query and a large truck to load the DB onto. Just write down a SELECT
statement for the model above!

If the properties you are searching on are indexed, I see no reason why
queries should run long. Plus, it is probably practical to internally
partition the tables into states or counties to speed up
location-specific queries, which are probably the most common for
traveling. (FBI queries, for example, may be different in that aspect
because they may look all over in a given query.) I believe Oracle and
other vendors has the ability to internally partition like this without
changing the external view (such as table name.

And, I don't know where you are getting your size estimates.


One of the biggest selling points of RDBMS is that multiple languages
can use them.

You mean the languages, without which one couldn't be able to use these
RDBMS. I don't see how relying on concurrent products makes yours any
better...

Sharing lots of company info in multi languages and tools without using
a DB is a royal pain.


Show with code.

But you said that you aren't interested in static program correctness
analysis.

If you want a challenge, fine. Take any machine learning method. Training
sets are ideal tables, rows and columns, nothing else. Take any method of
your choice and implement it in SQL! For introduction to existing methods,
see excellent tutorials by Andrew Moor:

http://www.autonlab.org/tutorials/

How about something from the domain of custom biz apps. I have already
conceded out of ignorance of the domain that DB's may not be good for
heavy-duty numerical analysis.

This is not numerical analysis. This a challenge to the relational approach
in general. There is a problem specified relationally, that has an obvious
relational solution, yet nobody would even try to use RDBMS for that. How
could THAT happen?


I don't know. I do notice that academics tend to be poorly trained in
RDBMS. They could simply be afraid of them out of unfamiliarity.
Second, RDBMS had been fairly expensive until the recent advent of
usable OSS solutions. When pressed on budget, people tend to roll their
own.



[ Answer: RDBMS are completely unsuitable either for matching in
multidimensional spaces, or for best-match criteria. The idea of separating
data and using normalized search schemas simply does not work beyond
trivial cases. Now, you will tell me that all biz-applications are trivial,
but I won't buy it. And for all I won't buy a tool limited to trivial
cases! ]

Further, if there are specialized engines already built to process
things such as Bayesean networks or neural nets, obviously it makes
sense to go with those specific already-built solutions. RDBMS shine
where you have *different uses* for the *same* info. Those things you
list tend to be *same* uses for the same info. See the difference? One
simply imports the load of data into those kits, chomps on it for a
while, and then dumps out the answers.


Again, I don't expect SQL or relational to solve the *entire* problem
by itself.

From this is derive that you consider relational is a niche solution for a
limited number of problems. As such it cannot be a concurrent to OO.

Perhaps RDBMS suck outside of custom biz apps and OO sucks inside.


But actually, they are incomparable not because relational is incomplete
and thus bad. It isn't bad. They are incomparable, because relational is of
lower level. You cannot compare a vehicle with a wheel, though a circus
rider could use the later. I, personally, better take a car. Some would a
jet or ship.

No, it is actually higher-level than OOP. (This will turn into "level"
definition fight I fear, similer to the "complexity" metrics mess.)


But to give an analogy, program code is merely data to the
interpreter/compiler.

I don't care about the CPU.

I said compiler/interpreter, not the CPU. However, CPU is a similar
example: the machine code is just data to it.


A developer may think of a function as
"behavior", but the interpreter treats it more like data if we look at
other processes that read what we normally call "data".

That the point, software is developed, maintained and finally scraped by
humans.

I don't see how this relates to the relativistic viewpoint of behavior
and data. If a robot, evolution, or God wrote the interpreter instead
of humans, would that change anything?


Perhaps you should make a distinction between "is a set of" and "can be
defined as a set of". Claiming something "is" is a strong ascertion.

For you, who believes in existence of data. To me, there is no data without
something that interprets them exposing a behavior. There is no information
without the receiver.

Yes, but one can focus their efforts on one or the other. I mentally
try to use tables to codify information and (potential) behavior such
that the "interpreter" (table processor) is a fairly trivial process,
or at least one that can be mentally seperated from the data config to
help divide up the effort into clean sub-parts. This is similar to the
music composer versus the player-piano engineer, who only cares that
the holes in the paper roll are "executed" according to expected rules
and conventions of music. The music composer does not want to have to
concern him/herself with the machanics of the player-piano. In practice
the seams are not always clean, such as when the composer is limited to
a max number of notes because brand X cannot play that many holes at
the same time or cannot reach high notes. But 90% seperation is still
better than zero. OOP fails to take advantage of this concept in my
opinion. Most biz OO code is or can be about 90% attributes anyhow. The
Grand Behavior machine that OO allegedly is is *wasted* on hard-coded
navigational DB-like code.

--
Regards,
Dmitry A. Kazakov

-T-

.



Relevant Pages

  • Re: Data Definition Language
    ... SQL is a standard language that is used for Data Definition Language ... System tables are a proprietary feature of some implementations of RDBMS ...
    (comp.databases)
  • Re: OO vs. RDB challenge
    ... That is very easy with SQL Server 2005, Oracle, DB2, etc. ... >RDBMS in some other language and show its as easy as using an RDBMS". ... Constraints between tuples are among the most ...
    (comp.object)
  • Re: Generating SQL
    ... happens that SQL is the lingua franca of such RDBMS - I don't ask why, ... > builds strings in another language for some other compiler to parse? ... > "The aim of science is not to open the door to infinite wisdom, ...
    (comp.object)
  • Re: Best bit fields implementation under postgres
    ... You missed the foundations of RDBMS. ... Assembly Language, hardware dependent code you have been writing. ... the DDL and after you have the DDL, then you might worry about the ... I have thought about doing a book on Bad SQL, ...
    (comp.databases)
  • Re: Full-Text Catalog does not build...
    ... You're welcome, Oleg, ... I've not confirmed if the Arabic language will be a FTS supported ... language in SQL Server 2005, you may want to review the ...
    (microsoft.public.sqlserver.fulltext)