Re: SQL
- From: "topmind" <topmind@xxxxxxxxxxxxxxxx>
- Date: 5 Feb 2006 14:00:22 -0800
Dmitry A. Kazakov wrote:
On 4 Feb 2006 15:28:10 -0800, topmind wrote:
Dmitry A. Kazakov wrote:
On 3 Feb 2006 18:28:11 -0800, topmind wrote:
Well, like I already said, I am not sure relational requires ordering
either. If ordering is a pivotable issue for your claim, then let's
explore it further.
You still need:
1. "=", an equality relation (transitive, symmetric, reflexive)
2. Copy constructor, to have an ability to place things into cells
The above is interface of a copyable, comparable type.
Having the ability to be copyable and comparable does not *make* it a
"type", unless you are using a very loose definition of "type". It
indeed may be possible to view everything as a type. But that does not
necessarily make everything actually *be* types.
It is a language issue then. The point is that relational model, you refer
to, can be fully described in terms of types, and therefore represent not
an alternative, but just a case.
Perhaps. There may be other and perhaps better "everything is a"
viewpoints that are pretty much interchangable with each other. I
remember some guy claiming that "everything is a closure". But just
because you can does not mean you should.
I agree that ADT's are pretty good for systems software (although sys
soft. is not my specialty, I should note). However, the concepts don't
appear to apply to well to business modeling. An RDBMS engine that may
be designed such that the "expression engine" (for lack of a better
name) can be swapped without changing the relational engine may indeed
use many ADT concepts.
Same for business, same for any other modeling. Don't some biz-models have
something in common, more advanced/specific than just relations? If so,
then those can be extracted in a form of a "biz engine", more specialized
than general "expression engine". ADTs just gives you a tool for doing such
things.
As I have said many times, in the real world the patterns of
differences/changes are not hierarchical. The variations of things tend
to be a semi-random selection of all possible combinations of factors
(Cartesian join). I find sets better able to handle such differences
than hierarchies. "Types" are too tied to the idea of "is-a", when mass
"has-a" management is more appropriate for variation management.
I have answered this. If you can map relations to individual objects rather
than types, do it. Nobody proposes to invent types where unnecessary. A
program with a lesser number of types is easier to understand. The problem
is that quite often this is technically impossible. If you wish to force
everything into the limited set of types, SQL has, you must also accept
much higher developing costs and maintenance beyond anyone's capacity.
When I see an actual case with code from my domain, I might change my
mind. Types don't model biz things very well IMO. A viewpoint one needs
tends to be relative, and types don't handle that very well, wanting a
universal classification instead. "Encapsulation" generally does not
allow for overlaps. Set theory does.
For "add" to report "not a number" means that it should know the types of
the operands. This means that you need types at least to check them.
How specificly is "types" different from "validation" in this example?
x = add("123","99.28");
y = add("foo","7");
The language does not care what "foo" is here. Only the "add" function
will care when it checks to see that the first parameter is a valid
number.
The above isn't properly typed. "123" has the type String. Strings aren't
additive.
Language can and do this. Label it however you want, it works.
There are other things it might check such as range because it
may not be able to add large numbers. Types cannot do this very well
unless we either pick arbitrary chunk sizes, or create a type for every
possible length/size, which is dumb.
Types do it perfectly. I can have a wide set of numeric types having
different models.
And long-winded confusing code to use it.
It is a very important issue. Numeric types are models
and there could be many different models of integer, real and other numbers
as found in mathematics. Note that range is only one aspect here. There is
also precision, rounding, accuracy of numeric operations etc. Further typed
systems describe requirements on the type and the compiler/engine is
responsible to fulfill these requirements *automatically*. This way is far
more safe than adding "123" to "99.28", riddling if the result is
"12399.28".
The "protection" of heavy typing is not free. Often it requires twice
as much code as type-free code and longer coding time. With a type-free
language one can use the time left over from brevity and clearity to
spend on testing, unit tests, etc. Type-free code reads more like
psuedo-code. Type-heavy code reads like legalese.
Businesses are run by scrooges: they want productivity at a low cost.
Strong typing does not provide this. It may perhaps provide a high
degree of accuracy if the resources are provided to spend on the
bloated red-tape typish code, but companies don't want to spend that.
They will tolerate slightly more errors if the cost is a lot lower.
Plus, the errors caused by not understanding the customer or the domain
are often a bigger component than outright program errors. It is
usually cheaper to educate a few good "scripters" in the domain than an
army of precision strong typers.
But, there are areas where the reverse might be true, such as
life-support medical equipment.
Again, I would like to see a specific scenario of OO outdoing RDBS in a
custom biz app setting (outside of machine performance issues for now).
Technological changes don't happen over night. Then there are serious
issues of foundations and lack of properly typed languages. I can't predict
what would a typical biz application do in 10 years.
I am not here to argue the merits of strong/weak/none typing. (Unless,
perhaps you are one who thinks OO == Types.)
OO <= Types
First you would have to form a clear, consensus definition of OO to
prove this. Good luck with that task.
but also utterly inefficient with respect of
space and time required for may operations.
When I encounter enough of such scenarios,
Put an image in relational table, so that each pixel would be a cell.
Put a program code in a relational table and write compiler in SQL
...
Why SQL? SQL is one of an infinite possible relational languages.
Further, one does not have to actually store stuff in a physical grid
in RAM or disk (if there is such a thing).
And, how would making each pixel an ADT or Object improve the picture
(pun)?
Further, relational does not dictate implementation.
Wrong, it puts definite limits of the implementation.
Show me a definite limit.
If I am *searching* based on multiple factors, sure I'd be happy to use
a (good) table browser.
That's the point, you need a different paradigm, because "path" is not a
type in SQL. Because result sets aren't ordered in SQL, etc. Write a GPS
car navigation system representing the results as relations and try to sell
it *anybody*!
Please clarify.
BTW, here is a "road" schema:
Table: Road
-----------
roadID
roadTitle
roadType (Highway, Main, Side, Mixed, etc.)
Table: Points
-------------
pointID
Longit (longitude)
Lattit (lattitude)
Table: RoadSegment
-------------------
roadRef (references roadID)
segID (may not be needed)
fromPoint (reference Point table)
toPoint
segmentType (Highway, Main, Side, etc.)
There are other, perhaps better, ways to do this. I added a "Point"
table so intersection info may be easier to represent. This is just a
quickie demo.
Further, I find dynamic or type-free systems more adaptable to multiple
languages and tools. Compile-time-checking tends to assume the whole
world is the same language and gag when it isn't. RDBMS info has proven
more sharable than Java, Eiffle, etc.
It isn't static vs. dynamic. Time of checking is so far irrelevant. As long
as 23 has only one type, there is nothing to check.
The problem of gotos is their power. They have too big "norm".
Mathematically, small program changes may lead to an enormous,
unpredictable effects.
Goto fans might argue that the impact of change is only unpredictable
to those who don't "get" goto's.
There are not so many goto's fans left. Same with RDBMS, they are already
extinct, and coming generations will forget about them.
Sorry, I see no evidence of their demise. Commercial sales are down a
bit of late, but this is probably because open-source solutions are
coming of age. The only current threat seems to be OO'ers who want to
do more in Java etc. and less in SQL. However, that battle is still
raging with no clear winner.
One of the biggest selling points of RDBMS is that multiple languages
can use them. I've seen Oracle systems that lived through COBOL and are
now having web languages, MS-Access, and languages like Java all hooked
to the *same* DB. Until you find a way to make OO/ADT results or access
more sharable, this aspect alone will keep them around for practical
purposes.
That is the fate of programming as a whole.
Relational approach suffers it very much. Just consider joins. The beauty
of specialized domain languages is not their power, but, on contrary, lack
of power - there is much less you can do wrong.
Huh? If everybody invents their own join, how does that reign in
problems?
Should any problem be solved in terms of joins? You are trying to sell me a
wrong tool!
I never said that. I am only saying that a lot of stuff that some view
as "domain specific" are simply database-like activities disquised as
domain-specific issues. They are largely reinventing the wheel without
knowing it.
I will make domain-specific functions when needed, I would note. Using
RDBMS does not prevent this. Sometimes I even mix them to get the best
of both:
doProcessFoo(123, bar, "x > 3 and y='blah'");
[...]
function doProcessFoo(id, glog, clause) {.......}
Here I pass in a WHERE-clause expression. (Some consider this a
security risk do to potential "sql-injection", but it is usually for
narrow-use intranets where much of the key data is read-only anyhow
under the logins used by the app.)
OO as a paradigm and ADT as its vehicle tries to keep the power, but also
provide checks and balances to diminish negative impact of exercising that
power. In particular, discipline in OO is enforced on the component basis.
Show with code.
But you said that you aren't interested in static program correctness
analysis.
If you want a challenge, fine. Take any machine learning method. Training
sets are ideal tables, rows and columns, nothing else. Take any method of
your choice and implement it in SQL! For introduction to existing methods,
see excellent tutorials by Andrew Moor:
http://www.autonlab.org/tutorials/
How about something from the domain of custom biz apps. I have already
conceded out of ignorance of the domain that DB's may not be good for
heavy-duty numerical analysis.
You have *pure* relational data. Note that whole machine learning is
nothing but just SELECT training_set WHEN example=x. So, don't hesitate,
give me a Support Vector Machine in SQL!
Outside of specific languages or implementation, the two biggest
differences between relational and OO are:
1. Each "record/map" must belong to one and only one entity (table) in
relational. OO's "map" has no such restriction. Inheritance can emulate
such, but that is optional. Each object can float independently.
Do you refer to singletons here? I don't see why this should be essential,
but it is no problem to enforce that in an OOPL. Make constructors private,
if you want, and here you are. But as a principle, it is wrong - numbers,
strings aren't bound by this rule. 123 can be in any number of cells. Try
to consider a wider picture: there are things in cells, cells themselves,
rows, columns, tables, sets of tables, sets of sets of tables etc. ADT
offers you a unified way to handle that all.
No, it does not. ADT by itself is not a language and relational is a
bigger picture than ADT's.
Come on! SQL can be trivially described in ADT's. Try the opposite. Just
place a table into another table, flip a table column, describe a ring in
SQL, write a task scheduler in SQL...
Again, I don't expect SQL or relational to solve the *entire* problem
by itself. Remember the Yin-Yang debate? A good case that a Yang-only
system is inharently better was not made.
2. Relational generally assumes a partitioning between data and
behavior, while OO tries to meld them.
Actually it tries to get rid of data. It says that there is no data, but
only behavior. The rationale is as follows. You cannot perceive data, only
the behavior of those. This is in full accordance with mathematics. There
is no such thing as number 123. There is a set of properties it and similar
things expose. Moreover, try to ask yourself what is a relation, and you
will see that a pure relational approach should care about data even less.
Well, data and behavior are just different views of the same thing.
Can you name this thing? Again, what is a relation? Formulate it, and point
me the word "data".
No, I cannot name "it". When we start modeling things that don't exist
in the real world, English is often no longer sufficient.
But to give an analogy, program code is merely data to the
interpreter/compiler. A developer may think of a function as
"behavior", but the interpreter treats it more like data if we look at
other processes that read what we normally call "data". It is yet
another case of relativism where "is-a" flunks.
This would get into a definition battle that has no hard math to say
yeah or neah.
One thing is quite clear, it is impossible even to define the term "data",
without description of behavior. Look at mathematical definitions of
numbers.
"Data" is a set of computation states, characterized by definite behavior.
Perhaps you should make a distinction between "is a set of" and "can be
defined as a set of". Claiming something "is" is a strong ascertion.
When it is said that a table cell contains 1, it means all states where Get
(or SELECT) would deliver result associated with the application domain
object denoted as 1.
It is the behavioral approach, which makes both OO and your beloved
relational model *implementation-independent*. Otherwise, you were unable
even to talk about "data", because, again, what in common have two states
of magnetic fields on the hard drives of two computers? They expose same
*behavior*, which you call "data"! Is big-endian and low-endian encoded
26732147 same "data"?
Perhaps, but that is an implemention issue. There may be other ways to
implement it.
--
Regards,
Dmitry A. Kazakov
-T-
.
- Follow-Ups:
- Re: SQL
- From: Dmitry A. Kazakov
- Re: SQL
- References:
- Prev by Date: Re: SQL
- Next by Date: Re: SQL
- Previous by thread: Re: SQL
- Next by thread: Re: SQL
- Index(es):
Relevant Pages
|