Re: Was: what does "serialization" mean?

From: Edward G. Nilges (spinoza1111_at_yahoo.com)
Date: 07/10/04


Date: 9 Jul 2004 19:49:37 -0700

Nick Landsberg <SPAMhukolauTRAP@SPAMworldnetTRAP.att.net> wrote in message news:<BmBHc.222121$Gx4.16309@bgtnsc04-news.ops.worldnet.att.net>...
> Edward G. Nilges wrote:
>
> > Nick Landsberg <SPAMhukolauTRAP@SPAMworldnetTRAP.att.net> wrote in message news:<UDFGc.204713$Gx4.171800@bgtnsc04-news.ops.worldnet.att.net>...
> >
> >>Edward G. Nilges wrote:
> >>
> >>
> >>>Nick Landsberg <SPAMhukolauTRAP@SPAMworldnetTRAP.att.net> wrote in message news:<z1cGc.195749$Gx4.117997@bgtnsc04-news.ops.worldnet.att.net>...
> >>>
> >>>
> >>>>Edward G. Nilges wrote:
> >>>>
> >>>>
> >>>>
> >>>>>Corey Murtagh <emonk@slingshot.no.uce> wrote in message news:<1088977185.694628@radsrv1.tranzpeer.net>...
> >>>>>
> >>>>
> >>>>[MUCH SNIPPAGE, just want to reply to one small point]
> >>>>
> >>>>
> >>>>
> >>>>>Damn straight and why the hell not? Entire businesses such as Wal
> >>>>>Mart, Amazon, Barnes and Noble and countless others did not exist
> >>>>>prior to the availability of powerful servers powered mostly by
> >>>>>Microsoft software, and not incompatible versions of Linux.
> >>>>>
> >>>>
> >>>>I don't know about the rest, but I do know that as of 10
> >>>>years or so ago, the Wal Mart database was hosted on a Teradata
> >>>
> >>>
> >>>So? First of all, many of these large servers are today dinosaurs.
> >>>What part of Moore's law don't you understand?
> >>
> >>So? What part of Newtonian physics don't you understand?
> >>
> >>The response time of large database systems is
> >>dominated by the rate at which data can be
> >>fetched off disk. Since disks are physical
> >>devices which are subject to things like
> >>intertia, momentum and rotational latency,
> >>they don't obey the so-called Moore's Law.
> >>Average access times for Disk I/O to a single
> >>disk have only increased by a factor of roughly
> >>5 during the last 25 years. (Certain RAID technology
> >>has improved this by another factor of about
> >>2.5, but even so, that's roughly only an order
> >>of magnitude in 25 years.) This kind of data
> >>(about the physical world in which the CPU's
> >>do their work) should be common knowledge
> >>among people who claim to be "software engineers"
> >>or "computer scientists" (IMO). "Moore's
> >>Law" is irrelevant in this case.
> >
> >
> > Conversely the fact that physical limits can be overcome by software
> > should be common knowledge.
> >
>
> At the point where I brought this up, it was to refute
> a claim that Wal Mart (among others) would not exist without
> Microsoft Server software. I related some personal
> knowledge that Wal Mart did in fact exist at least
> 10 years ago, and noted the kind of hardware and
> OS they were using at the time.
>
> The "in your face" reply I got was "What part of
> Moore's Law don't you understand?" (Which reply, by the
> way, is totally irrelevant to whether or not
> Wal Mart would exist without Microsoft SQL server
> or not.)
>
> I pointed out that Moore's Law does not apply
> to disk storage access speeds. Then replied
> in kind. Let's not escalate any more?

I apologize for what was the appearance of brusqueness. The point I
was trying to make was that hardware speeds are only part of the
story.
>
> > First of all, you did not define the size and shape of the information
> > packets being acquired. If it's a fixed block, what matters more than
> > the access time is whether that block contains all, some, or none of
> > the information desired.
> >
>
> Mostly true. The goal of the physical designer of the
> database is to maximize useful data per block. Having
> to access >1 physical disk block for the data is extremely
> bad for response times. (I will explain the "mostly"
> below.)
>
> > I do not claim that Microsoft is any better at this form of
> > organization. But when factored by the price of the software, they are
> > "better" as far as companies and their CEOs are concerned. Today,
> > these individuals and companies regard the whole issue of data
> > processing as a necessary annoyance needed merely to compete.
> >
>
> In the case of a large retail establishment, their
> "lifeblood," so to speak, is in their sales, inventory,
> accounts payable, accounts receivable, etc. data.
> The data volumes in the case of Walmart are huge
> even by today's standards. 10 or more years ago,
> it took 1,000+ disk drives just to store all that data.
> (Not counting the 1,000+ mirrored drives which were
> necessary to protect against drive failures.)
>
> In addition, the sales and inventory database has
> to support such queries as:
>
> Report the sales trends week by week
> for the last 6 months on the top and bottom
> 200 selling products ordered by product category,
> total sales, geographic region and store.
> (No, I'm not going to write the SQL for this.)

Let me try extempore without having the time to check syntax:
corrections welcome:

SELECT FIRST 6 MONTHLY_TOTAL_SALES FROM WHATEVER ORDER BY
(PRODUCT_CATEGORY, TOTAL_SALES, REGION, STORE)

SELECT LAST 6 MONTHLY_TOTAL_SALES FROM WHATEVER ORDER BY
(PRODUCT_CATEGORY, TOTAL_SALES, REGION, STORE)

WHATEVER would probably be a separate query to summarize total sales.

An irony of SQL it was supposed to be a nonprogramming language in
which wise and serene Masters of the Universe would commune directly
with the World Computer thereby consigning programmers and programming
to the ash-can.

However it is characteristic of a formal code that one makes mistakes,
and one finds among Masters of the Universe an unwillingness to make
mistakes, in an Hobbesian society in which preservation of position is
Job One; at Princeton we found that senior tenured faculty were the
least willing to use work-stations because they were afraid, in some
cases, of looking like dorks in front of mere mortals.

One may indeed define a programmer as a dork, who is unafraid of being
a dork, and the concept of dork is closely related to that of writing:
cf. Derrida, who does not use the term "dork" but does address, in Of
Grammatology, Rousseau's self-hatred and the link that had with
Rousseau's privileging of speech over writing and song over
instrumental music.

But a second irony is that around SQL a number of urban legends have
grown up in communities of scribes who have in fact been assigned the
task of using SQL.

One is the privileging of "table" over "relation" when in fact the
concept of "relation" is more general and more powerful than "table",
subject to the caveat that your queries should not result in the
continual creation and re-creation, of relations.

> And yes, that's a real query, and I was told
> third hand, that the results were used, in part,
> to establish price and inventory levels of
> cold medications during a flu epidemic somewhere
> in the midwest.
>
> In order to satisfy such a query in most
> database systems (including the ones you mention),
> one must visit a vast majority of the records in
> the sales and inventory database. Thus, the response
> time to such a query is dominated by disk access
> times and data transfer rates between the disk
> and the CPU's.
>
...Only if the system is poorly designed. The above query can be
optimized into straight line code as long as a stored procedure exists
for "whatever".

The whole purpose of effective data base design is to AVOID having to
visit a vast majority of records in cases like the above.

I won't say "what part of 'indexing' don't you understand" because you
understand indexing.

Let's assume WalMart doesn't have a single record for total sales but
instead a single table of all sales, completely normalized, at one
item for each distinct sale of each distinct item (the actual sale, of
multiple products as the shopper makes, we hope, a series of impulsive
purchases of dreck she does not need, is broken up into distinct
items).

If the designers don't put aggregate information including total sales
somewhere as the sales come in, they should be WalMart greeters and
not designers, but even if they don't, the total sales has only to be
developed once.

The single query in most cases therefore won't visit all records in
any halfway decent system.

Indeed the psychology of hardware efficiency encourages flaccid design
but even here fails to attain its goals when processes become subject
to NP completeness.

People speak in other words of large volumes of data at monopolistic
and rather greedy companies like WalMart because in modern society it
is intrinsically political to assert the possibility of mastership of
the whole...even when recent revelations show that the actual "masters
of the universe" like Ken Lay don't possess either the mastery of
detail, or the ability to organize concepts so they scale up, that the
language of lordship and bondage ascribes them.

Missing is the idea, from the theory of NP completeness, that you
can't throw raw hardware speeds at volumes but must instead organize.
The scandal becomes that the Masters of the Universe, from Ken Lay to
Don Trump to George Bush, DON'T POSSESS the skills of organization
whereas their dogsbodies do.

They master the language of organization while delegating its tasks; I
am certain that that cute blonde in The Apprentice is the real brains
of the operation (and, I think the treatment of Amoroso is racist, but
that's off-topic).

> So, to come back to your point above regarding
> the importance of whether a block contains
> all, some or none of the data necessary
> to satisfy a query, you are "mostly" correct,
> that is, correct until you run into something
> like the example query. It just so happens in this case,
> the answer is that "all blocks contain only
> some of the data."
>
Well, if I were working at WalMart, which I am not likely to after
writing the above content, above the level of greeter, I'd create a
separate aggregate table as I read individual sales.

[BTW I shop at Walmart: there is even one here in Shenzen in the
People's Republic. On my budget it makes sense DESPITE the way it
pushes out smaller players. We should keep in mind that 20 years ago,
smaller players like KMart were just as brutal as Walmart. Name of the
game is business.]
 
> Teradata (and its competition) was specifically
> designed to handle queries of this nature on
> huge databases (and in finite time). It is
> built upon a special purpose OS and using
> special purpose hardware. Beyond that
> I cannot say more because of the NDA.
>
Teradata's business model, I am certain, is making a large margin
based on vending a unique, proprietary approach based on its speed. I
am well aware that raw speed impresses CEOs and geeks alike and thus
closes sales.

At the same time, the 80% failure rate of "enterprise" systems is, it
seems, independent of the hardware vendors involved, and, it affects
companies that choose Microsoft as well as companies that don't. It
affects companies that select blindingly fast proprietary services and
companies that select Windows 2000.

As a result, when I worked a basic sales job during the writing of my
tech book (in order to free up evenings and weekends) I discovered a
number of companies that were abandoning high-end servers similar to
Teradata in favor of dorky Microsoft based on price points alone.
Their senior technicians were either dusting off their resumes or
grinning and bearing the prospect of rewriting C code in C++ for
Visual Basic.
  
> Teradata is a niche player which only
> plays in the "huge database" world. At the
> time, no *general purpose* database could handle
> the sheer volume of data. I do not know if
> any general purpose system can handle it even today
> (while providing a response in finite time).
> I have not been keeping up with whatever
> changes Teradata has made during the last
> 10 years, so I can't say where they are
> with their product today.
>
Microsoft means to exploit a window of opportunity created by Moore's
Law, and whatever holds them up doesn't appear to me to be raw speeds.
It is the abilities of the development team at the customer site.

However, the situation may change if Moore's law ceases operating,
which it may as chip design approaches physical limits.
 
> > Recommending some "open source" system du jour, running on some vanity
> > architecture, is folly in this context (hey, does MySQL support stored
> > procedures yet? Just asking.)
>
> I never mentioned anything about "open source". What
> was this comment in response to?

Sorry, just a basket of propositions that tend to be advanced without
due diligence, and the fashionability of high-end servers and open
source, as opposed to mere good practice.

>
> >
> > Moore's Law can and has overcome physical limitations. What part of
> > "virtual storage" don't you understand?
>
> Virtual storage is a fancy name for "disk."
> Systems do paging because there isn't enough memory
> to store everything *in memory*, thus something
> gets paged out to disk to make room in memory.
> Again, you find that your response times are dominated
> by disk I/O rates (this time for paging) rather than
> by CPU speed.
>
This isn't necessarily the case. In fact, if you do your job, response
times are dominated by CPU times for compute intensive tasks or
nonpaging IO for all other tasks. I think if your response times are
dominated by paging, you are in the pathological condition known as
"thrashing", and, it's time to mosey on down to CompUSA or even
Walmart and buy more memory. Take the kids.
> >
> >
> >>I don't believe
> >>
> >>>Microsoft's puffery as regards the ability of large Windows 2000
> >>>servers in all cases, but there's a grain of truth. The efficiencies
> >>>of a ten-year old Teradata are today inapplicable to the
> >>>price/performance equation.
>
> See above for a typical query. While one can imagine
> a small-to-medium server being able to do this for an individual
> store (or even for a small chain of stores), at some
> point the logistics of coordinating all this data for
> a chain as large as Wal Mart tips the scales over to
> a centralized database implementation.
>
No, I disagree. EVEN IN THE PUNCHED CARD ERA, large banks and
railroads knew how to aggregate data from individual sites into larger
and larger entities. IBM in that distant era showed how its primitive
equipment was able to create a single summary card for any volume of
data.

[This ability could be used for good or evil. During the Second World
War, mere punched card equipment organized WalMart sized masses of men
to defeat Hitler; my father's old records include both punched cards
and identification cards obviously printed on IBM tab printers. As to
evil, Edwin Black's book shows how the Nazis were empowered, by IBM,
to organize murder on a Walmart scale.]

I repeat: you don't read all the records to get the total for the
month. At the start of the month, you clear the aggregate record in
the aggregate table and update it when a sale comes in.

Indeed, the proposition "we won't worry about good praxis, instead we
shall throw money at high-end servers" is more like logistics in
particular and military planning in general during the Vietnam war and
today in Iraq, not like the Second World War, during which indvidual
logistics officers and technicians could make a real
difference...because the brass hats did not have the ability to buy
from vendors (like Bell Helicopters during Vietnam, or Halliburton in
Iraq) with an inside track.

All the data bases and high end servers at the CIA and MI-5 during
March of 2003 came up with the conclusion "Saddam has WMD and is in
the sack with Osama on a nightly basis": this has been acknowledged as
the wrong conclusion by the players involved and the nonprogramming
policy makers are busy making sure that the dogsbodies are being
blamed...for telling them, in March 2003, what they wanted to hear.

This is NOT off topic. This is because the realization of something so
humble as "I don't have to read all the records to get the total", in
context, is an encounter with truth. Senior policymakers in our
society are systematically protected from these encounters by economic
and educational inequality (Princeton has only recently addressed, for
example, the problem of "grade inflation").

Because senior policy wonks and CEOs are thus protected they tend, I
think, to make both species of mistakes: the Iraq-style boner, and the
preference for hardware over software.

> >>>
> >>>Furthermore, the data is useless if it can't be accessed, and like it
> >>>or not this is primarily through Windows.
> >>
> >>In the case of large Enterprise databases, enterprises
> >>which have /existed/ longer than Windows has, the access
> >>is mostly through Windows only in that users submit
> >>their queries via PC's and then access the resulting
> >>reports as a text file via their PC's.
> >>
> >
> > SQL Server is clearly inferior to Oracle UNTIL you look at the price
> > tags.
> >
> > Technologists fail to grok Marx's insight, from the Poverty of
> > Philosophy. In the business world, all things and all men are looked
> > through a single lens: what does it cost me and what is the final
> > cost/benefit ratio.
>
> That is absolutely true. And that's the way of business
> (and probably always has been).

Marx's whole point was that the way of business doesn't have to be the
only way.

In fact, the whole lesson of the "skunkworks" was that you needed to
give geeks the illusion of being protected from the operation in real
time of the cost-benefit ratio.

Thus at both certain Lockheed work groups, and Bell Labs (as well as
Bell Northern, where I worked) the geeks were given for brief periods
of time the illusion of security such that they did not, at all times
in a continuous, real-time fashion, have to make an auto da fe, in the
form of "justification" of their projects.

Of course, these regimes tend to be short-lived. However, today (when
because of changes in the tax code, they are nonexistent, for the most
part) they are NOT replaced by cost-benefit production regimes.
Instead, companies still are forced in many cases to look outside the
capitalist regime to "free" or "open" solutions.

>
> >
> > For this reason, all Microsoft has had to do in the recent past was
> > cut the price "just enough" to make up for the lack of technical
> > equivalence, while at the same time keeping up the pressure to improve
> > the (inferior) quality.
> >
> > Contrast Oracle's strategy which is to push its (marginally more
> > qualified) staff as hard as possible while charging what the market
> > will bear, trusting that the market will bear a premium because of
> > those network externalities that work to Oracle's favor.
>
> I have no particular love for either of these companies.
> Gates and Ellison deserve to be locked in a room
> together and forced to "enjoy each others' company"
> for the rest of their natural lives :)
>
Ellison would pull out a joint. Gates would retreat to a corner of the
room in nameless terror.

That's mean. Both are just men. I once had a dream about Gates. I'd
signed up for an Outward Bound expedition in the Olympics. When I got
to Sea-Tac, the instructors took us to a secure location and announced
that Bill Gates had joined the expedition, and did we have a problem
with that.

We said no, and then the instructors took us to Bill's Mom's house
where she asked us to be nice to him.
  
> >
> > The Microsoft game is the classic American business scramble for the
> > dollar. The Oracle game is an equally classic, if rather arrogant,
> > California game of insisting that one is intrinsically "better" and
> > that the world will throw money at one.
> >
> > But what's ignored is that neither system works at all if some clown
> > at the customer site doesn't know how to code an inner join.
>
> I have seen this happen too many times to even think about
> refuting it. What's worse is when the same clown codes a 7 way outer
> join and wonders why it's taking so long on a database
> larger than a "toy". Your definition of "toy" may vary
> from mine, tho.
>
What companies discover is that getting more raw power does NOT in
most cases, redress the clown's mess while redoing the query does.

In this particular example, however, it is possible to efficiently
perform the 7 way join. Just don't do it in seven stages: seven
different statements. Do it in one statement.

Don't "cascade" the join in a series. Use a single complex Boolean
statement. Duck soup.
 
> > ...What is
> > in reality a continuous labor process, down to the GUI presentation of
> > results is artificially reified, fetishized and broken up in to
> > magical chunks.
> >
> >
>
> [ REST SNIPPED ]



Relevant Pages

  • Re: Display data from 2 tables in 1 grid?
    ... with one table of members' names, then each sales ... add a second instance of the Names table to the Query ... Design mode, I right clicked on the form and a menu ... including scouring Help files for the various programs. ...
    (microsoft.public.vb.database)
  • Re: One query which generate multi report
    ... John, and John, have both described the issue with using a ... less-than-well-normalized data design in Access. ... > I import those Monthly data (Sales by Company) from Excel worksheet. ... your problem is not so much with the query as with the ...
    (microsoft.public.access.queries)
  • Re: How To create Dynamic Coloum names?
    ... this is not an issues of table design. ... Another way to describe the problem: Have a report with several different fields, the goal is to name the fields in the query based on run-time user input, say from a form. ... will contain sales between Jan 08 and Dec 08. ...
    (microsoft.public.access.queries)
  • Re: Multi Criteria Query
    ... You can do this in the design view of the query by ... and only the matching records from the table with the sales in it. ... > to the query the Sales field and Week Ending field from one of the tables ...
    (microsoft.public.access.queries)
  • RE: Reduce # of date columns in crosstab query
    ... "Sales by Product temp" for that product, so it won't show up no matter how I ... it from the final calculation in the crosstab query. ... the criteria out of the crosstab and into something like the original query. ... You could use the column headings to limit the number of columns; ...
    (microsoft.public.access.queries)

Loading