Re: taking python enterprise level?...



On Wed, 2010-03-03 at 17:26 +0100, mk wrote:

So there *may* be some evidence that joins are indeed bad in
practice.
If someone has smth specific/interesting on the subject, please post.

I have found joins to cause problems in a few cases - I'm talking about
relatively large tables though - roughly order 10^8 rows.

I'm on Mysql normally, but that shouldn't make any difference - I've
seen almost the same situation on Oracle

consider this simple example:

/* Table A */
CREATE TABLE TableA (
project_id BIGINT NOT NULL,
cost INT,
date DATETIME,
PRIMARY KEY (project_id, date)
);

/* Table projects */
CREATE TABLE projects (
client_id BIGINT NOT NULL,
project_id BIGINT NOT NULL,
INDEX(client_id)
);


.... now the index on TableA has been optimised for queries against date
ranges on specific project ids which should more or less be sequential
(under a load of other assumptions) - but that reduces the efficiency of
the query under a join with the table "projects".

If you denormalise the table, and update the first index to be on
(client_id, project_id, date) it can end up running far more quickly -
assuming you can access the first mapping anyway - so you're still
storing the first table, with stored procedures to ensure you still have
correct data in all tables.

I'm definitely glossing over the details - but I've definitely got
situations where I've had to choose denormalisation over purity of data.


Rolled-up data tables are other situations - where you know half your
queries are grouping by field "A" it's sometimes a requirement to store
that.


Tim



.