Re: Best way to do so.
From: Dan Stumpus (dstumpus_NOSP_at_mindspring.com)
Date: Sat, 15 Jan 2005 23:34:48 GMT
"Bob Bedford" <bedford1@YouKnowWhatToDoHerehotmail.com> wrote
> First, thanks for replying.
Too much time on my hands...
> I'll explain differently what I'm trying to do.
> I've to store all possible data for informing the companies that signed
> with our website what the people choice are.
> Actually I took example of the shoes, for simplify. The real items in the
> site are used cars.
> If they are 80% of the customers that are looking for Chevrolet, then the
> car dealer may buy Chevrolet instead of BMW for their customers. If they
> know that 90% are looking for a car between 5000 and 8000$, they would try
> to provide this kind of car. So, everytime somebody make a search, I may
> log what he was looking for, then build statistics for the car dealers.
> Since not all fields are compulsory, a customer may chose the price, an
> other the color, the other one just the make. I want to optimize it as
> much as possible but keeping it flexible for doing any kind of statistics.
It's a design mistake to try and optimize storage. It limits your choices
down the line. When I was younger, I did this several times, and it was
always a mistake in the long run.
Log the entire choice row. Then you can cook it down and digest it if you
need to for further analysis. 50,000 rows a month isn't much of a problem
on a fast box with fast disks. Keeping *all* fields in the log will give
you maximum flexibility and ease in conducting ad-hoc querying etc.
>> For a log, just record the raw event.
> When you say I've to log the row event, I don't know what you mean. My
> form has many fields, and I must save only the used ones, not the ones
> left empty.
Is someone looking over your shoulder and saying "you cannot save a field in
a row if it may be blank x% of the time?" If you want to design
heirarchical schema with "search" at the top, and many other types of
zero-to-1 entities beneath it (eg, mfg, color, miles, cost, upholstery,
etc), have at it. This will save storage but slow down querying (cause
you'll need lots of joining), and make querying always be a multiple table
join (a pain, if you do a lot of them). De-normalization (eg, one row per
search event) is used all the time where data mining needs to be quick and
> I'll use the part ID, but how to store any price if they are filled by the
> customer itself ? I don't provide any preset value for price.
Leave them null, or default to zero, your pick. Nulls are sometimes a bit
of a pain to deal with syntactically, as function arguments, etc.
>> I'd also add timestamp, source, and session_id fields to each row.
> What you mean for "source" to each row ?
Originating url or IP address, just in case someone is messing with you...
> Why should I store session_id ? this will let me know if the same person
> looked at different articles, but since they are too many search, it would
> be hard to get an interesting result.
If you're in the design mentality, you don't prejudge what may or may not be
> Maybe I may say: people looking for a BMW are also looking for an AUDI.
Shopping behavior is very interesting to marketing types. You may get clues
for suggestive selling from analyzing search habits.
>> Don't design to minimize physical storage. Design for maximum
> My problem is saving "time". I mean if my search makes 5 seconds because
> of bad DB design or other, it may be good for trashbin. I want maximum
> speed and flexibility.
Waiting a few seconds for a research query is perfectly ok. In my
experience (designing fairly large systems for 25 yrs) sub-second response
times are important for high volume things like customer activity, order
entry, and the like. Research (which is what your search history table is
for) involves people thinking about what they want, looking at the results,
and thinking some more. Educate your customer that processing historical
data takes a few seconds.
> The concurrent get 7 millions/month. I won't achieve this for a long time,
> but as what I see, I've about 50'000 search/month, and I don't store them
A few hundred thousand rows in a table or database is not much of a
challenge for a modern database. If you're worried, create a table with a
couple hundred k rows and run some queries against it.
> 1) store a table with every different search and an other where I store
> the searchID, the timestamp and the session_ID.
> But I've to check if the record already exist at every new search.
> 2) store everything in the same table, but a lot of same records will
> exist in the table.
IMO, you're over-thinking this one. Just store a row per search and be done
> Flexibility wouldn't be affected by such choices, only speed and
> diskspace. So what's the best choice ?
Simplicity. One row per search event.
> Does it exist a way to store a record in a table, and if it already exist,
> retrieve the ID ? That would be the faster way I think
Every insertion will be slower that way. If you want user entry speed
maximized, this isn't the way to go.
Just the thoughts of an old designer....