Re: How to choose a Data-Structure for a Priority Queue ?

In article <pan.2009.>,
sunrise@xxxxxxxxxxxxxxx says...

On Tue, 10 Nov 2009 13:19:35 +0000, Ben Bacarisse wrote:

It may well be fast enough. I have little data so it is hard to say.
You say there will typically be about 10,000 elements in the queue, so
you might think you will have to look though 5,000 elements on average
to find the insertion point, but priority queues rarely work like that.
In some applications, the majority of added elements go at either the
front or the end, so you can speed things up by special casing those.
In others, you get an exponential distribution of insertion points so
insertions far down the list are rare.

Your case is odd because you can't have duplicate priorities and that
means I can't even imagine what the application is.

Now go ahead and kill me for this new requirement that I have gotten from
my boss:

"There will be duplicates, can be 1 or 10. e.g. I am adding persons
into the queue with their name, age, address and phone". Then there will
be lots of persons with one age (say 30). So you need to add all
duplicates but when u add them (duplicates) one added first will be
removed first (FIFO), except this everything will added according to the
priority and person with minimum priority will always be removed first."

So it means, I have to search 2 times now: one for whether there is any
duplicate and 2nd if duplicate is there then add it after that duplicate
(s) else add it according to the priority.

Do you have any better ideas ?

Use the SQLite database in memory mapped mode. (It's written in C).
It will handle duplicates, and any sort of strange requirements that pop
up. I guess it will be fast enough for your purposes and it will give
you a permanent store for the processed data for subsequent operations.
This embedded database is small enough to fit on phone applications.

Something is not making sense to me. We have a huge volume of data
being processed, but it seems that you have some sort of pipe or filter
you need to write. If you are discarding the data, then the exercise is
pointless, so I suggest examination of the system as a whole.

What is really wanted here? We have a stream of data to process. Where
is the data to reside, eventually? What sort of processing requirements
exist for the incoming stream, and what sort of processing requirements
exist for the data after it exits the pipe?

The way that this project is developing is very, very, very bad:
"Oh, by the way, -- it also has to stop a herd of stampeding yak..."
is my name for this model. As you are moving along, the requirements
keep changing. That is a formal recipie for disaster, virtually 100% of
the time. So I urge you:
1. Take a step back.
2. Collect the actual requirements for this project. (What are the
inputs, what are the outputs, what are the tasks that need to be done,
have the workers who are going to use this tool been interviewed for how
this thing fits into the work flow, what are the documenation needs,
3. After you have requirements (it is very clear that requirements
gathering has never taken place yet) then you can design a solution.

It is literally impossible to design a tool when the specification is
constantly moving.