Re: Merging mixed data sets

From: Siemel Naran (SiemelNaran_at_REMOVE.att.net)
Date: 03/16/04


Date: Tue, 16 Mar 2004 17:45:16 GMT


"Dmitry Epstein" <mitia.nospam@northwestern.edu.invalid> wrote in message
> "Siemel Naran" <SiemelNaran@REMOVE.att.net> wrote in

> > Not necessarily. The list::sort swaps the previous and next
> > pointers of the node, but not the element in the node, for a
> > total of 2 swap operations. But std::sort will call the
> > specialized std::swap(std::basic_string&, std::basic_string&)
> > which will swap the data members, usually 3 for begin and size
> > and capacity, for a total of 3 swap operations, and the length
> > of the string does not matter. Which suggests list::sort is
> > faster. Though the optimizer may use one memswp operation,
> > though the list::sort method may swap 8 bytes, and the
> > std::swap method may swap 12 bytes. Furthermore, iterating
> > across a list is more complex. It could depend on the
> > architecture. Worth an experiment on your particular
> > platform.
>
> Thank you for explaning the swapping thing - I wasn't aware of
> that. So, sorting a vector of containers may not be as slow as I
> thought it would be? As for sorting lists, IIRC the quicksort
> algorithm is not strongly dependant on random element access
> (correct me if I am wrong).

Depends. I don't know enough low level so say which method is faster, or if
random access is faster. So we can only experiment.

> > Are you sure list::merge does not work? If you sort the
> > container which isn't sorted, then both containers are sorted,
> > and list::merge should work, right? You can use list::unique
> > to eliminate duplicates.
>
> Yes, but how expensive would that be?

O(N*lg(N)) to sort, O(N) to eliminate duplicates.

> > The running time of the above algorithm is O(N*lg(N)) for
> > constructing merged from sorted of N elements, O(M*lg(M)) for
> > adding unsorted of M elements to merged. Right? Assuming M =
> > N, the running time of the whole algorithm is O(N*lg(N)).
>
> I think it's O(M*lg(N)) in the second case, otherwise see my reply
> to Mike.

Right.

> > typedef std::vector<std::string>::iterator VectorIter;
> > VectorIter vectoriter = sorted.begin();
> > const VectorIter vectorend = sorted.end();
> > std::set<std::string> merged;
> > typedef std::set<std::string>::iterator SetIter;
> > SetIter hint = merged.end();
> > for ( ; vectoriter!=vectorend; ++vectoriter) {
> > hint = merged.insert(hint, vectoriter);
> > }
>
> I understand your motivation but I don't understand how this works.
> Why is merged.end() a good hint?

For the initial element in hint is merged.end(). You have to initialize
hint to something meaningful, and it doesn't matter what for an empty
container.

> > In any case, for adding M elements from unsorted to merged, I
> > can only think of a running time of O(M*lg(M)). So if M = N,
> > the running time of the whole algorithm is still O(N*lg(N)).
> > Can we make it linear, without the use of hashtables, bucket
> > sort, radix sort, etc?
>
> I doubt that you can make it faster than O(M*log(M)).

Can you prove this?

--
+++++++++++
Siemel Naran