# Re: Merging mixed data sets

**From:** Siemel Naran (*SiemelNaran_at_REMOVE.att.net*)

**Date:** 03/16/04

**Next message:**Claudio Puviani: "Re: C# vs. C++ Calling the overridden base class members from derived classes."**Previous message:**Ali Cehreli: "Re: collection class with block allocation"**In reply to:**Dmitry Epstein: "Re: Merging mixed data sets"**Next in thread:**Claudio Puviani: "Re: Merging mixed data sets"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

Date: Tue, 16 Mar 2004 17:45:16 GMT

"Dmitry Epstein" <mitia.nospam@northwestern.edu.invalid> wrote in message

*> "Siemel Naran" <SiemelNaran@REMOVE.att.net> wrote in
*

*> > Not necessarily. The list::sort swaps the previous and next
*

*> > pointers of the node, but not the element in the node, for a
*

*> > total of 2 swap operations. But std::sort will call the
*

*> > specialized std::swap(std::basic_string&, std::basic_string&)
*

*> > which will swap the data members, usually 3 for begin and size
*

*> > and capacity, for a total of 3 swap operations, and the length
*

*> > of the string does not matter. Which suggests list::sort is
*

*> > faster. Though the optimizer may use one memswp operation,
*

*> > though the list::sort method may swap 8 bytes, and the
*

*> > std::swap method may swap 12 bytes. Furthermore, iterating
*

*> > across a list is more complex. It could depend on the
*

*> > architecture. Worth an experiment on your particular
*

*> > platform.
*

*>
*

*> Thank you for explaning the swapping thing - I wasn't aware of
*

*> that. So, sorting a vector of containers may not be as slow as I
*

*> thought it would be? As for sorting lists, IIRC the quicksort
*

*> algorithm is not strongly dependant on random element access
*

*> (correct me if I am wrong).
*

Depends. I don't know enough low level so say which method is faster, or if

random access is faster. So we can only experiment.

*> > Are you sure list::merge does not work? If you sort the
*

*> > container which isn't sorted, then both containers are sorted,
*

*> > and list::merge should work, right? You can use list::unique
*

*> > to eliminate duplicates.
*

*>
*

*> Yes, but how expensive would that be?
*

O(N*lg(N)) to sort, O(N) to eliminate duplicates.

*> > The running time of the above algorithm is O(N*lg(N)) for
*

*> > constructing merged from sorted of N elements, O(M*lg(M)) for
*

*> > adding unsorted of M elements to merged. Right? Assuming M =
*

*> > N, the running time of the whole algorithm is O(N*lg(N)).
*

*>
*

*> I think it's O(M*lg(N)) in the second case, otherwise see my reply
*

*> to Mike.
*

Right.

*> > typedef std::vector<std::string>::iterator VectorIter;
*

*> > VectorIter vectoriter = sorted.begin();
*

*> > const VectorIter vectorend = sorted.end();
*

*> > std::set<std::string> merged;
*

*> > typedef std::set<std::string>::iterator SetIter;
*

*> > SetIter hint = merged.end();
*

*> > for ( ; vectoriter!=vectorend; ++vectoriter) {
*

*> > hint = merged.insert(hint, vectoriter);
*

*> > }
*

*>
*

*> I understand your motivation but I don't understand how this works.
*

*> Why is merged.end() a good hint?
*

For the initial element in hint is merged.end(). You have to initialize

hint to something meaningful, and it doesn't matter what for an empty

container.

*> > In any case, for adding M elements from unsorted to merged, I
*

*> > can only think of a running time of O(M*lg(M)). So if M = N,
*

*> > the running time of the whole algorithm is still O(N*lg(N)).
*

*> > Can we make it linear, without the use of hashtables, bucket
*

*> > sort, radix sort, etc?
*

*>
*

*> I doubt that you can make it faster than O(M*log(M)).
*

Can you prove this?

-- +++++++++++ Siemel Naran

**Next message:**Claudio Puviani: "Re: C# vs. C++ Calling the overridden base class members from derived classes."**Previous message:**Ali Cehreli: "Re: collection class with block allocation"**In reply to:**Dmitry Epstein: "Re: Merging mixed data sets"**Next in thread:**Claudio Puviani: "Re: Merging mixed data sets"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]