Merging mixed data sets

From: Dmitry Epstein (mitia.nospam_at_northwestern.edu.invalid)
Date: 03/14/04


Date: 14 Mar 2004 22:27:10 GMT

I have the following problem (not a "homework" problem - honest :)
I have two data sets consisting of ASCII strings (mostly 10-50
chars). One set is sorted, the other is not. I need to merge the
two into a sorted set while avoiding duplication, i.e. I want to
get a union of the two sets. The choice of containers (list,
vector, etc.) is up to me.

I can think of a few ways of solving the problem but I wonder what
would be the most efficient?

1. Use the STL, Luke! Probably the most straightforward approach
would be to sort the second set first, then use the set_union from
<algorithm> to merge the two. If I take this path, then what
container would be best? Sorting ought to be much more efficient
with lists than with vectors, right?

The downside of this approach is that I can't merge the two sets
in-place, without copying elements. I would've liked to use the
merge member-function of the list class, except it does not do what
I need. On the other hand, set_union will copy elements into a new
container regardless.

Hence,
2. Use STL to sort, then merge manually. With this approach I
would probably use the list container. After sorting the second
set, I would merge it into the first set using splice to avoid
copying elements. What's the best algorithm for such merging?

3. Roll your own. I am wondering if forgoing pre-sorting of the
second set would actually be more efficient? The problem then
would reduce to adding elements to a sorted set. What would be the
best way to do that? A binary tree perhaps? What would be a good
implementation of that?

Thanks,
Dmitry

-- 
Dmitry Epstein
Northwestern University, Evanston, IL.  USA
mitia(at)northwestern(dot)edu