Merging mixed data sets

From: Dmitry Epstein (mitia.nospam_at_northwestern.edu.invalid)
Date: 03/14/04


Date: 14 Mar 2004 22:27:10 GMT

I have the following problem (not a "homework" problem - honest :)
I have two data sets consisting of ASCII strings (mostly 10-50
chars). One set is sorted, the other is not. I need to merge the
two into a sorted set while avoiding duplication, i.e. I want to
get a union of the two sets. The choice of containers (list,
vector, etc.) is up to me.

I can think of a few ways of solving the problem but I wonder what
would be the most efficient?

1. Use the STL, Luke! Probably the most straightforward approach
would be to sort the second set first, then use the set_union from
<algorithm> to merge the two. If I take this path, then what
container would be best? Sorting ought to be much more efficient
with lists than with vectors, right?

The downside of this approach is that I can't merge the two sets
in-place, without copying elements. I would've liked to use the
merge member-function of the list class, except it does not do what
I need. On the other hand, set_union will copy elements into a new
container regardless.

Hence,
2. Use STL to sort, then merge manually. With this approach I
would probably use the list container. After sorting the second
set, I would merge it into the first set using splice to avoid
copying elements. What's the best algorithm for such merging?

3. Roll your own. I am wondering if forgoing pre-sorting of the
second set would actually be more efficient? The problem then
would reduce to adding elements to a sorted set. What would be the
best way to do that? A binary tree perhaps? What would be a good
implementation of that?

Thanks,
Dmitry

-- 
Dmitry Epstein
Northwestern University, Evanston, IL.  USA
mitia(at)northwestern(dot)edu


Relevant Pages

  • Re: How come Ada isnt more popular?
    ... are praising is, because what about trees of strings, trees of lists etc. ... A language with a Hinldey-Milner type system ... container varies, the element does not. ...
    (comp.lang.ada)
  • Re: Standard containers (Was: Wiki by the committee closed?)
    ... Since a fixed size is certainly enough for an iterator, ... bad pointers, overflows, and maybe even CONTAINER_ERROR_INDEX), why not ... be reported if a container is accessed out of bounds. ... OUR list elements never change when you add something to our lists. ...
    (comp.std.c)
  • Re: c[:]()
    ... pointer-to-PyObject (so are lists, BTW). ... [Please quit saying "a container" if you mean lists and tuples. ... doesn't break the language definition. ... The suggestion that containers broadcast a "call" operation ...
    (comp.lang.python)
  • Re: [Lxc-users] [systemd-devel] Unable to run systemd in an LXC / cgroup container.
    ... I know it's not always good to cross post between multiple lists ... problem down to some systemd internals... ... things for /dev and /dev/pts for the container to run properly and this ... on host: ...
    (Fedora)
  • Re: Why arent you upgrading?
    ... so I am forever either typecasting or creating custom TList ... desirable for lists of other types of data. ... If the polygon is the only thing that deals with lines, ... Iterators and container operations (addition, subtraction, union, ...
    (borland.public.delphi.non-technical)