Re: Removing duplicates from an array of pointers

From: Rufus V. Smith (nospam_at_nospam.com)
Date: 06/29/04


Date: Tue, 29 Jun 2004 13:25:17 GMT


"Bert" <maatjesharing@gmx.de> wrote in message
news:a65a4043.0406290307.382bd757@posting.google.com...
> I want to remove duplicate strings from an array of pointers to
> strings.
>
> Assume we have an array of pointers called "parray" of variable
> length. The pointers point to the contents of a file which is read
> into memory ("sfile"). Strings are created by replacing end of line
> characters with nuls. An array of pointers (parray) points to the
> first character of each line.
>
> Suppose sfile contains this (can be another length or other content):
> parray[0] AA
> parray[1] DDD
> parray[2] CC
> parray[3] DDD
> parray[4] EEEEEE
> parray[5] FFFF
> parray[6] DDD
>
> If viewed as a flat memory area, it will look like this:
> AA-DDD-CC-DDD-EEEEEE-FFFF-DDD- (- = '\0')
> p0.p1..p2.p3..p4.....p5...p6..
>
> The easy solution is to set the first character to NUL.
>
> That would result in this memory area:
> AA-DDD-CC--DD-EEEEEE-FFFF--DD-
> p0.p1..p2.p3..p4.....p5...p6..
>
> Two pointers (p3 and p6) now point to zero length strings. However,
> zero length strings are unusuable for later operations.
>
> What I'd like to get is this:
> AA-DDD-CC--DD-EEEEEE-FFFF--DD-
> p0.p1..p2.....p3.....p4.......
>
> The order of the pointers isn't important.
>
> I'm having trouble getting this done in real C code. Can anyone help?
>
> Thanks!
>
> Bert

As you come across duplicates, move the pointer in the last available
element into the duplicate's
position, and decrement your last element index. This reduces the pointer
count while maintaining
pointers to the unique or unchecked strings. If you wanted to maintain
order, you could slide
all the array elements down one position, but you said that wasn't a
requirement.

e.g.

for ( uniqueindex = 0; uniqueindex <= lastelement; uniqueindex++) {
    for (searchindex = uniqueindex+1 ; searchindex <= lastelement ;
searchindex++) {
       while ((searchindex <= lastelement) &&
(strcmp(parray[uniqueindex],parray[searchindex]) == 0)) {
            free(parray[searchindex]); // dispose of duplicate string, if
appropriate
            parray[searchindex] = parray[lastelement]; // bring down last
element
            parray[lastelement] = NULL; // this isn't strictly necessary,
but I like to clean up my pointers.
            lastelement--; // decrement element count
       } // while duplicate string present at searchindex
    } // comparing strings after reference string
} // for each string

Note, that as you pull an element down from the end of the array, you must
check against the reference string again, as it maybe a duplicate as well.
Hence
the while statement.

Rufus



Relevant Pages

  • Re: reading strings from file
    ... > it opens the file correctly but then the sscanf returns 0, ... those pointers need to point to something. ... strings to random locations in memory. ... in itself an array of char, so to allocate an array of strings, you need ...
    (comp.lang.c)
  • Re: A taxonomy of types
    ... however, elsewhere in my project (off in the dynamic typesystem, ...), I ... (since I am using NULL-terminated strings), and so I have used U+10FFFF ... remember, C also has things like arrays, funtion pointers, nestable ... int RIL_TypeSmallIntP; ...
    (comp.lang.misc)
  • Re: new IL: C (sort of...).
    ... only for "recent" Pascals, ... far pointers weren't really limited, ... in my compiler, I made wchar_t a builtin type (in most cases, aliased to ... I could very well include builtin "managed strings" in the new IL. ...
    (comp.lang.misc)
  • Re: HeapFree() Failing to deallocate string
    ... I've been able to recreate and isolate the problem with HeapFree(), ... in this simplified example is just a pointer to 16 bytes to store pointers ... the szCaption strings need to be copied to the pointed to ... strings for which storage needs to be allocated, ...
    (microsoft.public.windowsce.embedded)
  • Re: Increasing efficiency in C
    ... >> The representation of a string in C is the sequence of characters, ... strings, they are passed the addresses of strings. ... supports pointers the way it does. ... Competent programmers make mistakes, too. ...
    (comp.lang.c)