Re: how to replace a substring in a string using C?



Netocrat wrote:
> On Sun, 30 Oct 2005 19:45:30 +0000, Mike Wahler wrote:
> > "Paul" <paul.pettus@xxxxxxxxx> wrote in message
> > news:1130687985.367948.20180@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> >> hi, there,
> >>
> >> for example,
> >>
> >> char *mystr="##this is##a examp#le";
> >>
> >> I want to replace all the "##" in mystr with "****". How can I do this?
>
> Your code looked fine Mike but I gathered the OP wanted to deal more
> generally with string rather than character replacement, so here's an
> extended version.

There are a couple of problems here. First, this code will malfunction
on many 16 bit platforms if you pass it strings with length between 32K
and 64K (ie INT_MAX < length < SIZE_MAX). Your signed 'i' variable
will overflow (undefined behavior), and most likely wrap to -32768,
which would cause the subscripting to access memory it should not be
meddling with.

Second, this code performs poorly even though it may look at a glance
to be somewhat optimized. The first red flag is the extensive use of
subscripting. The second red flag is the unusual modification of a
loop variable ('i') within a loop, even though it is also modified by
the loop ('i++'). The third is the unusual use of an equality operator
(==) with strstr(), and the fourth is the character-by-character
parsing (which can be quite slow).

>
> Output:
>
> ##this is##a examp#le
> ****this is****a examp#le
>
> Code:
>
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
>
> char *replace(const char *s, const char *old, const char *new)
> {
> char *ret;
> int i, count = 0;
> size_t newlen = strlen(new);
> size_t oldlen = strlen(old);
>
> for (i = 0; s[i] != '\0'; i++) {
> if (strstr(&s[i], old) == &s[i]) {
> count++;
> i += oldlen - 1;
> }
> }

Note that strstr() can be called for nearly every character.

>
> ret = malloc(i + count * (newlen - oldlen));
> if (ret == NULL)
> exit(EXIT_FAILURE);

As other posters have noted, just return NULL to the caller.

>
> i = 0;
> while (*s) {
> if (strstr(s, old) == s) {
> strcpy(&ret[i], new);
> i += newlen;
> s += oldlen;
> } else
> ret[i++] = *s++;
> }
> ret[i] = '\0';

Again, strstr() can be called for nearly every character.

>
> return ret;
> }
>
<snip>

To give an idea of the impact of string parsing missteps can make on
program performance, I fed "Much Ado About Nothing" (the official play
of comp.lang.c) through your replace function, then through mine. The
entire file (121K) was loaded into a buffer, then fed 10 times to the
replace function as such:

/* ... */
fread(str, 1, size, fp);
str[size] = '\0';
for(i = 0; i < 10; i++) {
newstr = replace(str, "DON PEDRO", "NETOCRAT");
free(newstr);
}
/* ... */

The somewhat naieve implementation I whipped up quickly is:

char *replace(const char *str, const char *old, const char *new)
{
char *ret, *r;
const char *p, *q;
size_t len_str = strlen(str);
size_t len_old = strlen(old);
size_t len_new = strlen(new);
size_t count;

for(count = 0, p = str; (p = strstr(p, old)); p += len_old)
count++;

ret = malloc(count * (len_new - len_old) + len_str + 1);
if(!ret)
return NULL;

for(r = ret, p = str; (q = strstr(p, old)); p = q + len_old) {
count = q - p;
memcpy(r, p, count);
r += count;
strcpy(r, new);
r += len_new;
}
strcpy(r, p);
return ret;
}

Here are the timings on my machine. Both were compiled with the same
flags on gcc 4 (-Wall -O2 -ansi -pedantic)

[mark@icepick ~]$ time ./netocrat

real 0m39.901s
user 0m39.877s
sys 0m0.008s

[mark@icepick ~]$ time ./mark

real 0m0.033s
user 0m0.026s
sys 0m0.007s

Even though the somewhat naieve program traverses the input string
multiple times, it's still a 99.9% execution time decrease.


Mark F. Haigh
mfhaigh@xxxxxxxxxxxxx

.



Relevant Pages

  • Re: why I can not write to the file after initialize the MFC in a service program
    ... you don't use char, an obsolete data type ... Why do you need an intermedate buffer to write literal strings anyway? ... For example, if AfxWinInit fails, you copy a 45-character string into a ... So you are going to try to initialize MFC EACH TIME THROUGH THE LOOP? ...
    (microsoft.public.vc.mfc)
  • Re: why I can not write to the file after initialize the MFC in a service program
    ... you don't use char, an obsolete data type ... Why do you need an intermedate buffer to write literal strings anyway? ... For example, if AfxWinInit fails, you copy a 45-character string into a ... So you are going to try to initialize MFC EACH TIME THROUGH THE LOOP? ...
    (microsoft.public.vc.mfc)
  • Re: K&R2, exercise 5.4
    ... int strend(char*, char*); ... Now ps points to the first character of a string which is one character ... * don't want to FAKE the array call ... outer for loop checks for each element of 1st array. ...
    (comp.lang.c)
  • Re: Parsing between a character and sysmbol
    ... I did say was that your statement of avoiding arrays inside loops whenever ... creating new String objects. ... This is what I got inside its main loop (I'm not ... the "-" char into an array of chars. ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Parsing between a character and sysmbol
    ... This is what I got inside its main loop (I'm not ... the "-" char into an array of chars. ... IL_004c: call string ToString ...
    (microsoft.public.dotnet.languages.vb)