Re: how to replace a substring in a string using C?
- From: "Mark F. Haigh" <mfhaigh@xxxxxxxxxxxxx>
- Date: 8 Nov 2005 05:20:04 -0800
Netocrat wrote:
> On Sun, 30 Oct 2005 19:45:30 +0000, Mike Wahler wrote:
> > "Paul" <paul.pettus@xxxxxxxxx> wrote in message
> > news:1130687985.367948.20180@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> >> hi, there,
> >>
> >> for example,
> >>
> >> char *mystr="##this is##a examp#le";
> >>
> >> I want to replace all the "##" in mystr with "****". How can I do this?
>
> Your code looked fine Mike but I gathered the OP wanted to deal more
> generally with string rather than character replacement, so here's an
> extended version.
There are a couple of problems here. First, this code will malfunction
on many 16 bit platforms if you pass it strings with length between 32K
and 64K (ie INT_MAX < length < SIZE_MAX). Your signed 'i' variable
will overflow (undefined behavior), and most likely wrap to -32768,
which would cause the subscripting to access memory it should not be
meddling with.
Second, this code performs poorly even though it may look at a glance
to be somewhat optimized. The first red flag is the extensive use of
subscripting. The second red flag is the unusual modification of a
loop variable ('i') within a loop, even though it is also modified by
the loop ('i++'). The third is the unusual use of an equality operator
(==) with strstr(), and the fourth is the character-by-character
parsing (which can be quite slow).
>
> Output:
>
> ##this is##a examp#le
> ****this is****a examp#le
>
> Code:
>
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
>
> char *replace(const char *s, const char *old, const char *new)
> {
> char *ret;
> int i, count = 0;
> size_t newlen = strlen(new);
> size_t oldlen = strlen(old);
>
> for (i = 0; s[i] != '\0'; i++) {
> if (strstr(&s[i], old) == &s[i]) {
> count++;
> i += oldlen - 1;
> }
> }
Note that strstr() can be called for nearly every character.
>
> ret = malloc(i + count * (newlen - oldlen));
> if (ret == NULL)
> exit(EXIT_FAILURE);
As other posters have noted, just return NULL to the caller.
>
> i = 0;
> while (*s) {
> if (strstr(s, old) == s) {
> strcpy(&ret[i], new);
> i += newlen;
> s += oldlen;
> } else
> ret[i++] = *s++;
> }
> ret[i] = '\0';
Again, strstr() can be called for nearly every character.
>
> return ret;
> }
>
<snip>
To give an idea of the impact of string parsing missteps can make on
program performance, I fed "Much Ado About Nothing" (the official play
of comp.lang.c) through your replace function, then through mine. The
entire file (121K) was loaded into a buffer, then fed 10 times to the
replace function as such:
/* ... */
fread(str, 1, size, fp);
str[size] = '\0';
for(i = 0; i < 10; i++) {
newstr = replace(str, "DON PEDRO", "NETOCRAT");
free(newstr);
}
/* ... */
The somewhat naieve implementation I whipped up quickly is:
char *replace(const char *str, const char *old, const char *new)
{
char *ret, *r;
const char *p, *q;
size_t len_str = strlen(str);
size_t len_old = strlen(old);
size_t len_new = strlen(new);
size_t count;
for(count = 0, p = str; (p = strstr(p, old)); p += len_old)
count++;
ret = malloc(count * (len_new - len_old) + len_str + 1);
if(!ret)
return NULL;
for(r = ret, p = str; (q = strstr(p, old)); p = q + len_old) {
count = q - p;
memcpy(r, p, count);
r += count;
strcpy(r, new);
r += len_new;
}
strcpy(r, p);
return ret;
}
Here are the timings on my machine. Both were compiled with the same
flags on gcc 4 (-Wall -O2 -ansi -pedantic)
[mark@icepick ~]$ time ./netocrat
real 0m39.901s
user 0m39.877s
sys 0m0.008s
[mark@icepick ~]$ time ./mark
real 0m0.033s
user 0m0.026s
sys 0m0.007s
Even though the somewhat naieve program traverses the input string
multiple times, it's still a 99.9% execution time decrease.
Mark F. Haigh
mfhaigh@xxxxxxxxxxxxx
.
- Follow-Ups:
- Re: how to replace a substring in a string using C?
- From: Netocrat
- Re: how to replace a substring in a string using C?
- Prev by Date: Re: memset to initialize double array?
- Next by Date: Re: Doubt in Question From C Faq
- Previous by thread: Re: how to replace a substring in a string using C?
- Next by thread: Re: how to replace a substring in a string using C?
- Index(es):
Relevant Pages
|
|