Re: my search and replace function

From: Eric Sosman (Eric.Sosman_at_sun.com)
Date: 06/21/04


Date: Mon, 21 Jun 2004 11:18:22 -0400

pembed2003 wrote:
> Eric Sosman <Eric.Sosman@sun.com> wrote in message news:<40D33B1E.10703@sun.com>...
>
>>pembed2003 wrote:
>>
>>>Hi all,
>>>I need to write a function to search and replace part of a char*
>>>passed in to the function. I came up with the following:
>>>[code snipped; see up-thread]
>>>
>>>later I can then use it like:
>>>
>>>char source[] = "1234?5678?9";
>>>char search = '?';
>>>char* replace = "***";
>>>
>>>char* result = search_and_replace(source,search,replace);
>>>free(result);
>>>
>>>and I will end up result being "1234***5678***9"
>>
>> You will? That's not the result I'd expect to get.
>>Have you actually tried running this?
>>
>>
>>>Other than the lack of some error checkings, the above function seems
>>>to work ok but looks like it's doing alot of work. Can you think of
>>>anyway to improve the speed of the function?
>>
>> It does not "work ok." Full stop. Any attempt to
>>improve the speed simply arrives at the wrong answer sooner.
>>If that's your goal, the function can be made *much* faster.
>
>
> Hi Eric,
> Thanks for your comment and yes it does NOT work. Sorry about the
> false statement. I have made some change to it and the function now
> become:

     I haven't tested it, but it looks like you've corrected
the obvious error in the first version.

     The C language Standard says nothing about speed, and
different implementations on different machines will behave --
well, differently. The suggestions below are not guaranteed
to improve the speed, but might be worth testing. There are
also a few stylistic and "good practices" suggestions not
related to possible speed improvements.

> char* search_and_replace(char* source,char search,char* replace){

     The function does not modify either of the strings pointed
to by the first and third arguments, so I'd suggest declaring
them as `const char *source' and `const char *replace'.

> char* result;
>
> size_t l = strlen(source), r = strlen(replace), i, k;
>
> int number_of_replaces = 0;
>
> for(i = 0; i < l; i++){
> if(source[i] == search)
> number_of_replaces++;
> }

     Another way to search for the replacement character
would be to use the strchr() function in a loop, more
or less like this:

        const char *p;
        for (p = source; (p = strchr(p, search)) != NULL; ++p)
            ++number_of_replaces;

     This might or might not be faster. At a guess, the
greatest speed advantage (if any) would probably occur
with very long `source' strings containing very few
`search' characters.

> result = malloc((l - number_of_replaces) +
> number_of_replaces * r + 1);

     malloc() can return NULL if it fails to find enough
memory, and you should check for this possibility. As
the code stands, you've got a bug waiting to happen.

> i = 0; k = 0;
>
> while(k < l){
> if(source[k] == search){
> int j;
> for(j = 0; j < r; j++){
> result[i++] = replace[j];
> }

     Instead of copying the `replace' string one character
at a time, it might be faster to use memcpy():

        memcpy(result + i, replace, r);
        i += r;

     At a guess, this would probably not improve matters
for a short `replace' string but might be faster if `replace'
is fairly long.

> }else{
> result[i++] = source[k];
> }
> k++;
> }

     The outer loop could be rewritten to use strchr(), as
above but with a little extra bookkeeping to remember how
many non-`search' characters need copying. And the copying
itself could be done with memcpy() instead of one character
at a time. If there's any speed advantage to doing this,
I'd expect it to occur under much the same conditions as
before: a long `source' with few appearances of `search'.

> result[i] = 0;
> return result;
> }
>
> Can you tell me how I can improve the function in terms of speed?

     Before you go running off and start coding I'd like to
draw your attention to a lesson that this experience should
already have taught you and to another lesson long experience
has taught others.

     First, consider whether your "need for speed" is a Good
Thing or a Bad Thing. The salient fact here is that the
first version of your code was flat-out wrong, R-O-N-G, wrong.
Yet your enthusiasm for greater speed completely blinded you
to this little matter ... On balance, one would have to say
that your devotion to what Henry Spencer calls The Little Tin
God is *not* a Good Thing, at least not at this point in your
development. I recommend some self-examination.

     Second, it has been observed -- it has even been tested
under laboratory conditions -- that even very good programmers
are startlingly bad at predicting which parts of their programs
will consume the most time. Decades of experience on thousands
of programming projects have come up with an outline of how to
get the best performance:

     0. Make sure you're writing the right program. For
         example, do you really want a function that replaces
         single characters with strings, or would it be more
         useful to replace strings with strings?

     1. Choose appropriate algorithms. If you must search a
         table of five or six entries, just loop through an
         array -- but if the table has five or six thousand
         entries you'd better use another approach entirely.

     2. Write the program, implementing the chosen algorithms.
         The focus here should be on simplicity, correctness,
         and robustness. If the word "speed" so much as pops
         into your head, go take a nap.

     3. Debug the program. I've been programming professionally
         for three decades, yet I nearly always have at least
         one bug in any program longer than about a thousand
         lines of new-minted code. If you're ten times as good
         as I am, you will still write bugs and need to fix them.
         Fix them.

     4. *Measure* the performance of your program. If the
         performance is adequate, *stop*. "If it ain't broke,
         don't fix it." You have better things to do than gild
         lilies.

     5. *If* the performance is inadequate, use a profiler or
         other tools to *measure* where the time is going. You
         may study a piece of code and find that optimization
         could eliminate ninety percent of its running time --
         but if it only consumes ten microseconds per run of the
         program, who cares? When you've identified a piece of
         slow code, ask yourself this: "If I could eliminate *all*
         the time consumed by this fragment, how fast would the
         program run?" If the answer is "Still not fast enough,"
         you need to look elsewhere.

     6. Optimize the code sections whose unacceptable performance
         you have *measured*, and then *measure* the improvement.
         You may be surprised: Sometimes an "optimization" makes
         things worse -- but if you fail to measure both before
         and after, you'll never know.

     7. Immediately after optimizing, go all the way back to
         Step Three. Herein fail not. I have seen plenty of
         working programs broken by "optimization;" I have
         broken more than a few myself.

     Donald Knuth wrote that "Premature optimization is the
root of all evil." Michael Jackson expressed a similar idea
in the form of a pair of Laws:

         First Law Of Program Optimization:
         Don't do it.

         Second Law Of Program Optimization (for experts only):
         Don't do it yet.

     Ponder the message, and ignore it at your peril.

-- 
Eric.Sosman@sun.com


Relevant Pages

  • Re: I need help please!
    ... logic going here...the above is the "joke troll", ... buffers for standard input on some systems (the buffers, ... So substitute "unconsumed characters input by the user" for "buffers" and it still applies. ... My experience is also that it is wrong for the vast majority of professional programmers and a lot of very unprofessional programmers. ...
    (comp.lang.c)
  • Re: A good console font
    ... Roger Leigh wrote: ... >>stuff that programmers might find useful? ... Actually ten fonts. ... as long as you only need 512 characters. ...
    (comp.os.linux.misc)
  • Re: Help Constructing Fictional Cross-Religious Movement
    ... >>> Personal taste, but I'd rather see Spock contrasted with McCoy (or ... > book whose main point is contrasting him in chapter N with him in chapter ... it's because each book has the *same* painful lesson. ... > as OST did that, and the characters kept their rough edges and friction, ...
    (rec.arts.sf.composition)
  • Re: Help Constructing Fictional Cross-Religious Movement
    ... > story doesn't bother with characters changing, ... I don't mind characters who make *similar* mistakes to the ones they made ... > lesson but doesn't. ... but are difficult to use as protagonists. ...
    (rec.arts.sf.composition)
  • Re: "Kanji in Context" - Chinese equivalent?
    ... are in my opinion quite good for learning hanzi by exposure. ... character compounds created from these characters and characters which ... (From the Preface to DeFrancis' Advanced Chinese Reader) ... "All compounds occur at least twice in the lesson in which they are ...
    (sci.lang)

Loading