Re: Quandry with the following C code (Intermediate)

From: Richard Bos (rlb_at_hoekstra-uitgeverij.nl)
Date: 01/13/05


Date: Thu, 13 Jan 2005 14:48:43 GMT


"BMarsh" <b.marsh@gmx.net> wrote:

> The following code will compare and old string to a new one, bombing
> out if 'max' similar chars is exceeded.

It doesn't do a compare the usual way. That is, it does something
completely different from strcmp().

(Oh, btw, if you insist on posting through Google-Broken-Beta, it would
be a good thing if you could get it not to strip all indentation. Your
code is hard to read this way.)

> static
> int compare(unsigned char *old, unsigned char *new, int max)
> {
> unsigned char in_old[256];

First of all, you need to use UCHAR_MAX here, instead of 256. If you
don't, you may try to run this code on a Unicode system some day, and be
surprised when your function scribbles all over memory when you pass it
a string with Unicode characters over 256 in it.

> int equal = 0;
>
> (void)memset(in_old, 0, sizeof (in_old));

Lose the cast. It does no good, and clutters up the code.

> while (*old)
> in_old[*(old++)]++;

This tallies the number of occurrences of each separate character value
in the first string. There's a bug in it: what happens if you pass it a
string of UCHAR_MAX 'a's?

> while (*new) {
> if (in_old[*new])
> equal++;
> new++;

(See what I mean about the indentation?)

This checks each character in the second string, and if there were any
of the same character at all in the first string, counts it as "equal".

> }
>
> if (equal > max)
> return (1);
>
> return (0);

If the number of "equal" characters, that is, the number of chars in the
second string of which there was at least one in the first string,
exceeds the passed-in maximum, return 1, else 0. This could be more
easily written as

  return (equal>max);

> I fail to see how the 2 strings are compared for character equality,

So do I; they're not.

Note, in particular, the different treatment of "old" and "new".

For example, try to explain the discrepancy between

  compare("abc", "dbbbe", 2)

and

  compare("dbbbe", "abc", 2)

Then, when you want an exercise I can't solve, try to explain _why_
someone would write a function like that, and then call it, sec,
"compare". The logic escapes me, I'm afraid. It's reasonably clear to me
_what_ this function does, but not why.

> especially in how the
>
> in_old[*(old++)]++;

The index entry corresponding to the character at the _current_ value of
old is increased (that is, the character now under the old pointer is
tallied); and old is moved to the next character. Not necessarily in
that order, or in any order at all, but since (old++) returns the old
value of old (so to speak) no matter which order is chosen, it doesn't
matter for the result.

Richard



Relevant Pages

  • [TOMOYO #15 3/8] Common functions for TOMOYO Linux.
    ... This file contains common functions (e.g. policy I/O, pattern matching). ... Since TOMOYO Linux is a name based access control, ... TOMOYO Linux's string manipulation functions make reviewers feel crazy, ... the Linux kernel accepts all characters but NUL character ...
    (Linux-Kernel)
  • Re: searching for the highest index within a directory
    ... (I used to write code in the Ada programming language... ... Because "testFile_34" is a string, ... there is no way to compare them as numbers. ... means we look at the first character in each string. ...
    (microsoft.public.dotnet.languages.csharp)
  • RfD: Escaped Strings version 4
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... as an escape character for the entry of characters that cannot be ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • RfD: Escaped Strings version 4
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... as an escape character for the entry of characters that cannot be ... \b BS (backspace, ASCII 8) ...
    (comp.lang.forth)
  • Re: RfD: Escaped Strings
    ... the S" string can only contain printable characters, ... the S" string cannot contain the '"' character, ... \b BS (backspace, ASCII 8) ... \ ** escapes to characters much as C does. ...
    (comp.lang.forth)