Re: Fast and Safe C Strings: User friendly C macros to Declare and use C Strings.



On 23 Apr 2007 18:27:18 GMT, Chris Torek <nospam@xxxxxxxxx> wrote:

In article <462ce4fe.44079812@xxxxxxxxxxxxx>
Richard Harter <cri@xxxxxxxx> wrote:
... if you have a length count you can unroll the [strchr-or-equivalent]
loop and avoid most of the comparisons between length and index. This
is a standard optimization; many compilers will even do it for you. If
you are scanning a \0 terminated string loop unrolling is not available.

Well, yes, except that implementations can "cheat", and unroll
strchr() anyway, after first ensuring that the address is (say) 0
mod 4. On some architectures (e.g., the original Alpha or MIPS)
this is pretty much the only way to handle the loop.

Point conceded. To be fair though, once we are looking under the hood
at particular implementations, there are a lot of little tricks one can
use to juice performance in count delimited strings. The original
posting said in all instances; that has to be an overstatement.

The obvious way to create an artificial situation in which
counted-length-strings underperfom zero-terminated-strings
is to create a lot of single-character strings in the first
version, and re-use the zero-terminator in the second:

loop {
newstr = substring(original, pos, 1);
if (compare_strings(newstr, looking_for) == match) ...
release_string(newstr);
}

vs:

newstr[1] = '\0';
loop {
newstr[0] = char_at(original, pos);
if (compare_strings(newstr, looking_for) == match) ...
}

Of course, you can make the first one perform the same as the second
by doing character (instead of string) operations (on newstr[0]
instead of newstr) -- but in languages that have counted-length-strings
as built-in primitive types, one often finds programmers allocating
and releasing single-character "strings" inside inner loops.

How depressing.

It seems to me that for count-strings one wants a substring operation
that just points to a position in the original string. One
distinguishes between strings with modifiable content and those with
non-modifiable content. With that concept your first instance is simply

loop {
newstr = substring(original, pos, 1);
if (compare_strings(newstr, looking_for) == match) ...
}

and under the hood we have something like:

newstr.ptr = original.ptr + pos;
newstr.cnt = 1;

An advantage of count-strings is that you can refer to arbitrary
substrings of an original string. With terminated-strings you can only
refer to suffix substrings.


This
may be where some of the "anti-counted-length-strings" bias comes
from. (I believe another chunk of bias comes from implementations
that limit counted-length strings to 255 bytes maximum: clearly a
bad idea, yet it occurs over and over again.)

There's a little issue involved. Commonly the count in a count-string
is packaged into the start of the string. When you do that then you're
stuck with a fixed format for the count (well, yes, you can wiggle
around it but that has a cost). But if you don't package the count with
the string then the count can go irretrievably lost. The advantage of
having a terminating character is the length can't go lost.


.



Relevant Pages

  • extension_pack
    ... It is used to set upper loop -- limits for non-deterministic values thus avoiding the use of access -- types and enabling the functions to be used for synthesizeable code. ... DivisorVal: integer) return std_logic_vector; function "/"(DividendVal: string; DivisorVal: integer) return std_logic_vector; ... for loopVar in 0 to slvVal'length/4-1 loop ... end loop; if then return not resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to else return resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to end if; ...
    (comp.lang.vhdl)
  • extension_pack
    ... It is used to set upper loop -- limits for non-deterministic values thus avoiding the use of access -- types and enabling the functions to be used for synthesizeable code. ... DivisorVal: integer) return std_logic_vector; function "/"(DividendVal: string; DivisorVal: integer) return std_logic_vector; ... for loopVar in 0 to slvVal'length/4-1 loop ... end loop; if then return not resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to else return resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to end if; ...
    (comp.lang.vhdl)
  • extension_pack
    ... It is used to set upper loop -- limits for non-deterministic values thus avoiding the use of access -- types and enabling the functions to be used for synthesizeable code. ... DivisorVal: integer) return std_logic_vector; function "/"(DividendVal: string; DivisorVal: integer) return std_logic_vector; ... for loopVar in 0 to slvVal'length/4-1 loop ... end loop; if then return not resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to else return resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to end if; ...
    (comp.lang.vhdl)
  • extension_pack
    ... It is used to set upper loop -- limits for non-deterministic values thus avoiding the use of access -- types and enabling the functions to be used for synthesizeable code. ... DivisorVal: integer) return std_logic_vector; function "/"(DividendVal: string; DivisorVal: integer) return std_logic_vector; ... for loopVar in 0 to slvVal'length/4-1 loop ... end loop; if then return not resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to else return resultVar; -- "width mismatch" errors here are due to improper sizing of the vector that this function is assigned to end if; ...
    (comp.lang.vhdl)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... The MAXIMUM size of a working string, accessed in its entirety, is ... compiler would generate a loop), ... > natural incompetence in the field of programming. ...
    (comp.programming)