Re: String concatenation function, request for comments.



Thank you for your suggestions and the english lesson.

Eric Sosman wrote:
> michael.casey@xxxxxxxxx wrote:
>
> > The purpose of this post is to obtain the communities opinion of the
> > usefulness, efficiency, and most importantly the correctness of this
> > small piece of code. I thank everyone in advance for your time in
> > considering this post, and for your comments.
> >
> > I wrote this function to simplify the task of combining strings using
> > mixed sources. I often see the use of sprintf() which results in
> > several lines of code, and more processing than really for the task.
> > The function is pretty straight forward, it accepts a variable list of
> > arguments beginning with the mandatory integer containing the total
> > number of arguments.
> >
> > The function allocates a static buffer; I thought the use of malloc()
> > to be unnecessary, as the responsibility of freeing the memory would be
> > left to the client. This design choice also minimizes the amount of
> > required arguments. If the resultant buffer is needed beyond the second
> > call to the function, it can be copied to another buffer at the clients
> > digression.
>
> ITYM "discretion" -- but I digress.
>

I would hope it to be at the clients discretion, yes. :)

> > If the strings specified for concatenation exceed the buffer available,
> > the overall string will simply be truncated. I decided on this
> > behaviour because I do not believe the output to be essential to a
> > programs operation, so this error can in most cases go silent, but
> > still be apparent to the programmer once the result is examined.
>
> This strikes me as a poor design choice. Essentially, you are
> saying that long concatenated strings are only for "decorative"
> purposes, and are "unimportant" to the correct operation of the
> program. To put it another way, part of the "contract" for your
> function is "Don't use this for anything important." This limits
> the function's usefulness (one of the attributes you asked about);
> for example, it would be unsafe to use the function to generate
> database queries. It would be unsafe even for some uses where a
> human reader is involved: Imagine getting a message on your text
> pager that said "Emergency! The fertilizer has hit the air
> circulator! Drop everything and meet me right away at"
>

Good point. Besides dynamic memory allocation handled by the function,
with the responsibility placed on the client, what alternatives do I
have?

I don't believe it is too much to ask that the client ensure sufficient
space is allocated for the intended string.

> > If the defined buffer amount is insufficient or is unnecessarily large
> > for a particular application, it is left to the client to redefine the
> > buffer amount; I feel that 512 bytes is sufficient for my use.
> >
> > I find this function most useful when frequently forming messages for
> > use over a socket, which will not otherwise be reused.
> >
> > Example use of the function:
> >
> > char *tmp;
> > tmp = vast("Welcome to ", HOST, ", you are logged in as ", name, ".");
>
> This demonstrates a fragility of the interface. It's YOUR
> function and YOUR interface design, and in YOUR very first example
> of its use you've made an error. When such things happen to me
> (nobody's perfect), I start to wonder whether the interface I've
> designed should perhaps be redesigned.
>

A simple oversight. If I did not think this function should be
redesigned I would not have chosen to request comments on it. I believe
everything is subject to redesign.

> Personally, I dislike interfaces that require the coder to count
> things -- it dates from my early FORTRAN days, when I had to write
> character strings as 13HHELLO, WORLD! and hope that I hadn't blown it.
>
> > send(sockfd, tmp, strlen(tmp), 0);
> >
> > The function is defined as follows:
> >
> > /*------------------------------------------------------------------*\
> > |- vast(): Variable Argument String; concatenate arbitrary strings. -|
> > \*------------------------------------------------------------------*/
> >
> > #define VAST_BUF 512
> >
> > char *vast(int num, ...) {
> >
> > static char buf[VAST_BUF];
> >
> > int t = 0,
> > o = 0;
>
> What's the point of maintaining two variables whose values
> are always equal?
>

There is no point, I will remove it.

> > char *v;
> >
> > va_list ap;
>
> You haven't #include'd <stdarg.h>, so this won't compile.
>
> > if(num > 1) {
>
> Peculiar choice: if somebody tries to "concatenate" just one
> string (`p = vast(1, "Alpha and Omega")'), he'll get the empty
> string in return.
>

My initial thought was to save processing because the function call was
an error to begin with, return an empty string so that it will be most
obvious that the usage of this function was not intended to begin with.

This probably needs some re-thinking.

> > va_start(ap, num);
> >
> > /* Loop it while its hot. */
> >
> > while(num--) {
> > v = va_arg(ap, char *);
> >
> > /* Copy string to cumulative buffer until NUL is reached */
> > /* so long as the total does not exceed (MAX_BUF - 1). */
> >
> > while((buf[o] = *v++) && !(t >= VAST_BUF - 1)) {
>
> Can't you find a more obscure way to write `t < VAST_BUF - 1'?
>

I will try.

> > /* Condition is met, increment offset and total. */
> > t++, o++;
> > }
> > }
> >
> > va_end(ap);
> > }
> >
> > buf[o] = NUL;
>
> NUL has not been defined. Assuming you intend to define it
> as zero, the name `NUL' is a misnomer: you do *not* want some
> particular character in some particular encoding, you want the
> zero-valued character regardless of encoding -- ASCII, EBCDIC,
> you name it.
>

I did not know that NUL was undefined.

> An aside: This assignment is required only if `num' was
> less than two to begin with.
>

Good point.

> > return(buf);
> >
> > }
>
> Summary: I don't like the design -- but that's to some extent
> a matter of personal taste about which reasonable people can
> disagree. The implementation is all right, once a few missing
> pieces like <stdarg.h> and NUL are supplied, but could be made
> clearer. The documentation is inadequate; in particular, the value
> 511 is part of the external interface but is not described anywhere,
> and the surprising treatment of single-string "concatenations"
> deserves a mention. Usefulness: minimal. Efficiency: unanswerable
> in terms of Standard C. Correctness: also unanswerable due to the
> lack of a specification.
>
> --
> Eric Sosman
> esosman@xxxxxxxxxxxxxxxxxxx

Thank you for your comments.

.



Relevant Pages

  • Re: String concatenation function, request for comments.
    ... usefulness, efficiency, and most importantly the correctness of this small piece of code. ... as the responsibility of freeing the memory would be left to the client. ... If the resultant buffer is needed beyond the second call to the function, it can be copied to another buffer at the clients digression. ... If the strings specified for concatenation exceed the buffer available, ...
    (comp.lang.c)
  • Re: [Lit.] Buffer overruns
    ... >> As has been pointed out, C strings are unrelated to C buffer overflow ... > is a infrastructure source length available based on nul-terminated, ... > data-pattern based length paradigm ... ...
    (sci.crypt)
  • Re: Brian Kernighan, maybe Im not worthy, maybe Im scum
    ... I've seen a fair amount of Rob Pike's code, and it looks pretty good to me. ... opinions, I respect those opinions enough to treat them seriously, because ... strings" - to C, they're just strings, and if the implementation wants ... interface appears in C Sharp as consisting of sbyte arrays. ...
    (comp.programming)
  • Re: null terminated strings
    ... terminated strings are blamed because they were being used in the programs ... involved with the overrun situation. ... maximum length for that buffer and make sure you don't exceed it. ... Languages such as COBOL would automaically pad or truncate fields during ...
    (comp.os.vms)
  • Re: [Lit.] Buffer overruns
    ... > As has been pointed out, C strings are unrelated to C buffer overflow ... is a infrastructure source length available based on nul-terminated, ... data-pattern based length paradigm ... ...
    (sci.crypt)