Re: Sub-sub-expression evaluation order

From: Andrey Tarasevich (andreytarasevich_at_hotmail.com)
Date: 12/10/04


Date: Thu, 09 Dec 2004 17:20:36 -0800

Frank Wallingford wrote:
> ...
> I came across an interesting question when talking with my colleagues.
> According to the standard, most operators evaluate their operands in an
> unspecified order. This means that in code like this:
>
> f() + g()
>
> the two functions may be called in either order, and still be standards-
> compliant.

That's correct.

> Fine. Now, when the evaluation of one sub-expression begins, is there
> anything in the standard that says that it must complete (ignoring side-
> effects) before the other sub-expression evaluation begins? In other
> words, can pieces of the sub-expressions be evaluated in an interleaving
> fashion?

Yes. The evaluation can be implemented in any way, as long as it
produces the result that satisfies the requirements imposed by the
language specification.

> Take this example:
>
> f(i++) + g(i++)
>
> The difference from the first example is the sequence points for the
> function calls.

There are indeed sequence points at the beginning of each function's
execution. However, the standard doesn't require the evaluation of
argument subexpressions to be "tightly packed" with the execution of the
corresponding function. In other words, in this particular case the
expression might be interpreted as two consecutive 'i++' evaluations
(not separated by any sequence point) followed by calls to 'f' and 'g'.
This means that the modifications of 'i' are potentially violating the
requirements of the standard and the code produces undefined behavior.

> Now, obviously since f() and g() may be called in any order, this code
> is non-deterministic.

If by "non-deterministic" you mean that the code produces unspecified
result, you are wrong. The code produces undefined behavior.

> My question is: can the compiler evaluate the sub-
> expressions in the following order, interleaving the sub-sub-expression
> evaluations?
>
> t1 = i;
> t2 = i;
> i++;
> i++;
> f(t1);
> g(t2);

Yes. That's exactly what I said above. Note though, that there wouldn't
really be any sequence points between two modifications of 'i', which
leads to UB.

> Note that all side-effects were finished before the function-call
> sequence points, so this doesn't violate that.

What exactly do you mean by "that" here?

> If there is nothing in the standard to prevent this, then not only is
> the code "f(i++) + g(i++)" non-deterministic, but it remains as
> undefined as "i++ + i++".

Exactly.

> Here is an example where I think the above problem exists, although it's
> not obvious (please ignore the obvious problems and poor design; this is
> pseudo-code to illustrate a point):
>
> const char *a(void)
> {
> static char *x;
> if (x) free(x);
> x = (char*)malloc(2);
> strcpy(x, "a");
> return x;
> }
>
> printf("%s %s\n", a(), a());
>
> The obvious problem here is that the second call to a() (in either
> order) will free the pointer returned by the first call.

That's correct. But I'd say that the nature of the problem is very
different here.

> Here is the revised version, but I believe that the following is still
> undefined:
>
> const char *a(void) { ... same as above ... }
>
> static list *strings; // implementation not important
>
> const char *cache(const char *in)
> {
> char *out;
> out = strdup(in);
> add_to_list(out);
> return out;
> }
>
> printf("%s %s\n", cache(a()), cache(a()));
>
> The cache() function only exists so that calls like 'printf' can be made
> - that is, they copy the argument off and return the copy (and save it
> in a list to be freed later). This way, the second call to a() doesn't
> affect anything by freeing the old value.
>
> But, similar to the above, I believe that an implementation is free to
> evaluate the printf call in this order:
>
> char *t1 = a();
> char *t2 = a();
> char *t3 = cache(t1);
> char *t4 = cache(t2);
> printf("%s %s\n", t3, t4);
>
> Is this right?

Yes, that's perfectly possible.

>Is it undefined, according to the standard?

I don't see any problems with the last portion of code. It doesn't have
any violations present in the 'f(i++) + g(i++)' example. Why exactly do
you suspect that it is undefined?

-- 
Best regards,
Andrey Tarasevich


Relevant Pages

  • Re: x=(x=5,11)
    ... But the standard is not ambiguous at all. ... There is a sequence point ... This was a red herring when it first entered the conversation and is ... sequence points is to insure that the evaluation is complete. ...
    (comp.lang.c)
  • Re: Further along the evaluation order: Was: Order of evaluation.
    ... is an example from the C Standard of undefined behavior. ... There is a sequence point at the semicolon. ... Standard declares that modifying an object twice between sequence ...
    (alt.comp.lang.learn.c-cpp)
  • Re: "Variables" tutorial available (Windows, mingw/msvc)
    ... For the benefit of those on this newsgroup who haven't got the standard ... the order of evaluation of operands of individual ... previous and next sequence point a scalar object shall have its stored ... But the standard says that it _is_ undefined behaviour. ...
    (comp.lang.cpp)
  • Re: Varied results for ++ operator in VC++/Unix.
    ... > int main ... ++a" does not contain any sequence point. ... C++ standard, the behaviour is undefined, so both answers are equelly ... modified at most once by the evaluation of an expression". ...
    (microsoft.public.vc.language)
  • Re: Defined bahavior or otherwise?
    ... >in the standard? ... Since the result cannot vary due to permissible variations in evaluation ... order between sequence points, ...
    (comp.lang.c)