# Re: Warning on assigning a function-returning-a-pointer-to-arrays

In article <MPG.1f092ec2b126110b989681@xxxxxxxxxxxxxx>
I.M. !Knuth <not_knuth@xxxxxxxxxxxx> wrote:
int *pfunc(void);

This declares pfunc as a function taking no arguments and returning
"int *", or "pointer to int". A value of type "pointer to int" can
(but does not necessarily) point to the first of one or more "int"s
in sequence, or even into the middle of such a sequence:

int *p1;
int x, y[4];

p1 = NULL; /* p1 does not point to any "int"s */
p1 = &x; /* p1 points to a single "int" */
p1 = &y[0]; /* p1 points to the first of 4 ints */
p1 = &y[1]; /* p1 points to the second of 4 ints, so p1[-1] is y[0] */

Presumably pfunc() will return a pointer to a single int, or to the
first of a sequence of "int"s.

int main(void)
{
int (*pd)[5];

(Very good definition of main(), by the way. :-) )

"pd" has type "pointer to array 5 of int". As with p1 above, it can
(not necessarily "does") point to the first of one or more "array 5
of int"s in sequence.

pd = pfunc();

Type mismatch: pfunc() returns a pointer to an int, or some int
within a sequence of "int"s. "pd" needs to point to an "array 5
of int", or the first of a sequence of "array 5 of int"s.

/* Should return 20 and 9 */
printf("\nDigits are: %d and %d", *(*(pd + 0) + 3), *(*(pd + 1) + 2) );

This could be written more clearly as:

printf("\nDigits are: %d and %d", pd[0][3], pd[1][2]);

return 0;

For strict portability, make sure any output is newline-terminated
(e.g., replace the above printf format "\n..." with "...\n", or
"\n...\n" if you like the extra blank line for some reason).

}

int *pfunc(void)
{
static int arr[2][5] = {{5, 10, 15, 20, 25}, {3, 6, 9, 12, 15}};

"arr" has type "array 2 of array 5 of int". That is, it is an array
of 2 "things", where each of those "things" is an "array 5 of int".
arr[0] is the first of those things and arr[1] is the second.

return(*arr);
}

Aside: the parentheses here are unnecessary, albeit harmless. The
syntax for "return" statements that return a value is:

return expression;

and of course expressions can be parenthesized. In any case, the
effect is to return a value.

*arr is one way to write arr[0]. This names the entire "array 5
of int" containing the sequence {5, 10, 15, 20, 25}. Whenever you
name an array object in a place where a value is required, C compilers
will produce, as a value, the address of the first element of that
array. Hence:

return arr[0];

"means" the same thing as:

return &arr[0][0];

Hence pfunc() returns a pointer to the "int" holding the value 5
(namely arr[0][0]), which is of course the first of five "int"s
in sequence.

Because C's arrays are always "a sequence of identically-typed
elements with no gaps in between"[%], this sequence of 5 "int"s is
going to be followed by the sequence of 5 "int"s that make up the
object arr[1]. That is, if you can somehow "flatten out" the array,
you get 2 * 5 = 10 "int"s in a row, holding the sequence {5, 10,
15, 20, 25, 3, 6, 9, 12, 15}.
-----
% Or more precisely, if there are gaps, they have to be invisible
as long as you stick to code with defined behavior.
-----

(The C standard does not promise that accessing the array in this
"flattened" manner will always work. There may be some rare
situations in which a clever C compiler tracks array-bounds
information so that it "knows" that, e.g., arr[0][i] can never have
a value of "i" greater than 4, and generates machine code that
actually fails if "i" really is greater than 4. But on real C
compilers on real machines, the flattened access usually does work.
Depending on this is like skating across a frozen pond where someone
has put out a "danger -- thin ice" sign. You will probably be
fine, but if you fall in and freeze to death, you will know who to
blame. :-) )

In this case, with some luck -- it is not clear whether this is
"good luck" or "bad luck" -- if the above code compiles, the value
pfunc() returns will "flatten out" the array, and this value will
be "re-folded" by the assignment to "pd". Having folded, spindled,
and perhaps mutilated the value, the code will go on to access
arr[0][3] and arr[1][2] through the pointer value now stored in
"pd". These two elements of "arr" should contain 20 and 9
respectively; and that is what you actually saw:

It compiles and works correctly, but the compilers I've tried it
on warn me of "illegal conversion of pointer type" or "suspicious
pointer conversion" regarding:

pd = pfunc();

The C standard says only that "a diagnostic" is required. A warning
suffices as a diagnostic, as does having the compiler spin the
CD-ROM really fast so that it makes a horrible whining sound (as
long as the compiler's documentation says as much). An "error"
that aborts compilation entirely is also a valid diagnostic.

I'm stumped. I think my syntax is correct, and there's nothing
in the literature or this group's FAQ that tells me otherwise; yet,
if I combine the declaration and initialisation to:

int (*pd)[5] = pfunc();

. . . the warnings go away. What's up with that?

That would indicate a bug in the compiler, as a diagnostic is
still required.

The "right" thing to do (for some version of "right" at least) is
to have pfunc() return a value of the correct type. This requires
some ugly syntax, or resorting to C's "typedef" type-alias-creating
facility.

Using a typedef -- which merely exchanges one ugly syntax for a
different ugly syntax, in my personal opinion :-) -- we get something
like this:

#include <stdio.h>

typedef int Zog[5]; /* Zog is now an alias for int[5] */

Zog *pfunc(void);

int main(void) {
Zog *pd;

pd = pfunc();
printf("%d and %d\n", pd[0][3], pd[1][2]);
return 0;
}

Zog *pfunc(void) {
static int arr[2][5] = {{5, 10, 15, 20, 25}, {3, 6, 9, 12, 15}};

return arr; /* or return &arr[0]; */
}

To eliminate the typedef, we just have to expand it out -- but now
we need parentheses and "[5]"s in awkward places, as with the
original definition for "pd" in main():

int (*pfunc(void))[5];

int main(void) {
int (*pd)[5];

pd = pfunc();
printf("%d and %d\n", pd[0][3], pd[1][2]);
return 0;
}

int (*pfunc(void))[5] {
static int arr[2][5] = {{5, 10, 15, 20, 25}, {3, 6, 9, 12, 15}};

return arr; /* or return &arr[0]; */
}

Note that (alas) in C89, pfunc() can only return a pointer to (the
first of several of) "array 5 of int"s. So we can change "arr" to
"static int arr[123][5]", or "static int arr[42][5]", but never to
"static int arr[2][7]", for instance. C99's "variable-length
arrays" and "variably modified" types solve this particular problem.

Aside: I dislike typedefs in general, and I dislike typedefs for
array types even more because of C's peculiar treatment of arrays.
Unlike every other data type, you *must* know whether some type
is an array type in order to predict the behavior of arguments,
and know whether it is OK to return a value of that type. That
is, given the "Zog" typedef above, the following is not valid C:

Zog f(void) {
Zog result;

while (need_more_work())
fill_it_in(&result);
return result;
}

But if we were to replace the typedef line with, e.g.:

struct zog { int val[5]; };
typedef struct zog Zog;

then the function f() above would suddenly become valid C. Similarly,
if we have no idea whether "Morgle" is a typedef for an array type,
we cannot tell whether the following can be simplified:

void operate(void) {
Morgle a, b;

init(&a);
memcpy(&b, &a, sizeof a); /* ??? do we need this ? */

while (more_to_do())
frob(a);

/* make sure frob() did not modify "a" */
if (memcmp(&a, &b) != 0)
printf("alas! alack!\n");
}

If Morgle is *not* an array type, frob() will be unable to modify
"a", because frob() takes the *value* of "a", not the address of
"a". In this case, the copy in "b" is pointless and the memcmp()
will never show them as different, so we do not need the copy.
But if Morgle *is* an array type, frob() receives a pointer to the
first element of "a", and is able to modify "a".

(In some cases we can use "const" to promise, weakly, that frob()
will not modify "a" even if it gets a pointer to the first element;
but this promise can be violated, and in some cases adding "const"
is inappropriate anyway. I think it is better to avoid the situation
entirely.)

(The heart of the problem is really that C treats arrays "specially".
Because of this, it is important to know whether some purportedly
abstract type is in fact an array type. If so, it will not behave
the way other types behave. C's structure types *do* behave
"properly", so in the limit, all abstract data types should be
"struct"s.)

The one place where even I break down and use "typedef" :-) is for
pointer-to-function types. Consider "signal", which takes two
parameters:

- one, an int specifying which signal, and
- the other, a pointer to a signal-hanlding function

and returns one value:

- a pointer to a signal-handling function

where the repeated type -- "pointer to signal-handling function"
is itself a pointer-to-function-taking-int-and-returning-void,
or "void (*)(int)", complete with awkwardly-placed parentheses,
asterisks, and parameter types. If we write down one typedef for
this particular type, we can then use it twice and get:

typedef void (*Sig_func_ptr)(int);
Sig_func_ptr signal(int sig, Sig_func_ptr func);

Of course, the standard header <signal.h> is not allowed to use
names that are in the user's namespace, so most implementors expand
the types in-line, and omit the parameter names, giving:

void (*signal(int, void (*)(int)))(int);

which is confusing at best. If you are the implementor, and
go to write the function's definition, it gets even worse:

void (*signal(int sig, void (*func)(int)))(int) {
void (*old(int, void (*)(int)))(int);

if (sig < __MIN_SIGNO || sig >= __MAX_SIGNO)
return SIG_ERR;

some sort of signal atomicity magic here;

old = __sigtable[sig - __MIN_SIGNO];
/*
* May need additional work depending on sig and/or whether
* func == SIG_DFL or SIG_IGN. For instance, instead of the
* crazy top-of-stack "trampoline code" that BSD systems (used
* to?) use, we might do something like this:
*
* if (func == SIG_DFL || func == SIG_IGN)
* kernel_entry = func;
* else
* kernel_entry = __sigtramp;
* __sig_syscall(sig, kernel_entry, and, any, other, args);
*
* Then the local signal table contains the userland handler,
* while the kernel is told to jump to the trampoline code
* in the library, no matter where that has been loaded.
* Now the library always automatically matches itself, even
* with future version changes, e.g., to save additional state.
*
* I believe Sun did something like this way back in SunOS 4.x
* or 5.0 or so.
*/
__sigtable[sig - MIN_SIGNO] = func;

more atomicity here including check for pending signals;

return old;
}

Of course, signals (with their associated atomicity issues, operating
system interactions, and hardware dependencies) are "relatively deep
magic" in the first place.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
Reading email is like searching for food in the garbage, thanks to spammers.
.

## Relevant Pages

• Re: The question regarding type of pointers
... int day_of_year ... According to my understanding daytab is pointing to the whole daytab ... array i.e it is equivalent to p3. ... daytab is converted to a pointer to the first ...
(comp.lang.c)
• Re: How would you design Cs replacement?
... I would get rid of void. ... a member called size of type int. ... size of the array pointed at by v->dynamic is given by int size. ... If you pass an unsized pointer to a sized parameter, ...
(comp.lang.c)
• Re: Newbie
... to talk about the int value 3 and the int value 4, ... It also lets you talk about pointer ... C has a special rule for array objects. ... to printf() is: ...
(comp.lang.c)
• Re: union {unsigned char u[10]; ...}
... But character type is not a union. ... u.a is of type int. ... has to do so to make pointer equality work consistently). ... were a single-element array. ...
(comp.lang.c)
• (patch for Bash) regex case statement
... Following up on my previous patch for regex conditional tests, ... /* Return an array of strings; ... int dollarflag, zeropad, compareflag; ... SHELL_VAR *var; ...
(comp.unix.shell)