Musings on C declaration syntax and style

Thu, 10 Mar 2011 21:27:08 +0000

So, today I was asked why I declared my pointer variables like:

int *f;

rather than:

int* f;

After all, I want a variable with the type pointer to int; the pointer is part of the type, so the asterisk rightly belongs with the int part. Which, on the face of it is a pretty reasonable argument. And to be honest, may be valid in the overall scheme of things.

The reason for my preference really stems from the syntax of declarations in C. The mental model that many of us have is that the syntax is:

type-name variable-name

If this was the reality of the situation then int* foo would be a pretty reasonable way to declare a variable. Unfortunately the reality of the situation is that syntax for declarations in C is actually:

delcaration-specifiers init-declarator-listopt

At this point even an experienced C developers could be forgiven for thinking to herself, wtf is a declaration-specifier, and for that matter an init-declarator??.

Well, the full answers to such questions are found in everyone’s favourite piece of literature ISO/IEC 9899:1999 (of course to get the actual standard costs money, so most of us make do with WG14 N1256 [pdf], which is the final draft of the standard and as far as I’m aware there are no significant changes between the draft and the published standard) but I’m going to try and give a less precise, and hopefully more readable overview of what it means.

Declaration Specifiers

The declaration-specifiers consists of the storage-class specifier, the type-specifier and the type-qualifier. So, basically your storage class specifier is the extern or static. You can only have one of these in your declaration-specifier and should go at the front of your declaration.

Next up is the type-specifier, this is your void, char int, short, long, signed, unsigned, float, double. It can also be a struct, union or enum specifier, or a typedef name if you are feeling particularly out there.

Now, you’ve got to have at least one type-specifier in your declaration, but you can have more than one, such as unsigned int, for example. Interestingly, the order doesn’t matter, so int unsigned is the same as unsigned int, and the other kind of crazy thing is that type-specifiers can be mixed freely with the other specifiers, so int volatile long static unsigned is also perfectly valid!

Finally, you can optionally have a type-qualifer (or two), which are the volatile and const. These can also appear anywhere in your declaration-specifier and as a bonus party trick, if you have more than one of them that is fine. So int volatile volatile volatile is perfectly fine if you want to treat your C code as some form of absurdist poetry. (In C99 at least, not true in C89).

OK, so now you have a pretty good idea of what goes in to the magic declaration-specifier thing is. Now the important thing here is that pointer is not mentioned at all! And neither for that matter are arrays or functions. The pointer and array (and function) part of an identifier’s type don’t go in the declaration-specifier. Which just leaves us with the init-declarator-list thing; which is simply a list of declarators, which may be initialised. For this article we’ll not really worry about the initialisation part.

Declarators

So, a declarator contains the identifier (informally, the variable name), and additionally extra bits of type information, specifically whether the identifier is a pointer, array or function. Some example declarators:


x
*x
*const x
*const*const x
x[5]
x(int)
x(int x)

Now, for every declaration-specifier we can have a list of declarator, so the standard reason for putting a space between the type-specifier and the pointer is when declaring multiple pointers in the same declaration, there is less chance of getting things wrong. For example

int* x, y;

It is not clear whether the author intended for x to be a pointer to int, and y to be an int, or whether the intent was for both x and y to be pointers to int. Assuming the latter the right way to do it with such a formatting style would be:

int* x,* y;

which, is somewhat aesthetically unpleasing. By comparison:

int *x, *y;

is clearly two pointers, and

int *x, y;

is clearly one pointer without any real ambiguity. (Of course one could argue that declaring multiple identifiers with different types in the same declaration is probably not a crash hot idea anyway).

As an aside, while I’d never suggest doing this in real code, it is perfectly legal to declare a variable, pointer, array, function pointer, and function identifier within the same declaration, for example:

int *d1, d2, d3[1], (*d4)(void), d5(void);

Conclusions

So the main point here is that we shouldn’t think of declarations as type-name variable-name because that just isn’t how the language’s syntax works. Of course, there are other places where we do need to specify a full type, and that is when using the cast operator, however in the C specification a type-name is defined as “a declaration ... of that type that omits the identifier”, which is I format casts as (int *)foo rather than (int*)foo.

So, back to the topic at hand, int* foo vs int *foo. I don’t think there is any real defense for the first approach if your coding standard allows multiple identifiers to be declared within the same declaration.

I can see an argument being made that the C declaration rules are just too damn complex, and lets just pretend that declarations really are of the form type-name variable-name.

Of course one problem with that approach is that is is not possible to give identifiers complex types without the use of typedef. I guess this could be seen as a feature rather tahn a drawback.

Another argument for the space between type-specifier and pointer is that the pointer may include a type-qualifier, such as const. Compare int*const vs. int *const. In my opinion the latter is more aesthetically pleasing, but this is much weaker argument (and also motivates having a space between pointer and identifier *const ident vs *constident... the latter is not even syntactically correct).

So, a conclusion... I’m still going with int *foo in any of my coding standards because it most closely matches the underlying syntax of the langauge.

Do you have an opinion one way or the other? Do you have some good reasons to back it up? Please leave a comment!

Footnotes

If you want to know how to read C declarations I’d suggest understanding The Clockwise/Spiral Rule. Or if you are lazy try cdecl.

So, really, the C declaration syntax is kind of nuts, I much prefer what is done in Go. Go’s Declaration Syntax is worth a read.

blog comments powered by Disqus