f67798e710
* cpplex.c: Update comments. * README.Portability: Small update. From-SVN: r35058
397 lines
11 KiB
Plaintext
397 lines
11 KiB
Plaintext
Copyright (C) 2000 Free Software Foundation, Inc.
|
|
|
|
This file is intended to contain a few notes about writing C code
|
|
within GCC so that it compiles without error on the full range of
|
|
compilers GCC needs to be able to compile on.
|
|
|
|
The problem is that many ISO-standard constructs are not accepted by
|
|
either old or buggy compilers, and we keep getting bitten by them.
|
|
This knowledge until know has been sparsely spread around, so I
|
|
thought I'd collect it in one useful place. Please add and correct
|
|
any problems as you come across them.
|
|
|
|
I'm going to start from a base of the ISO C89 standard, since that is
|
|
probably what most people code to naturally. Obviously using
|
|
constructs introduced after that is not a good idea.
|
|
|
|
The first section of this file deals strictly with portability issues,
|
|
the second with common coding pitfalls.
|
|
|
|
|
|
Portability Issues
|
|
==================
|
|
|
|
Unary +
|
|
-------
|
|
|
|
K+R C compilers and preprocessors have no notion of unary '+'. Thus
|
|
the following code snippet contains 2 portability problems.
|
|
|
|
int x = +2; /* int x = 2; */
|
|
#if +1 /* #if 1 */
|
|
#endif
|
|
|
|
|
|
Pointers to void
|
|
----------------
|
|
|
|
K+R C compilers did not have a void pointer, and used char * as the
|
|
pointer to anything. The macro PTR is defined as either void * or
|
|
char * depending on whether you have a standards compliant compiler or
|
|
a K+R one. Thus
|
|
|
|
free ((void *) h->value.expansion);
|
|
|
|
should be written
|
|
|
|
free ((PTR) h->value.expansion);
|
|
|
|
Further, an initial investigation indicates that pointers to functions
|
|
returning void are okay. Thus the example given by "Calling functions
|
|
through pointers to functions" below appears not to cause a problem.
|
|
|
|
|
|
String literals
|
|
---------------
|
|
|
|
Some SGI compilers choke on the parentheses in:-
|
|
|
|
const char string[] = ("A string");
|
|
|
|
This is unfortunate since this is what the GNU gettext macro N_
|
|
produces. You need to find a different way to code it.
|
|
|
|
K+R C did not allow concatenation of string literals like
|
|
|
|
"This is a " "single string literal".
|
|
|
|
Moreover, some compilers like MSVC++ have fairly low limits on the
|
|
maximum length of a string literal; 509 is the lowest we've come
|
|
across. You may need to break up a long printf statement into many
|
|
smaller ones.
|
|
|
|
|
|
Empty macro arguments
|
|
---------------------
|
|
|
|
ISO C (6.8.3 in the 1990 standard) specifies the following:
|
|
|
|
If (before argument substitution) any argument consists of no
|
|
preprocessing tokens, the behavior is undefined.
|
|
|
|
This was relaxed by ISO C99, but some older compilers emit an error,
|
|
so code like
|
|
|
|
#define foo(x, y) x y
|
|
foo (bar, )
|
|
|
|
needs to be coded in some other way.
|
|
|
|
|
|
signed keyword
|
|
--------------
|
|
|
|
The signed keyword did not exist in K+R compilers; it was introduced
|
|
in ISO C89, so you cannot use it. In both K+R and standard C,
|
|
unqualified char and bitfields may be signed or unsigned. There is no
|
|
way to portably declare signed chars or signed bitfields.
|
|
|
|
All other arithmetic types are signed unless you use the 'unsigned'
|
|
qualifier. For instance, it is safe to write
|
|
|
|
short paramc;
|
|
|
|
instead of
|
|
|
|
signed short paramc;
|
|
|
|
If you have an algorithm that depends on signed char or signed
|
|
bitfields, you must find another way to write it before it can be
|
|
integrated into GCC.
|
|
|
|
|
|
Function prototypes
|
|
-------------------
|
|
|
|
You need to provide a function prototype for every function before you
|
|
use it, and functions must be defined K+R style. The function
|
|
prototype should use the PARAMS macro, which takes a single argument.
|
|
Therefore the parameter list must be enclosed in parentheses. For
|
|
example,
|
|
|
|
int myfunc PARAMS ((double, int *));
|
|
|
|
int
|
|
myfunc (var1, var2)
|
|
double var1;
|
|
int *var2;
|
|
{
|
|
...
|
|
}
|
|
|
|
You also need to use PARAMS when referring to function protypes in
|
|
other circumstances, for example see "Calling functions through
|
|
pointers to functions" below.
|
|
|
|
Variable-argument functions are best described by example:-
|
|
|
|
void cpp_ice PARAMS ((cpp_reader *, const char *msgid, ...));
|
|
|
|
void
|
|
cpp_ice VPARAMS ((cpp_reader *pfile, const char *msgid, ...))
|
|
{
|
|
#ifndef ANSI_PROTOTYPES
|
|
cpp_reader *pfile;
|
|
const char *msgid;
|
|
#endif
|
|
va_list ap;
|
|
|
|
VA_START (ap, msgid);
|
|
|
|
#ifndef ANSI_PROTOTYPES
|
|
pfile = va_arg (ap, cpp_reader *);
|
|
msgid = va_arg (ap, const char *);
|
|
#endif
|
|
|
|
...
|
|
va_end (ap);
|
|
}
|
|
|
|
For the curious, here are the definitions of the above macros. See
|
|
ansidecl.h for the definitions of the above macros and more.
|
|
|
|
#define PARAMS(paramlist) paramlist /* ISO C. */
|
|
#define VPARAMS(args) args
|
|
|
|
#define PARAMS(paramlist) () /* K+R C. */
|
|
#define VPARAMS(args) (va_alist) va_dcl
|
|
|
|
One aspect of using K+R style function declarations, is you cannot
|
|
have arguments whose types are char, short, or float, since without
|
|
prototypes (ie, K+R rules), these types are promoted to int, int, and
|
|
double respectively.
|
|
|
|
Calling functions through pointers to functions
|
|
-----------------------------------------------
|
|
|
|
K+R C compilers require parentheses around the dereferenced function
|
|
pointer expression in the call, whereas ISO C relaxes the syntax. For
|
|
example
|
|
|
|
typedef void (* cl_directive_handler) PARAMS ((cpp_reader *, const char *));
|
|
*p->handler (pfile, p->arg);
|
|
|
|
needs to become
|
|
|
|
(*p->handler) (pfile, p->arg);
|
|
|
|
|
|
Macros
|
|
------
|
|
|
|
The rules under K+R C and ISO C for achieving stringification and
|
|
token pasting are quite different. Therefore some macros have been
|
|
defined which will get it right depending upon the compiler.
|
|
|
|
CONCAT2(a,b) CONCAT3(a,b,c) and CONCAT4(a,b,c,d)
|
|
|
|
will paste the tokens passed as arguments. You must not leave any
|
|
space around the commas. Also,
|
|
|
|
STRINGX(x)
|
|
|
|
will stringify an argument; to get the same result on K+R and ISO
|
|
compilers x should not have spaces around it.
|
|
|
|
|
|
Passing structures by value
|
|
---------------------------
|
|
|
|
Avoid passing structures by value, either to or from functions. It
|
|
seems some K+R compilers handle this differently or not at all.
|
|
|
|
|
|
Enums
|
|
-----
|
|
|
|
In K+R C, you have to cast enum types to use them as integers, and
|
|
some compilers in particular give lots of warnings for using an enum
|
|
as an array index.
|
|
|
|
|
|
Bitfields
|
|
---------
|
|
|
|
See also "signed keyword" above. In K+R C only unsigned int bitfields
|
|
were defined (i.e. unsigned char, unsigned short, unsigned long.
|
|
Using plain int/short/long was not allowed).
|
|
|
|
|
|
free and realloc
|
|
----------------
|
|
|
|
Some implementations crash upon attempts to free or realloc the null
|
|
pointer. Thus if mem might be null, you need to write
|
|
|
|
if (mem)
|
|
free (mem);
|
|
|
|
|
|
Reserved Keywords
|
|
-----------------
|
|
|
|
K+R C has "entry" as a reserved keyword, so you should not use it for
|
|
your variable names.
|
|
|
|
|
|
Type promotions
|
|
---------------
|
|
|
|
K+R used unsigned-preserving rules for arithmetic expresssions, while
|
|
ISO uses value-preserving. This means an unsigned char compared to an
|
|
int is done as an unsigned comparison in K+R (since unsigned char
|
|
promotes to unsigned) while it is signed in ISO (since all of the
|
|
values in unsigned char fit in an int, it promotes to int).
|
|
|
|
Trigraphs
|
|
---------
|
|
|
|
You weren't going to use them anyway, but trigraphs were not defined
|
|
in K+R C, and some otherwise ISO C compliant compilers do not accept
|
|
them.
|
|
|
|
|
|
Suffixes on Integer Constants
|
|
-----------------------------
|
|
|
|
K+R C did not accept a 'u' suffix on integer constants. If you want
|
|
to declare a constant to be be unsigned, you must use an explicit
|
|
cast.
|
|
|
|
You should never use a 'l' suffix on integer constants ('L' is fine),
|
|
since it can easily be confused with the number '1'.
|
|
|
|
|
|
Common Coding Pitfalls
|
|
======================
|
|
|
|
errno
|
|
-----
|
|
|
|
errno might be declared as a macro.
|
|
|
|
|
|
Implicit int
|
|
------------
|
|
|
|
In C, the 'int' keyword can often be omitted from type declarations.
|
|
For instance, you can write
|
|
|
|
unsigned variable;
|
|
|
|
as shorthand for
|
|
|
|
unsigned int variable;
|
|
|
|
There are several places where this can cause trouble. First, suppose
|
|
'variable' is a long; then you might think
|
|
|
|
(unsigned) variable
|
|
|
|
would convert it to unsigned long. It does not. It converts to
|
|
unsigned int. This mostly causes problems on 64-bit platforms, where
|
|
long and int are not the same size.
|
|
|
|
Second, if you write a function definition with no return type at
|
|
all:
|
|
|
|
operate(a, b)
|
|
int a, b;
|
|
{
|
|
...
|
|
}
|
|
|
|
that function is expected to return int, *not* void. GCC will warn
|
|
about this. K+R C has no problem with 'void' as a return type, so you
|
|
need not worry about that.
|
|
|
|
Implicit function declarations always have return type int. So if you
|
|
correct the above definition to
|
|
|
|
void
|
|
operate(a, b)
|
|
int a, b;
|
|
...
|
|
|
|
but operate() is called above its definition, you will get an error
|
|
about a "type mismatch with previous implicit declaration". The cure
|
|
is to prototype all functions at the top of the file, or in an
|
|
appropriate header.
|
|
|
|
Char vs unsigned char vs int
|
|
----------------------------
|
|
|
|
In C, unqualified 'char' may be either signed or unsigned; it is the
|
|
implementation's choice. When you are processing 7-bit ASCII, it does
|
|
not matter. But when your program must handle arbitrary binary data,
|
|
or fully 8-bit character sets, you have a problem. The most obvious
|
|
issue is if you have a look-up table indexed by characters.
|
|
|
|
For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
|
|
WITH ACUTE ACCENT. In the proper locale, isalpha('\341') will be
|
|
true. But if you read '\341' from a file and store it in a plain
|
|
char, isalpha(c) may look up character 225, or it may look up
|
|
character -31. And the ctype table has no entry at offset -31, so
|
|
your program will crash. (If you're lucky.)
|
|
|
|
It is wise to use unsigned char everywhere you possibly can. This
|
|
avoids all these problems. Unfortunately, the routines in <string.h>
|
|
take plain char arguments, so you have to remember to cast them back
|
|
and forth - or avoid the use of strxxx() functions, which is probably
|
|
a good idea anyway.
|
|
|
|
Another common mistake is to use either char or unsigned char to
|
|
receive the result of getc() or related stdio functions. They may
|
|
return EOF, which is outside the range of values representable by
|
|
char. If you use char, some legal character value may be confused
|
|
with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
|
|
The correct choice is int.
|
|
|
|
A more subtle version of the same mistake might look like this:
|
|
|
|
unsigned char pushback[NPUSHBACK];
|
|
int pbidx;
|
|
#define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
|
|
#define get(c) (pbidx ? pushback[--pbidx] : getchar())
|
|
...
|
|
unget(EOF);
|
|
|
|
which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
|
|
WITH UMLAUT.
|
|
|
|
|
|
Other common pitfalls
|
|
---------------------
|
|
|
|
o Expecting 'plain' char to be either sign or unsigned extending
|
|
|
|
o Shifting an item by a negative amount or by greater than or equal to
|
|
the number of bits in a type (expecting shifts by 32 to be sensible
|
|
has caused quite a number of bugs at least in the early days).
|
|
|
|
o Expecting ints shifted right to be sign extended.
|
|
|
|
o Modifying the same value twice within one sequence point.
|
|
|
|
o Host vs. target floating point representation, including emitting NaNs
|
|
and Infinities in a form that the assembler handles.
|
|
|
|
o qsort being an unstable sort function (unstable in the sense that
|
|
multiple items that sort the same may be sorted in different orders
|
|
by different qsort functions).
|
|
|
|
o Passing incorrect types to fprintf and friends.
|
|
|
|
o Adding a function declaration for a module declared in another file to
|
|
a .c file instead of to a .h file.
|