195753SdwmaloneCopyright (C) 2000, 2003 Free Software Foundation, Inc.
295753Sdwmalone
395753SdwmaloneThis file is intended to contain a few notes about writing C code
495753Sdwmalonewithin GCC so that it compiles without error on the full range of
595753Sdwmalonecompilers GCC needs to be able to compile on.
695753Sdwmalone
795753SdwmaloneThe problem is that many ISO-standard constructs are not accepted by
895753Sdwmaloneeither old or buggy compilers, and we keep getting bitten by them.
995753SdwmaloneThis knowledge until know has been sparsely spread around, so I
1095753Sdwmalonethought I'd collect it in one useful place.  Please add and correct
1195753Sdwmaloneany problems as you come across them.
1295753Sdwmalone
1395753SdwmaloneI'm going to start from a base of the ISO C90 standard, since that is
1495753Sdwmaloneprobably what most people code to naturally.  Obviously using
1595753Sdwmaloneconstructs introduced after that is not a good idea.
1695753Sdwmalone
1795753SdwmaloneFor the complete coding style conventions used in GCC, please read
1895753Sdwmalonehttp://gcc.gnu.org/codingconventions.html
1995753Sdwmalone
2095753Sdwmalone
2195753SdwmaloneString literals
2295753Sdwmalone---------------
2395753Sdwmalone
2495753SdwmaloneIrix6 "cc -n32" and OSF4 "cc" have problems with constant string
2595753Sdwmaloneinitializers with parens around it, e.g.
2695753Sdwmalone
2795753Sdwmaloneconst char string[] = ("A string");
2895753Sdwmalone
2995753SdwmaloneThis is unfortunate since this is what the GNU gettext macro N_
3095753Sdwmaloneproduces.  You need to find a different way to code it.
3195753Sdwmalone
3295753SdwmaloneSome compilers like MSVC++ have fairly low limits on the maximum
3395753Sdwmalonelength of a string literal; 509 is the lowest we've come across.  You
3495753Sdwmalonemay need to break up a long printf statement into many smaller ones.
3595753Sdwmalone
3695753Sdwmalone
3795753SdwmaloneEmpty macro arguments
3895753Sdwmalone---------------------
3995753Sdwmalone
4095753SdwmaloneISO C (6.8.3 in the 1990 standard) specifies the following:
4195753Sdwmalone
4295753SdwmaloneIf (before argument substitution) any argument consists of no
4395753Sdwmalonepreprocessing tokens, the behavior is undefined.
4495753Sdwmalone
4595753SdwmaloneThis was relaxed by ISO C99, but some older compilers emit an error,
4695753Sdwmaloneso code like
4795753Sdwmalone
4895753Sdwmalone#define foo(x, y) x y
4995753Sdwmalonefoo (bar, )
5095753Sdwmalone
5195753Sdwmaloneneeds to be coded in some other way.
5295753Sdwmalone
5395753Sdwmalone
5495753Sdwmalonefree and realloc
5595753Sdwmalone----------------
5695753Sdwmalone
5795753SdwmaloneSome implementations crash upon attempts to free or realloc the null
5895753Sdwmalonepointer.  Thus if mem might be null, you need to write
5995753Sdwmalone
6095753Sdwmalone  if (mem)
6195753Sdwmalone    free (mem);
6295753Sdwmalone
6395753Sdwmalone
6495753SdwmaloneTrigraphs
6595753Sdwmalone---------
6695753Sdwmalone
6795753SdwmaloneYou weren't going to use them anyway, but some otherwise ISO C
6895753Sdwmalonecompliant compilers do not accept trigraphs.
6995753Sdwmalone
7095753Sdwmalone
7195753SdwmaloneSuffixes on Integer Constants
7295753Sdwmalone-----------------------------
7395753Sdwmalone
7495753SdwmaloneYou should never use a 'l' suffix on integer constants ('L' is fine),
7595753Sdwmalonesince it can easily be confused with the number '1'.
7695753Sdwmalone
7795753Sdwmalone
7895753Sdwmalone			Common Coding Pitfalls
7995753Sdwmalone			======================
8095753Sdwmalone
8195753Sdwmaloneerrno
8295753Sdwmalone-----
8395753Sdwmalone
8495753Sdwmaloneerrno might be declared as a macro.
8595753Sdwmalone
8695753Sdwmalone
8795753SdwmaloneImplicit int
8895753Sdwmalone------------
8995753Sdwmalone
9095753SdwmaloneIn C, the 'int' keyword can often be omitted from type declarations.
9195753SdwmaloneFor instance, you can write
9295753Sdwmalone
9395753Sdwmalone  unsigned variable;
9495753Sdwmalone
9595753Sdwmaloneas shorthand for
9695753Sdwmalone
9795753Sdwmalone  unsigned int variable;
9895753Sdwmalone
9995753SdwmaloneThere are several places where this can cause trouble.  First, suppose
10095753Sdwmalone'variable' is a long; then you might think
10195753Sdwmalone
10295753Sdwmalone  (unsigned) variable
10395753Sdwmalone
10495753Sdwmalonewould convert it to unsigned long.  It does not.  It converts to
10595753Sdwmaloneunsigned int.  This mostly causes problems on 64-bit platforms, where
10695753Sdwmalonelong and int are not the same size.
10795753Sdwmalone
10895753SdwmaloneSecond, if you write a function definition with no return type at
10995753Sdwmaloneall:
11095753Sdwmalone
11195753Sdwmalone  operate (int a, int b)
11295753Sdwmalone  {
11395753Sdwmalone    ...
11495753Sdwmalone  }
11595753Sdwmalone
11695753Sdwmalonethat function is expected to return int, *not* void.  GCC will warn
11795753Sdwmaloneabout this.
11895753Sdwmalone
11995753SdwmaloneImplicit function declarations always have return type int.  So if you
12095753Sdwmalonecorrect the above definition to
12195753Sdwmalone
12295753Sdwmalone  void
12395753Sdwmalone  operate (int a, int b)
12495753Sdwmalone  ...
12595753Sdwmalone
12695753Sdwmalonebut operate() is called above its definition, you will get an error
12795753Sdwmaloneabout a "type mismatch with previous implicit declaration".  The cure
12895753Sdwmaloneis to prototype all functions at the top of the file, or in an
12995753Sdwmaloneappropriate header.
13095753Sdwmalone
13195753SdwmaloneChar vs unsigned char vs int
13295753Sdwmalone----------------------------
13395753Sdwmalone
13495753SdwmaloneIn C, unqualified 'char' may be either signed or unsigned; it is the
13595753Sdwmaloneimplementation's choice.  When you are processing 7-bit ASCII, it does
13695753Sdwmalonenot matter.  But when your program must handle arbitrary binary data,
13795753Sdwmaloneor fully 8-bit character sets, you have a problem.  The most obvious
13895753Sdwmaloneissue is if you have a look-up table indexed by characters.
13995753Sdwmalone
14095753SdwmaloneFor instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
14195753SdwmaloneWITH ACUTE ACCENT.  In the proper locale, isalpha('\341') will be
14295753Sdwmalonetrue.  But if you read '\341' from a file and store it in a plain
14395753Sdwmalonechar, isalpha(c) may look up character 225, or it may look up
14495753Sdwmalonecharacter -31.  And the ctype table has no entry at offset -31, so
14595753Sdwmaloneyour program will crash.  (If you're lucky.)
14695753Sdwmalone
14795753SdwmaloneIt is wise to use unsigned char everywhere you possibly can.  This
14895753Sdwmaloneavoids all these problems.  Unfortunately, the routines in <string.h>
14995753Sdwmalonetake plain char arguments, so you have to remember to cast them back
15095753Sdwmaloneand forth - or avoid the use of strxxx() functions, which is probably
15195753Sdwmalonea good idea anyway.
15295753Sdwmalone
15395753SdwmaloneAnother common mistake is to use either char or unsigned char to
15495753Sdwmalonereceive the result of getc() or related stdio functions.  They may
15595753Sdwmalonereturn EOF, which is outside the range of values representable by
15695753Sdwmalonechar.  If you use char, some legal character value may be confused
15795753Sdwmalonewith EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
15895753SdwmaloneThe correct choice is int.
15995753Sdwmalone
16095753SdwmaloneA more subtle version of the same mistake might look like this:
16195753Sdwmalone
16295753Sdwmalone  unsigned char pushback[NPUSHBACK];
16395753Sdwmalone  int pbidx;
16495753Sdwmalone  #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
16595753Sdwmalone  #define get(c) (pbidx ? pushback[--pbidx] : getchar())
16695753Sdwmalone  ...
16795753Sdwmalone  unget(EOF);
16895753Sdwmalone
16995753Sdwmalonewhich will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
17095753SdwmaloneWITH UMLAUT.
17195753Sdwmalone
17295753Sdwmalone
17395753SdwmaloneOther common pitfalls
17495753Sdwmalone---------------------
17595753Sdwmalone
17695753Sdwmaloneo Expecting 'plain' char to be either sign or unsigned extending.
17795753Sdwmalone
17895753Sdwmaloneo Shifting an item by a negative amount or by greater than or equal to
17995753Sdwmalone  the number of bits in a type (expecting shifts by 32 to be sensible
18095753Sdwmalone  has caused quite a number of bugs at least in the early days).
18195753Sdwmalone
18295753Sdwmaloneo Expecting ints shifted right to be sign extended.
18395753Sdwmalone
18495753Sdwmaloneo Modifying the same value twice within one sequence point.
18595753Sdwmalone
18695753Sdwmaloneo Host vs. target floating point representation, including emitting NaNs
18795753Sdwmalone  and Infinities in a form that the assembler handles.
18895753Sdwmalone
18995753Sdwmaloneo qsort being an unstable sort function (unstable in the sense that
19095753Sdwmalone  multiple items that sort the same may be sorted in different orders
19195753Sdwmalone  by different qsort functions).
19295753Sdwmalone
19395753Sdwmaloneo Passing incorrect types to fprintf and friends.
19495753Sdwmalone
19595753Sdwmaloneo Adding a function declaration for a module declared in another file to
19695753Sdwmalone  a .c file instead of to a .h file.
19795753Sdwmalone
19895753Sdwmalone