David Malcolm ebedc9a341 Source range tracking in libcpp and C FE, with bit-packing optimization
This patch combines:
  [PATCH 05/10] Add ranges to libcpp tokens (via ad-hoc data, unoptimized)
  [PATCH 06/10] Track expression ranges in C frontend
  [PATCH 07/10] Add plugin to recursively dump the source-ranges in a tree (v2)
  [PATCH 08/10] Wire things up so that libcpp users get token underlines
  [PATCH 09/10] Delay some resolution of ad-hoc locations, preserving ranges
  [PATCH 10/10] Compress short ranges into source_location
  [PATCH] libcpp: add examples to source_location description
along with fixes for the nits identified during review.

gcc/ChangeLog:
	* Makefile.in (OBJS): Add gcc-rich-location.o.
	* diagnostic.c (diagnostic_append_note): Pass line_table to
	rich_location ctor.
	(emit_diagnostic): Likewise.
	(inform): Likewise.
	(inform_n): Likewise.
	(warning): Likewise.
	(warning_at): Likewise.
	(warning_n): Likewise.
	(pedwarn): Likewise.
	(permerror): Likewise.
	(error): Likewise.
	(error_n): Likewise.
	(error_at): Likewise.
	(sorry): Likewise.
	(fatal_error): Likewise.
	(internal_error): Likewise.
	(internal_error_no_backtrace): Likewise.
	(source_range::debug): Likewise.
	* gcc-rich-location.c: New file.
	* gcc-rich-location.h: New file.
	* genmatch.c (fatal_at): Pass line_table to rich_location ctor.
	(warning_at): Likewise.
	* gimple.h (gimple_set_block): Use set_block function.
	* input.c (dump_line_table_statistics): Dump stats on how many
	ranges were optimized vs how many needed ad-hoc table.
	(write_digit_row): Add "map" param; use its range_bits
	to calculate the per-character offset.
	(dump_location_info): Print the range and column bits for each
	ordinary map.  Use the range bits to calculate the per-character
	offset.  Pass the map as a new param to the various calls to
	write_digit_row.  Eliminate uses of
	ORDINARY_MAP_NUMBER_OF_COLUMN_BITS.
	* print-tree.c (print_node): Print any source range information.
	* rtl-error.c (diagnostic_for_asm): Likewise.
	* toplev.c (general_init): Initialize line_table's
	default_range_bits.
	* tree-cfg.c (move_block_to_fn): Likewise.
	(move_block_to_fn): Likewise.
	* tree-inline.c (copy_phis_for_bb): Likewise.
	* tree.c (tree_set_block): Likewise.
	(get_pure_location): New function.
	(set_source_range): New functions.
	(set_block): New function.
	(set_source_range): New functions.
	* tree.h (CAN_HAVE_RANGE_P): New.
	(EXPR_LOCATION_RANGE): New.
	(EXPR_HAS_RANGE): New.
	(get_expr_source_range): New inline function.
	(DECL_LOCATION_RANGE): New.
	(set_source_range): New decls.
	(get_decl_source_range): New inline function.

gcc/ada/ChangeLog:
	* gcc-interface/trans.c (Sloc_to_locus): Add line_table param when
	calling linemap_position_for_line_and_column.

gcc/c-family/ChangeLog:
	* c-common.c (c_fully_fold_internal): Capture existing souce_range,
	and store it on the result.
	* c-opts.c (c_common_init_options): Set
	global_dc->colorize_source_p.

gcc/c/ChangeLog:
	* c-decl.c (warn_defaults_to): Pass line_table to
	rich_location ctor.
	* c-errors.c (pedwarn_c99): Likewise.
	(pedwarn_c90): Likewise.
	* c-parser.c (set_c_expr_source_range): New functions.
	(c_token::get_range): New method.
	(c_token::get_finish): New method.
	(c_parser_expr_no_commas): Call set_c_expr_source_range on the ret
	based on the range from the start of the LHS to the end of the
	RHS.
	(c_parser_conditional_expression): Likewise, based on the range
	from the start of the cond.value to the end of exp2.value.
	(c_parser_binary_expression): Call set_c_expr_source_range on
	the stack values for TRUTH_ANDIF_EXPR and TRUTH_ORIF_EXPR.
	(c_parser_cast_expression): Call set_c_expr_source_range on ret
	based on the cast_loc through to the end of the expr.
	(c_parser_unary_expression): Likewise, based on the
	op_loc through to the end of op.
	(c_parser_sizeof_expression) Likewise, based on the start of the
	sizeof token through to either the closing paren or the end of
	expr.
	(c_parser_postfix_expression): Likewise, using the token range,
	or from the open paren through to the close paren for
	parenthesized expressions.
	(c_parser_postfix_expression_after_primary): Likewise, for
	various kinds of expression.
	* c-tree.h (struct c_expr): Add field "src_range".
	(c_expr::get_start): New method.
	(c_expr::get_finish): New method.
	(set_c_expr_source_range): New decls.
	* c-typeck.c (parser_build_unary_op): Call set_c_expr_source_range
	on ret for prefix unary ops.
	(parser_build_binary_op): Likewise, running from the start of
	arg1.value through to the end of arg2.value.

gcc/cp/ChangeLog:
	* error.c (pedwarn_cxx98): Pass line_table to rich_location ctor.

gcc/fortran/ChangeLog:
	* error.c (gfc_warning): Pass line_table to rich_location ctor.
	(gfc_warning_now_at): Likewise.
	(gfc_warning_now): Likewise.
	(gfc_error_now): Likewise.
	(gfc_fatal_error): Likewise.
	(gfc_error): Likewise.
	(gfc_internal_error): Likewise.

gcc/testsuite/ChangeLog:
	* gcc.dg/diagnostic-token-ranges.c: New file.
	* gcc.dg/diagnostic-tree-expr-ranges-2.c: New file.
	* gcc.dg/plugin/diagnostic-test-expressions-1.c: New file.
	* gcc.dg/plugin/diagnostic-test-show-trees-1.c: New file.
	* gcc.dg/plugin/diagnostic_plugin_show_trees.c: New file.
	* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c (get_loc): Add
	line_table param when calling
	linemap_position_for_line_and_column.
	(test_show_locus): Pass line_table to rich_location ctors.
	(plugin_init): Remove setting of global_dc->colorize_source_p.
	* gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c:
	New file.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
	diagnostic_plugin_test_tree_expression_range.c,
	diagnostic-test-expressions-1.c, diagnostic_plugin_show_trees.c,
	and diagnostic-test-show-trees-1.c.

libcpp/ChangeLog:
	* errors.c (cpp_diagnostic): Pass pfile->line_table to
	rich_location ctor.
	(cpp_diagnostic_with_line): Likewise.
	* include/cpplib.h (struct cpp_token): Update comment for src_loc
	to indicate that the range of the token is "baked into" the
	source_location.
	* include/line-map.h (source_location): Update the descriptive
	comment to reflect the packing scheme for short ranges, adding
	worked examples of location encoding.
	(struct line_map_ordinary): Drop field "column_bits" in favor
	of field "m_column_and_range_bits"; add field "m_range_bits".
	(ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): Delete.
	(location_adhoc_data): Add source_range field.
	(struct line_maps): Add fields "default_range_bits",
	"num_optimized_ranges" and "num_unoptimized_ranges".
	(get_combined_adhoc_loc): Add source_range param.
	(get_range_from_loc): New declaration.
	(pure_location_p): New prototype.
	(COMBINE_LOCATION_DATA):  Add source_range param.
	(SOURCE_LINE): Update for renaming of column_bits.
	(SOURCE_COLUMN): Likewise.  Shift the column right by the map's
	range_bits.
	(LAST_SOURCE_LINE_LOCATION): Update for renaming of column_bits.
	(linemap_position_for_line_and_column): Add line_maps * params.
	(rich_location::rich_location): Likewise.
	* lex.c (_cpp_lex_direct): Capture the range of the token, baking
	it into token->src_loc via a call to COMBINE_LOCATION_DATA.
	* line-map.c (LINE_MAP_MAX_COLUMN_NUMBER): Reduce from 1U << 17 to
	1U << 12.
	(location_adhoc_data_hash): Add the src_range into
	the hash value.
	(location_adhoc_data_eq): Require equality of the src_range
	values.
	(can_be_stored_compactly_p): New function.
	(get_combined_adhoc_loc): Add src_range param, and store it,
	via a bit-packing scheme for short ranges, otherwise within the
	lookaside table.  Remove the requirement that data is non-NULL.
	(get_range_from_adhoc_loc): New function.
	(get_range_from_loc): New function.
	(pure_location_p): New function.
	(linemap_add): Ensure that start_location has zero for the
	range_bits, unless we're past LINE_MAP_MAX_LOCATION_WITH_COLS.
	Initialize range_bits to zero.  Assert that the start_location
	is "pure".
	(linemap_line_start): Assert that the
	column_and_range_bits >= range_bits.
	Update determinination of whether we need to start a new map
	using the effective column bits, without the range bits.
	Use the set's default_range_bits in new maps, apart from
	those with column_bits == 0, which should also have 0 range_bits.
	Increase the column bits for new maps by the range bits.
	When adding lines to an existing map, use set->highest_line
	directly rather than offsetting highest by SOURCE_COLUMN.
	Add assertions to sanity-check the return value.
	(linemap_position_for_column): Offset to_column by range_bits.
	Update set->highest_location if necessary.
	(linemap_position_for_line_and_column): Add line_maps * param.
	Update the calculation to offset the column by range_bits, and
	conditionalize it on being <= LINE_MAP_MAX_LOCATION_WITH_COLS.
	Bound it by LINEMAPS_MACRO_LOWEST_LOCATION.  Update
	set->highest_location if necessary.
	(linemap_position_for_loc_and_offset): Handle ad-hoc locations;
	pass "set" to linemap_position_for_line_and_column.
	(linemap_macro_map_loc_unwind_toward_spelling): Add line_maps
	param.  Handle ad-hoc locations.
	(linemap_location_in_system_header_p): Pass on "set" to call to
	linemap_macro_map_loc_unwind_toward_spelling.
	(linemap_macro_loc_to_spelling_point): Retain ad-hoc locations.
	Pass on "set" to call to
	linemap_macro_map_loc_unwind_toward_spelling.
	(linemap_resolve_location): Retain ad-hoc locations.  Pass on
	"set" to call to linemap_macro_map_loc_unwind_toward_spelling.
	(linemap_unwind_toward_expansion):  Pass on "set" to call to
	linemap_macro_map_loc_unwind_toward_spelling.
	(linemap_expand_location): Extract the data pointer before
	extracting the location.
	(rich_location::rich_location): Add line_maps param; use it to
	extract the range from the source_location.
	* location-example.txt: Regenerate, showing new representation.

From-SVN: r230331
2015-11-13 16:29:59 +00:00
..
2015-11-11 14:18:08 +00:00
2015-11-11 14:18:08 +00:00
2015-11-11 14:18:08 +00:00
2015-11-11 14:18:08 +00:00
2015-11-11 14:18:08 +00:00
2015-11-11 14:18:08 +00:00
2015-11-09 11:33:30 +00:00
2015-10-30 15:48:59 +00:00
2015-11-10 12:27:33 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-05 16:08:08 +01:00
2015-11-07 10:01:52 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-05 16:08:08 +01:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-11 08:06:03 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-13 00:16:12 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-19 23:47:35 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-09 12:16:55 +00:00
2015-10-30 15:48:59 +00:00
2015-11-07 10:01:52 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-13 12:28:54 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-09 15:47:01 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-09 15:47:01 +00:00
2015-11-09 15:47:01 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 23:56:32 +03:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-09 12:16:55 +00:00
2015-11-11 11:21:44 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-10 09:12:52 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 23:56:32 +03:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-09 12:16:55 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-11 11:21:44 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-11-05 16:08:08 +01:00
2015-10-30 15:48:59 +00:00
2015-11-11 11:21:44 +00:00
2015-11-01 22:27:14 -07:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00
2015-10-30 15:48:59 +00:00

Copyright (C) 2000-2015 Free Software Foundation, Inc.

This file is intended to contain a few notes about writing C code
within GCC so that it compiles without error on the full range of
compilers GCC needs to be able to compile on.

The problem is that many ISO-standard constructs are not accepted by
either old or buggy compilers, and we keep getting bitten by them.
This knowledge until now has been sparsely spread around, so I
thought I'd collect it in one useful place.  Please add and correct
any problems as you come across them.

I'm going to start from a base of the ISO C90 standard, since that is
probably what most people code to naturally.  Obviously using
constructs introduced after that is not a good idea.

For the complete coding style conventions used in GCC, please read
http://gcc.gnu.org/codingconventions.html


String literals
---------------

Irix6 "cc -n32" and OSF4 "cc" have problems with constant string
initializers with parens around it, e.g.

const char string[] = ("A string");

This is unfortunate since this is what the GNU gettext macro N_
produces.  You need to find a different way to code it.

Some compilers like MSVC++ have fairly low limits on the maximum
length of a string literal; 509 is the lowest we've come across.  You
may need to break up a long printf statement into many smaller ones.


Empty macro arguments
---------------------

ISO C (6.8.3 in the 1990 standard) specifies the following:

If (before argument substitution) any argument consists of no
preprocessing tokens, the behavior is undefined.

This was relaxed by ISO C99, but some older compilers emit an error,
so code like

#define foo(x, y) x y
foo (bar, )

needs to be coded in some other way.


Avoid unnecessary test before free
----------------------------------

Since SunOS 4 stopped being a reasonable portability target,
(which happened around 2007) there has been no need to guard
against "free (NULL)".  Thus, any guard like the following
constitutes a redundant test:

  if (P)
    free (P);

It is better to avoid the test.[*]
Instead, simply free P, regardless of whether it is NULL.

[*] However, if your profiling exposes a test like this in a
performance-critical loop, say where P is nearly always NULL, and
the cost of calling free on a NULL pointer would be prohibitively
high, consider using __builtin_expect, e.g., like this:

  if (__builtin_expect (ptr != NULL, 0))
    free (ptr);



Trigraphs
---------

You weren't going to use them anyway, but some otherwise ISO C
compliant compilers do not accept trigraphs.


Suffixes on Integer Constants
-----------------------------

You should never use a 'l' suffix on integer constants ('L' is fine),
since it can easily be confused with the number '1'.


			Common Coding Pitfalls
			======================

errno
-----

errno might be declared as a macro.


Implicit int
------------

In C, the 'int' keyword can often be omitted from type declarations.
For instance, you can write

  unsigned variable;

as shorthand for

  unsigned int variable;

There are several places where this can cause trouble.  First, suppose
'variable' is a long; then you might think

  (unsigned) variable

would convert it to unsigned long.  It does not.  It converts to
unsigned int.  This mostly causes problems on 64-bit platforms, where
long and int are not the same size.

Second, if you write a function definition with no return type at
all:

  operate (int a, int b)
  {
    ...
  }

that function is expected to return int, *not* void.  GCC will warn
about this.

Implicit function declarations always have return type int.  So if you
correct the above definition to

  void
  operate (int a, int b)
  ...

but operate() is called above its definition, you will get an error
about a "type mismatch with previous implicit declaration".  The cure
is to prototype all functions at the top of the file, or in an
appropriate header.

Char vs unsigned char vs int
----------------------------

In C, unqualified 'char' may be either signed or unsigned; it is the
implementation's choice.  When you are processing 7-bit ASCII, it does
not matter.  But when your program must handle arbitrary binary data,
or fully 8-bit character sets, you have a problem.  The most obvious
issue is if you have a look-up table indexed by characters.

For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
WITH ACUTE ACCENT.  In the proper locale, isalpha('\341') will be
true.  But if you read '\341' from a file and store it in a plain
char, isalpha(c) may look up character 225, or it may look up
character -31.  And the ctype table has no entry at offset -31, so
your program will crash.  (If you're lucky.)

It is wise to use unsigned char everywhere you possibly can.  This
avoids all these problems.  Unfortunately, the routines in <string.h>
take plain char arguments, so you have to remember to cast them back
and forth - or avoid the use of strxxx() functions, which is probably
a good idea anyway.

Another common mistake is to use either char or unsigned char to
receive the result of getc() or related stdio functions.  They may
return EOF, which is outside the range of values representable by
char.  If you use char, some legal character value may be confused
with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
The correct choice is int.

A more subtle version of the same mistake might look like this:

  unsigned char pushback[NPUSHBACK];
  int pbidx;
  #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
  #define get(c) (pbidx ? pushback[--pbidx] : getchar())
  ...
  unget(EOF);

which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
WITH UMLAUT.


Other common pitfalls
---------------------

o Expecting 'plain' char to be either sign or unsigned extending.

o Shifting an item by a negative amount or by greater than or equal to
  the number of bits in a type (expecting shifts by 32 to be sensible
  has caused quite a number of bugs at least in the early days).

o Expecting ints shifted right to be sign extended.

o Modifying the same value twice within one sequence point.

o Host vs. target floating point representation, including emitting NaNs
  and Infinities in a form that the assembler handles.

o qsort being an unstable sort function (unstable in the sense that
  multiple items that sort the same may be sorted in different orders
  by different qsort functions).

o Passing incorrect types to fprintf and friends.

o Adding a function declaration for a module declared in another file to
  a .c file instead of to a .h file.