As proposed at <https://gcc.gnu.org/ml/gcc/2014-11/msg00014.html>,
this patch enables -fextended-identifiers by default for all standard
versions including this feature (all C++ versions, C99 and above for
C, but not C90 / C94 / gnu89 / preprocessing assembler). It adds a
couple of tests for areas where I previously noted testsuite coverage
for extended identifiers was lacking, removes -fextended-identifiers
from existing tests, adds -g to various such tests to verify that
extended identifiers don't break debug info generation and removes the
test that was only there to verify that the feature was off by
default.
The current state of the feature may not correspond exactly to any
particular checklist from 2004/5 (see bug 9449) of what was wanted
before enabling the feature by default, but I don't think it's any
worse than plenty of other features supported by default before every
corner case is fully functional, and think problems can readily be
fixed incrementally.
The following aspects of extended identifiers could still do with more
work (and should be straightforward):
* C -aux-info (output should use UCNs).
* ObjC -gen-decls (output should use UCNs; associated diagnostics from
the ObjC front end should use extended characters or UCNs as
appropriate to the locale, via using %qE or identifier_to_locale).
* Use DW_AT_use_UTF8 in DWARF-3 debug info for compilation units built
with extended identifiers enabled (or unconditionally).
* cpplib diagnostics (outputting characters or UCNs as appropriate
depending on the locale, as done for identifiers in non-cpplib
diagnostics).
* C++ test for UCN linking with C and extern "C".
* Check GDB support / file issues for support if needed.
* Actual UTF-8 in identifiers (?). (Be careful about not affecting
performance for the normal fast path of lexing identifiers, if
possible.)
The following may be trickier:
* cpplib spelling preservation (required to diagnose macro
redefinition with different spellings of the same identifier in the
definition or argument names; different spellings of the name of the
macro itself are OK, however; also required for correct handling of
multiple stringizing in C++); correct output for -d (UCNs), DWARF
debug info for macros (UCNs), PCH and PCH tests. (Spelling
preservation is the issue that needs fixing to remove references to
corner cases in the documentation of -std=c99 and -std=c11 and in
c99status.html.) The idea would be to add a second pointer to
cpp_identifier that stores the original spelling (whether for
extended identifiers only, or for all identifiers); this does not
enlarge cpp_token because the resulting larger cpp_identifier
structure is no bigger than cpp_string.
* C++ translation of extended characters (including $@` and various
control characters) to UCNs in phase 1 (note diagnostics thus
needed, but not for C++11, for control characters in strings /
character constants as those UCNs invalid); a likely implementation
approach is to do translation when identifiers / strings / character
constants are lexed, together with errors for stray $@` / control
characters in program as not being valid UCNs in identifiers ($ only
if not accepted in identifiers); note that this translation should
not take place inside raw string literals.
Bootstrapped with no regressions on x86_64-unknown-linux-gnu.
libcpp:
PR preprocessor/9449
* init.c (lang_defaults): Enable extended identifiers for C++ and
C99-based standards.
gcc:
PR preprocessor/9449
* doc/cpp.texi (Character sets, Tokenization)
(Implementation-defined behavior): Don't refer to UCNs in
identifiers requiring -fextended-identifiers.
* doc/cppopts.texi (-fextended-identifiers): Document as enabled
by default for C99 and later and C++.
* doc/invoke.texi (-std=c99, -std=c11): Don't refer to extended
identifiers needing -fextended-identifiers.
gcc/testsuite:
PR preprocessor/9449
* lib/target-supports.exp (check_effective_target_ucn_nocache):
Don't use -fextended-identifiers.
* c-c++-common/cpp/normalize-3.c, c-c++-common/cpp/ucnid-2011-1.c,
g++.dg/cpp/ucn-1.C, g++.dg/cpp/ucnid-1.C, g++.dg/other/ucnid-1.C,
gcc.dg/cpp/normalize-1.c, gcc.dg/cpp/normalize-2.c,
gcc.dg/cpp/normalize-4.c: Don't use -fextended-identifiers.
* gcc.dg/cpp/ucnid-1.c: Don't use -fextended-identifiers. Use
-g3.
* gcc.dg/cpp/ucnid-10.c, gcc.dg/cpp/ucnid-2.c,
gcc.dg/cpp/ucnid-3.c, gcc.dg/cpp/ucnid-4.c, gcc.dg/cpp/ucnid-5.c,
gcc.dg/cpp/ucnid-7.c, gcc.dg/cpp/ucnid-9.c,
gcc.dg/cpp/warn-normalized-1.c, gcc.dg/cpp/warn-normalized-2.c,
gcc.dg/cpp/warn-normalized-3.c: Don't use -fextended-identifiers.
* gcc.dg/ucnid-1.c, gcc.dg/ucnid-2.c, gcc.dg/ucnid-3.c,
gcc.dg/ucnid-4.c, gcc.dg/ucnid-5.c, gcc.dg/ucnid-6.c: Don't use
-fextended-identifiers. Use -g.
* gcc.dg/ucnid-7.c, gcc.dg/ucnid-8.c: Don't use
-fextended-identifiers.
* gcc.dg/ucnid-9.c: Don't use -fextended-identifiers. Use -g.
* gcc.dg/ucnid-10.c: Don't use -fextended-identifiers.
* gcc.dg/ucnid-11.c, gcc.dg/ucnid-12.c: Don't use
-fextended-identifiers. Use -g.
* gcc.dg/ucnid-13.c: Don't use -fextended-identifiers.
* gcc.dg/cpp/ucnid-8.c: Remove test.
* gcc.dg/cpp/ucnid-10.c, gcc.dg/ucnid-14.c: New tests.
From-SVN: r217144
2014-10-03 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* lex.c (search_line_fast): Add new version to be used for Power8
and later targets when Altivec is enabled. Restrict the existing
Altivec version to big-endian systems so that lvsr is not used on
little endian, where it is deprecated. Remove LE-specific code
from the now-BE-only version.
From-SVN: r215873
2014-10-02 Bernd Edlinger <bernd.edlinger@hotmail.de>
Jeff Law <law@redhat.com>
* charset.c (convert_no_conversion): Reallocate memory with 25%
headroom.
Co-Authored-By: Jeff Law <law@redhat.com>
From-SVN: r215785
* charset.c (conversion): Rename to ...
(cpp_conversion): ... this one; update.
* files.c (file_hash_entry): Rename to ...
(cpp_file_hash_entry): ... this one ; update.
From-SVN: r215482
gcc/ChangeLog:
2014-09-04 Manuel López-Ibáñez <manu@gcc.gnu.org>
* doc/options.texi: Document that Var and Init are required if CPP
is given.
* optc-gen.awk: Require Var and Init if CPP is given.
* common.opt (Wpedantic): Use Init.
libcpp/ChangeLog:
2014-09-04 Manuel López-Ibáñez <manu@gcc.gnu.org>
* macro.c (replace_args): Use cpp_pedwarning, cpp_warning and
CPP_W flags.
* include/cpplib.h: Add CPP_W_C90_C99_COMPAT and CPP_W_PEDANTIC.
* init.c (cpp_create_reader): Do not init to -1 here.
* expr.c (num_binary_op): Use cpp_pedwarning.
gcc/c-family/ChangeLog:
2014-09-04 Manuel López-Ibáñez <manu@gcc.gnu.org>
* c.opt (Wc90-c99-compat,Wc++-compat,Wcomment,Wendif-labels,
Winvalid-pch,Wlong-long,Wmissing-include-dirs,Wmultichar,Wpedantic,
(Wdate-time,Wtraditional,Wundef,Wvariadic-macros): Add CPP, Var
and Init.
* c-opts.c (c_common_handle_option): Do not handle here.
(sanitize_cpp_opts): Likewise.
* c-common.c (struct reason_option_codes_t): Handle
CPP_W_C90_C99_COMPAT and CPP_W_PEDANTIC.
gcc/testsuite/ChangeLog:
2014-09-04 Manuel López-Ibáñez <manu@gcc.gnu.org>
* gcc.dg/cpp/endif-pedantic2.c: More general options do not
override specific ones, but specific ones do.
From-SVN: r214904
libcpp/ChangeLog:
2014-08-29 Manuel López-Ibáñez <manu@gcc.gnu.org>
* macro.c (warn_of_redefinition): Suppress warnings for builtins
that lack the NODE_WARN flag, unless Wbuiltin-macro-redefined.
(_cpp_create_definition): Use Wbuiltin-macro-redefined for
builtins that lack the NODE_WARN flag.
* directives.c (do_undef): Likewise.
* init.c (cpp_init_special_builtins): Do not change flags
depending on Wbuiltin-macro-redefined.
gcc/c-family/ChangeLog:
2014-08-29 Manuel López-Ibáñez <manu@gcc.gnu.org>
* c.opt (Wbuiltin-macro-redefined): Use CPP, Var and Init.
* c-opts.c (c_common_handle_option): Do not handle here.
From-SVN: r214730
libcpp/
2014-08-27 Edward Smith-Rowland <3dw4rd@verizon.net>
PR cpp/23827 - standard C++ should not have hex float preprocessor
tokens
* libcpp/init.c (lang_flags): Change CXX98 flag for extended numbers
from 1 to 0.
* libcpp/expr.c (cpp_classify_number): Weite error message for improper
use of hex floating literal.
gcc/testsuite/
2014-08-27 Edward Smith-Rowland <3dw4rd@verizon.net>
PR cpp/23827 - standard C++ should not have hex float preprocessor
tokens
* g++.dg/cpp/pr23827_cxx11.C: New.
* g++.dg/cpp/pr23827_cxx98.C: New.
* g++.dg/cpp/pr23827_cxx98_neg.C: New.
* gcc.dg/cpp/pr23827_c90.c: New.
* gcc.dg/cpp/pr23827_c90_neg.c: New.
* gcc.dg/cpp/pr23827_c99.c: New.
From-SVN: r214616
PR c/61861
* macro.c (builtin_macro): Add location parameter. Set
location of builtin macro to the expansion point.
(enter_macro_context): Pass location to builtin_macro.
* gcc.dg/pr61861.c: New test.
From-SVN: r213102
When a built-in macro is expanded, the location of the token in the
epansion list is the location of the expansion point of the built-in
macro.
This patch creates a virtual location for that token instead,
effectively tracking locations of tokens resulting from built-in macro
tokens.
libcpp/
* include/line-map.h (line_maps::builtin_location): New data
member.
(line_map_init): Add a new parameter to initialize the new
line_maps::builtin_location data member.
* line-map.c (linemap_init): Initialize the
line_maps::builtin_location data member.
* macro.c (builtin_macro): Create a macro map and track the token
resulting from the expansion of a built-in macro.
gcc/
* input.h (is_location_from_builtin_token): New function
declaration.
* input.c (is_location_from_builtin_token): New function
definition.
* toplev.c (general_init): Tell libcpp what the pre-defined
spelling location for built-in tokens is.
Signed-off-by: Dodji Seketeli <dodji@redhat.com>
From-SVN: r212637
2014-07-10 Edward Smith-Rowland <3dw4rd@verizon.net>
Jonathan Wakely <jwakely@redhat.com>
PR CPP/61389
* macro.c (_cpp_arguments_ok, parse_params, create_iso_definition):
Warning messages mention C++11 in c++ mode and C99 in c mode.
* lex.c (lex_identifier_intern, lex_identifier): Ditto
Co-Authored-By: Jonathan Wakely <jwakely@redhat.com>
From-SVN: r212441
libcpp/
2014-07-09 Edward Smith-Rowland <3dw4rd@verizon.net>
PR c++/58155 - -Wliteral-suffix warns about tokens which are skipped
by preprocessor
* lex.c (lex_raw_string ()): Do not warn about invalid suffix
if skipping. (lex_string ()): Ditto.
gcc/testsuite/
2014-07-09 Edward Smith-Rowland <3dw4rd@verizon.net>
PR c++/58155 - -Wliteral-suffix warns about tokens which are skipped
g++.dg/cpp0x/pr58155.C: New.
From-SVN: r212392
PR c++/61038
I was asked to combine the escape logic for regular chars and strings
with the escape logic for user-defined literals chars and strings.
I just forgot the first time.
I forgot the ChangeLog!
From-SVN: r211267
PR c++/61038
I was asked to combine the escape logic for regular chars and strings
with the escape logic for user-defined literals chars and strings.
I just forgot the first time.
From-SVN: r211266
2014-05-26 Richard Biener <rguenther@suse.de>
libcpp/
* configure.ac: Remove long long and __int64 type checks,
add check for uint64_t and fail if that wasn't found.
* include/cpplib.h (cpp_num_part): Use uint64_t.
* config.in: Regenerate.
* configure: Likewise.
gcc/
* configure.ac: Drop __int64 type check. Insist that we
found uint64_t and int64_t.
* hwint.h (HOST_BITS_PER___INT64): Remove.
(HOST_BITS_PER_WIDE_INT): Define to 64 and remove
__int64 case.
(HOST_WIDE_INT_PRINT_*): Remove 32bit case.
(HOST_WIDEST_INT*): Define to HOST_WIDE_INT*.
(HOST_WIDEST_FAST_INT): Remove __int64 case.
* vmsdbg.h (struct _DST_SRC_COMMAND): Use int64_t
for dst_q_src_df_rms_cdt.
* configure: Regenerate.
* config.in: Likewise.
From-SVN: r210928
2014-05-20 Richard Biener <rguenther@suse.de>
gcc/
* config.gcc: Remove need_64bit_hwint.
* configure.ac: Do not define NEED_64BIT_HOST_WIDE_INT.
* hwint.h: Do not check NEED_64BIT_HOST_WIDE_INT but assume
it to be true.
* config.in: Regenerate.
* configure: Likewise.
libcpp/
* configure.ac: Copy gcc logic of detecting a 64bit type.
Remove HOST_WIDE_INT define.
* include/cpplib.h: typedef cpp_num_part to a 64bit type,
similar to how hwint.h does it.
* config.in: Regenerate.
* configure: Likewise.
From-SVN: r210632
2014-05-09 Joey Ye <joey.ye@arm.com>
* files.c (find_file_in_dir): Always try to shorten for DOS
non-system headers.
* init.c (ENABLE_CANONICAL_SYSTEM_HEADERS): Default enabled for DOS.
From-SVN: r210264
2014-05-07 Richard Biener <rguenther@suse.de>
libcpp/
* configure.ac: Always set need_64bit_hwint to yes.
* configure: Regenerated.
* config.gcc: Always set need_64bit_hwint to yes.
From-SVN: r210149
PR preprocessor/58844
* macro.c (enter_macro_context): Only push
macro_real_token_count (macro) tokens rather than
macro->count tokens, regardless of
CPP_OPTION (pfile, track-macro-expansion).
* c-c++-common/cpp/pr58844-1.c: New test.
* c-c++-common/cpp/pr58844-2.c: New test.
From-SVN: r207871
In this problem report, the compiler is fed a (bogus) translation unit
in which some literals contain bytes whose value is zero. The
preprocessor detects that and proceeds to emit diagnostics for that
king of bogus literals. But then when the diagnostics machinery
re-reads the input file again to display the bogus literals with a
caret, it attempts to calculate the length of each of the lines it got
using fgets. The line length calculation is done using strlen. But
that doesn't work well when the content of the line can have several
zero bytes. The result is that the read_line never sees the end of
the line because strlen repeatedly reports that the line ends before
the end-of-line character; so read_line thinks its buffer for reading
the line is too small; it thus increases the buffer, leading to a huge
memory consumption and disaster.
Here is what this patch does.
location_get_source_line is modified to return the length of a source
line that can now contain bytes with zero value.
diagnostic_show_locus() is then modified to consider that a line can
have characters of value zero, and so just shows a white space when
instructed to display one of these characters.
Additionally location_get_source_line is modified to avoid re-reading
each and every line from the beginning of the file until it reaches
the line number N that it is instructed to get; this was leading to
annoying quadratic behaviour when reading adjacent lines near the end
of (big) files. So a cache is now associated to the file opened in
text mode. When the content of the file is read, that content is
stashed in the file cache. That file cache is searched for line
delimiters. A number of line positions are saved in the cache and a
number of file caches are kept in memory. That way when
location_get_source_line is asked to read line N + 1, it just has to
start reading from line N that it has already read.
libcpp/ChangeLog:
* include/line-map.h (linemap_get_file_highest_location): Declare
new function.
* line-map.c (linemap_get_file_highest_location): Define it.
gcc/ChangeLog:
* input.h (location_get_source_line): Take an additional line_size
parameter.
(void diagnostics_file_cache_fini): Declare new function.
* input.c (struct fcache): New type.
(fcache_tab_size, fcache_buffer_size, fcache_line_record_size):
New static constants.
(diagnostic_file_cache_init, total_lines_num)
(lookup_file_in_cache_tab, evicted_cache_tab_entry)
(add_file_to_cache_tab, lookup_or_add_file_to_cache_tab)
(needs_read, needs_grow, maybe_grow, read_data, maybe_read_data)
(get_next_line, read_next_line, goto_next_line, read_line_num):
New static function definitions.
(diagnostic_file_cache_fini): New function.
(location_get_source_line): Take an additional output line_len
parameter. Re-write using lookup_or_add_file_to_cache_tab and
read_line_num.
* diagnostic.c (diagnostic_finish): Call
diagnostic_file_cache_fini.
(adjust_line): Take an additional input parameter for the length
of the line, rather than calculating it with strlen.
(diagnostic_show_locus): Adjust the use of
location_get_source_line and adjust_line with respect to their new
signature. While displaying a line now, do not stop at the first
null byte. Rather, display the zero byte as a space and keep
going until we reach the size of the line.
* Makefile.in: Add vec.o to OBJS-libcommon
gcc/testsuite/ChangeLog:
* c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
Signed-off-by: Dodji Seketeli <dodji@seketeli.org>
From-SVN: r206957
PR preprocessor/55715
libcpp:
* expr.c (num_binary_op): Implement subtraction directly rather
than with negation and falling through into addition case.
gcc/testsuite:
* gcc.dg/cpp/expr-overflow-1.c: New test.
From-SVN: r205846
gcc/testsuite:
* c-c++-common/cpp/ucnid-2011-1.c: New test.
libcpp:
* ucnid.tab: Add C11 and C11NOSTART data.
* makeucnid.c (digit): Rename enum value to N99.
(C11, N11, all_languages): New enum values.
(NUM_CODE_POINTS, MAX_CODE_POINT): New macros.
(flags, decomp, combining_value): Use NUM_CODE_POINTS as array
size.
(decomp): Use unsigned int as element type.
(all_decomp): New array.
(read_ucnid): Handle C11 and C11NOSTART. Use MAX_CODE_POINT.
(read_table): Use MAX_CODE_POINT. Store all decompositions in
all_decomp.
(read_derived): Use MAX_CODE_POINT.
(write_table): Use NUM_CODE_POINTS. Print N99, C11 and N11
flags. Print whole array variable declaration rather than just
array contents.
(char_id_valid, write_context_switch): New functions.
(main): Call write_context_switch.
* ucnid.h: Regenerate.
* include/cpplib.h (struct cpp_options): Add c11_identifiers.
* init.c (struct lang_flags): Add c11_identifiers.
(cpp_set_lang): Set c11_identifiers option from selected language.
* internal.h (struct normalize_state): Document "previous" as
previous starter character.
(NORMALIZE_STATE_UPDATE_IDNUM): Take character as argument.
* charset.c (DIG): Rename enum value to N99.
(C11, N11): New enum values.
(struct ucnrange): Give name to struct. Use short for flags and
unsigned int for end of range. Include ucnid.h for whole variable
declaration.
(ucn_valid_in_identifier): Allow for characters up to 0x10FFFF.
Allow for C11 in determining valid characters and valid start
characters. Use check_nfc for non-Hangul context-dependent
checks. Only store starter characters in nst->previous.
(_cpp_valid_ucn): Pass new argument to
NORMALIZE_STATE_UPDATE_IDNUM.
* lex.c (lex_identifier): Pass new argument to
NORMALIZE_STATE_UPDATE_IDNUM. Call NORMALIZE_STATE_UPDATE_IDNUM
after initial non-UCN part of identifier.
(lex_number): Pass new argument to NORMALIZE_STATE_UPDATE_IDNUM.
From-SVN: r204886