gcc/libcpp
Jakub Jelinek c4d6dcacfc libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31
The following patch implements the
P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31
paper.  We already allow UTF-8 characters in the source, so that part
is already implemented, so IMHO all we need to do is pedwarn instead of
just warn for the (default) -Wnormalize=nfc (or for -Wnormalize={id,nkfc})
if the character is not in NFC and to use the unicode XID_Start and
XID_Continue derived code properties to find out what characters are allowed
(the standard actually adds U+005F to XID_Start, but we are handling the
ASCII compatible characters differently already and they aren't allowed
in UCNs in identifiers).  Instead of hardcoding the large tables
in ucnid.tab, this patch makes makeucnid.c read them from the Unicode
tables (13.0.0 version at this point).

For non-pedantic mode, we accept as 2nd+ char in identifiers a union
of valid characters in all supported modes, but for the 1st char it
was actually pedantically requiring that it is not any of the characters
that may not appear in the currently chosen standard as the first character.
This patch changes it such that also what is allowed at the start of an
identifier is a union of characters valid at the start of an identifier
in any of the pedantic modes.

2021-09-01  Jakub Jelinek  <jakub@redhat.com>

	PR c++/100977
libcpp/
	* include/cpplib.h (struct cpp_options): Add cxx23_identifiers.
	* charset.c (CXX23, NXX23): New enumerators.
	(CID, NFC, NKC, CTX): Renumber.
	(ucn_valid_in_identifier): Implement P1949R7 - use CXX23 and
	NXX23 flags for cxx23_identifiers.  For start character in
	non-pedantic mode, allow characters that are allowed as start
	characters in any of the supported language modes, rather than
	disallowing characters allowed only as non-start characters in
	current mode but for characters from other language modes allowing
	them even if they are never allowed at start.
	* init.c (struct lang_flags): Add cxx23_identifiers.
	(lang_defaults): Add cxx23_identifiers column.
	(cpp_set_lang): Initialize CPP_OPTION (pfile, cxx23_identifiers).
	* lex.c (warn_about_normalization): If cxx23_identifiers, use
	cpp_pedwarning_with_line instead of cpp_warning_with_line for
	"is not in NFC" diagnostics.
	* makeucnid.c: Adjust usage comment.
	(CXX23, NXX23): New enumerators.
	(all_languages): Add CXX23.
	(not_NFC, not_NFKC, maybe_not_NFC): Renumber.
	(read_derivedcore): New function.
	(write_table): Print also CXX23 and NXX23 columns.
	(main): Require 5 arguments instead of 4, call read_derivedcore.
	* ucnid.h: Regenerated using Unicode 13.0.0 files.
gcc/testsuite/
	* g++.dg/cpp23/normalize1.C: New test.
	* g++.dg/cpp23/normalize2.C: New test.
	* g++.dg/cpp23/normalize3.C: New test.
	* g++.dg/cpp23/normalize4.C: New test.
	* g++.dg/cpp23/normalize5.C: New test.
	* g++.dg/cpp23/normalize6.C: New test.
	* g++.dg/cpp23/normalize7.C: New test.
	* g++.dg/cpp23/ucnid-1-utf8.C: New test.
	* g++.dg/cpp23/ucnid-2-utf8.C: New test.
	* gcc.dg/cpp/ucnid-4.c: Don't expect
	"not valid at the start of an identifier" errors.
	* gcc.dg/cpp/ucnid-4-utf8.c: Likewise.
	* gcc.dg/cpp/ucnid-5-utf8.c: New test.
2021-09-01 22:33:06 +02:00
..
include libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
po Daily bump. 2021-08-17 00:16:32 +00:00
ChangeLog Daily bump. 2021-09-01 00:16:58 +00:00
ChangeLog.jit
Makefile.in Update copyright years. 2021-01-04 10:26:59 +01:00
aclocal.m4
charset.c libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
config.in
configure GCC_CET_HOST_FLAGS: Check if host supports multi-byte NOPs 2021-05-03 05:01:23 -07:00
configure.ac
directives.c preprocessor: Support C2X #elifdef, #elifndef 2021-05-11 23:54:01 +00:00
errors.c Update copyright years. 2021-01-04 10:26:59 +01:00
expr.c preprocessor: Fix pp-number lexing of digit separators [PR83873, PR97604] 2021-05-06 23:20:35 +00:00
files.c diagnostics: Support for -finput-charset [PR93067] 2021-08-25 11:15:28 -04:00
generated_cpp_wcwidth.h libcpp: Update cpp_wcwidth() to Unicode 13.0.0 2020-11-07 09:36:43 -05:00
identifiers.c Update copyright years. 2021-01-04 10:26:59 +01:00
init.c libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
internal.h c++: header-unit build capability [PR 99023] 2021-02-18 13:22:48 -08:00
lex.c libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
line-map.c libcpp: location comparison within macro [PR100796] 2021-06-16 11:41:08 -04:00
location-example.txt
macro.c libcpp: __VA_OPT__ tweak 2021-09-01 21:33:30 +02:00
makeucnid.c libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
mkdeps.c preprocessor: Make quoting : [PR 95253] 2021-01-15 08:56:20 -08:00
pch.c Update copyright years. 2021-01-04 10:26:59 +01:00
symtab.c Update copyright years. 2021-01-04 10:26:59 +01:00
system.h Update copyright years. 2021-01-04 10:26:59 +01:00
traditional.c Update copyright years. 2021-01-04 10:26:59 +01:00
ucnid.h libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31 2021-09-01 22:33:06 +02:00
ucnid.tab Update copyright years. 2021-01-04 10:26:59 +01:00