gcc/contrib/unicode
Lewis Hyatt 4fda776a2f libcpp: Update ucnid.h to Unicode 14
This patch updates ucnid.h from Unicode 13 to Unicode 14.  Additionally, the
procedure detailed in contrib/unicode/README, which updates
generated_wcwidth.h, has been expanded with instructions for updating this
file as well, so that both may be done at the same time conveniently.  Two
additional Unicode data files which are needed to create ucnid.h are also
added to source control in contrib/unicode.

contrib/ChangeLog:

	* unicode/README: Added instructions for updating ucnid.h.
	* unicode/DerivedCoreProperties.txt: New file added to source
	control from Unicode 14.0 release.
	* unicode/DerivedNormalizationProps.txt: Likewise.

libcpp/ChangeLog:

	* ucnid.h: Regenerated for Unicode 14.0.
2022-06-28 17:33:37 -04:00
..
from_glibc
DerivedCoreProperties.txt libcpp: Update ucnid.h to Unicode 14 2022-06-28 17:33:37 -04:00
DerivedNormalizationProps.txt libcpp: Update ucnid.h to Unicode 14 2022-06-28 17:33:37 -04:00
EastAsianWidth.txt
gen_wcwidth.py
PropList.txt
README libcpp: Update ucnid.h to Unicode 14 2022-06-28 17:33:37 -04:00
unicode-license.txt
UnicodeData.txt
utf8-dump.py

This directory contains a mechanism for GCC to have its own internal
implementation of wcwidth functionality (cpp_wcwidth () in libcpp/charset.c),
as well as a mechanism to update the information about codepoints permitted in
identifiers, which is encoded in libcpp/ucnid.h.

The idea is to produce the necessary lookup tables
(../../libcpp/{ucnid.h,generated_cpp_wcwidth.h}) in a reproducible way, starting
from the following files that are distributed by the Unicode Consortium:

ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
ftp://ftp.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt
ftp://ftp.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt

These files have been added to source control in this directory;
please see unicode-license.txt for the relevant copyright information.

In order to keep in sync with glibc's wcwidth as much as possible, it is
desirable for the logic that processes the Unicode data to be the same as
glibc's.  To that end, we also put in this directory, in the from_glibc/
directory, the glibc python code that implements their logic.  This code was
copied verbatim from glibc, and it can be updated at any time from the glibc
source code repository.  The files copied from that respository are:

localedata/unicode-gen/unicode_utils.py
localedata/unicode-gen/utf8_gen.py

And the most recent versions added to GCC are from glibc git commit:
f6032247061fb37d59565f2e9667e242c8a98e76

The script gen_wcwidth.py found here contains the GCC-specific code to
map glibc's output to the lookup tables we require.  This script should not need
to change, unless there are structural changes to the Unicode data files or to
the glibc code.  Similarly, makeucnid.cc in ../../libcpp contains the logic to
produce ucnid.h.

The procedure to update GCC's Unicode support is the following:

1.  Update the five Unicode data files from the above URLs.

2.  Update the two glibc files in from_glibc/ from glibc's git.  Update
    the commit number above in this README.

3.  Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
    (where X.Y is the version of the Unicode standard corresponding to the
    Unicode data files being used, most recently, 14.0.0).

4.  Compile makeucnid, e.g. with:
      gcc -O2 ../../libcpp/makeucnid.cc -o ../../libcpp/makeucnid

5.  Generate ucnid.h as follows:
      ../../libcpp/makeucnid ../../libcpp/ucnid.tab UnicodeData.txt \
	DerivedNormalizationProps.txt DerivedCoreProperties.txt \
	> ../../libcpp/ucnid.h