Go to file
David Malcolm bd5e882cf6 diagnostics: escape non-ASCII source bytes for certain diagnostics
This patch adds support to GCC's diagnostic subsystem for escaping certain
bytes and Unicode characters when quoting source code.

Specifically, this patch adds a new flag rich_location::m_escape_on_output
which is a hint from a diagnostic that non-ASCII bytes in the pertinent
lines of the user's source code should be escaped when printed.

The patch sets this for the following diagnostics:
- when complaining about stray bytes in the program (when these
are non-printable)
- when complaining about "null character(s) ignored");
- for -Wnormalized= (and generate source ranges for such warnings)

The escaping is controlled by a new option:
  -fdiagnostics-escape-format=[unicode|bytes]

For example, consider a diagnostic involing a source line containing the
string "before" followed by the Unicode character U+03C0 ("GREEK SMALL
LETTER PI", with UTF-8 encoding 0xCF 0x80) followed by the byte 0xBF
(a stray UTF-8 trailing byte), followed by the string "after", where the
diagnostic highlights the U+03C0 character.

By default, this line will be printed verbatim to the user when
reporting a diagnostic at it, as:

 beforeπXafter
       ^

(using X for the stray byte to avoid putting invalid UTF-8 in this
commit message)

If the diagnostic sets the "escape" flag, it will be printed as:

 before<U+03C0><BF>after
       ^~~~~~~~

with -fdiagnostics-escape-format=unicode (the default), or as:

  before<CF><80><BF>after
        ^~~~~~~~

if the user supplies -fdiagnostics-escape-format=bytes.

This only affects how the source is printed; it does not affect
how column numbers that are printed (as per -fdiagnostics-column-unit=
and -fdiagnostics-column-origin=).

gcc/c-family/ChangeLog:
	* c-lex.c (c_lex_with_flags): When complaining about non-printable
	CPP_OTHER tokens, set the "escape on output" flag.

gcc/ChangeLog:
	* common.opt (fdiagnostics-escape-format=): New.
	(diagnostics_escape_format): New enum.
	(DIAGNOSTICS_ESCAPE_FORMAT_UNICODE): New enum value.
	(DIAGNOSTICS_ESCAPE_FORMAT_BYTES): Likewise.
	* diagnostic-format-json.cc (json_end_diagnostic): Add
	"escape-source" attribute.
	* diagnostic-show-locus.c
	(exploc_with_display_col::exploc_with_display_col): Replace
	"tabstop" param with a cpp_char_column_policy and add an "aspect"
	param.  Use these to compute m_display_col accordingly.
	(struct char_display_policy): New struct.
	(layout::m_policy): New field.
	(layout::m_escape_on_output): New field.
	(def_policy): New function.
	(make_range): Update for changes to exploc_with_display_col ctor.
	(default_print_decoded_ch): New.
	(width_per_escaped_byte): New.
	(escape_as_bytes_width): New.
	(escape_as_bytes_print): New.
	(escape_as_unicode_width): New.
	(escape_as_unicode_print): New.
	(make_policy): New.
	(layout::layout): Initialize new fields.  Update m_exploc ctor
	call for above change to ctor.
	(layout::maybe_add_location_range): Update for changes to
	exploc_with_display_col ctor.
	(layout::calculate_x_offset_display): Update for change to
	cpp_display_width.
	(layout::print_source_line): Pass policy
	to cpp_display_width_computation. Capture cpp_decoded_char when
	calling process_next_codepoint.  Move printing of source code to
	m_policy.m_print_cb.
	(line_label::line_label): Pass in policy rather than context.
	(layout::print_any_labels): Update for change to line_label ctor.
	(get_affected_range): Pass in policy rather than context, updating
	calls to location_compute_display_column accordingly.
	(get_printed_columns): Likewise, also for cpp_display_width.
	(correction::correction): Pass in policy rather than tabstop.
	(correction::compute_display_cols): Pass m_policy rather than
	m_tabstop to cpp_display_width.
	(correction::m_tabstop): Replace with...
	(correction::m_policy): ...this.
	(line_corrections::line_corrections): Pass in policy rather than
	context.
	(line_corrections::m_context): Replace with...
	(line_corrections::m_policy): ...this.
	(line_corrections::add_hint): Update to use m_policy rather than
	m_context.
	(line_corrections::add_hint): Likewise.
	(layout::print_trailing_fixits): Likewise.
	(selftest::test_display_widths): New.
	(selftest::test_layout_x_offset_display_utf8): Update to use
	policy rather than tabstop.
	(selftest::test_one_liner_labels_utf8): Add test of escaping
	source lines.
	(selftest::test_diagnostic_show_locus_one_liner_utf8): Update to
	use policy rather than tabstop.
	(selftest::test_overlapped_fixit_printing): Likewise.
	(selftest::test_overlapped_fixit_printing_utf8): Likewise.
	(selftest::test_overlapped_fixit_printing_2): Likewise.
	(selftest::test_tab_expansion): Likewise.
	(selftest::test_escaping_bytes_1): New.
	(selftest::test_escaping_bytes_2): New.
	(selftest::diagnostic_show_locus_c_tests): Call the new tests.
	* diagnostic.c (diagnostic_initialize): Initialize
	context->escape_format.
	(convert_column_unit): Update to use default character width policy.
	(selftest::test_diagnostic_get_location_text): Likewise.
	* diagnostic.h (enum diagnostics_escape_format): New enum.
	(diagnostic_context::escape_format): New field.
	* doc/invoke.texi (-fdiagnostics-escape-format=): New option.
	(-fdiagnostics-format=): Add "escape-source" attribute to examples
	of JSON output, and document it.
	* input.c (location_compute_display_column): Pass in "policy"
	rather than "tabstop", passing to
	cpp_byte_column_to_display_column.
	(selftest::test_cpp_utf8): Update to use cpp_char_column_policy.
	* input.h (class cpp_char_column_policy): New forward decl.
	(location_compute_display_column): Pass in "policy" rather than
	"tabstop".
	* opts.c (common_handle_option): Handle
	OPT_fdiagnostics_escape_format_.
	* selftest.c (temp_source_file::temp_source_file): New ctor
	overload taking a size_t.
	* selftest.h (temp_source_file::temp_source_file): Likewise.

gcc/testsuite/ChangeLog:
	* c-c++-common/diagnostic-format-json-1.c: Add regexp to consume
	"escape-source" attribute.
	* c-c++-common/diagnostic-format-json-2.c: Likewise.
	* c-c++-common/diagnostic-format-json-3.c: Likewise.
	* c-c++-common/diagnostic-format-json-4.c: Likewise, twice.
	* c-c++-common/diagnostic-format-json-5.c: Likewise.
	* gcc.dg/cpp/warn-normalized-4-bytes.c: New test.
	* gcc.dg/cpp/warn-normalized-4-unicode.c: New test.
	* gcc.dg/encoding-issues-bytes.c: New test.
	* gcc.dg/encoding-issues-unicode.c: New test.
	* gfortran.dg/diagnostic-format-json-1.F90: Add regexp to consume
	"escape-source" attribute.
	* gfortran.dg/diagnostic-format-json-2.F90: Likewise.
	* gfortran.dg/diagnostic-format-json-3.F90: Likewise.

libcpp/ChangeLog:
	* charset.c (convert_escape): Use encoding_rich_location when
	complaining about nonprintable unknown escape sequences.
	(cpp_display_width_computation::::cpp_display_width_computation):
	Pass in policy rather than tabstop.
	(cpp_display_width_computation::process_next_codepoint): Add "out"
	param and populate *out if non-NULL.
	(cpp_display_width_computation::advance_display_cols): Pass NULL
	to process_next_codepoint.
	(cpp_byte_column_to_display_column): Pass in policy rather than
	tabstop.  Pass NULL to process_next_codepoint.
	(cpp_display_column_to_byte_column): Pass in policy rather than
	tabstop.
	* errors.c (cpp_diagnostic_get_current_location): New function,
	splitting out the logic from...
	(cpp_diagnostic): ...here.
	(cpp_warning_at): New function.
	(cpp_pedwarning_at): New function.
	* include/cpplib.h (cpp_warning_at): New decl for rich_location.
	(cpp_pedwarning_at): Likewise.
	(struct cpp_decoded_char): New.
	(struct cpp_char_column_policy): New.
	(cpp_display_width_computation::cpp_display_width_computation):
	Replace "tabstop" param with "policy".
	(cpp_display_width_computation::process_next_codepoint): Add "out"
	param.
	(cpp_display_width_computation::m_tabstop): Replace with...
	(cpp_display_width_computation::m_policy): ...this.
	(cpp_byte_column_to_display_column): Replace "tabstop" param with
	"policy".
	(cpp_display_width): Likewise.
	(cpp_display_column_to_byte_column): Likewise.
	* include/line-map.h (rich_location::escape_on_output_p): New.
	(rich_location::set_escape_on_output): New.
	(rich_location::m_escape_on_output): New.
	* internal.h (cpp_diagnostic_get_current_location): New decl.
	(class encoding_rich_location): New.
	* lex.c (skip_whitespace): Use encoding_rich_location when
	complaining about null characters.
	(warn_about_normalization): Generate a source range when
	complaining about improperly normalized tokens, rather than just a
	point, and use encoding_rich_location so that the source code
	is escaped on printing.
	* line-map.c (rich_location::rich_location): Initialize
	m_escape_on_output.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-01 09:35:46 -04:00
c++tools Daily bump. 2021-10-27 00:16:33 +00:00
config Daily bump. 2021-09-20 00:16:21 +00:00
contrib Daily bump. 2021-10-21 00:16:29 +00:00
fixincludes Daily bump. 2021-08-31 00:16:50 +00:00
gcc diagnostics: escape non-ASCII source bytes for certain diagnostics 2021-11-01 09:35:46 -04:00
gnattools Daily bump. 2021-10-23 00:16:26 +00:00
gotools Daily bump. 2021-09-22 00:16:28 +00:00
include Daily bump. 2021-09-28 00:16:21 +00:00
INSTALL
intl Daily bump. 2021-06-15 00:16:37 +00:00
libada Daily bump. 2021-10-23 00:16:26 +00:00
libatomic Daily bump. 2021-07-22 00:16:46 +00:00
libbacktrace Daily bump. 2021-10-23 00:16:26 +00:00
libcc1 Daily bump. 2021-08-18 00:16:48 +00:00
libcody libcody: add mostlyclean Makefile target 2021-11-01 04:47:38 +01:00
libcpp diagnostics: escape non-ASCII source bytes for certain diagnostics 2021-11-01 09:35:46 -04:00
libdecnumber Daily bump. 2021-10-23 00:16:26 +00:00
libffi Daily bump. 2021-10-28 00:16:39 +00:00
libgcc Daily bump. 2021-10-28 00:16:39 +00:00
libgfortran Daily bump. 2021-10-19 00:16:23 +00:00
libgo runtime: set runtime.GOROOT value at build time 2021-09-21 14:31:10 -07:00
libgomp Daily bump. 2021-10-31 00:16:24 +00:00
libiberty Daily bump. 2021-10-23 00:16:26 +00:00
libitm Daily bump. 2021-06-18 00:16:58 +00:00
libobjc Daily bump. 2021-01-06 00:16:55 +00:00
liboffloadmic Daily bump. 2021-10-20 00:16:43 +00:00
libphobos Daily bump. 2021-11-01 00:16:20 +00:00
libquadmath Daily bump. 2021-06-09 00:16:30 +00:00
libsanitizer Daily bump. 2021-10-09 00:16:26 +00:00
libssp Daily bump. 2021-01-06 00:16:55 +00:00
libstdc++-v3 libstdc++: Fix range access for empty std::valarray [PR103022] 2021-11-01 13:26:29 +00:00
libvtv Daily bump. 2021-01-06 00:16:55 +00:00
lto-plugin Daily bump. 2021-09-14 00:16:23 +00:00
maintainer-scripts Daily bump. 2021-05-15 00:16:27 +00:00
zlib Daily bump. 2021-01-06 00:16:55 +00:00
.dir-locals.el dir-locals: Use https for bug references 2021-07-20 11:40:34 +01:00
.gitattributes
.gitignore Add cscope.out to git ignore. 2021-06-24 16:51:40 +05:30
ABOUT-NLS
ar-lib
ChangeLog Daily bump. 2021-10-29 00:16:37 +00:00
ChangeLog.jit
ChangeLog.tree-ssa
compile
config-ml.in
config.guess config.sub, config.guess : Import upstream 2021-01-25. 2021-02-23 17:21:10 +08:00
config.rpath
config.sub config.sub, config.guess : Import upstream 2021-01-25. 2021-02-23 17:21:10 +08:00
configure [PATCH 1/5] Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets 2021-10-28 10:42:49 -04:00
configure.ac [PATCH 1/5] Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets 2021-10-28 10:42:49 -04:00
COPYING
COPYING3
COPYING3.LIB
COPYING.LIB
COPYING.RUNTIME
depcomp
install-sh
libtool-ldflags
libtool.m4 Update GNU/Hurd configure support 2021-01-05 16:04:14 -07:00
lt~obsolete.m4
ltgcc.m4
ltmain.sh
ltoptions.m4
ltsugar.m4
ltversion.m4
MAINTAINERS Fixup MAINTAINERS file 2021-10-26 16:28:04 -04:00
Makefile.def Add install-dvi Makefile targets. 2021-10-22 15:43:50 -07:00
Makefile.in [PATCH 1/5] Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets 2021-10-28 10:42:49 -04:00
Makefile.tpl [PATCH 1/5] Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets 2021-10-28 10:42:49 -04:00
missing
mkdep
mkinstalldirs
move-if-change
multilib.am
README
symlink-tree
test-driver
ylwrap

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.