gcc/libcpp/errors.c
David Malcolm bd5e882cf6 diagnostics: escape non-ASCII source bytes for certain diagnostics
This patch adds support to GCC's diagnostic subsystem for escaping certain
bytes and Unicode characters when quoting source code.

Specifically, this patch adds a new flag rich_location::m_escape_on_output
which is a hint from a diagnostic that non-ASCII bytes in the pertinent
lines of the user's source code should be escaped when printed.

The patch sets this for the following diagnostics:
- when complaining about stray bytes in the program (when these
are non-printable)
- when complaining about "null character(s) ignored");
- for -Wnormalized= (and generate source ranges for such warnings)

The escaping is controlled by a new option:
  -fdiagnostics-escape-format=[unicode|bytes]

For example, consider a diagnostic involing a source line containing the
string "before" followed by the Unicode character U+03C0 ("GREEK SMALL
LETTER PI", with UTF-8 encoding 0xCF 0x80) followed by the byte 0xBF
(a stray UTF-8 trailing byte), followed by the string "after", where the
diagnostic highlights the U+03C0 character.

By default, this line will be printed verbatim to the user when
reporting a diagnostic at it, as:

 beforeπXafter
       ^

(using X for the stray byte to avoid putting invalid UTF-8 in this
commit message)

If the diagnostic sets the "escape" flag, it will be printed as:

 before<U+03C0><BF>after
       ^~~~~~~~

with -fdiagnostics-escape-format=unicode (the default), or as:

  before<CF><80><BF>after
        ^~~~~~~~

if the user supplies -fdiagnostics-escape-format=bytes.

This only affects how the source is printed; it does not affect
how column numbers that are printed (as per -fdiagnostics-column-unit=
and -fdiagnostics-column-origin=).

gcc/c-family/ChangeLog:
	* c-lex.c (c_lex_with_flags): When complaining about non-printable
	CPP_OTHER tokens, set the "escape on output" flag.

gcc/ChangeLog:
	* common.opt (fdiagnostics-escape-format=): New.
	(diagnostics_escape_format): New enum.
	(DIAGNOSTICS_ESCAPE_FORMAT_UNICODE): New enum value.
	(DIAGNOSTICS_ESCAPE_FORMAT_BYTES): Likewise.
	* diagnostic-format-json.cc (json_end_diagnostic): Add
	"escape-source" attribute.
	* diagnostic-show-locus.c
	(exploc_with_display_col::exploc_with_display_col): Replace
	"tabstop" param with a cpp_char_column_policy and add an "aspect"
	param.  Use these to compute m_display_col accordingly.
	(struct char_display_policy): New struct.
	(layout::m_policy): New field.
	(layout::m_escape_on_output): New field.
	(def_policy): New function.
	(make_range): Update for changes to exploc_with_display_col ctor.
	(default_print_decoded_ch): New.
	(width_per_escaped_byte): New.
	(escape_as_bytes_width): New.
	(escape_as_bytes_print): New.
	(escape_as_unicode_width): New.
	(escape_as_unicode_print): New.
	(make_policy): New.
	(layout::layout): Initialize new fields.  Update m_exploc ctor
	call for above change to ctor.
	(layout::maybe_add_location_range): Update for changes to
	exploc_with_display_col ctor.
	(layout::calculate_x_offset_display): Update for change to
	cpp_display_width.
	(layout::print_source_line): Pass policy
	to cpp_display_width_computation. Capture cpp_decoded_char when
	calling process_next_codepoint.  Move printing of source code to
	m_policy.m_print_cb.
	(line_label::line_label): Pass in policy rather than context.
	(layout::print_any_labels): Update for change to line_label ctor.
	(get_affected_range): Pass in policy rather than context, updating
	calls to location_compute_display_column accordingly.
	(get_printed_columns): Likewise, also for cpp_display_width.
	(correction::correction): Pass in policy rather than tabstop.
	(correction::compute_display_cols): Pass m_policy rather than
	m_tabstop to cpp_display_width.
	(correction::m_tabstop): Replace with...
	(correction::m_policy): ...this.
	(line_corrections::line_corrections): Pass in policy rather than
	context.
	(line_corrections::m_context): Replace with...
	(line_corrections::m_policy): ...this.
	(line_corrections::add_hint): Update to use m_policy rather than
	m_context.
	(line_corrections::add_hint): Likewise.
	(layout::print_trailing_fixits): Likewise.
	(selftest::test_display_widths): New.
	(selftest::test_layout_x_offset_display_utf8): Update to use
	policy rather than tabstop.
	(selftest::test_one_liner_labels_utf8): Add test of escaping
	source lines.
	(selftest::test_diagnostic_show_locus_one_liner_utf8): Update to
	use policy rather than tabstop.
	(selftest::test_overlapped_fixit_printing): Likewise.
	(selftest::test_overlapped_fixit_printing_utf8): Likewise.
	(selftest::test_overlapped_fixit_printing_2): Likewise.
	(selftest::test_tab_expansion): Likewise.
	(selftest::test_escaping_bytes_1): New.
	(selftest::test_escaping_bytes_2): New.
	(selftest::diagnostic_show_locus_c_tests): Call the new tests.
	* diagnostic.c (diagnostic_initialize): Initialize
	context->escape_format.
	(convert_column_unit): Update to use default character width policy.
	(selftest::test_diagnostic_get_location_text): Likewise.
	* diagnostic.h (enum diagnostics_escape_format): New enum.
	(diagnostic_context::escape_format): New field.
	* doc/invoke.texi (-fdiagnostics-escape-format=): New option.
	(-fdiagnostics-format=): Add "escape-source" attribute to examples
	of JSON output, and document it.
	* input.c (location_compute_display_column): Pass in "policy"
	rather than "tabstop", passing to
	cpp_byte_column_to_display_column.
	(selftest::test_cpp_utf8): Update to use cpp_char_column_policy.
	* input.h (class cpp_char_column_policy): New forward decl.
	(location_compute_display_column): Pass in "policy" rather than
	"tabstop".
	* opts.c (common_handle_option): Handle
	OPT_fdiagnostics_escape_format_.
	* selftest.c (temp_source_file::temp_source_file): New ctor
	overload taking a size_t.
	* selftest.h (temp_source_file::temp_source_file): Likewise.

gcc/testsuite/ChangeLog:
	* c-c++-common/diagnostic-format-json-1.c: Add regexp to consume
	"escape-source" attribute.
	* c-c++-common/diagnostic-format-json-2.c: Likewise.
	* c-c++-common/diagnostic-format-json-3.c: Likewise.
	* c-c++-common/diagnostic-format-json-4.c: Likewise, twice.
	* c-c++-common/diagnostic-format-json-5.c: Likewise.
	* gcc.dg/cpp/warn-normalized-4-bytes.c: New test.
	* gcc.dg/cpp/warn-normalized-4-unicode.c: New test.
	* gcc.dg/encoding-issues-bytes.c: New test.
	* gcc.dg/encoding-issues-unicode.c: New test.
	* gfortran.dg/diagnostic-format-json-1.F90: Add regexp to consume
	"escape-source" attribute.
	* gfortran.dg/diagnostic-format-json-2.F90: Likewise.
	* gfortran.dg/diagnostic-format-json-3.F90: Likewise.

libcpp/ChangeLog:
	* charset.c (convert_escape): Use encoding_rich_location when
	complaining about nonprintable unknown escape sequences.
	(cpp_display_width_computation::::cpp_display_width_computation):
	Pass in policy rather than tabstop.
	(cpp_display_width_computation::process_next_codepoint): Add "out"
	param and populate *out if non-NULL.
	(cpp_display_width_computation::advance_display_cols): Pass NULL
	to process_next_codepoint.
	(cpp_byte_column_to_display_column): Pass in policy rather than
	tabstop.  Pass NULL to process_next_codepoint.
	(cpp_display_column_to_byte_column): Pass in policy rather than
	tabstop.
	* errors.c (cpp_diagnostic_get_current_location): New function,
	splitting out the logic from...
	(cpp_diagnostic): ...here.
	(cpp_warning_at): New function.
	(cpp_pedwarning_at): New function.
	* include/cpplib.h (cpp_warning_at): New decl for rich_location.
	(cpp_pedwarning_at): Likewise.
	(struct cpp_decoded_char): New.
	(struct cpp_char_column_policy): New.
	(cpp_display_width_computation::cpp_display_width_computation):
	Replace "tabstop" param with "policy".
	(cpp_display_width_computation::process_next_codepoint): Add "out"
	param.
	(cpp_display_width_computation::m_tabstop): Replace with...
	(cpp_display_width_computation::m_policy): ...this.
	(cpp_byte_column_to_display_column): Replace "tabstop" param with
	"policy".
	(cpp_display_width): Likewise.
	(cpp_display_column_to_byte_column): Likewise.
	* include/line-map.h (rich_location::escape_on_output_p): New.
	(rich_location::set_escape_on_output): New.
	(rich_location::m_escape_on_output): New.
	* internal.h (cpp_diagnostic_get_current_location): New decl.
	(class encoding_rich_location): New.
	* lex.c (skip_whitespace): Use encoding_rich_location when
	complaining about null characters.
	(warn_about_normalization): Generate a source range when
	complaining about improperly normalized tokens, rather than just a
	point, and use encoding_rich_location so that the source code
	is escaped on printing.
	* line-map.c (rich_location::rich_location): Initialize
	m_escape_on_output.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-01 09:35:46 -04:00

353 lines
8.5 KiB
C

/* Default error handlers for CPP Library.
Copyright (C) 1986-2021 Free Software Foundation, Inc.
Written by Per Bothner, 1994.
Based on CCCP program by Paul Rubin, June 1986
Adapted to ANSI C, Richard Stallman, Jan 1987
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any
later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>.
In other words, you are welcome to use, share and improve this program.
You are forbidden to forbid anyone else to use, share and improve
what you give them. Help stamp out software-hoarding! */
#include "config.h"
#include "system.h"
#include "cpplib.h"
#include "internal.h"
/* Get a location_t for the current location in PFILE,
generally that of the previously lexed token. */
location_t
cpp_diagnostic_get_current_location (cpp_reader *pfile)
{
if (CPP_OPTION (pfile, traditional))
{
if (pfile->state.in_directive)
return pfile->directive_line;
else
return pfile->line_table->highest_line;
}
/* We don't want to refer to a token before the beginning of the
current run -- that is invalid. */
else if (pfile->cur_token == pfile->cur_run->base)
{
return 0;
}
else
{
return pfile->cur_token[-1].src_loc;
}
}
/* Print a diagnostic at the given location. */
ATTRIBUTE_FPTR_PRINTF(5,0)
static bool
cpp_diagnostic_at (cpp_reader * pfile, enum cpp_diagnostic_level level,
enum cpp_warning_reason reason, rich_location *richloc,
const char *msgid, va_list *ap)
{
bool ret;
if (!pfile->cb.diagnostic)
abort ();
ret = pfile->cb.diagnostic (pfile, level, reason, richloc, _(msgid), ap);
return ret;
}
/* Print a diagnostic at the location of the previously lexed token. */
ATTRIBUTE_FPTR_PRINTF(4,0)
static bool
cpp_diagnostic (cpp_reader * pfile, enum cpp_diagnostic_level level,
enum cpp_warning_reason reason,
const char *msgid, va_list *ap)
{
location_t src_loc = cpp_diagnostic_get_current_location (pfile);
rich_location richloc (pfile->line_table, src_loc);
return cpp_diagnostic_at (pfile, level, reason, &richloc, msgid, ap);
}
/* Print a warning or error, depending on the value of LEVEL. */
bool
cpp_error (cpp_reader * pfile, enum cpp_diagnostic_level level,
const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic (pfile, level, CPP_W_NONE, msgid, &ap);
va_end (ap);
return ret;
}
/* Print a warning. The warning reason may be given in REASON. */
bool
cpp_warning (cpp_reader * pfile, enum cpp_warning_reason reason,
const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic (pfile, CPP_DL_WARNING, reason, msgid, &ap);
va_end (ap);
return ret;
}
/* Print a pedantic warning. The warning reason may be given in REASON. */
bool
cpp_pedwarning (cpp_reader * pfile, enum cpp_warning_reason reason,
const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic (pfile, CPP_DL_PEDWARN, reason, msgid, &ap);
va_end (ap);
return ret;
}
/* Print a warning, including system headers. The warning reason may be
given in REASON. */
bool
cpp_warning_syshdr (cpp_reader * pfile, enum cpp_warning_reason reason,
const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic (pfile, CPP_DL_WARNING_SYSHDR, reason, msgid, &ap);
va_end (ap);
return ret;
}
/* As cpp_warning above, but use RICHLOC as the location of the diagnostic. */
bool cpp_warning_at (cpp_reader *pfile, enum cpp_warning_reason reason,
rich_location *richloc, const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic_at (pfile, CPP_DL_WARNING, reason, richloc,
msgid, &ap);
va_end (ap);
return ret;
}
/* As cpp_pedwarning above, but use RICHLOC as the location of the
diagnostic. */
bool
cpp_pedwarning_at (cpp_reader * pfile, enum cpp_warning_reason reason,
rich_location *richloc, const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic_at (pfile, CPP_DL_PEDWARN, reason, richloc,
msgid, &ap);
va_end (ap);
return ret;
}
/* Print a diagnostic at a specific location. */
ATTRIBUTE_FPTR_PRINTF(6,0)
static bool
cpp_diagnostic_with_line (cpp_reader * pfile, enum cpp_diagnostic_level level,
enum cpp_warning_reason reason,
location_t src_loc, unsigned int column,
const char *msgid, va_list *ap)
{
bool ret;
if (!pfile->cb.diagnostic)
abort ();
rich_location richloc (pfile->line_table, src_loc);
if (column)
richloc.override_column (column);
ret = pfile->cb.diagnostic (pfile, level, reason, &richloc, _(msgid), ap);
return ret;
}
/* Print a warning or error, depending on the value of LEVEL. */
bool
cpp_error_with_line (cpp_reader *pfile, enum cpp_diagnostic_level level,
location_t src_loc, unsigned int column,
const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic_with_line (pfile, level, CPP_W_NONE, src_loc,
column, msgid, &ap);
va_end (ap);
return ret;
}
/* Print a warning. The warning reason may be given in REASON. */
bool
cpp_warning_with_line (cpp_reader *pfile, enum cpp_warning_reason reason,
location_t src_loc, unsigned int column,
const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic_with_line (pfile, CPP_DL_WARNING, reason, src_loc,
column, msgid, &ap);
va_end (ap);
return ret;
}
/* Print a pedantic warning. The warning reason may be given in REASON. */
bool
cpp_pedwarning_with_line (cpp_reader *pfile, enum cpp_warning_reason reason,
location_t src_loc, unsigned int column,
const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic_with_line (pfile, CPP_DL_PEDWARN, reason, src_loc,
column, msgid, &ap);
va_end (ap);
return ret;
}
/* Print a warning, including system headers. The warning reason may be
given in REASON. */
bool
cpp_warning_with_line_syshdr (cpp_reader *pfile, enum cpp_warning_reason reason,
location_t src_loc, unsigned int column,
const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic_with_line (pfile, CPP_DL_WARNING_SYSHDR, reason, src_loc,
column, msgid, &ap);
va_end (ap);
return ret;
}
/* As cpp_error, but use SRC_LOC as the location of the error, without
a column override. */
bool
cpp_error_at (cpp_reader * pfile, enum cpp_diagnostic_level level,
location_t src_loc, const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
rich_location richloc (pfile->line_table, src_loc);
ret = cpp_diagnostic_at (pfile, level, CPP_W_NONE, &richloc,
msgid, &ap);
va_end (ap);
return ret;
}
/* As cpp_error, but use RICHLOC as the location of the error, without
a column override. */
bool
cpp_error_at (cpp_reader * pfile, enum cpp_diagnostic_level level,
rich_location *richloc, const char *msgid, ...)
{
va_list ap;
bool ret;
va_start (ap, msgid);
ret = cpp_diagnostic_at (pfile, level, CPP_W_NONE, richloc,
msgid, &ap);
va_end (ap);
return ret;
}
/* Print a warning or error, depending on the value of LEVEL. Include
information from errno. */
bool
cpp_errno (cpp_reader *pfile, enum cpp_diagnostic_level level,
const char *msgid)
{
return cpp_error (pfile, level, "%s: %s", _(msgid), xstrerror (errno));
}
/* Print a warning or error, depending on the value of LEVEL. Include
information from errno. Unlike cpp_errno, the argument is a filename
that is not localized, but "" is replaced with localized "stdout". */
bool
cpp_errno_filename (cpp_reader *pfile, enum cpp_diagnostic_level level,
const char *filename,
location_t loc)
{
if (filename[0] == '\0')
filename = _("stdout");
return cpp_error_at (pfile, level, loc, "%s: %s", filename,
xstrerror (errno));
}