bd5e882cf6
This patch adds support to GCC's diagnostic subsystem for escaping certain bytes and Unicode characters when quoting source code. Specifically, this patch adds a new flag rich_location::m_escape_on_output which is a hint from a diagnostic that non-ASCII bytes in the pertinent lines of the user's source code should be escaped when printed. The patch sets this for the following diagnostics: - when complaining about stray bytes in the program (when these are non-printable) - when complaining about "null character(s) ignored"); - for -Wnormalized= (and generate source ranges for such warnings) The escaping is controlled by a new option: -fdiagnostics-escape-format=[unicode|bytes] For example, consider a diagnostic involing a source line containing the string "before" followed by the Unicode character U+03C0 ("GREEK SMALL LETTER PI", with UTF-8 encoding 0xCF 0x80) followed by the byte 0xBF (a stray UTF-8 trailing byte), followed by the string "after", where the diagnostic highlights the U+03C0 character. By default, this line will be printed verbatim to the user when reporting a diagnostic at it, as: beforeπXafter ^ (using X for the stray byte to avoid putting invalid UTF-8 in this commit message) If the diagnostic sets the "escape" flag, it will be printed as: before<U+03C0><BF>after ^~~~~~~~ with -fdiagnostics-escape-format=unicode (the default), or as: before<CF><80><BF>after ^~~~~~~~ if the user supplies -fdiagnostics-escape-format=bytes. This only affects how the source is printed; it does not affect how column numbers that are printed (as per -fdiagnostics-column-unit= and -fdiagnostics-column-origin=). gcc/c-family/ChangeLog: * c-lex.c (c_lex_with_flags): When complaining about non-printable CPP_OTHER tokens, set the "escape on output" flag. gcc/ChangeLog: * common.opt (fdiagnostics-escape-format=): New. (diagnostics_escape_format): New enum. (DIAGNOSTICS_ESCAPE_FORMAT_UNICODE): New enum value. (DIAGNOSTICS_ESCAPE_FORMAT_BYTES): Likewise. * diagnostic-format-json.cc (json_end_diagnostic): Add "escape-source" attribute. * diagnostic-show-locus.c (exploc_with_display_col::exploc_with_display_col): Replace "tabstop" param with a cpp_char_column_policy and add an "aspect" param. Use these to compute m_display_col accordingly. (struct char_display_policy): New struct. (layout::m_policy): New field. (layout::m_escape_on_output): New field. (def_policy): New function. (make_range): Update for changes to exploc_with_display_col ctor. (default_print_decoded_ch): New. (width_per_escaped_byte): New. (escape_as_bytes_width): New. (escape_as_bytes_print): New. (escape_as_unicode_width): New. (escape_as_unicode_print): New. (make_policy): New. (layout::layout): Initialize new fields. Update m_exploc ctor call for above change to ctor. (layout::maybe_add_location_range): Update for changes to exploc_with_display_col ctor. (layout::calculate_x_offset_display): Update for change to cpp_display_width. (layout::print_source_line): Pass policy to cpp_display_width_computation. Capture cpp_decoded_char when calling process_next_codepoint. Move printing of source code to m_policy.m_print_cb. (line_label::line_label): Pass in policy rather than context. (layout::print_any_labels): Update for change to line_label ctor. (get_affected_range): Pass in policy rather than context, updating calls to location_compute_display_column accordingly. (get_printed_columns): Likewise, also for cpp_display_width. (correction::correction): Pass in policy rather than tabstop. (correction::compute_display_cols): Pass m_policy rather than m_tabstop to cpp_display_width. (correction::m_tabstop): Replace with... (correction::m_policy): ...this. (line_corrections::line_corrections): Pass in policy rather than context. (line_corrections::m_context): Replace with... (line_corrections::m_policy): ...this. (line_corrections::add_hint): Update to use m_policy rather than m_context. (line_corrections::add_hint): Likewise. (layout::print_trailing_fixits): Likewise. (selftest::test_display_widths): New. (selftest::test_layout_x_offset_display_utf8): Update to use policy rather than tabstop. (selftest::test_one_liner_labels_utf8): Add test of escaping source lines. (selftest::test_diagnostic_show_locus_one_liner_utf8): Update to use policy rather than tabstop. (selftest::test_overlapped_fixit_printing): Likewise. (selftest::test_overlapped_fixit_printing_utf8): Likewise. (selftest::test_overlapped_fixit_printing_2): Likewise. (selftest::test_tab_expansion): Likewise. (selftest::test_escaping_bytes_1): New. (selftest::test_escaping_bytes_2): New. (selftest::diagnostic_show_locus_c_tests): Call the new tests. * diagnostic.c (diagnostic_initialize): Initialize context->escape_format. (convert_column_unit): Update to use default character width policy. (selftest::test_diagnostic_get_location_text): Likewise. * diagnostic.h (enum diagnostics_escape_format): New enum. (diagnostic_context::escape_format): New field. * doc/invoke.texi (-fdiagnostics-escape-format=): New option. (-fdiagnostics-format=): Add "escape-source" attribute to examples of JSON output, and document it. * input.c (location_compute_display_column): Pass in "policy" rather than "tabstop", passing to cpp_byte_column_to_display_column. (selftest::test_cpp_utf8): Update to use cpp_char_column_policy. * input.h (class cpp_char_column_policy): New forward decl. (location_compute_display_column): Pass in "policy" rather than "tabstop". * opts.c (common_handle_option): Handle OPT_fdiagnostics_escape_format_. * selftest.c (temp_source_file::temp_source_file): New ctor overload taking a size_t. * selftest.h (temp_source_file::temp_source_file): Likewise. gcc/testsuite/ChangeLog: * c-c++-common/diagnostic-format-json-1.c: Add regexp to consume "escape-source" attribute. * c-c++-common/diagnostic-format-json-2.c: Likewise. * c-c++-common/diagnostic-format-json-3.c: Likewise. * c-c++-common/diagnostic-format-json-4.c: Likewise, twice. * c-c++-common/diagnostic-format-json-5.c: Likewise. * gcc.dg/cpp/warn-normalized-4-bytes.c: New test. * gcc.dg/cpp/warn-normalized-4-unicode.c: New test. * gcc.dg/encoding-issues-bytes.c: New test. * gcc.dg/encoding-issues-unicode.c: New test. * gfortran.dg/diagnostic-format-json-1.F90: Add regexp to consume "escape-source" attribute. * gfortran.dg/diagnostic-format-json-2.F90: Likewise. * gfortran.dg/diagnostic-format-json-3.F90: Likewise. libcpp/ChangeLog: * charset.c (convert_escape): Use encoding_rich_location when complaining about nonprintable unknown escape sequences. (cpp_display_width_computation::::cpp_display_width_computation): Pass in policy rather than tabstop. (cpp_display_width_computation::process_next_codepoint): Add "out" param and populate *out if non-NULL. (cpp_display_width_computation::advance_display_cols): Pass NULL to process_next_codepoint. (cpp_byte_column_to_display_column): Pass in policy rather than tabstop. Pass NULL to process_next_codepoint. (cpp_display_column_to_byte_column): Pass in policy rather than tabstop. * errors.c (cpp_diagnostic_get_current_location): New function, splitting out the logic from... (cpp_diagnostic): ...here. (cpp_warning_at): New function. (cpp_pedwarning_at): New function. * include/cpplib.h (cpp_warning_at): New decl for rich_location. (cpp_pedwarning_at): Likewise. (struct cpp_decoded_char): New. (struct cpp_char_column_policy): New. (cpp_display_width_computation::cpp_display_width_computation): Replace "tabstop" param with "policy". (cpp_display_width_computation::process_next_codepoint): Add "out" param. (cpp_display_width_computation::m_tabstop): Replace with... (cpp_display_width_computation::m_policy): ...this. (cpp_byte_column_to_display_column): Replace "tabstop" param with "policy". (cpp_display_width): Likewise. (cpp_display_column_to_byte_column): Likewise. * include/line-map.h (rich_location::escape_on_output_p): New. (rich_location::set_escape_on_output): New. (rich_location::m_escape_on_output): New. * internal.h (cpp_diagnostic_get_current_location): New decl. (class encoding_rich_location): New. * lex.c (skip_whitespace): Use encoding_rich_location when complaining about null characters. (warn_about_normalization): Generate a source range when complaining about improperly normalized tokens, rather than just a point, and use encoding_rich_location so that the source code is escaped on printing. * line-map.c (rich_location::rich_location): Initialize m_escape_on_output. Signed-off-by: David Malcolm <dmalcolm@redhat.com> |
||
---|---|---|
.. | ||
include | ||
po | ||
aclocal.m4 | ||
ChangeLog | ||
ChangeLog.jit | ||
charset.c | ||
config.in | ||
configure | ||
configure.ac | ||
directives.c | ||
errors.c | ||
expr.c | ||
files.c | ||
generated_cpp_wcwidth.h | ||
identifiers.c | ||
init.c | ||
internal.h | ||
lex.c | ||
line-map.c | ||
location-example.txt | ||
macro.c | ||
Makefile.in | ||
makeucnid.c | ||
mkdeps.c | ||
pch.c | ||
symtab.c | ||
system.h | ||
traditional.c | ||
ucnid.h | ||
ucnid.tab |