Clang has some bugs with destructors that use constraints to be
conditionally trivial, so disable the P2231R1 constexpr changes to
std::variant unless the compiler is GCC 12 or later.
If/when P2493R0 gets accepted and implemented by G++ we can remove the
__GNUC__ check and use __cpp_concepts >= 202002 instead.
libstdc++-v3/ChangeLog:
PR libstdc++/103891
* include/bits/c++config (_GLIBCXX_HAVE_COND_TRIVIAL_SPECIAL_MEMBERS):
Define.
* include/std/variant (__cpp_lib_variant): Only define C++20
value when the compiler is known to support conditionally
trivial destructors.
* include/std/version (__cpp_lib_variant): Likewise.
This library issue was approved in the October 2021 plenary.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h (common_iterator): Add constexpr
to all member functions (LWG 3574).
* testsuite/24_iterators/common_iterator/1.cc: Evaluate some
tests as constant expressions.
* testsuite/24_iterators/common_iterator/2.cc: Likewise.
glibc strptime passes around some state, what fields in struct tm have been
set and what needs to be finalized through possibly recursive calls, and
at the end performs various finalizations, like applying %p so that it
works for both %I %p and %p %I orders, or applying century so that both
%C %y and %y %C works, or computation of missing fields from others
(e.g. from %Y and %j one can compute tm_mon, tm_mday and tm_wday,
from %Y %U %w, %Y %W %w, %Y %U %a, or %Y %W %w one can compute
tm_mon, tm_mday, tm_yday or e.g. from %Y %m %d one can compute tm_wday
and tm_yday.
As the finalization is quite large and doesn't need to be a template
(doesn't depend on any iterators or char types), I've put it into libstdc++,
and left some padding in the state struct, so that perhaps in the future we
can track some more state without changing ABI.
Unfortunately, there is an ugly problem that the standard mandates that
get method calls the do_get virtual method and I don't see how we can
cary on any state in between those calls (even if we did an ABI change
for the facets, the methods are const, so that I think multiple threads
could use the same time_get objects and we couldn't store state in there).
There is a hack for that for GCC (seems to work with ICC too, doesn't work
with clang++) if the do_get method isn't overriden we can pass the state
around.
For both do_get_year and per IRC discussions also for %y, the behavior is
if 1-2 digits are parsed, the year is treated according to POSIX 2008 %y
rules (0-68 is 2000-2068, 69-99 is 1969-1999), if 3-4 digits are parsed,
it is treated as %Y.
2022-01-10 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/77760
* include/bits/locale_facets_nonio.h (__time_get_state): New struct.
(time_get::_M_extract_via_format): Declare new method with
__time_get_state& as an extra argument.
* include/bits/locale_facets_nonio.tcc (_M_extract_via_format): Add
__state argument, set various fields in it while parsing. Handle %j,
%U, %w and %W, fix up handling of %y, %Y and %C, don't adjust tm_hour
for %p immediately. Add a wrapper around the method without the
__state argument for backwards compatibility.
(_M_extract_num): Remove all __len == 4 special cases.
(time_get::do_get_time, time_get::do_get_date, time_get::do_get): Zero
initialize __state, pass it to _M_extract_via_format and finalize it
at the end.
(do_get_year): For 1-2 digit parsed years, map 0-68 to 2000-2068,
69-99 to 1969-1999. For 3-4 digit parsed years use that as year.
(get): If do_get isn't overloaded from the locale_facets_nonio.tcc
version, don't call do_get but call _M_extract_via_format instead to
pass around state.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Export _M_extract_via_format
with extra __time_get_state and __time_get_state::_M_finalize_state.
* src/c++98/locale_facets.cc (is_leap, day_of_the_week,
day_of_the_year): New functions in anon namespace.
(mon_yday): New var in anon namespace.
(__time_get_state::_M_finalize_state): Define.
* testsuite/22_locale/time_get/get/char/4.cc: New test.
* testsuite/22_locale/time_get/get/wchar_t/4.cc: New test.
* testsuite/22_locale/time_get/get_year/char/1.cc (test01): Parse 197
as year 197AD instead of error.
* testsuite/22_locale/time_get/get_year/char/5.cc (test01): Parse 1 as
year 2001 instead of error.
* testsuite/22_locale/time_get/get_year/char/6.cc: New test.
* testsuite/22_locale/time_get/get_year/wchar_t/1.cc (test01): Parse
197 as year 197AD instead of error.
* testsuite/22_locale/time_get/get_year/wchar_t/5.cc (test01): Parse
1 as year 2001 instead of error.
* testsuite/22_locale/time_get/get_year/wchar_t/6.cc: New test.
This fixes the --disable-hosted-libstdcxx build so that it works with
--without-headers. Currently you need to also use --with-newlib, which
is confusing for users who aren't actually using newlib.
The AM_PROG_LIBTOOL checks are currently skipped for --with-newlib and
--with-avrlibc builds, with this change they are also skipped when using
--without-headers. It would be nice if using --disable-hosted-libstdcxx
automatically skipped those checks, but GLIBCXX_ENABLE_HOSTED comes too
late to make the AM_PROG_LIBTOOL checks depend on $is_hosted.
The checks for EOF, SEEK_CUR etc. cause the build to fail if there is no
<stdio.h> available. Unlike most headers, which get a HAVE_FOO_H macro,
<stdio.h> is in autoconf's default includes, so every check tries to
include it unconditionally. This change skips those checks for
freestanding builds.
Similarly, the checks for <stdint.h> types done by GCC_HEADER_STDINT try
to include <stdio.h> and fail for --without-headers builds. This change
skips the use of GCC_HEADER_STDINT for freestanding. We can probably
stop using GCC_HEADER_STDINT entirely, since only one file uses the
gstdint.h header that is generated, and that could easily be changed to
use <stdint.h> instead. That can wait for stage 1.
We also need to skip the GLIBCXX_CROSSCONFIG stage if --without-headers
was used, since we don't have any of the functions it deals with.
The end result of the changes above is that it should not be necessary
for a --disable-hosted-libstdcxx --without-headers build to also use
--with-newlib.
Finally, compile libsupc++ with -ffreestanding when --without-headers is
used, so that <stdint.h> will use <gcc-stdint.h> instead of expecting it
to come from libc.
libstdc++-v3/ChangeLog:
PR libstdc++/103866
* acinclude.m4 (GLIBCXX_COMPUTE_STDIO_INTEGER_CONSTANTS): Do
nothing for freestanding builds.
(GLIBCXX_ENABLE_HOSTED): Define FREESTANDING_FLAGS.
* configure.ac: Do not use AC_LIBTOOL_DLOPEN when configured
with --without-headers. Do not use GCC_HEADER_STDINT for
freestanding builds.
* libsupc++/Makefile.am (HOSTED_CXXFLAGS): Use -ffreestanding
for freestanding builds.
* configure: Regenerate.
* Makefile.in: Regenerate.
* doc/Makefile.in: Regenerate.
* include/Makefile.in: Regenerate.
* libsupc++/Makefile.in: Regenerate.
* po/Makefile.in: Regenerate.
* python/Makefile.in: Regenerate.
* src/Makefile.in: Regenerate.
* src/c++11/Makefile.in: Regenerate.
* src/c++17/Makefile.in: Regenerate.
* src/c++20/Makefile.in: Regenerate.
* src/c++98/Makefile.in: Regenerate.
* src/filesystem/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
When building a build!=host compiler, the just-built gcc can't be used
to build the target libstdc++ (because it is built for the host triplet,
not the build triplet). The top-level configure.ac sets up the build
flags for libstdc++ (and other "raw_cxx" libs) like this:
GCC_TARGET_TOOL(c++ for libstdc++, RAW_CXX_FOR_TARGET, CXX,
[gcc/xgcc -shared-libgcc -B$$r/$(HOST_SUBDIR)/gcc -nostdinc++ -L$$r/$(TARGET_SUBDIR)/libstdc++-v3/src -L$$r/$(TARGET_SUBDIR)/libstdc++-v3/src/.libs -L$$r/$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs],
c++)
The -nostdinc++ flag is only used for the IN-TREE-TOOL, i.e. when using
the just-built gcc/xgcc compiler. This means that the cross-compiler
used to build libstdc++ will add its own libstdc++ headers to the
include path. That results in the #include <cfenv> in
src/c++17/floating_to_chars.cc and src/c++17/floating_from_chars.cc
doing #include_next <fenv.h> and finding the libstdc++ fenv.h wrapper
from the host compiler. Because that has the same include guard as the
<fenv.h> in the libstdc++ we're trying to build, we never reach the
underlying <fenv.h> from libc. That results in several errors of the
form:
error: 'fenv_t' has not been declared in '::'
The most correct fix would be to add -nostdinc++ to the
RAW_CXX_FOR_TARGET variable in configure.ac, or the
RAW_CXX_TARGET_EXPORTS variable in Makefile.tpl.
Another solution would be to make the libstdc++ <fenv.h> wrapper use
_GLIBCXX_INCLUDE_NEXT_C_HEADERS like our <stdlib.h> and other C header
wrappers.
For now though, the simplest and safest solution is to just add
-nostdinc++ to the CXXFLAGS used for src/c++17/*.cc, which is what this
does.
libstdc++-v3/ChangeLog:
PR libstdc++/100017
* src/c++17/Makefile.am (AM_CXXFLAGS): Add -nostdinc++.
* src/c++17/Makefile.in: Regenerate.
This implements the proposed resolution of LWG 3088, so that x.merge(x)
is a no-op, consistent with std::list::merge.
Signed-off-by: Pavel I. Kryukov <pavel.kryukov@phystech.edu>
Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/103853
* include/bits/forward_list.tcc (forward_list::merge): Check for
self-merge.
* testsuite/23_containers/forward_list/operations/merge.cc: New test.
I think this code is valid but it fails with Clang, possibly due to
https://llvm.org/PR38882
Qualifying the names makes it work for all compilers.
libstdc++-v3/ChangeLog:
* include/bits/regex.h (basic_regex, match_results): Qualify
name in friend declaration, to work around Clang bug.
This test spawns thousands of threads and so times out if the tests are
run with a low timeout value and the machine is busy.
libstdc++-v3/ChangeLog:
* testsuite/ext/rope/pthread7-rope.cc: Add dg-timeout-factor.
This avoids a potential race condition if std::setlocale is used
concurrently with std::from_chars.
libstdc++-v3/ChangeLog:
PR libstdc++/103911
* include/std/charconv (__from_chars_alpha_to_num): Return
char instead of unsigned char. Change invalid return value to
127 instead of using numeric trait.
(__from_chars_alnum): Fix comment. Do not use std::isdigit.
Change type of variable to char.
When hasher is identified as slow and the number of elements is limited in the
container use a brute-force loop on those elements to look for a given key using
the key_equal functor. For the moment the default threshold to consider the
container as small is 20.
libstdc++-v3/ChangeLog:
PR libstdc++/68303
* include/bits/hashtable_policy.h
(_Hashtable_hash_traits<_Hash>): New.
(_Hash_code_base<>::_M_hash_code(const _Hash_node_value<>&)): New.
(_Hashtable_base<>::_M_key_equals): New.
(_Hashtable_base<>::_M_equals): Use latter.
(_Hashtable_base<>::_M_key_equals_tr): New.
(_Hashtable_base<>::_M_equals_tr): Use latter.
* include/bits/hashtable.h
(_Hashtable<>::__small_size_threshold()): New, use _Hashtable_hash_traits.
(_Hashtable<>::find): Loop through elements to look for key if size is lower
than __small_size_threshold().
(_Hashtable<>::_M_emplace(true_type, _Args&&...)): Likewise.
(_Hashtable<>::_M_insert_unique(_Kt&&, _Args&&, const _NodeGenerator&)): Likewise.
(_Hashtable<>::_M_compute_hash_code(const_iterator, const key_type&)): New.
(_Hashtable<>::_M_emplace(const_iterator, false_type, _Args&&...)): Use latter.
(_Hashtable<>::_M_find_before_node(const key_type&)): New.
(_Hashtable<>::_M_erase(true_type, const key_type&)): Use latter.
(_Hashtable<>::_M_erase(false_type, const key_type&)): Likewise.
* src/c++11/hashtable_c++0x.cc: Include <bits/functional_hash.h>.
* testsuite/util/testsuite_performance.h
(report_performance): Use 9 width to display memory.
* testsuite/performance/23_containers/insert_erase/unordered_small_size.cc:
New performance test case.
The C++17 basic_string(const T&, size_t, size_t) constructor is
overconstrained, so it can't be used for a NTBS and a temporary string
gets constructed (potentially allocating memory). There is no
corresponding constructor taking an NTBS, so no need to disambiguate
from it. Accepting an NTBS avoids the temporary (and potential
allocation) and is what the standard requires.
libstdc++-v3/ChangeLog:
PR libstdc++/103919
* include/bits/basic_string.h (basic_string(const T&, size_t, size_t)):
Relax constraints on string_view parameter.
* include/bits/cow_string.h (basic_string(const T&, size_t, size_t)):
Likewise.
* testsuite/21_strings/basic_string/cons/char/103919.cc: New test.
This feature is present in the C++23 draft.
With Jakub's recent front-end changes we can implement constexpr
equality by comparing the addresses of std::type_info objects. We do not
need string comparisons, because for constant evaluation cases we know
we aren't dealing with std::type_info objects defined in other
translation units.
The ARM EABI requires that the type_info::operator== function can be
defined out-of-line (and suggests that should be the default), but to be
a constexpr function it must be defined inline (at least for C++23
mode). To meet these conflicting requirements we make the inline version
of operator== call a new __equal function when called at runtime. That
is an alias for the non-inline definition of operator== defined in
libsupc++.
libstdc++-v3/ChangeLog:
* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Export new symbol for
ARM EABI.
* include/bits/c++config (_GLIBCXX23_CONSTEXPR): Define.
* include/std/version (__cpp_lib_constexpr_typeinfo): Define.
* libsupc++/tinfo.cc: Add #error to ensure non-inline definition
is emitted.
(type_info::__equal): Define alias symbol.
* libsupc++/typeinfo (type_info::before): Combine different
implementations into one.
(type_info::operator==): Likewise. Use address equality for
constant evaluation. Call __equal for targets that require the
definition to be non-inline.
* testsuite/18_support/type_info/constexpr.cc: New test.
In r12-3860 the error categories in <system_error> were made final and
immortal, but I missed the categories for <future> and <ios>. This makes
the same changes to those.
libstdc++-v3/ChangeLog:
* src/c++11/cxx11-ios_failure.cc (io_error_category): Define
class and virtual functions as 'final'.
(io_category_instance): Use constinit union to make the object
immortal.
* src/c++11/future.cc (future_error_category): Define class and
virtual functions as 'final'.
(future_category_instance): Use constinit union.
This helps visualize the NFA states in a std::regex. It probably isn't
very useful for users, but helps when working on the implementation.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (StdRegexStatePrinter): New
printer for std::regex NFA states.
We don't need a preprocessor condition to decide whether to use
placement new or std::construct_at, because std::_Construct already does
that.
libstdc++-v3/ChangeLog:
* include/bits/alloc_traits.h (allocator_traits<allocator<void>>):
Use std::_Construct for construct.
This moves the last two template parameters of __regex_algo_impl to be
runtime function parameters instead, so that we don't need four
different instantiations for the possible ways to call it. Most of the
function (and what it instantiates) is the same in all cases, so making
them compile-time choices doesn't really have much benefit.
Use 'if constexpr' for conditions that check template parameters, so
that when we do depend on a compile-time condition we only instantiate
what we need to.
libstdc++-v3/ChangeLog:
* include/bits/regex.h (__regex_algo_impl): Change __policy and
__match_mode template parameters to be function parameters.
(regex_match, regex_search): Pass policy and match mode as
function arguments.
* include/bits/regex.tcc (__regex_algo_impl): Change template
parameters to function parameters.
* include/bits/regex_compiler.h (_RegexTranslatorBase): Use
'if constexpr' for conditions using template parameters.
(_RegexTranslator): Likewise.
* include/bits/regex_executor.tcc (_Executor::_M_handle_accept):
Likewise.
* testsuite/util/testsuite_regex.h (regex_match_debug)
(regex_search_debug): Move template arguments to function
arguments.
The regex_match_debug testsuite helper doesn't compare the
std::match_results objects after a failed match, but it should do. The
standard says that the effects of a failed match on the match-results
are unspecified, except that [conditions testable by operator==]. So we
can check that the two sets of results compare equal even if the match
failed.
libstdc++-v3/ChangeLog:
* testsuite/util/testsuite_regex.h (regex_match_debug): Compare
results even if the match failed.
This replaces the vague "regex_error" for std::regex_error::what() with
a string that corresponds to the error_type enum passed to the
constructor. This allows us to remove many of the strings passed to
__throw_regex_error, because the default string is at least as good.
When a string argument to __throw_regex_error is kept it should add some
context-specific detail absent from the default string.
Also remove full stops (periods) from the end of those strings, to make
it easier to include them in logs and other output. I've left them
starting with an upper-case letter, which is consistent with strerror
output for (at least) Glibc, Solaris and BSD. I'm ambivalent whether
that's the right choice.
This also adds the missing noreturn attribute to __throw_regex_error.
libstdc++-v3/ChangeLog:
* include/bits/regex_compiler.tcc: Adjust all calls to
__throw_regex_error.
* include/bits/regex_error.h (__throw_regex_error): Add noreturn
attribute.
* include/bits/regex_scanner.tcc: Likewise.
* src/c++11/regex.cc (desc): New helper function.
(regex_error::regex_error(error_type)): Use desc to get a string
corresponding to the error code.
Prefer to overload __to_address to partially specialize std::pointer_traits because
std::pointer_traits would be mostly useless. Moreover partial specialization of
pointer_traits<__normal_iterator<P, C>> fails to rebind C, so you get incorrect types
like __normal_iterator<long*, vector<int>>. In the case of __gnu_debug::_Safe_iterator
the to_pointer method is impossible to implement correctly because we are missing
the parent container to associate the iterator to.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h
(std::pointer_traits<__gnu_cxx::__normal_iterator<>>): Remove.
(std::__to_address(const __gnu_cxx::__normal_iterator<>&)): New for C++11 to C++17.
* include/debug/safe_iterator.h
(std::__to_address(const __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<>,
_Sequence>&)): New for C++11 to C++17.
* testsuite/24_iterators/normal_iterator/to_address.cc: Add check on std::vector::iterator
to validate both __gnu_cxx::__normal_iterator<> __to_address overload in normal mode and
__gnu_debug::_Safe_iterator in _GLIBCXX_DEBUG mode.
This patch uses the same not completely correct case insensitive comparisons
as used elsewhere in the same header. Proper comparisons that would handle
even multi-byte characters would be harder, but I don't see them implemented
in __ctype's methods.
2021-12-15 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/71557
* include/bits/locale_facets_nonio.tcc (_M_extract_via_format):
Compare characters other than format specifiers and whitespace
case insensitively.
(_M_extract_name): Compare characters case insensitively.
* testsuite/22_locale/time_get/get/char/71557.cc: New test.
* testsuite/22_locale/time_get/get/wchar_t/71557.cc: New test.
This checks whether the locale data for en_HK includes %p and adjusts
the string being tested accordingly. To account for Jakub's fix to make
%I parse "12" as 0 instead of 12, we need to change the expected value
for the case where the locale format doesn't include %p. Also change the
time from 12:00:00 to 12:02:01 so we can tell if the minutes and seconds
get mixed up.
libstdc++-v3/ChangeLog:
PR libstdc++/103687
* testsuite/22_locale/time_get/get_date/wchar_t/4.cc: Restore
original locale before returning.
* testsuite/22_locale/time_get/get_time/char/2.cc: Check for %p
in locale's T_FMT and adjust accordingly.
* testsuite/22_locale/time_get/get_time/wchar_t/2.cc: Likewise.
std::regex currently allows invalid bracket ranges such as [\w-a] which
are only allowed by ECMAScript when in web browser compatibility mode.
It should be an error, because the start of the range is a character
class, not a single character. The current implementation of
_Compiler::_M_expression_term does not provide a way to reject this,
because we only remember a previous character, not whether we just
processed a character class (or collating symbol etc.)
This patch replaces the pair<bool, CharT> used to emulate
optional<CharT> with a custom class closer to pair<tribool,CharT>. That
allows us to track three states, so that we can tell when we've just
seen a character class.
With this additional state the code in _M_expression_term for processing
the _S_token_bracket_dash can be improved to correctly reject the [\w-a]
case, without regressing for valid cases such as [\w-] and [----].
libstdc++-v3/ChangeLog:
PR libstdc++/102447
* include/bits/regex_compiler.h (_Compiler::_BracketState): New
class.
(_Compiler::_BrackeyMatcher): New alias template.
(_Compiler::_M_expression_term): Change pair<bool, CharT>
parameter to _BracketState. Process first character for
ECMAScript syntax as well as POSIX.
* include/bits/regex_compiler.tcc
(_Compiler::_M_insert_bracket_matcher): Pass _BracketState.
(_Compiler::_M_expression_term): Use _BracketState to store
state between calls. Improve handling of dashes in ranges.
* testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
Add more tests for ranges containing dashes. Check invalid
ranges with character class at the beginning.
This removes the __syntax_option and __match_flag enumeration types,
which are only used to define enumerators with successive values that
are then used to initialize the std::regex_constants global variables.
By defining enumerators in the syntax_option_type and match_flag_type
enumeration types with the correct values for the globals we get rid of
two useless enumeration types that just count from 0 to N, and we
improve the debugging experience. Because the enumeration types now have
enumerators defined, GDB will print values in terms of those enumerators
e.g.
$6 = (std::regex_constants::_S_ECMAScript | std::regex_constants::_S_multiline)
Previously this would have been shown as simply 0x810 because there were
no enumerators of that type.
This changes the type and value of enumerators such as _S_grep, but
users should never be referring to them directly anyway.
libstdc++-v3/ChangeLog:
* include/bits/regex_constants.h (__syntax_option, __match_flag):
Remove.
(syntax_option_type, match_flag_type): Define enumerators.
Use to initialize globals. Add constexpr to compound assignment
operators.
* include/bits/regex_error.h (error_type): Add comment.
* testsuite/28_regex/constants/constexpr.cc: Remove comment.
* testsuite/28_regex/constants/error_type.cc: Improve comment.
* testsuite/28_regex/constants/match_flag_type.cc: Check bitmask
requirements.
* testsuite/28_regex/constants/syntax_option_type.cc: Likewise.
libstdc++-v3/ChangeLog:
* include/bits/regex_compiler.tcc (_Compiler::_M_match_token):
Use reserved name for parameter.
* testsuite/17_intro/names.cc: Check "token".
The scripts/make_exports.pl script used for darwin only replaces '*'
wildcards in globs, it doesn't handle '?'. This means the recent changes
to std::__timepunct exports broke darwin.
Rather than use mangled names in the linker script, this adds support
for '?' to the perl script.
This also removes some unnecessary escaping of the replacement strings
in s// substitutions.
libstdc++-v3/ChangeLog:
* scripts/make_exports.pl: Replace '?' with '.' when turning
a glob into a regex.
Passing IncompleteType(&)[] to ranges::begin produces an error outside
the immediate context, which is fine for ranges::begin, but it means
that we fail to enforce the SFINAE-able constraints for ranges::size and
ranges::size. They should not be callable for any array of unknown
bound, whether the type is complete or not. Because we don't enforce
that in their constraints, we get a hard error when they try to use
ranges::begin.
This simply adds explicit checks for arrays of unknown bound to the
constraints for ranges::size and ranges::empty. We only need to check it
for the __sentinel_size and __eq_iter_empty concepts, because those are
the ones that are relevant to arrays, and which try to use
ranges::begin.
libstdc++-v3/ChangeLog:
* include/bits/ranges_base.h (ranges::size, ranges::empty): Add
explicit check for unbounded arrays before using ranges::begin.
* testsuite/std/ranges/access/empty.cc: Check handling of unbounded
arrays.
* testsuite/std/ranges/access/size.cc: Likewise.
The overload of std::regex_replace that takes a std::basic_string as the
fmt argument (for the replacement string) is implemented in terms of the
one taking a const C*, which uses std::char_traits to find the length.
That means it stops at a null character, even though the basic_string
might have additional characters beyond that.
Rather than duplicate the implementation of the const C* one for the
std::basic_string case, this moves that implementation to a new
__regex_replace function which takes a const C* and a length. Then both
the std::basic_string and const C* overloads can call that (with the
latter using char_traits to find the length to pass to the new
function).
libstdc++-v3/ChangeLog:
PR libstdc++/103664
* include/bits/regex.h (__regex_replace): Declare.
(regex_replace): Use it.
* include/bits/regex.tcc (__regex_replace): Replace regex_replace
definition with __regex_replace.
* testsuite/28_regex/algorithms/regex_replace/char/103664.cc: New test.
In the testcase for 103534 we get a warning about append leading to memcpy
of a very large number of bytes overflowing the buffer. This turns out to
be because we weren't calling _M_check_length for string append. Rather
than do that directly, let's go through the public pointer append that calls
it.
PR c++/103534
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h (append (basic_string)): Call pointer
append instead of _M_append directly.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wstringop-overflow-8.C: New test.
This incremental patch adds std::time_get %r support (%p was added already
in the previous patch). The _M_am_fm_format method previously in the header
unfortunately had wrong arguments and so was useless, so the largest
complication in this patch is exporting a new symbol in the right symbol
version.
2021-12-10 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/71367
* config/locale/dragonfly/time_members.cc (_M_initialize_timepunct):
Initialize "C" _M_am_pm_format to %I:%M:%S %p rather than empty
string.
* config/locale/gnu/time_members.cc (_M_initialize_timepunct):
Likewise.
* config/locale/generic/time_members.cc (_M_initialize_timepunct):
Likewise.
* include/bits/locale_facets_nonio.h (_M_am_pm_format): New method.
* include/bits/locale_facets_nonio.tcc (_M_extract_via_format): Handle
%r.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Export _M_am_pm_format
with const _CharT** argument, ensure it isn't exported in GLIBCXX_3.4.
* testsuite/22_locale/time_get/get/char/71367.cc: New test.
* testsuite/22_locale/time_get/get/wchar_t/71367.cc: New test.
The following patch is an attempt to fix various time_get related issues.
Sorry, it is long...
One of them is PR78714. It seems _M_extract_via_format has been written
with how strftime behaves in mind rather than how strptime behaves.
There is a significant difference between the two, for strftime %a and %A
behave differently etc., one emits an abbreviated name, the other full name.
For strptime both should behave the same and accept both the full or
abbreviated names. This needed large changes in _M_extract_name, which
was assuming the names are unique and names aren't prefixes of other names.
The _M_extract_name changes allow to deal with those cases. As can be
seen in the new testcase, e.g. for %b and english locales we need to
accept both Apr and April. If we see Apr in the input, the code looks
at whether there is end right after those 3 chars or if the next
character doesn't match characters in the longer names; in that case
it accepts the abbreviated name. Otherwise, if the input has Apri, it
commits to a longer name and fails if it isn't April. This behavior is
different from strptime, which for %bix and Aprix accepts it, but for
an input iterator I'm afraid we can't do better, we can't go back (peek
more than the current character).
Another case is that %d and %e in strptime should work the same, while
previously the code was hardcoding that %d would be 01 to 31 and %e
1 to 31 (with leading 0 replaced by space).
strptime POSIX 2009 documentation seems to suggest for numbers it should
accept up to the specified number of digits rather than exactly that number
of digits:
The pattern "[x,y]" indicates that the value shall fall within the range
given (both bounds being inclusive), and the maximum number of characters scanned
shall be the maximum required to represent any value in the range without leading
zeros.
so by my reading "1:" is valid for "%H:".
The glibc strptime implementation actually skips any amount of whitespace
in all the cases where a number is read, my current patch skips a single
space at the start of %d/%e but not the others, but doesn't subtract the
space length from the len characters.
One option would be to do the leading whitespace skipping in _M_extract_num
but take it into account how many digits can be read.
This matters for " 12:" and "%H:", but not for " 12:" and " %H:"
as in the latter case the space in the format string results in all the
whitespace at the start to be consumed.
Note, the allowing of a single digit rather than 2 changes a behavior in
other ways, e.g. when seeing 40 in a number for range [1, 31] we reject
it as before, but previously we'd keep *ret == '4' because it was assuming
it has to be 2 digits and 40 isn't valid, so we know error already on the
4, but now we accept the 4 as value and fail iff the next format string
doesn't match the 0.
Also, previously it wasn't really checking the number was in the right
range, it would accept 00 for [1, 31] numbers, or would accept 39.
Another thing is that %I was parsing 12 as tm_hour 12 rather than as tm_hour 0
like e.g. glibc does.
Another thing is that %t was matching a single tab and %n a single newline,
while strptime docs say it skips over whitespace (again, zero or more).
Another thing is that %p wasn't handled at all, I think this was the main
cause of
FAIL: 22_locale/time_get/get_time/char/2.cc execution test
FAIL: 22_locale/time_get/get_time/char/wrapped_env.cc execution test
FAIL: 22_locale/time_get/get_time/char/wrapped_locale.cc execution test
FAIL: 22_locale/time_get/get_time/wchar_t/2.cc execution test
FAIL: 22_locale/time_get/get_time/wchar_t/wrapped_env.cc execution test
FAIL: 22_locale/time_get/get_time/wchar_t/wrapped_locale.cc execution test
before this patch, because en_HK* locales do use %I and %p in it.
The patch handles %p only if it follows %I (i.e. when the hour is parsed
first), which is the more usual case (in glibc):
grep '%I' localedata/locales/* | grep '%I.*%p' | wc -l
282
grep '%I' localedata/locales/* | grep -v '%I.*%p' | wc -l
44
grep '%I' localedata/locales/* | grep -v '%p' | wc -l
17
The last case use %P instead of %p in t_fmt_ampm, not sure if that one
is never used by strptime because %P isn't handled by strptime.
Anyway, the right thing to handle even %p%I would be to pass some state
around through all the _M_extract_via_format calls like glibc passes
struct __strptime_state
{
unsigned int have_I : 1;
unsigned int have_wday : 1;
unsigned int have_yday : 1;
unsigned int have_mon : 1;
unsigned int have_mday : 1;
unsigned int have_uweek : 1;
unsigned int have_wweek : 1;
unsigned int is_pm : 1;
unsigned int want_century : 1;
unsigned int want_era : 1;
unsigned int want_xday : 1;
enum ptime_locale_status decided : 2;
signed char week_no;
signed char century;
int era_cnt;
} s;
around. That is for the %p case used like:
if (s.have_I && s.is_pm)
tm->tm_hour += 12;
during finalization, but handles tons of other cases which it is unclear
if libstdc++ needs or doesn't need to handle, e.g. strptime if one
specifies year and yday computes wday/mon/day from it, etc. basically for
the redundant fields computes them from other fields if those have been
parsed and are sufficient to determine it.
To do this we'd need to change ABI for the _M_extract_via_format,
though sure, we could add a wrapper around the new one with the old
arguments that would just use a dummy state. And we'd need a new
_M_whatever finalizer that would do those post parsing tweaks.
Also, %% wasn't handled.
For a whitespace in the strings there was inconsistent behavior,
_M_extract_via_format would require exactly that whitespace char (say
matching space, or matching tab), while the caller follows what
https://eel.is/c++draft/locale.time.get#members-8.5 says, that
when encountering whitespace it skips whitespace in the format and
then whitespace in the input if any. I've changed _M_extract_via_format
to skip whitespace in the input (looping over format isn't IMHO necessary,
because next iteration of the loop will handle that too).
Tested on x86_64-linux by make check-target-libstdc++-v3, ok for trunk
if it passes full bootstrap/regtest?
For the new 3.cc testcases, I have included hopefully correctly
corresponding C testcase using strptime in an attachment, and to the
extent where it can be compared (e.g. strptime on failure just
returns NULL, doesn't tell where it exactly stopped) I think the
only difference is that
str = "Novembur";
format = "%bembur";
ret = strptime (str, format, &time);
case where strptime accepts it but there is no way to do it with input
operator.
I admit I don't have libc++ or other STL libraries around to be able to
check how much the new 3.cc matches or disagrees with other implementations.
Now, the things not handled by this patch but which should be fixed (I
probably need to go back to compiler work) or at least looked at:
1) seems %j, %r, %U, %w and %W aren't handled (not sure if all of them
are already in POSIX 2009 or some are later)
2) I haven't touched the %y/%Y/%C and year handling stuff, that is
definitely not matching what POSIX 2009 says:
C All but the last two digits of the year {2}; leading zeros shall be permitted but shall not be required. A leading '+' or '−' character shall be permitted before
any leading zeros but shall not be required.
y The last two digits of the year. When format contains neither a C conversion specifier nor a Y conversion specifier, values in the range [69,99] shall refer to
years 1969 to 1999 inclusive and values in the range [00,68] shall refer to years 2000 to 2068 inclusive; leading zeros shall be permitted but shall not be re‐
quired. A leading '+' or '−' character shall be permitted before any leading zeros but shall not be required.
Note: It is expected that in a future version of this standard the default century inferred from a 2-digit year will change. (This would apply to all commands
accepting a 2-digit year as input.)
Y The full year {4}; leading zeros shall be permitted but shall not be required. A leading '+' or '−' character shall be permitted before any leading zeros but
shall not be required.
I've tried to avoid making changes to _M_extract_num for these as well
to keep current status quo (the __len == 4 cases). One thing is what
to do for things with %C %y and/or %Y in the formats, another thing
is what to do in the methods that directly perform _M_extract_num
for year
3) the above question what to do for leading whitespace of any numbers
being parsed
4) the %p%I issue mentioned above and generally what to do if we
pass state and have finalizers at the end of parsing
5) _M_extract_via_format is also inconsistent with its callers on handling
the non-whitespace characters in between format specifiers, the caller
follows https://eel.is/c++draft/locale.time.get#members-8.6 and does
case insensitive comparison:
// TODO real case-insensitive comparison
else if (__ctype.tolower(*__s) == __ctype.tolower(*__fmt) ||
__ctype.toupper(*__s) == __ctype.toupper(*__fmt))
while _M_extract_via_format only compares exact characters:
// Verify format and input match, extract and discard.
if (__format[__i] == *__beg)
++__beg;
(another question is if there is a better way how to do real
case-insensitive comparison of 2 characters and whether we e.g. need
to handle the Turkish i/İ and ı/I which have different number of bytes
in UTF-8)
6) _M_extract_name does something weird for case-sensitivity,
// NB: Some of the locale data is in the form of all lowercase
// names, and some is in the form of initially-capitalized
// names. Look for both.
if (__beg != __end)
and
if (__c == __names[__i1][0]
|| __c == __ctype.toupper(__names[__i1][0]))
for the first letter while just
__name[__pos] == *__beg
on all the following letters. strptime says:
In case a text string (such as the name of a day of the week or a month
name) is to be matched, the comparison is case insensitive.
so supposedly all the _M_extract_name comparisons should be case
insensitive.
2021-12-10 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/78714
* include/bits/locale_facets_nonio.tcc (_M_extract_via_format):
Mention in function comment it interprets strptime format string
rather than strftime. Handle %a and %A the same by accepting both
full and abbreviated names. Similarly handle %h, %b and %B the same.
Handle %d and %e the same by accepting possibly optional single space
and 1 or 2 digits. For %I store tm_hour 0 instead of tm_hour 12. For
%t and %n skip any whitespace. Handle %p and %%. For whitespace in
the string skip any whitespace.
(_M_extract_num): For __len == 2 accept 1 or 2 digits rather than
always 2. Don't punt early if __value * __mult is larget than __max
or smaller than __min - __mult, instead punt if __value > __max.
At the end verify __value is in between __min and __max and punt
otherwise.
(_M_extract_name): Allow non-unique names or names which are prefixes
of other names. Don't recompute lengths of names for every character.
* testsuite/22_locale/time_get/get/char/3.cc: New test.
* testsuite/22_locale/time_get/get/wchar_t/3.cc: New test.
* testsuite/22_locale/time_get/get_date/char/12791.cc (test01): Use
62 instead 60 and expect 6 to be accepted and thus *ret01 == '2'.
* testsuite/22_locale/time_get/get_date/wchar_t/12791.cc (test01):
Similarly.
* testsuite/22_locale/time_get/get_time/char/2.cc (test02): Add " PM"
to the string.
* testsuite/22_locale/time_get/get_time/char/5.cc (test01): Expect
tm_hour 1 rather than 0.
* testsuite/22_locale/time_get/get_time/wchar_t/2.cc (test02): Add
" PM" to the string.
* testsuite/22_locale/time_get/get_time/wchar_t/5.cc (test01): Expect
tm_hour 1 rather than 0.