The proposed resolution for this library issue simplifies the
constraints for compare_three_way, ranges::equal_to, ranges::less etc.
so that they do not work with types which are convertible to pointers
but which fail to meet the usual syntactic requirements for the
comparisons.
This affects the example in PR libstdc++/93628 but doesn't fix the
problem described in that report.
libstdc++-v3/ChangeLog:
* include/bits/ranges_cmp.h (__eq_builtin_ptr_cmp): Remove.
(ranges::equal_to, ranges::not_equal_to): Do not constrain
with __eq_builtin_ptr_cmp.
(ranges::less, ranges::greater, ranges::less_equal)
(ranges::greater_equal): Do not constrain with
__less_builtin_ptr_cmp.
* libsupc++/compare (compare_three_way): Do not constrain with
__3way_builtin_ptr_cmp.
* testsuite/18_support/comparisons/object/builtin-ptr-three-way.cc: Moved to...
* testsuite/18_support/comparisons/object/lwg3530.cc: ...here.
* testsuite/20_util/function_objects/range.cmp/lwg3530.cc: New test.
As can be seen on:
unsigned char f1 (unsigned char x, int y) { return std::rotl (x, y); }
unsigned char f2 (unsigned char x, int y) { return std::rotr (x, y); }
unsigned short f3 (unsigned short x, int y) { return std::rotl (x, y); }
unsigned short f4 (unsigned short x, int y) { return std::rotr (x, y); }
unsigned int f5 (unsigned int x, int y) { return std::rotl (x, y); }
unsigned int f6 (unsigned int x, int y) { return std::rotr (x, y); }
unsigned long int f7 (unsigned long int x, int y) { return std::rotl (x, y); }
unsigned long int f8 (unsigned long int x, int y) { return std::rotr (x, y); }
unsigned long long int f9 (unsigned long long int x, int y) { return std::rotl (x, y); }
unsigned long long int f10 (unsigned long long int x, int y) { return std::rotr (x, y); }
//unsigned __int128 f11 (unsigned __int128 x, int y) { return std::rotl (x, y); }
//unsigned __int128 f12 (unsigned __int128 x, int y) { return std::rotr (x, y); }
constexpr auto a = std::rotl (1234U, 0);
constexpr auto b = std::rotl (1234U, 5);
constexpr auto c = std::rotl (1234U, -5);
constexpr auto d = std::rotl (1234U, -__INT_MAX__ - 1);
the current <bit> definitions of std::__rot[lr] aren't pattern recognized
as rotates, they are too long/complex for that, starting with signed modulo,
special case for 0 and different cases for positive and negative.
For types with power of two bits the following patch adds definitions that
the compiler can pattern recognize and turn e.g. on x86_64 into ro[lr][bwlq]
instructions. For weirdo types like unsigned __int20 etc. it keeps the
current definitions.
2021-03-06 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/99396
* include/std/bit (__rotl, __rotr): Add optimized variants for power of
two _Nd which the compiler can pattern match the rotates.
This seems to be a typo/thinko in the definition of the arrays used as
storage.
libstdc++-v3/ChangeLog:
PR libstdc++/99382
* testsuite/20_util/specialized_algorithms/uninitialized_default_n/sizes.cc:
Make storage larger than required. Verify no write to the last
element.
* testsuite/20_util/specialized_algorithms/uninitialized_value_construct_n/sizes.cc:
Likewise.
The following patch updates the Solaris baselines for GCC 11.1. There's
only one caveat: comparing the Solaris 11.3 and 11.4 baselines, I find
+FUNC:_ZSt10from_charsPKcS0_RdSt12chars_format@@GLIBCXX_3.4.29
+FUNC:_ZSt10from_charsPKcS0_ReSt12chars_format@@GLIBCXX_3.4.29
+FUNC:_ZSt10from_charsPKcS0_RfSt12chars_format@@GLIBCXX_3.4.29
i.e.
std::from_chars(char const*, char const*, double&, std::chars_format)
and similarly for long double, float. Those are from from
src/c++17/floating_from_chars.cc and only defined if
_GLIBCXX_HAVE_USELOCALE, i.e. depend on the XPG7 addition. Given that
only Solaris 11.4 supports XPG7, I've taken the 11.3 baselines to avoid
having separate ones for 11.3 and 11.4.
Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (sparc and x86,
32 and 64-bit, 11.3 and 11.4).
2021-02-10 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libstdc++-v3:
* config/abi/post/i386-solaris/baseline_symbols.txt: Regenerate.
* config/abi/post/i386-solaris/amd64/baseline_symbols.txt:
Likewise.
* config/abi/post/sparc-solaris/baseline_symbols.txt: Likewise.
* config/abi/post/sparc-solaris/sparcv9/baseline_symbols.txt:
Likewise.
Two simd tests FAIL on Solaris, both SPARC and x86:
FAIL: experimental/simd/standard_abi_usable.cc -msse2 -O2 -Wno-psabi (test for excess errors)
FAIL: experimental/simd/standard_abi_usable_2.cc -msse2 -O2 -Wno-psabi (test for excess errors)
This happens because the simd headers use identifiers documented in the
libstdc++ manual as reserved by system headers.
Fixed as follows, tested on i386-pc-solaris2.11, sparc-sun-solaris2.11,
and x86_64-pc-linux-gnu.
2021-02-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libstdc++-v3:
* include/experimental/bits/simd.h: Replace reserved _X, _B by
_Xp, _Bp.
* include/experimental/bits/simd_builtin.h: Likewise.
* include/experimental/bits/simd_x86.h: Likewise.
The conversions to integer types are explicit, so need to use the
correct type. Converting to uint32_t only works if that is the same type
as unsigned.
libstdc++-v3/ChangeLog:
PR libstdc++/99301
* include/std/chrono (year_month_day::_M_days_since_epoch()):
Convert chrono::month and chrono::day to unsigned before
converting to uint32_t.
Implement P1682R2 as just approved for C++23.
libstdc++-v3/ChangeLog:
* include/std/utility (to_underlying): Define.
* include/std/version (__cpp_lib_to_underlying): Define.
* testsuite/20_util/to_underlying/1.cc: New test.
* testsuite/20_util/to_underlying/version.cc: New test.
The code path in __floating_to_chars_precision for handling long double
by going through printf now also handles __float128, so the condition
that guards this code path needs to get updated accordingly.
libstdc++-v3/ChangeLog:
* src/c++17/floating_to_chars.cc (__floating_to_chars_precision):
Relax the condition that guards the printf code path to accept
F128_type as well as long double.
This patch reimplements std::chrono::year_month_day_last:day() which yields the
last day of a particular month. The current implementation uses a look-up table
implemented as an unsigned[12] array. The new implementation instead
is based on
the fact that a month m in [1, 12], except for m == 2 (February), is
either 31 or
30 days long and m's length depends on two things: m's parity and whether m >= 8
or not. These two conditions are determined by the 0th and 3th bit of m and,
therefore, cheap and straightforward bit-twiddling can provide the right result.
Measurements in x86_64 [1] suggest a 10% performance boost. Although this does
not seem to be huge, notice that measurements are done in hot L1 cache
conditions which might not be very representative of production runs. Also
freeing L1 cache from holding the look-up table might allow performance
improvements elsewhere.
References:
[1] https://github.com/cassioneri/calendar
libstdc++-v3/ChangeLog:
* include/std/chrono (year_month_day_last:day): New
implementation.
This patch reimplements std::chrono::year::is_leap(). Leap year check is
ubiquitously implemented (including here) as:
y % 4 == 0 && (y % 100 != 0 || y % 400 == 0).
The rationale being that testing divisibility by 4 first implies an earlier
return for 75% of the cases, therefore, avoiding the needless calculations of
y % 100 and y % 400. Although this fact is true, it does not take into account
the cost of branching. This patch, instead, tests divisibility by 100 first:
(y % 100 != 0 || y % 400 == 0) && y % 4 == 0.
It is certainly counterintuitive that this could be more efficient since among
the three divisibility tests (4, 100 and 400) the one by 100 is the only one
that can never provide a definitive answer and a second divisibility test (by 4
or 400) is always required. However, measurements [1] in x86_64 suggest this is
3x more efficient! A possible explanation is that checking divisibility by 100
first implies a split in the execution path with probabilities of (1%, 99%)
rather than (25%, 75%) when divisibility by 4 is checked first. This decreases
the entropy of the branching distribution which seems to help prediction.
Given that y belongs to [-32767, 32767] [time.cal.year.members], a more
efficient algorithm [2] to check divisibility by 100 is used (instead of
y % 100 != 0). Measurements suggest that this optimization improves performance
by 20%.
The patch adds a test that exhaustively compares the result of this
implementation with the ubiquitous one for all y in [-32767, 32767]. Although
its completeness, the test completes in a matter of seconds.
References:
[1] https://stackoverflow.com/a/60646967/1137388
[2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
libstdc++-v3/ChangeLog:
* include/std/chrono (year::is_leap): New implementation.
* testsuite/std/time/year/2.cc: New test.
This patch reimplements std::chrono::year_month_day::_M_days_since_epoch()
which calculates the number of elapsed days since 1970/01/01. The new
implementation is based on Proposition 6.2 of Neri and Schneider, "Euclidean
Affine Functions and Applications to Calendar Algorithms" available at
https://arxiv.org/abs/2102.06959.
The aforementioned paper benchmarks the implementation against several
counterparts, including libc++'s (which is identical to the current
implementation). The results, shown in Figure 3, indicate the new algorithm is
1.7 times faster than the current one.
The patch adds a test which loops through all dates in [-32767/01/01,
32767/12/31], and for each of them, gets the number of days and compares the
result against its expected value. The latter is calculated using a much
simpler and easy to understand algorithm but which is also much slower.
The dates used in the test covers the full range of possible values
[time.cal.year.members]. Despite its completeness the test runs in matter of
seconds.
libstdc++-v3/ChangeLog:
* include/std/chrono (year_month_day::_M_days_since_epoch):
New implementation.
* testsuite/std/time/year_month_day/4.cc: New test.
This patch reimplements std::chrono::year_month_day::_S_from_days() which
retrieves a date from the number of elapsed days since 1970/01/01. The new
implementation is based on Proposition 6.3 of Neri and Schneider, "Euclidean
Affine Functions and Applications to Calendar Algorithms" available at
https://arxiv.org/abs/2102.06959.
The aforementioned paper benchmarks the implementation against several
counterparts, including libc++'s (which is identical to the current
implementation). The results, shown in Figure 4, indicate the new algorithm is
2.2 times faster than the current one.
The patch adds a test which loops through all integers in [-12687428, 11248737],
and for each of them, gets the corresponding date and compares the result
against its expected value. The latter is calculated using a much simpler and
easy to understand algorithm but which is also much slower.
The interval used in the test covers the full range of values for which a
roundtrip must work [time.cal.ymd.members]. Despite its completeness the test
runs in a matter of seconds.
libstdc++-v3/ChangeLog:
* include/std/chrono (year_month_day::_S_from_days): New
implementation.
* testsuite/std/time/year_month_day/3.cc: New test.
The long double std::to_chars testcase currently verifies the
correctness of its output by comparing it to that of printf, so if
there's a mismatch between to_chars and printf, the test FAILs. This
works well for the scientific, fixed and general formatting modes,
because the corresponding printf conversion specifiers (%e, %f and %g)
are rigidly specified.
But this doesn't work well for the hex formatting mode because the
corresponding printf conversion specifier %a is more flexibly specified.
For instance, the hexadecimal forms 0x1p+0, 0x2p-1, 0x4p-2 and 0x8p-3
are all equivalent and valid outputs of the %a specifier for the number 1.
The apparent freedom here is the choice of leading hex digit -- the
standard just requires that the leading hex digit is nonzero for
normalized numbers.
Currently, our hexadecimal formatting implementation uses 0/1/2 as the
leading hex digit for floating point types that have an implicit leading
mantissa bit which in practice means all supported floating point types
except x86 long double. The latter type has a 64 bit mantissa with an
explicit leading mantissa bit, and for this type our implementation uses
the most significant four bits of the mantissa as leading hex digit.
This seems to be consistent with most printf implementations, but not
all, as PR98384 illustrates.
In order to avoid false-positive FAILs due to arbitrary disagreement
between to_chars and printf about the choice of leading hex digit, this
patch makes the testcase's verification via printf conditional on the
leading hex digits first agreeing. An additional verification step is
also added: round-tripping the output of to_chars through from_chars
should recover the value exactly.
libstdc++-v3/ChangeLog:
PR libstdc++/98384
* testsuite/20_util/to_chars/long_double.cc: Include <optional>.
(test01): Simplify verifying the nearby values by using a
2-iteration loop and a dedicated output buffer to check that the
nearby values are different. Factor out the printf-based
verification into a local function, and check that the leading
hex digits agree before comparing to the output of printf. Also
verify the output by round-tripping it through from_chars.
This adds overloads of std::to_chars for powerpc64's __ieee128, so that
std::to_chars can be used for long double when -mabi=ieeelongdouble is
in used.
Eventually we'll want to extend these new overloads to work for
__float128 on all targets that support that type. For now, we're only
doing it for powerpc64 when the new long double type is supported in
parallel to the old long double type.
Additionally the existing std::to_chars overloads for long double
are given the right symbol version, resolving PR libstdc++/98389.
libstdc++-v3/ChangeLog:
PR libstdc++/98389
* config/abi/pre/gnu.ver (GLIBCXX_3.4.29): Do not match to_chars
symbols for long double arguments mangled as 'g'.
* config/os/gnu-linux/ldbl-extra.ver: Likewise.
* config/os/gnu-linux/ldbl-ieee128-extra.ver: Likewise.
* src/c++17/Makefile.am [GLIBCXX_LDBL_ALT128_COMPAT_TRUE]:
Use -mabi=ibmlongdouble for floating_to_chars.cc.
* src/c++17/Makefile.in: Regenerate.
* src/c++17/floating_to_chars.cc (floating_type_traits_binary128):
New type defining type traits of IEEE binary128 format.
(floating_type_traits<__float128>): Define specialization.
(floating_type_traits<long double>): Define in terms of
floating_type_traits_binary128 when appropriate.
(floating_to_shortest_scientific): Handle __float128.
(sprintf_ld): New function template for printing a long double
or __ieee128 value using sprintf.
(__floating_to_chars_shortest, __floating_to_chars_precision):
Use sprintf_ld.
(to_chars): Define overloads for __float128.
libstdc++-v3/ChangeLog:
PR c++/99074
* libsupc++/dyncast.cc (__dynamic_cast): Return null when
first argument is null.
gcc/testsuite/ChangeLog:
PR c++/99074
* g++.dg/warn/Wnonnull11.C: New test.
Because of LWG 467, std::char_traits<char>::lt compares the values
cast to unsigned char rather than char, so even when char is signed
we get unsigned comparision. std::char_traits<char>::compare uses
__builtin_memcmp and that works the same, but during constexpr evaluation
we were calling __gnu_cxx::char_traits<char_type>::compare. As
char_traits::lt is not virtual, __gnu_cxx::char_traits<char_type>::compare
used __gnu_cxx::char_traits<char_type>::lt rather than
std::char_traits<char>::lt and thus compared chars as signed if char is
signed.
This change fixes it by inlining __gnu_cxx::char_traits<char_type>::compare
into std::char_traits<char>::compare by hand, so that it calls the right
lt method.
2021-02-23 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/99181
* include/bits/char_traits.h (char_traits<char>::compare): For
constexpr evaluation don't call
__gnu_cxx::char_traits<char_type>::compare but do the comparison loop
directly.
* testsuite/21_strings/char_traits/requirements/char/99181.cc: New
test.
In GCC 10, parallel_backend.h just included parallel_backend_{serial,tbb}.h and
did nothing beyond that, and parallel_backend_tbb.h provided directly
namespace __pstl { namespace __par_backend { ... } }
and defined everything in there, while parallel_backend_serial.h did:
namespace __pstl { namespace __serial { ... } } and had this
namespace __pstl { namespace __par_backend { using namespace __pstl::__serial; } }
at the end.
In GCC 11, parallel_backend.h does:
namespace __pstl { namespace __par_backend = __serial_backend; }
after including parallel_backend_serial.h or
namespace __pstl { namespace __par_backend = __tbb_backend; }
after including parallel_backend_tbb.h. The latter then has:
namespace __pstl { namespace __tbb_backend { ... } }
and no using etc. at the end, while parallel_backend_serial.h changed to:
namespace __pstl { namespace __serial_backend { ... } }
but has this leftover block from the GCC 10 times. Even changing that
using namespace __pstl::__serial;
to
using namespace __pstl::__serial_backend;
doesn't work, as it clashes with
namespace __pstl { namespace __par_backend = __serial_backend; }
in parallel_backend.h.
2021-02-23 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/97549
* include/pstl/parallel_backend_serial.h: Remove __pstl::__par_backend.
The code in std::to_chars for extracting the high- and low-order parts
of an IBM long double value does the right thing on powerpc64le, but not
on powerpc64be. This patch makes the extraction endian-agnostic, which
fixes the execution FAIL of to_chars/long_double.cc on powerpc64be.
libstdc++-v3/ChangeLog:
PR libstdc++/98384
* src/c++17/floating_to_chars.cc (get_ieee_repr): Extract
the high- and low-order parts from an IBM long double value
in an endian-agnostic way.
My recent change to the preprocessor conditions in __thread_relax() was
supposed to also change the __gthread_yield() call to __thread_yield(),
which has the right preprocessor checks. Instead I just removed the
check for _GLIBCXX_USE_SCHED_YIELD which means the __gthread_yield()
call will be ill-formed for non-gthreads targets, and targets without
sched_yield(). This fixes it properly.
libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h (__thread_relax()): Call
__thread_yield() not __gthread_yield().
The __gthread_yield() function is only defined for gthreads targets, so
check _GLIBCXX_HAS_GTHREADS before using it.
Also reorder __thread_relax and __thread_yield so that the former can
use the latter instead of repeating the same preprocessor checks.
libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h (__thread_yield()): Check
_GLIBCXX_HAS_GTHREADS before using __gthread_yield.
(__thread_relax()): Use __thread_yield() instead of repeating
the preprocessor checks for __gthread_yield.
The once_flag::_M_activate() function is only ever called immediately
after a call to once_flag::_M_passive(), and so in the non-gthreads case
it is impossible for _M_passive() to be true in the body of
_M_activate(). Add a check for it anyway, to avoid warnings about
missing return.
Also replace a non-reserved name with a reserved one.
libstdc++-v3/ChangeLog:
* include/std/mutex (once_flag::_M_activate()): Add explicit
return statement for passive case.
(once_flag::_M_finish(bool)): Use reserved name for parameter.
I forgot that the workaround is present in both filesystem::status and
filesystem::symlink_status. This restores it in the latter.
libstdc++-v3/ChangeLog:
PR libstdc++/88881
* src/c++17/fs_ops.cc (fs::symlink_status): Re-enable workaround.
The _wrename function won't overwrite an existing file, so use
MoveFileEx instead. That allows renaming directories over files, which
POSIX doesn't allow, so check for that case explicitly and report an
error.
Also document the deviation from the expected behaviour, and add a test
for filesystem::rename which was previously missing.
The Filesystem TS experimental::filesystem::rename doesn't have that
extra code to handle directories correctly, so the relevant parts of the
new test are not run on Windows.
libstdc++-v3/ChangeLog:
* doc/xml/manual/status_cxx2014.xml: Document implementation
specific properties of std::experimental::filesystem::rename.
* doc/xml/manual/status_cxx2017.xml: Document implementation
specific properties of std::filesystem::rename.
* doc/html/*: Regenerate.
* src/c++17/fs_ops.cc (fs::rename): Implement correct behaviour
for directories on Windows.
* src/filesystem/ops-common.h (__gnu_posix::rename): Use
MoveFileExW on Windows.
* testsuite/27_io/filesystem/operations/rename.cc: New test.
* testsuite/experimental/filesystem/operations/rename.cc: New test.
The helper function for creating new paths doesn't work well on Windows,
because the PID of a process started by Wine is very consistent and so
the same path gets created each time.
libstdc++-v3/ChangeLog:
* testsuite/util/testsuite_fs.h (nonexistent_path): Add
random number to the path.
libstdc++-v3/ChangeLog:
* include/experimental/internet (address_v6::to_string): Include
scope ID in string.
* testsuite/experimental/net/internet/address/v6/members.cc:
Test to_string() results.
This avoids some warnings when building with -fno-rtti because the
function parameters are only used when RTTI is enabled.
libstdc++-v3/ChangeLog:
* include/bits/shared_ptr_base.h (__shared_ptr::_M_get_deleter):
Add unused attribute to parameter.
* src/c++11/shared_ptr.cc (_Sp_make_shared_tag::_S_eq):
Likewise.
The std::emit_on_flush manipulator depends on dynamic_cast, so fails
without RTTI.
The std::async code can't catch a forced_unwind exception when RTTI is
disabled, so it can't rethrow it either, and the test aborts.
libstdc++-v3/ChangeLog:
* testsuite/27_io/basic_ostream/emit/1.cc: Expect test to fail
if -fno-rtti is used.
* testsuite/30_threads/async/forced_unwind.cc: Expect test
to abort if -fno-rtti is used.