7259a9d577
Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops). However this saves 129 bytes of data and totally avoids a chance of cache misses on __digits. * For base == 16 replacing the lookup in __digits table with arithmetic computations leads to a few additional instructions, but totally avoids a chance of cache misses on __digits (- ~9 cache misses for worst case) and saves 513 bytes of const data. * Replacing __first[pos] and __first[pos - 1] with __first[1] and __first[0] on final iterations saves ~2% of code size. * Removing trailing '\0' from arrays of digits allows the linker to merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and "0123456789abcdef" could share the same address). This improves data locality and reduces binary sizes. * Using __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len makes the operation O(1) instead of O(N). It also makes the code two times shorter. In sum: this significantly reduces the size of a binary (for about 4KBs only for base-8 conversion), deals with latency (CPU cache misses) without changing the iterations count and without adding costly instructions into the loops. 2019-08-30 Antony Polukhin <antoshkka@gmail.com> * include/std/charconv (__detail::__to_chars_8) __detail::__to_chars_16): Replace array of precomputed digits with arithmetic operations to avoid CPU cache misses. Remove zero termination from array of digits to allow symbol merge with generic implementation of __detail::__to_chars. Replace final offsets with constants. Use __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len. (__detail::__to_chars): Remove zero termination from array of digits. (__detail::__to_chars_2): Leading digit is always '1'. From-SVN: r275205 |
||
---|---|---|
.. | ||
algorithm | ||
any | ||
array | ||
atomic | ||
bit | ||
bitset | ||
charconv | ||
chrono | ||
codecvt | ||
complex | ||
condition_variable | ||
deque | ||
execution | ||
filesystem | ||
forward_list | ||
fstream | ||
functional | ||
future | ||
iomanip | ||
ios | ||
iosfwd | ||
iostream | ||
istream | ||
iterator | ||
limits | ||
list | ||
locale | ||
map | ||
memory | ||
memory_resource | ||
mutex | ||
numbers | ||
numeric | ||
optional | ||
ostream | ||
queue | ||
random | ||
ratio | ||
regex | ||
scoped_allocator | ||
set | ||
shared_mutex | ||
sstream | ||
stack | ||
stdexcept | ||
streambuf | ||
string | ||
string_view | ||
system_error | ||
thread | ||
tuple | ||
type_traits | ||
typeindex | ||
unordered_map | ||
unordered_set | ||
utility | ||
valarray | ||
variant | ||
vector | ||
version |