5d630cef4f
Don't invalidate TB with the end of zero overhead loop when LBEG or LEND change. Instead encode the distance from the start of the page where the TB starts to the LEND in the TB cs_base and generate loopback code when the next PC matches encoded LEND. Distance to a destination within the same page and up to a maximum instruction length into the next page is encoded literally, otherwise it's zero. The distance from LEND to LBEG is also encoded in the cs_base: it's encoded literally when less than 256 or as 0 otherwise. This allows for TB chaining for the loopback branch at the end of a loop for the most common loop sizes. With this change the resulting emulation speed is about 10% higher in softmmu mode on uClibc-ng and LTP tests. Emulation speed in linux user mode is a few percent lower because there's no direct TB chaining between different memory pages. Testing with lower limit on direct TB chaining range shows gradual slowdown to ~15% for the block size of 64 bytes and ~50% for the block size of 32 bytes. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> |
||
---|---|---|
.. | ||
core-dc232b | ||
core-dc233c | ||
core-de212 | ||
core-fsf | ||
core-sample_controller | ||
core-test_kc705_be | ||
core-dc232b.c | ||
core-dc233c.c | ||
core-de212.c | ||
core-fsf.c | ||
core-sample_controller.c | ||
core-test_kc705_be.c | ||
cpu-qom.h | ||
cpu.c | ||
cpu.h | ||
gdbstub.c | ||
helper.c | ||
helper.h | ||
import_core.sh | ||
Makefile.objs | ||
monitor.c | ||
op_helper.c | ||
overlay_tool.h | ||
translate.c | ||
xtensa-isa-internal.h | ||
xtensa-isa.c | ||
xtensa-isa.h | ||
xtensa-semi.c |