Chain the temporaries together via pointers intstead of indices.
The mem_reg value is now mem_base->reg. This will be important later.
This does require that the frame pointer have a global temporary
allocated for it. This is simple bar the existing reserved_regs check.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Thus, use cpu_env as the parameter, not TCG_AREG0 directly.
Update all uses in the translators.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Undo the workaround at b17a6d3390.
If there are lots of memory operations in a TB, the slow path code
can exceed the highwater reservation. Add a check within the loop.
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Split the bits that require it to exec/log.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1452174932-28657-8-git-send-email-den@openvz.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-16-git-send-email-peter.maydell@linaro.org
If there are a lot of guest memory ops in the TB, the amount of
code generated by tcg_out_tb_finalize could be well more than 1k.
In the short term, increase the reservation larger than any TB
seen in practice.
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
A simple typo in the variable to use when comparing vs the highwater mark.
Reports are that qemu can in fact segfault occasionally due to this mistake.
Signed-off-by: John Clarke <johnc@kirriwa.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Extend MIPS movcond implementation to support the SELNEZ/SELEQZ
instructions introduced in MIPS r6 (where MOVN/MOVZ have been removed).
Whereas the "MOVN/MOVZ rd, rs, rt" instructions have the following
semantics:
rd = [!]rt ? rs : rd
The "SELNEZ/SELEQZ rd, rs, rt" instructions are slightly different:
rd = [!]rt ? rs : 0
First we ensure that if one of the movcond input values is zero that it
comes last (we can swap the input arguments if we invert the condition).
This is so that it can exactly match one of the SELNEZ/SELEQZ
instructions and avoid the need to emit the other one.
Otherwise we emit the opposite instruction first into a temporary
register, and OR that into the result:
SELNEZ/SELEQZ TMP1, v2, c1
SELEQZ/SELNEZ ret, v1, c1
OR ret, ret, TMP1
Which does the following:
ret = cond ? v1 : v2
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-7-git-send-email-james.hogan@imgtec.com>
MIPSr6 adds several new integer multiply, divide, and modulo
instructions, and removes several pre-r6 encodings, along with the HI/LO
registers which were the implicit operands of some of those
instructions. Update TCG to use the new instructions when built for r6.
The new instructions actually map much more directly to the TCG ops, as
they only provide a single 32-bit half of the result and in a normal
general purpose register instead of HI or LO.
The mulu2_i32 and muls2_i32 operations are no longer appropriate for r6,
so they are removed from the TCG opcode table. This is because they
would need to emit two separate host instructions anyway (for the high
and low half of the result), which TCG can arrange automatically for us
in the absense of mulu2_i32/muls2_i32 by splitting it into mul_i32 and
mul*h_i32 TCG ops.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-6-git-send-email-james.hogan@imgtec.com>
MIPSr6 encodes JR as JALR with zero as the link register, and the pre-r6
JR encoding is removed. Update TCG to use the new encoding when built
for r6.
We still use the old encoding for pre-r6, so as not to confuse return
prediction stack hardware which may detect only particular encodings of
the return instruction.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-5-git-send-email-james.hogan@imgtec.com>
Add definition use_mips32r6_instructions to the MIPS TCG backend which
is constant 1 when built for MIPS release 6. This will be used to decide
between pre-R6 and R6 instruction encodings.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-4-git-send-email-james.hogan@imgtec.com>
We already have a TLADDR_ARGS definition, so rearrange the order
slightly and use it in the definition of insn_start, instead of
having an #ifdef.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-2-git-send-email-james.hogan@imgtec.com>
Restrict the size of code_gen_buffer to 2GB on ppc64, which
lets us assert that everything is reachable with addis+addi
from tb_ret_addr. This lets us use a max of 4 insns for goto_tb
instead of 7.
Emit the indirect branch portion of goto_tb up front, which
means we only have to update two insns to update any link.
With a 64-bit store, we can update the link atomically, which
may be required in future.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Changing the prologue to the beginning of the code_gen_buffer
changes the direction of the "return" branch. Need to change
the logic to match.
Signed-off-by: Richard Henderson <rth@twiddle.net>
We currently pre-compute an worst case code size for any TB, which
works out to be 122kB. Since the average TB size is near 1kB, this
wastes quite a lot of storage.
Instead, check for overflow in between generating code for each opcode.
The overhead of the check isn't measurable and wastage is minimized.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
By putting the prologue at the end, we risk overwriting the
prologue should our estimate of maximum TB size. Given the
two different placements of the call to tcg_prologue_init,
move the high water mark computation into tcg_prologue_init.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It's no longer used, so tidy up everything reached by it.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It is no longer used, so tidy up everything reached by it.
This includes the gen_opc_* arrays, the search_pc parameter
and the inline gen_intermediate_code_internal functions.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We can now restore state without retranslation.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The gen_opc_* arrays are already redundant with the data stored in
the insn_start arguments. Transition restore_state_to_opc to use
data from the latter.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Adjust all translators to respect it.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
With an eye toward having this data replace the gen_opc_* arrays
that each target collects in order to enable restore_state_from_tb.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
With an eye toward making it mandatory.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Instead of computing mem_index and s_bits in both tcg_out_qemu_ld and
tcg_out_qemu_st function and passing them to tcg_out_tlb_load, directly
pass oi to the tcg_out_tlb_load function and compute mem_index and
s_bits there.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Somehow the tcg_out_addsub2 function ended-up in the middle of the
qemu_ld/st related functions. Move it with other arithmetics related
functions.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The MIPS TCG backend implements qemu_ld with 64-bit targets using the v0
register (base) as a temporary to load the upper half of the QEMU TLB
comparator (see line 5 below), however this happens before the input
address is used (line 8 to mask off the low bits for the TLB
comparison, and line 12 to add the host-guest offset). If the input
address (addrl) also happens to have been placed in v0 (as in the second
column below), it gets clobbered before it is used.
addrl in t2 addrl in v0
1 srl a0,t2,0x7 srl a0,v0,0x7
2 andi a0,a0,0x1fe0 andi a0,a0,0x1fe0
3 addu a0,a0,s0 addu a0,a0,s0
4 lw at,9136(a0) lw at,9136(a0) set TCG_TMP0 (at)
5 lw v0,9140(a0) lw v0,9140(a0) set base (v0)
6 li t9,-4093 li t9,-4093
7 lw a0,9160(a0) lw a0,9160(a0) set addend (a0)
8 and t9,t9,t2 and t9,t9,v0 use addrl
9 bne at,t9,0x836d8c8 bne at,t9,0x836d838 use TCG_TMP0
10 nop nop
11 bne v0,t8,0x836d8c8 bne v0,a1,0x836d838 use base
12 addu v0,a0,t2 addu v0,a0,v0 use addrl, addend
13 lw t0,0(v0) lw t0,0(v0)
Fix by using TCG_TMP0 (at) as the temporary instead of v0 (base),
pushing the load on line 5 forward into the delay slot of the low
comparison (line 10). The early load of the addend on line 7 also needs
pushing even further for 64-bit targets, or it will clobber a0 before
we're done with it. The output for 32-bit targets is unaffected.
srl a0,v0,0x7
andi a0,a0,0x1fe0
addu a0,a0,s0
lw at,9136(a0)
-lw v0,9140(a0) load high comparator
li t9,-4093
-lw a0,9160(a0) load addend
and t9,t9,v0
bne at,t9,0x836d838
- nop
+ lw at,9140(a0) load high comparator
+lw a0,9160(a0) load addend
-bne v0,a1,0x836d838
+bne at,a1,0x836d838
addu v0,a0,v0
lw t0,0(v0)
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This requires global visibility to common code. Move to tcg-common.
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <cb0340eba225ab4945aa6cf7c9013f33aa05bcf8.1441614289.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
tcg_op_defs (and the _max) are both needed by the TCI disassembler. For
multi-arch, tcg.c will be multiple-compiled (arch-obj) with its symbols
hidden from common code. So split the definition off to new file,
tcg-common.c which will remain a regular obj-y for use by both the TCI
disas as well as the multiple tcg.c's.
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <4b607425886d85aee65878e4935dfad46b3e6085.1441614289.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qemu_mutex_lock_iothread "No such process" fix
* cutils: qemu_strto* wrappers
* iohandler.c simplification
* Many other fixes and misc patches.
And some MTTCG work (with Emilio's fixes squashed):
* Signal-free TCG kick
* Removing spinlock in favor of QemuMutex
* User-mode emulation multi-threading fixes/docs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJV8Tk7AAoJEL/70l94x66Ds3QH/3bi0RRR2NtKIXAQrGo5tfuD
NPMu1K5Hy+/26AC6mEVNRh4kh7dPH5E4NnDGbxet1+osvmpjxAjc2JrxEybhHD0j
fkpzqynuBN6cA2Gu5GUNoKzxxTmi2RrEYigWDZqCftRXBeO2Hsr1etxJh9UoZw5H
dgpU3j/n0Q8s08jUJ1o789knZI/ckwL4oXK4u2KhSC7ZTCWhJT7Qr7c0JmiKReaF
JEYAsKkQhICVKRVmC8NxML8U58O8maBjQ62UN6nQpVaQd0Yo/6cstFTZsRrHMHL3
7A2Tyg862cMvp+1DOX3Bk02yXA+nxnzLF8kUe0rYo6llqDBDStzqyn1j9R0qeqA=
=nB06
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Support for jemalloc
* qemu_mutex_lock_iothread "No such process" fix
* cutils: qemu_strto* wrappers
* iohandler.c simplification
* Many other fixes and misc patches.
And some MTTCG work (with Emilio's fixes squashed):
* Signal-free TCG kick
* Removing spinlock in favor of QemuMutex
* User-mode emulation multi-threading fixes/docs
# gpg: Signature made Thu 10 Sep 2015 09:03:07 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
* remotes/bonzini/tags/for-upstream: (44 commits)
cutils: work around platform differences in strto{l,ul,ll,ull}
cpu-exec: fix lock hierarchy for user-mode emulation
exec: make mmap_lock/mmap_unlock globally available
tcg: comment on which functions have to be called with mmap_lock held
tcg: add memory barriers in page_find_alloc accesses
remove unused spinlock.
replace spinlock by QemuMutex.
cpus: remove tcg_halt_cond and tcg_cpu_thread globals
cpus: protect work list with work_mutex
scripts/dump-guest-memory.py: fix after RAMBlock change
configure: Add support for jemalloc
add macro file for coccinelle
configure: factor out adding disas configure
vhost-scsi: fix wrong vhost-scsi firmware path
checkpatch: remove tests that are not relevant outside the kernel
checkpatch: adapt some tests to QEMU
CODING_STYLE: update mixed declaration rules
qmp: Add example usage of strto*l() qemu wrapper
cutils: Add qemu_strtoull() wrapper
cutils: Add qemu_strtoll() wrapper
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch introduces several helpers to pass return address
which points to the TB. Correct return address allows correct
restoring of the guest PC and icount. These functions should be used when
helpers embedded into TB invoke memory operations.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150710095650.13280.32255.stgit@PASHA-ISP>
Signed-off-by: Richard Henderson <rth@twiddle.net>
spinlock is only used in two cases:
* cpu-exec.c: to protect TranslationBlock
* mem_helper.c: for lock helper in target-i386 (which seems broken).
It's a pthread_mutex_t in user-mode, so we can use QemuMutex directly,
with an #ifdef. The #ifdef will be removed when multithreaded TCG
will need the mutex as well.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Message-Id: <1439220437-23957-5-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
[Merge Emilio G. Cota's patch to remove volatile. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When computing the TLB address we are likely to mask out the high
32-bits by using shr + and. We can use 32-bit instructions in that
case. This saves 2 bytes per TLB access.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1437306632-20655-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
In ffc6372851, we swapped the guest
base to the address base register from the address index register.
Except that 31 in the base slot is SP not XZR, so we need to be
more intelligent about which reg gets placed in which slot.
Cc: qemu-stable@nongnu.org (v2.4.0)
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
guest_base must be used only in linux-user mode.
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-id: 1440757421-9674-1-git-send-email-laurent@vivier.eu
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
As we have removed CONFIG_USE_GUEST_BASE, we always use a guest base
and the macros GUEST_BASE and RESERVED_VA become useless: replace
them by their values.
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <1440420834-8388-1-git-send-email-laurent@vivier.eu>
Signed-off-by: Richard Henderson <rth@twiddle.net>
All tcg host architectures now support the guest base and as
there is no real performance lost, it can be always enabled.
Anyway, guest base use can be disabled lively by setting guest
base to 0.
CONFIG_USE_GUEST_BASE is defined as (USE_GUEST_BASE && USER_ONLY),
it should have to be replaced by CONFIG_USER_ONLY in non CONFIG_USER_ONLY
parts, but as some other parts are using !CONFIG_SOFTMMU I have chosen to
use !CONFIG_SOFTMMU instead.
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <1440373328-9788-2-git-send-email-laurent@vivier.eu>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Currently, we get to the slow path for any unaligned access in the
backend, because we effectively preserve the bottom address bits
below the alignment requirement when comparing with the TLB entry,
so any non-0 bit there will cause the compare to fail.
For the same number of instructions, we can instead add the access
size - 1 to the address and stick to clearing all the bottom bits.
That means that normal unaligned accesses will not fallback (the HW
will handle them fine). Only when crossing a page boundary well we
end up having a mismatch because we'll end up pointing to the next
page which cannot possibly be in that same TLB entry.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Message-Id: <1437455978.5809.2.camel@kernel.crashing.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Softmmu unaligned load/stores currently goes through through the slow
path for two reasons:
- to support unaligned access on host with strict alignement
- to correctly handle accesses crossing pages
x86 is only concerned by the second reason. Unaligned accesses are
avoided by compilers, but are not uncommon. We therefore would like
to see them going through the fast path, if they don't cross pages.
For that we can use the fact that two adjacent TLB entries can't contain
the same page. Therefore accessing the TLB entry corresponding to the
first byte, but comparing its content to page address of the last byte
ensures that we don't cross pages. We can do this check without adding
more instructions in the TLB code (but increasing its length by one
byte) by using the LEA instruction to combine the existing move with the
size addition.
On an x86-64 host, this gives a 3% boot time improvement for a powerpc
guest and 4% for an x86-64 guest.
[rth: Tidied calculation of the offset mask]
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436467197-2183-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Rather than allow arbitrary shift+trunc, only concern ourselves
with low and high parts. This is all that was being used anyway.
Signed-off-by: Richard Henderson <rth@twiddle.net>
They behave the same as ext32s_i64 and ext32u_i64 from the constant
folding and zero propagation point of view, except that they can't
be replaced by a mov, so we don't compute the affected value.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
32-bit value is always converted to a 64-bit value and not propagated
through the register allocator or the optimizer.
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Acked-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
returns a 32-bit value. Directly call tcg_gen_op3 with the correct
types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.
Always use the long name to make it clear it is a size changing op.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Instead of using an enum which could be either a copy or a const, track
them separately. This will be used in the next patch.
Constants are tracked through a bool. Copies are tracked by initializing
temp's next_copy and prev_copy to itself, allowing to simplify the code
a bit.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Add two accessor functions temp_is_const and temp_is_copy, to make the
code more readable and make code change easier.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
vector registers on most guests, it's not uncommon to have more than 100
used temps. This means we have initialize more than 2kB at least twice
per TB, often more when there is a few goto_tb.
Instead used a TCGTempSet bit array to track which temps are in used in
the current basic block. This means there are only around 16 bytes to
initialize.
This improves the boot time of a MIPS guest on an x86-64 host by around
7% and moves out tcg_optimize from the the top of the profiler list.
[rth: Handle TCG_CALL_DUMMY_ARG]
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
By convention, on a 64-bit host TCG internally stores 32-bit constants
as sign-extended. This is not the case in the optimizer when a 32-bit
constant is folded.
This doesn't seem to have more consequences than suboptimal code
generation. For instance the x86 backend assumes sign-extended constants,
and in some rare cases uses a 32-bit unsigned immediate 0xffffffff
instead of a 8-bit signed immediate 0xff for the constant -1. This is
with a ppc guest:
before
------
---- 0x9f29cc
movi_i32 tmp1,$0xffffffff
movi_i32 tmp2,$0x0
add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
mov_i32 r10,tmp0
0x7fd8c7dfe90c: xor %ebp,%ebp
0x7fd8c7dfe90e: mov %ebp,%r11d
0x7fd8c7dfe911: mov 0x18(%r14),%r9d
0x7fd8c7dfe915: add %r9d,%r10d
0x7fd8c7dfe918: adc %ebp,%r11d
0x7fd8c7dfe91b: add $0xffffffff,%r10d
0x7fd8c7dfe922: adc %ebp,%r11d
0x7fd8c7dfe925: mov %r11d,0x134(%r14)
0x7fd8c7dfe92c: mov %r10d,0x28(%r14)
after
-----
---- 0x9f29cc
movi_i32 tmp1,$0xffffffffffffffff
movi_i32 tmp2,$0x0
add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
mov_i32 r10,tmp0
0x7f37010d490c: xor %ebp,%ebp
0x7f37010d490e: mov %ebp,%r11d
0x7f37010d4911: mov 0x18(%r14),%r9d
0x7f37010d4915: add %r9d,%r10d
0x7f37010d4918: adc %ebp,%r11d
0x7f37010d491b: add $0xffffffffffffffff,%r10d
0x7f37010d491f: adc %ebp,%r11d
0x7f37010d4922: mov %r11d,0x134(%r14)
0x7f37010d4929: mov %r10d,0x28(%r14)
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The add2 code in the tcg_out_addsub2 function doesn't take into account
the case where rl == al == bl. In that case we can't compute the carry
after the addition. As it corresponds to a multiplication by 2, the
carry bit is the bit 31.
While this is a corner case, this prevents x86-64 guests to boot on a
MIPS host.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit 2b7ec66f fixed TCGMemOp masking following the MO_AMASK addition,
but two cases were forgotten in the TCG S390 backend.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit 2b7ec66f fixed TCGMemOp masking following the MO_AMASK addition,
but two cases were forgotten in the TCG MIPS backend.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
For 32-bit guest, we load a 32-bit address from the TLB, so there is no
need to compensate for the low or high part. This fixes 32-bit guests on
big-endian hosts.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When a constant has to be loaded in a mov op, we fail to set
mem_coherent = 0. This patch fixes that.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1437994568-7825-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
When tcg_reg_alloc_mov propagate a constant, we failed to correctly mark
a temp as dead if the liveness analysis hints so. This fixes the
following assert when configure with --enable-debug-tcg:
qemu-x86_64: tcg/tcg.c:1827: tcg_reg_alloc_bb_end: Assertion `ts->val_type == TEMP_VAL_DEAD' failed.
Cc: Richard Henderson <rth@twiddle.net>
Reported-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1437994568-7825-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Due to a copy&paste, the new op value is tested against mov_i32 instead
of movi_i32. The test is therefore always false. Fix that.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Similar to the same fix for user-mode, except this instance
occurs on the softmmu path. Again, the tlb addend must be
the base register, while the guest address is the index.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Thanks to the previous patch, it is now easy for tcg_out_qemu_ld and
tcg_out_qemu_st to use a 32-bit zero extended offset. However, the
guest base register x28 must be the base and addr_reg must be the
index.
Reported-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1436974021-28978-3-git-send-email-pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The new argument lets you pick uxtw or uxtx mode for the offset
register. For now, all callers pass TCG_TYPE_I64 so that uxtx
is generated. The bits for uxtx are removed from I3312_TO_I3310.
Reported-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1436974021-28978-2-git-send-email-pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Commit 59227d5d45 did not update the
code in tcg/tci/tcg-target.c for those two cases.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1436556159-3002-1-git-send-email-sw@weilnetz.de
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit 3972ef6f83 ("tcg: Push merged memop+mmu_idx parameter to
softmmu routines") caused the following build errors when building TCG
for MIPS:
In file included from tcg/tcg.c:258:0:
tcg/mips/tcg-target.c In function ‘tcg_out_qemu_ld_slow_path’:
tcg/mips/tcg-target.c:1015:22: error: ‘lb’ undeclared (first use in this function)
tcg/mips/tcg-target.c In function ‘tcg_out_qemu_st_slow_path’:
tcg/mips/tcg-target.c:1058:22: error: ‘lb’ undeclared (first use in this function)
It looks like lb was meant to refer to the TCGLabelQemuLdst *l
parameter, so fix both references to lb to refer to just l.
Fixes: 3972ef6f83 ("tcg: Push merged memop+mmu_idx parameter to softmmu routines")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Message-id: 1436433435-24898-2-git-send-email-james.hogan@imgtec.com
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Make sure to not modify the branch target. This ensure that the
branch target is not corrupted during partial retranslation.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
The usages of this define are pure TCG and there is no architecture
specific variation of the value. Localise it to the TCG engine to
remove another architecture agnostic piece from cpu-defs.h.
This follows on from a28177820a where
temp_buf was moved out of the CPU_COMMON obsoleting the need for
the super early definition.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <498e8e5325c1a1aff79e5bcfc28cb760ef6b214e.1433052532.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The tcg_constant_folding folding ends up doing all the optimizations
(which is a good thing to avoid looping on all ops multiple time), so
make it clear and just rename it tcg_optimize.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-6-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Most of the calls to tcg_opt_gen_mov are preceeded by a test to check if
the source temp is a constant. Fold that into the tcg_opt_gen_mov
function.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433495958-9508-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Each call to tcg_opt_gen_mov is preceeded by a test to check if the
source and destination temps are copies. Fold that into the
tcg_opt_gen_mov function.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-4-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We can get the opcode using the TCGOp pointer. It needs to be
dereferenced, but it's anyway done a few lines below to write
the new value.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We can get the opcode using the TCGOp pointer. It needs to be
dereferenced, but it's anyway done a few lines below to write
the new value.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
When the same temp is used twice or more as an input argument to a TCG
instruction, the dead computation code doesn't recognize the second use
as a dead temp. This is because the temp is marked as live in the same
loop where dead inputs are checked.
The fix is to split the loop in two parts. This avoid emitting a move
and using a register for the movcond instruction when used as "move if
true" on x86-64. This might bring more improvements on RISC TCG targets
which don't have outputs aliased to inputs.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447228-29425-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For TCG ops with two outputs registers (add2, sub2, div2, div2u), when
the same input temp is used for the two inputs aliased to the two
outputs, and when these inputs are both dead, the register allocation
code wrongly assigned the same register to the same output.
This happens for example with sub2 t1, t2, t3, t3, t4, t5, when t3 is
not used anymore after the TCG op. In that case the same register is
used for t1, t2 and t3.
The fix is to look for already allocated aliased input when allocating
a dead aliased input and check that the register is not already
used.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447228-29425-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The addition of MO_AMASK means that places that used inverted masks
need to be changed to use positive masks, and places that failed to
mask the intended bits need updating.
Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com>
Tested-by: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This will be used to size the TLB when more than 8 MMU modes are
used by the target. Limitations come from the limited size of
the immediate fields (which sometimes, as in the case of Aarch64,
extend to instructions that shift the immediate).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1424436345-37924-2-git-send-email-pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
tcg-target.h does not use any QEMU-specific symbols, save for tci's usage
of CPUArchState. Pull that up to tcg/tcg.h.
This will make it possible to include tcg-target.h in cpu-defs.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
These modifiers control, on a per-memory-op basis, whether
unaligned memory accesses are allowed. The default setting
reflects the target's definition of ALIGNED_ONLY.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The extra information is not yet used but it is now available.
This requires minor changes through all of the tcg backends.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
At the tcg opcode level, not at the tcg-op.h generator level.
This requires minor changes through all of the tcg backends,
but none of the cpu translators.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
No code uses the cpu_pc_from_tb() function. Delete from tricore and
arm which each provide an unused implementation. Update the comment
in tcg.h to reflect that this is obsoleted by synchronize_from_tb.
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Commit 951c6300f7 out-of-lined the 32-bit-host versions of
tcg_gen_{ld,st}_i64, but in the process it inadvertently changed
an #ifdef HOST_WORDS_BIGENDIAN to #ifdef TCG_TARGET_WORDS_BIGENDIAN.
Since the latter doesn't get defined anywhere this meant we always
took the "LE host" codepath, and stored the two halves of the value
in the wrong order on BE hosts. This typically breaks any 64-bit
guest on a 32-bit BE host completely, and will have possibly more
subtle effects even for 32-bit guests.
Switch the ifdef back to HOST_WORDS_BIGENDIAN.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Tested-by: Andreas Färber <afaerber@suse.de>
Message-id: 1428523029-13620-1-git-send-email-peter.maydell@linaro.org
As seen with ubuntu-5.10-live-powerpc.iso.
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Pre-allocating 512 of them per TB is a waste.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This is less about improved type checking than enabling a
subsequent change to the representation of labels.
Acked-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This is improved type checking for the translators -- it's no longer
possible to accidentally swap arguments to the branch functions.
Note that the code generating backends still manipulate labels as int.
With notable exceptions, the scope of the change is just a few lines
for each target, so it's not worth building extra machinery to do this
change in per-target increments.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Cc: Michael Walle <michael@walle.cc>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Anthony Green <green@moxielogic.com>
Cc: Jia Liu <proljc@gmail.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Pre-allocating 640 of them per TB is a waste.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We no longer need INDEX_op_end to terminate the list, nor do we
need 5 forms of nop, since we just remove the TCGOp instead.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Rather reserving space in the op stream for optimization,
let the optimizer add ops as necessary.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
With the linked list scheme we need not leave nops in the stream
that we need to process later.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The previous setup required ops and args to be completely sequential,
and was error prone when it came to both iteration and optimization.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The method by which we count the number of ops emitted
is going to change. Abstract that away into some inlines.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Almost completely eliminates the ifdefs in this file, improving
confidence in the lesser used 32-bit builds.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>