No change to the ultimate load/store routines yet, so some atomicity
conditions not yet honored, but plumbs the change to alignment through
the relevant functions.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
No change to the ultimate load/store routines yet, so some atomicity
conditions not yet honored, but plumbs the change to alignment through
the relevant functions.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Examine MemOp for atomicity and alignment, adjusting alignment
as required to implement atomicity on the host.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that tcg_out_helper_load_regs is not recursive, we can
merge it into its only caller, tcg_out_helper_load_slots.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
With x86_64 as host, we do not have any temporaries with which to
resolve cycles, but we do have xchg. As a side bonus, the set of
graphs that can be made with 3 nodes and all nodes conflicting is
small: two. We can solve the cycle with a single temp.
This is required for x86_64 to handle stores of i128: 1 address
register and 2 data registers.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add opcodes for backend support for 128-bit memory operations.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Replace the unparameterized TCG_TARGET_HAS_MEMORY_BSWAP macro
with a function with a memop argument.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The system is required to emulate unaligned accesses, even if the
hardware does not support it. The resulting trap may or may not
be more efficient than the qemu slow path. There are linux kernel
patches in flight to allow userspace to query hardware support;
we can re-evaluate whether to enable this by default after that.
In the meantime, softmmu now matches useronly, where we already
assumed that unaligned accesses are supported.
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Test the final byte of an unaligned access.
Use BSTRINS.D to clear the range of bits, rather than AND.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This should be true of all loongarch64 running Linux.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Drop the target-specific trampolines for the standard slow path.
This lets us use tcg_out_helper_{ld,st}_args, and handles the new
atomicity bits within MemOp.
At the same time, use the full load/store helpers for user-only mode.
Drop inline unaligned access support for user-only mode, as it does
not handle atomicity.
Use TCG_REG_T[1-3] in the tlb lookup, instead of TCG_REG_O[0-2].
This allows the constraints to be simplified.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Emphasize that the constant is unsigned.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Shuffle the order in tcg_out_movi_int to check s13 first, and
drop this check from tcg_out_movi_imm32. This might make the
sequence for in_prologue larger, but not worth worrying about.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Emphasize that the constant is signed.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of using helper_unaligned_{ld,st}, use the full load/store helpers.
This will allow the fast path to increase alignment to implement atomicity
while not immediately raising an alignment exception.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of using helper_unaligned_{ld,st}, use the full load/store helpers.
This will allow the fast path to increase alignment to implement atomicity
while not immediately raising an alignment exception.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of using helper_unaligned_{ld,st}, use the full load/store helpers.
This will allow the fast path to increase alignment to implement atomicity
while not immediately raising an alignment exception.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Always reserve r3 for tlb softmmu lookup. Fix a bug in user-only
ALL_QLDST_REGS, in that r14 is clobbered by the BLNE that leads
to the misaligned trap. Remove r0+r1 from user-only ALL_QLDST_REGS;
I believe these had been reserved for bswap, which we no longer
perform during qemu_st.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of using helper_unaligned_{ld,st}, use the full load/store helpers.
This will allow the fast path to increase alignment to implement atomicity
while not immediately raising an alignment exception.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of using helper_unaligned_{ld,st}, use the full load/store helpers.
This will allow the fast path to increase alignment to implement atomicity
while not immediately raising an alignment exception.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of using helper_unaligned_{ld,st}, use the full load/store helpers.
This will allow the fast path to increase alignment to implement atomicity
while not immediately raising an alignment exception.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of using helper_unaligned_{ld,st}, use the full load/store helpers.
This will allow the fast path to increase alignment to implement atomicity
while not immediately raising an alignment exception.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Instead of using helper_unaligned_{ld,st}, use the full load/store helpers.
This will allow the fast path to increase alignment to implement atomicity
while not immediately raising an alignment exception.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These features are present for Apple M1.
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Notice when the host has additional atomic instructions.
The new variables will also be used in generated code.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Notice when Intel or AMD have guaranteed that vmovdqa is atomic.
The new variable will also be used in generated code.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We can now fold these two pieces of code.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
TCG backends may need to defer to a helper to implement
the atomicity required by a given operation. Mirror the
interface used in system mode.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
With the current structure of cputlb.c, there is no difference
between the little-endian and big-endian entry points, aside
from the assert. Unify the pairs of functions.
Hoist the qemu_{ld,st}_helpers arrays to tcg.c.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This field may be used to describe the precise atomicity requirements
of the guest, which may then be used to constrain the methods by which
it may be emulated by the host.
For instance, the AArch64 LDP (32-bit) instruction changes semantics
with ARMv8.4 LSE2, from
MO_64 | MO_ATOM_IFALIGN_PAIR
(64-bits, single-copy atomic only on 4 byte units,
nonatomic if not aligned by 4),
to
MO_64 | MO_ATOM_WITHIN16
(64-bits, single-copy atomic within a 16 byte block)
The former may be implemented with two 4 byte loads, or a single 8 byte
load if that happens to be efficient on the host. The latter may not
be implemented with two 4 byte loads and may also require a helper when
misaligned.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
All uses have now been expunged.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Adjust the softmmu tlb to use R0+R1, not any of the normally available
registers. Since we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Rather than zero-extend the guest address into a register,
use an add instruction which zero-extends the second input.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The softmmu tlb uses TCG_REG_TMP[0-2], not any of the normally available
registers. Now that we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Never used since its introduction.
Fixes: 3d582c6179 ("tcg-ppc64: Rearrange integer constant constraints")
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These constraints have not been used for quite some time.
Fixes: 77b73de676 ("Use rem/div[u]_i32 drop div[u]2_i32")
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The softmmu tlb uses TCG_REG_{TMP1,TMP2,R0}, not any of the normally
available registers. Now that we handle overlap betwen inputs and
helper arguments, we can allow any allocatable reg.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Allocate TCG_REG_TMP2. Use R0, TMP1, TMP2 instead of any of
the normally allocated registers for the tlb load.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The softmmu tlb uses TCG_REG_TMP[0-3], not any of the normally available
registers. Now that we handle overlap betwen inputs and helper arguments,
and have eliminated use of A0, we can allow any allocatable reg.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Compare the address vs the tlb entry with sign-extended values.
This simplifies the page+alignment mask constant, and the
generation of the last byte address for the misaligned test.
Move the tlb addend load up, and the zero-extension down.
This frees up a register, which allows us use TMP3 as the returned base
address register instead of A0, which we were using as a 5th temporary.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
While performing the load in the delay slot of the call to the common
bswap helper function is cute, it is not worth the added complexity.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The softmmu tlb uses TCG_REG_TMP[0-2], not any of the normally available
registers. Now that we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use tcg_out_ld_helper_args, tcg_out_ld_helper_ret,
and tcg_out_st_helper_args.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>