qemu-e2k

Commit Graph

Author	SHA1	Message	Date
David Hildenbrand	ca86cf328c	tcg: Enforce single page access in probe_write() Let's enforce the interface restriction. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190826075112.25637-5-david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:34:18 -07:00
David Hildenbrand	03a981893c	tcg: Check for watchpoints in probe_write() Let size > 0 indicate a promise to write to those bytes. Check for write watchpoints in the probed range. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190823100741.9621-10-david@redhat.com> [rth: Recompute index after tlb_fill; check TLB_WATCHPOINT.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:34:18 -07:00
Richard Henderson	50b107c5d6	cputlb: Handle watchpoints via TLB_WATCHPOINT The raising of exceptions from check_watchpoint, buried inside of the I/O subsystem, is fundamentally broken. We do not have the helper return address with which we can unwind guest state. Replace PHYS_SECTION_WATCH and io_mem_watch with TLB_WATCHPOINT. Move the call to cpu_check_watchpoint into the cputlb helpers where we do have the helper return address. This allows watchpoints on RAM to bypass the full i/o access path. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:39 -07:00
Richard Henderson	5787585d04	cputlb: Remove double-alignment in store_helper We have already aligned page2 to the start of the next page. There is no reason to do that a second time. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:39 -07:00
Richard Henderson	8f7cd2ad4a	cputlb: Fix size operand for tlb_fill on unaligned store We are currently passing the size of the full write to the tlb_fill for the second page. Instead pass the real size of the write to that page. This argument is unused within all tlb_fill, except to be logged via tracing, so in practice this makes no difference. But in a moment we'll need the value of size2 for watchpoints, and if we've computed the value we might as well use it. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:39 -07:00
Richard Henderson	30d7e098d5	cputlb: Fold TLB_RECHECK into TLB_INVALID_MASK We had two different mechanisms to force a recheck of the tlb. Before TLB_RECHECK was introduced, we had a PAGE_WRITE_INV bit that would immediate set TLB_INVALID_MASK, which automatically means that a second check of the tlb entry fails. We can use the same mechanism to handle small pages. Conserve TLB_* bits by removing TLB_RECHECK. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:39 -07:00
Tony Nguyen	a26fc6f515	cputlb: Byte swap memory transaction attribute Notice new attribute, byte swap, and force the transaction through the memory slow path. Required by architectures that can invert endianness of memory transaction, e.g. SPARC64 has the Invert Endian TTE bit. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <2a10a1f1c00a894af1212c8f68ef09c2966023c1.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:39 -07:00
Tony Nguyen	9bf825bf3d	memory: Single byte swap along the I/O path Now that MemOp has been pushed down into the memory API, and callers are encoding endianness, we can collapse byte swaps along the I/O path into the accelerator and target independent adjust_endianness. Collapsing byte swaps along the I/O path enables additional endian inversion logic, e.g. SPARC64 Invert Endian TTE bit, with redundant byte swaps cancelling out. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Message-Id: <911ff31af11922a9afba9b7ce128af8b8b80f316.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:39 -07:00
Tony Nguyen	be5c4787e9	cputlb: Replace size and endian operands for MemOp Preparation for collapsing the two byte swaps adjust_endianness and handle_bswap into the former. Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <755b7104410956b743e1f1e9c34ab87db113360f.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:39 -07:00
Tony Nguyen	d5d680cacc	memory: Access MemoryRegion with endianness Preparation for collapsing the two byte swaps adjust_endianness and handle_bswap into the former. Call memory_region_dispatch_{read\|write} with endianness encoded into the "MemOp op" operand. This patch does not change any behaviour as memory_region_dispatch_{read\|write} is yet to handle the endianness. Once it does handle endianness, callers with byte swaps can collapse them into adjust_endianness. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Message-Id: <8066ab3eb037c0388dfadfe53c5118429dd1de3a.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:39 -07:00
Tony Nguyen	4cbb198eef	cputlb: Access MemoryRegion with MemOp The memory_region_dispatch_{read\|write} operand "unsigned size" is being converted into a "MemOp op". Convert interfaces by using no-op size_memop. After all interfaces are converted, size_memop will be implemented and the memory_region_dispatch_{read\|write} operand "unsigned size" will be converted into a "MemOp op". As size_memop is a no-op, this patch does not change any behaviour. Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <c4571c76467ade83660970f7ef9d7292297f1908.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:38 -07:00
Tony Nguyen	14776ab5a1	tcg: TCGMemOp is now accelerator independent MemOp Preparation for collapsing the two byte swaps, adjust_endianness and handle_bswap, along the I/O path. Target dependant attributes are conditionalized upon NEED_CPU_H. Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <81d9cd7d7f5aaadfa772d6c48ecee834e9cf7882.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-09-03 08:30:38 -07:00
Alex Bennée	ab7a2009df	cputlb: cast size_t to target_ulong before using for address masks While size_t is defined to happily access the biggest host object this isn't the case when generating masks for 64 bit guests on 32 bit hosts. Otherwise we end up truncating the address when we fall back to our unaligned helper. Fixes: https://bugs.launchpad.net/qemu/+bug/1831545 Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Andrew Randrianasulu <randrianasulu@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-06-12 17:53:23 +01:00
Alex Bennée	8c79b28851	cputlb: use uint64_t for interim values for unaligned load When running on 32 bit TCG backends a wide unaligned load ends up truncating data before returning to the guest. We specifically have the return type as uint64_t to avoid any premature truncation so we should use the same for the interim types. Fixes: https://bugs.launchpad.net/qemu/+bug/1830872 Fixes: `eed5664238` Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Laszlo Ersek <lersek@redhat.com> Tested-by: Igor Mammedov <imammedo@redhat.com>	2019-06-12 17:53:22 +01:00
Richard Henderson	29a0af618d	cpu: Replace ENV_GET_CPU with env_cpu Now that we have both ArchCPU and CPUArchState, we can define this generically instead of via macro in each target's cpu.h. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-06-10 07:03:34 -07:00
Richard Henderson	a40ec84ee2	tcg: Create struct CPUTLB Move all softmmu tlb data into this structure. Arrange the members so that we are able to place mask+table together and at a smaller absolute offset from ENV. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-06-10 07:03:34 -07:00
Richard Henderson	79e4208506	tcg: Fold CPUTLBWindow into CPUTLBDesc Both structures are allocated once per mmu_idx. There is no reason for them to be separate. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-06-10 07:03:34 -07:00
Peter Maydell	d8276573da	Add CPUClass::tlb_fill. Improve tlb_vaddr_to_host for use by ARM SVE no-fault loads. -----BEGIN PGP SIGNATURE----- iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAlzVx4UdHHJpY2hhcmQu aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+U1Af/b3cV5d5a1LWRdLgR 71JCPK/M3o43r2U9wCSikteXkmNBEdEoc5+WRk2SuZFLW/JB1DHDY7/gISPIhfoB ZIza2TxD/QK1CQ5/mMWruKBlyygbYYZgsYaaNsMJRJgicgOSjTN0nuHMbIfv3tAN mu+IlkD0LdhVjP0fz30Jpew3b3575RCjYxEPM6KQI3RxtQFjZ3FhqV5hKR4vtdP5 yLWJQzwAbaCB3SZUvvp7TN1ZsmeyLpc+Yz/YtRTqQedo7SNWWBKldLhqq4bZnH1I AkzHbtWIOBrjWJ34ZMAgI5Q56Du9TBbBvCdM9azmrQjSu/2kdsPBPcUyOpnUCsCx NyXo9g== =x71l -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20190510' into staging Add CPUClass::tlb_fill. Improve tlb_vaddr_to_host for use by ARM SVE no-fault loads. # gpg: Signature made Fri 10 May 2019 19:48:37 BST # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20190510: (27 commits) tcg: Use tlb_fill probe from tlb_vaddr_to_host tcg: Remove CPUClass::handle_mmu_fault tcg: Use CPUClass::tlb_fill in cputlb.c target/xtensa: Convert to CPUClass::tlb_fill target/unicore32: Convert to CPUClass::tlb_fill target/tricore: Convert to CPUClass::tlb_fill target/tilegx: Convert to CPUClass::tlb_fill target/sparc: Convert to CPUClass::tlb_fill target/sh4: Convert to CPUClass::tlb_fill target/s390x: Convert to CPUClass::tlb_fill target/riscv: Convert to CPUClass::tlb_fill target/ppc: Convert to CPUClass::tlb_fill target/openrisc: Convert to CPUClass::tlb_fill target/nios2: Convert to CPUClass::tlb_fill target/moxie: Convert to CPUClass::tlb_fill target/mips: Convert to CPUClass::tlb_fill target/mips: Tidy control flow in mips_cpu_handle_mmu_fault target/mips: Pass a valid error to raise_mmu_exception for user-only target/microblaze: Convert to CPUClass::tlb_fill target/m68k: Convert to CPUClass::tlb_fill ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-05-16 13:15:08 +01:00
Richard Henderson	4601f8d10d	cputlb: Do unaligned store recursion to outermost function This is less tricky than for loads, because we always fall back to single byte stores to implement unaligned stores. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>	2019-05-10 20:23:21 +01:00
Richard Henderson	2dd9260678	cputlb: Do unaligned load recursion to outermost function If we attempt to recurse from load_helper back to load_helper, even via intermediary, we do not get all of the constants expanded away as desired. But if we recurse back to the original helper (or a shim that has a consistent function signature), the operands are folded away as desired. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>	2019-05-10 20:23:21 +01:00
Richard Henderson	fc1bc77791	cputlb: Drop attribute flatten Going to approach this problem via __attribute__((always_inline)) instead, but full conversion will take several steps. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>	2019-05-10 20:23:21 +01:00
Richard Henderson	f1be36969d	cputlb: Move TLB_RECHECK handling into load/store_helper Having this in io_readx/io_writex meant that we forgot to re-compute index after tlb_fill. It also means we can use the normal aligned memory load path. It also fixes a bug in that we had cached a use of index across a tlb_fill. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>	2019-05-10 20:23:21 +01:00
Alex Bennée	eed5664238	accel/tcg: demacro cputlb Instead of expanding a series of macros to generate the load/store helpers we move stuff into common functions and rely on the compiler to eliminate the dead code for each variant. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>	2019-05-10 20:23:06 +01:00
Richard Henderson	4811e9095c	tcg: Use tlb_fill probe from tlb_vaddr_to_host Most of the existing users would continue around a loop which would fault the tlb entry in via a normal load/store. But for AArch64 SVE we have an existing emulation bug wherein we would mark the first element of a no-fault vector load as faulted (within the FFR, not via exception) just because we did not have its address in the TLB. Now we can properly only mark it as faulted if there really is no valid, readable translation, while still not raising an exception. (Note that beyond the first element of the vector, the hardware may report a fault for any reason whatsoever; with at least one element loaded, forward progress is guaranteed.) Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-10 11:12:50 -07:00
Richard Henderson	c319dc1357	tcg: Use CPUClass::tlb_fill in cputlb.c We can now use the CPUClass hook instead of a named function. Create a static tlb_fill function to avoid other changes within cputlb.c. This also isolates the asserts within. Remove the named tlb_fill function from all of the targets. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-10 11:12:50 -07:00
Shahab Vahedi	ef5dae6805	cputlb: Fix io_readx() to respect the access_type This change adapts io_readx() to its input access_type. Currently io_readx() treats any memory access as a read, although it has an input argument "MMUAccessType access_type". This results in: 1) Calling the tlb_fill() only with MMU_DATA_LOAD 2) Considering only entry->addr_read as the tlb_addr Buglink: https://bugs.launchpad.net/qemu/+bug/1825359 Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Shahab Vahedi <shahab.vahedi@gmail.com> Message-Id: <20190420072236.12347-1-shahab.vahedi@gmail.com> [rth: Remove assert; fix expression formatting.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-25 10:40:06 -07:00
Emilio G. Cota	6d967cb86d	cputlb: update TLB entry/index after tlb_fill We are failing to take into account that tlb_fill() can cause a TLB resize, which renders prior TLB entry pointers/indices stale. Fix it by re-doing the TLB entry lookups immediately after tlb_fill. Fixes: `86e1eff8bc` ("tcg: introduce dynamic TLB sizing", 2019-01-28) Reported-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190209162745.12668-3-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-02-11 08:52:44 -08:00
Thomas Huth	fb0343d5b4	tcg: Fix LGPL version number It's either "GNU Library General Public version 2" or "GNU Lesser General Public version 2.1", but there was no "version 2.0" of the "Lesser" library. So assume that version 2.1 is meant here. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-01-30 11:01:52 +01:00
Richard Henderson	e77c89fb08	cputlb: Remove static tlb sizing Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB, remove the define and the old code. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:35 -08:00
Emilio G. Cota	86e1eff8bc	tcg: introduce dynamic TLB sizing Disabled in all TCG backends for now. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-3-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Emilio G. Cota	3cea94bbc9	cputlb: do not evict empty entries to the vtlb Currently we evict an entry to the victim TLB when it doesn't match the current address. But it could be that there's no match because the current entry is empty (i.e. all -1's, for instance via tlb_flush). Do not evict the entry to the vtlb in that case. This change will help us keep track of the TLB's use rate, which we'll use to implement a policy for dynamic TLB sizing. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-2-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	ab65110530	cputlb: Remove tlb_c.pending_flushes This is essentially redundant with tlb_c.dirty. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:39 +00:00
Richard Henderson	3d1523ced6	cputlb: Filter flushes on already clean tlbs Especially for guests with large numbers of tlbs, like ARM or PPC, we may well not use all of them in between flush operations. Remember which tlbs have been used since the last flush, and avoid any useless flushing. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:35 +00:00
Richard Henderson	e09de0a20d	cputlb: Count "partial" and "elided" tlb flushes Our only statistic so far was "full" tlb flushes, where all mmu_idx are flushed at the same time. Now count "partial" tlb flushes where sets of mmu_idx are flushed, but the set is not maximal. Account one per mmu_idx flushed, as that is the unit of work performed. We don't actually count elided flushes yet, but go ahead and change the interface presented to the monitor all at once. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:30 +00:00
Richard Henderson	f8144c6c1e	cputlb: Merge tlb_flush_page into tlb_flush_page_by_mmuidx The difference between the two sets of APIs is now miniscule. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:26 +00:00
Richard Henderson	64f2674bbc	cputlb: Merge tlb_flush_nocheck into tlb_flush_by_mmuidx_async_work The difference between the two sets of APIs is now miniscule. This allows tlb_flush, tlb_flush_all_cpus, and tlb_flush_all_cpus_synced to be merged with their corresponding by_mmuidx functions as well. For accounting, consider mmu_idx_bitmask = ALL_MMUIDX_BITS to be a full flush. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:16 +00:00
Richard Henderson	d5363e5849	cputlb: Move env->vtlb_index to env->tlb_d.vindex The rest of the tlb victim cache is per-tlb, the next use index should be as well. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:12 +00:00
Richard Henderson	1308e02671	cputlb: Split large page tracking per mmu_idx The set of large pages in the kernel is probably not the same as the set of large pages in the application. Forcing one range to cover both will flush more often than necessary. This allows tlb_flush_page_async_work to flush just the one mmu_idx implicated, which in turn allows us to remove tlb_check_page_and_flush_by_mmuidx_async_work. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:08 +00:00
Richard Henderson	60a2ad7d86	cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush Protect it with the tlb_lock instead of using atomics. The move puts it in or near the same cacheline as the lock; using the lock means we don't need a second atomic operation in order to perform the update. Which makes it cheap to also update pending_flush in tlb_flush_by_mmuidx_async_work. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:16:02 +00:00
Richard Henderson	8ab102667e	cputlb: Remove tcg_enabled hack from tlb_flush_nocheck The bugs this was working around were fixed with commits `022d6378c7` target/unicore32: remove tlb_flush from uc32_init_fn `6e11beecfd` target/alpha: remove tlb_flush from alpha_cpu_initfn Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:15:57 +00:00
Richard Henderson	53d284554c	cputlb: Move tlb_lock to CPUTLBCommon This is the first of several moves to reduce the size of the CPU_COMMON_TLB macro and improve some locality of refernce. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-31 12:15:28 +00:00
Emilio G. Cota	403f290c06	cputlb: read CPUTLBEntry.addr_write atomically Updates can come from other threads, so readers that do not take tlb_lock must use atomic_read to avoid undefined behaviour (UB). This completes the conversion to tlb_lock. This conversion results on average in no performance loss, as the following experiments (run on an Intel i7-6700K CPU @ 4.00GHz) show. 1. aarch64 bootup+shutdown test: - Before: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% ) 31,574,905,303 cycles # 4.217 GHz ( +- 0.12% ) 57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% ) 10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% ) 173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% ) 7.504481349 seconds time elapsed ( +- 0.14% ) - After: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% ) 31,478,476,520 cycles # 4.218 GHz ( +- 0.07% ) 57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% ) 10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% ) 173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% ) 7.474970463 seconds time elapsed ( +- 0.07% ) 2. SPEC06int: SPEC06int (test set) [Y axis: Speedup over master] 1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+ \| \| 1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+ \| +++ \| +++ tlb-lock-v3 (spinl\|ck) \| \| +++ \| \| +++ +++ \| \| \| 1.05 +-+....+++...........####.........\|####.+++.\|......\|.....###....+++...........+++....###.........+-+ \| ### ++#\| # \|# \|# *### +++### +++#+# \| +++ \| #\|# ### \| 1 +-++++#++++####+++#++#++++++++++#++#++++#++++#+#+**+#++++###++++###++++###++++#+#++++#+#+++-+ \| +* # #++# * # #### * # * ++# **+# \| * # ***\|# \|# # #\|# #+# # # \| 0.95 +-+....#....#..#.\|..#...#..#.\|..#....#.\|..#.++.#.+++#.**.#....#+#....#.#..++#.#..+-+ \| * # # # \| # # # \| # * * # ++ # * * # * * # * \|* # ++# # # # *** # \| \| * * # ++# # + # # # \| # * * # * * # * * # * * # ++ # **** # ++# # * * # \| 0.9 +-+....#...\|#..#....#.++#..#.\|..#....#....#....#....#....#..\|.#...\|#.#....#..+-+ \| * * # *** # * * # \|# # + # * * # * * # * * # * * # * * # ++ # \|# # * * # \| 0.85 +-+....#..\|..#....#.**..#....#....#....#....#....#....#....#.**.#....#..+-+ \| * # + # * * # \| # * * # * * # * * # * * # * * # * * # * * # * \|* # * * # \| \| * * # * * # * * # + # * * # * * # * * # * * # * * # * * # * * # * \|* # * * # \| 0.8 +-+....#.....#....#....#....#....#....#....#....#....#....#.++.#....#..+-+ \| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # \| 0.75 +-+--*##--###-###-###-###-###-*##-##-##-##-##-##--*##--+-+ 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean png: https://imgur.com/a/BHzpPTW Notes: - tlb-lock-v2 corresponds to an implementation with a mutex. - tlb-lock-v3 corresponds to the current implementation, i.e. a spinlock and a single lock acquisition in tlb_set_page_with_attrs. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181016153840.25877-1-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 19:46:53 -07:00
Richard Henderson	e6cd4bb59b	tcg: Split CONFIG_ATOMIC128 GCC7+ will no longer advertise support for 16-byte __atomic operations if only cmpxchg is supported, as for x86_64. Fortunately, x86_64 still has support for __sync_compare_and_swap_16 and we can make use of that. AArch64 does not have, nor ever has had such support, so open-code it. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 19:46:36 -07:00
Richard Henderson	383beda9cf	tcg: Add tlb_index and tlb_entry helpers Isolate the computation of an index from an address into a helper before we change that function. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> [ cota: convert tlb_vaddr_to_host; use atomic_read on addr_write ] Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009175129.17888-2-cota@braap.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	71aec3541d	cputlb: serialize tlb updates with env->tlb_lock Currently we rely on atomic operations for cross-CPU invalidations. There are two cases that these atomics miss: cross-CPU invalidations can race with either (1) vCPU threads flushing their TLB, which happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB, which updates .addr_write with a regular store. This results in undefined behaviour, since we're mixing regular and atomic ops on concurrent accesses. Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table and the corresponding victim cache now hold the lock. The readers that do not hold tlb_lock must use atomic reads when reading .addr_write, since this field can be updated by other threads; the conversion to atomic reads is done in the next patch. Note that an alternative fix would be to expand the use of atomic ops. However, in the case of TLB flushes this would have a huge performance impact, since (1) TLB flushes can happen very frequently and (2) we currently use a full memory barrier to flush each TLB entry, and a TLB has many entries. Instead, acquiring the lock is barely slower than a full memory barrier since it is uncontended, and with a single lock acquisition we can flush the entire TLB. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009174557.16125-6-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	ea9025cb49	cputlb: fix assert_cpu_is_self macro Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009174557.16125-5-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	5005e2537d	exec: introduce tlb_init Paves the way for the addition of a per-TLB lock. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009174557.16125-4-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Peter Maydell	55a7cb144d	accel/tcg: Check whether TLB entry is RAM consistently with how we set it up We set up TLB entries in tlb_set_page_with_attrs(), where we have some logic for determining whether the TLB entry is considered to be RAM-backed, and thus has a valid addend field. When we look at the TLB entry in get_page_addr_code(), we use different logic for determining whether to treat the page as RAM-backed and use the addend field. This is confusing, and in fact buggy, because the code in tlb_set_page_with_attrs() correctly decides that rom_device memory regions not in romd mode are not RAM-backed, but the code in get_page_addr_code() thinks they are RAM-backed. This typically results in "Bad ram pointer" assertion if the guest tries to execute from such a memory region. Fix this by making get_page_addr_code() just look at the TLB_MMIO bit in the code_address field of the TLB, which tlb_set_page_with_attrs() sets if and only if the addend field is not valid for code execution. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180713150945.12348-1-peter.maydell@linaro.org	2018-08-14 17:17:19 +01:00
Peter Maydell	20cb6ae472	accel/tcg: Return -1 for execution from MMIO regions in get_page_addr_code() Now that all the callers can handle get_page_addr_code() returning -1, remove all the code which tries to handle execution from MMIO regions or small-MMU-region RAM areas. This will mean that we can correctly execute from these areas, rather than ending up either aborting QEMU or delivering an incorrect guest exception. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Cédric Le Goater <clg@kaod.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180710160013.26559-6-peter.maydell@linaro.org	2018-08-14 17:17:19 +01:00
Peter Maydell	dbea78a4d6	accel/tcg: Pass read access type through to io_readx() The io_readx() function needs to know whether the load it is doing is an MMU_DATA_LOAD or an MMU_INST_FETCH, so that it can pass the right value to the cpu_transaction_failed() function. Plumb this information through from the softmmu code. This is currently not often going to give the wrong answer, because usually instruction fetches go via get_page_addr_code(). However once we switch over to handling execution from non-RAM by creating single-insn TBs, the path for an insn fetch to generate a bus error will be through cpu_ld*_code() and io_readx(), so without this change we will generate a d-side fault when we should generate an i-side fault. We also have to pass the access type via a CPU struct global down to unassigned_mem_read(), for the benefit of the targets which still use the cpu_unassigned_access() hook (m68k, mips, sparc, xtensa). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Cédric Le Goater <clg@kaod.org> Message-id: 20180710160013.26559-2-peter.maydell@linaro.org	2018-08-14 17:17:19 +01:00

1 2

74 Commits