Commit Graph

60 Commits

Author SHA1 Message Date
Richard Henderson 29a0af618d cpu: Replace ENV_GET_CPU with env_cpu
Now that we have both ArchCPU and CPUArchState, we can define
this generically instead of via macro in each target's cpu.h.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-10 07:03:34 -07:00
Richard Henderson a40ec84ee2 tcg: Create struct CPUTLB
Move all softmmu tlb data into this structure.  Arrange the
members so that we are able to place mask+table together and
at a smaller absolute offset from ENV.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-10 07:03:34 -07:00
Richard Henderson 79e4208506 tcg: Fold CPUTLBWindow into CPUTLBDesc
Both structures are allocated once per mmu_idx.
There is no reason for them to be separate.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-10 07:03:34 -07:00
Peter Maydell d8276573da Add CPUClass::tlb_fill.
Improve tlb_vaddr_to_host for use by ARM SVE no-fault loads.
 -----BEGIN PGP SIGNATURE-----
 
 iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAlzVx4UdHHJpY2hhcmQu
 aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+U1Af/b3cV5d5a1LWRdLgR
 71JCPK/M3o43r2U9wCSikteXkmNBEdEoc5+WRk2SuZFLW/JB1DHDY7/gISPIhfoB
 ZIza2TxD/QK1CQ5/mMWruKBlyygbYYZgsYaaNsMJRJgicgOSjTN0nuHMbIfv3tAN
 mu+IlkD0LdhVjP0fz30Jpew3b3575RCjYxEPM6KQI3RxtQFjZ3FhqV5hKR4vtdP5
 yLWJQzwAbaCB3SZUvvp7TN1ZsmeyLpc+Yz/YtRTqQedo7SNWWBKldLhqq4bZnH1I
 AkzHbtWIOBrjWJ34ZMAgI5Q56Du9TBbBvCdM9azmrQjSu/2kdsPBPcUyOpnUCsCx
 NyXo9g==
 =x71l
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20190510' into staging

Add CPUClass::tlb_fill.
Improve tlb_vaddr_to_host for use by ARM SVE no-fault loads.

# gpg: Signature made Fri 10 May 2019 19:48:37 BST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20190510: (27 commits)
  tcg: Use tlb_fill probe from tlb_vaddr_to_host
  tcg: Remove CPUClass::handle_mmu_fault
  tcg: Use CPUClass::tlb_fill in cputlb.c
  target/xtensa: Convert to CPUClass::tlb_fill
  target/unicore32: Convert to CPUClass::tlb_fill
  target/tricore: Convert to CPUClass::tlb_fill
  target/tilegx: Convert to CPUClass::tlb_fill
  target/sparc: Convert to CPUClass::tlb_fill
  target/sh4: Convert to CPUClass::tlb_fill
  target/s390x: Convert to CPUClass::tlb_fill
  target/riscv: Convert to CPUClass::tlb_fill
  target/ppc: Convert to CPUClass::tlb_fill
  target/openrisc: Convert to CPUClass::tlb_fill
  target/nios2: Convert to CPUClass::tlb_fill
  target/moxie: Convert to CPUClass::tlb_fill
  target/mips: Convert to CPUClass::tlb_fill
  target/mips: Tidy control flow in mips_cpu_handle_mmu_fault
  target/mips: Pass a valid error to raise_mmu_exception for user-only
  target/microblaze: Convert to CPUClass::tlb_fill
  target/m68k: Convert to CPUClass::tlb_fill
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-05-16 13:15:08 +01:00
Richard Henderson 4601f8d10d cputlb: Do unaligned store recursion to outermost function
This is less tricky than for loads, because we always fall
back to single byte stores to implement unaligned stores.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:21 +01:00
Richard Henderson 2dd9260678 cputlb: Do unaligned load recursion to outermost function
If we attempt to recurse from load_helper back to load_helper,
even via intermediary, we do not get all of the constants
expanded away as desired.

But if we recurse back to the original helper (or a shim that
has a consistent function signature), the operands are folded
away as desired.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:21 +01:00
Richard Henderson fc1bc77791 cputlb: Drop attribute flatten
Going to approach this problem via __attribute__((always_inline))
instead, but full conversion will take several steps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:21 +01:00
Richard Henderson f1be36969d cputlb: Move TLB_RECHECK handling into load/store_helper
Having this in io_readx/io_writex meant that we forgot to
re-compute index after tlb_fill.  It also means we can use
the normal aligned memory load path.  It also fixes a bug
in that we had cached a use of index across a tlb_fill.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:21 +01:00
Alex Bennée eed5664238 accel/tcg: demacro cputlb
Instead of expanding a series of macros to generate the load/store
helpers we move stuff into common functions and rely on the compiler
to eliminate the dead code for each variant.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:06 +01:00
Richard Henderson 4811e9095c tcg: Use tlb_fill probe from tlb_vaddr_to_host
Most of the existing users would continue around a loop which
would fault the tlb entry in via a normal load/store.

But for AArch64 SVE we have an existing emulation bug wherein we
would mark the first element of a no-fault vector load as faulted
(within the FFR, not via exception) just because we did not have
its address in the TLB.  Now we can properly only mark it as faulted
if there really is no valid, readable translation, while still not
raising an exception.  (Note that beyond the first element of the
vector, the hardware may report a fault for any reason whatsoever;
with at least one element loaded, forward progress is guaranteed.)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-10 11:12:50 -07:00
Richard Henderson c319dc1357 tcg: Use CPUClass::tlb_fill in cputlb.c
We can now use the CPUClass hook instead of a named function.

Create a static tlb_fill function to avoid other changes within
cputlb.c.  This also isolates the asserts within.  Remove the
named tlb_fill function from all of the targets.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-10 11:12:50 -07:00
Shahab Vahedi ef5dae6805 cputlb: Fix io_readx() to respect the access_type
This change adapts io_readx() to its input access_type. Currently
io_readx() treats any memory access as a read, although it has an
input argument "MMUAccessType access_type". This results in:

1) Calling the tlb_fill() only with MMU_DATA_LOAD
2) Considering only entry->addr_read as the tlb_addr

Buglink: https://bugs.launchpad.net/qemu/+bug/1825359
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Shahab Vahedi <shahab.vahedi@gmail.com>
Message-Id: <20190420072236.12347-1-shahab.vahedi@gmail.com>
[rth: Remove assert; fix expression formatting.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-25 10:40:06 -07:00
Emilio G. Cota 6d967cb86d cputlb: update TLB entry/index after tlb_fill
We are failing to take into account that tlb_fill() can cause a
TLB resize, which renders prior TLB entry pointers/indices stale.
Fix it by re-doing the TLB entry lookups immediately after tlb_fill.

Fixes: 86e1eff8bc ("tcg: introduce dynamic TLB sizing", 2019-01-28)
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190209162745.12668-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-11 08:52:44 -08:00
Thomas Huth fb0343d5b4 tcg: Fix LGPL version number
It's either "GNU *Library* General Public version 2" or "GNU Lesser
General Public version *2.1*", but there was no "version 2.0" of the
"Lesser" library. So assume that version 2.1 is meant here.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-01-30 11:01:52 +01:00
Richard Henderson e77c89fb08 cputlb: Remove static tlb sizing
Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB,
remove the define and the old code.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:35 -08:00
Emilio G. Cota 86e1eff8bc tcg: introduce dynamic TLB sizing
Disabled in all TCG backends for now.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Emilio G. Cota 3cea94bbc9 cputlb: do not evict empty entries to the vtlb
Currently we evict an entry to the victim TLB when it doesn't match
the current address. But it could be that there's no match because
the current entry is empty (i.e. all -1's, for instance via tlb_flush).
Do not evict the entry to the vtlb in that case.

This change will help us keep track of the TLB's use rate, which
we'll use to implement a policy for dynamic TLB sizing.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-2-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson ab65110530 cputlb: Remove tlb_c.pending_flushes
This is essentially redundant with tlb_c.dirty.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:39 +00:00
Richard Henderson 3d1523ced6 cputlb: Filter flushes on already clean tlbs
Especially for guests with large numbers of tlbs, like ARM or PPC,
we may well not use all of them in between flush operations.
Remember which tlbs have been used since the last flush, and
avoid any useless flushing.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:35 +00:00
Richard Henderson e09de0a20d cputlb: Count "partial" and "elided" tlb flushes
Our only statistic so far was "full" tlb flushes, where all mmu_idx
are flushed at the same time.

Now count "partial" tlb flushes where sets of mmu_idx are flushed,
but the set is not maximal.  Account one per mmu_idx flushed, as
that is the unit of work performed.

We don't actually count elided flushes yet, but go ahead and change
the interface presented to the monitor all at once.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:30 +00:00
Richard Henderson f8144c6c1e cputlb: Merge tlb_flush_page into tlb_flush_page_by_mmuidx
The difference between the two sets of APIs is now miniscule.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:26 +00:00
Richard Henderson 64f2674bbc cputlb: Merge tlb_flush_nocheck into tlb_flush_by_mmuidx_async_work
The difference between the two sets of APIs is now miniscule.

This allows tlb_flush, tlb_flush_all_cpus, and tlb_flush_all_cpus_synced
to be merged with their corresponding by_mmuidx functions as well.  For
accounting, consider mmu_idx_bitmask = ALL_MMUIDX_BITS to be a full flush.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:16 +00:00
Richard Henderson d5363e5849 cputlb: Move env->vtlb_index to env->tlb_d.vindex
The rest of the tlb victim cache is per-tlb,
the next use index should be as well.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:12 +00:00
Richard Henderson 1308e02671 cputlb: Split large page tracking per mmu_idx
The set of large pages in the kernel is probably not the same
as the set of large pages in the application.  Forcing one
range to cover both will flush more often than necessary.

This allows tlb_flush_page_async_work to flush just the one
mmu_idx implicated, which in turn allows us to remove
tlb_check_page_and_flush_by_mmuidx_async_work.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:08 +00:00
Richard Henderson 60a2ad7d86 cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush
Protect it with the tlb_lock instead of using atomics.
The move puts it in or near the same cacheline as the lock;
using the lock means we don't need a second atomic operation
in order to perform the update.  Which makes it cheap to also
update pending_flush in tlb_flush_by_mmuidx_async_work.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:02 +00:00
Richard Henderson 8ab102667e cputlb: Remove tcg_enabled hack from tlb_flush_nocheck
The bugs this was working around were fixed with commits
022d6378c7  target/unicore32: remove tlb_flush from uc32_init_fn
6e11beecfd  target/alpha: remove tlb_flush from alpha_cpu_initfn

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:15:57 +00:00
Richard Henderson 53d284554c cputlb: Move tlb_lock to CPUTLBCommon
This is the first of several moves to reduce the size of the
CPU_COMMON_TLB macro and improve some locality of refernce.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:15:28 +00:00
Emilio G. Cota 403f290c06 cputlb: read CPUTLBEntry.addr_write atomically
Updates can come from other threads, so readers that do not
take tlb_lock must use atomic_read to avoid undefined
behaviour (UB).

This completes the conversion to tlb_lock. This conversion results
on average in no performance loss, as the following experiments
(run on an Intel i7-6700K CPU @ 4.00GHz) show.

1. aarch64 bootup+shutdown test:

- Before:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7487.087786      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.12% )
    31,574,905,303      cycles                    #    4.217 GHz                      ( +-  0.12% )
    57,097,908,812      instructions              #    1.81  insns per cycle          ( +-  0.08% )
    10,255,415,367      branches                  # 1369.747 M/sec                    ( +-  0.08% )
       173,278,962      branch-misses             #    1.69% of all branches          ( +-  0.18% )

       7.504481349 seconds time elapsed                                          ( +-  0.14% )

- After:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7462.441328      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
    31,478,476,520      cycles                    #    4.218 GHz                      ( +-  0.07% )
    57,017,330,084      instructions              #    1.81  insns per cycle          ( +-  0.05% )
    10,251,929,667      branches                  # 1373.804 M/sec                    ( +-  0.05% )
       173,023,787      branch-misses             #    1.69% of all branches          ( +-  0.11% )

       7.474970463 seconds time elapsed                                          ( +-  0.07% )

2. SPEC06int:
                                              SPEC06int (test set)
                                           [Y axis: Speedup over master]
  1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+
       |                                                                                                  |
   1.1 +-+.................................+++.............................+  tlb-lock-v2 (m+++x)       +-+
       |                                +++ |                   +++        tlb-lock-v3 (spinl|ck)         |
       |                    +++          |  |     +++    +++     |                           |            |
  1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+
       |      ###         ++#| #         |# |# ***### +++### +++#+#     |     +++     |     #|#    ###    |
     1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+
       |    *+* #    #++# ***  #   #### ***  # * *++# ****+# *| * # ****|#   |# #    #|#    #+#    # #    |
  0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+
       |    * * #    #  # *|*  #   #  # *|*  # * *  # *++* # *  * # *  * # * |* #  ++# #    # #  *** #    |
       |    * * #  ++#  # *+*  #   #  # *|*  # * *  # *  * # *  * # *  * # *++* # **** #  ++# #  * * #    |
   0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+
       |    * * #  ***  # * *  #  |#  # *+*  # * *  # *  * # *  * # *  * # *  * # *++* #   |# #  * * #    |
  0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+
       |    * * #  *+*  # * *  # *|*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
       |    * * #  * *  # * *  # *+*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
   0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+
       |    * * #  * *  # * *  # * *  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # *  * #  * * #    |
  0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+
 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean

  png: https://imgur.com/a/BHzpPTW

Notes:
- tlb-lock-v2 corresponds to an implementation with a mutex.
- tlb-lock-v3 corresponds to the current implementation, i.e.
  a spinlock and a single lock acquisition in tlb_set_page_with_attrs.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181016153840.25877-1-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 19:46:53 -07:00
Richard Henderson e6cd4bb59b tcg: Split CONFIG_ATOMIC128
GCC7+ will no longer advertise support for 16-byte __atomic operations
if only cmpxchg is supported, as for x86_64.  Fortunately, x86_64 still
has support for __sync_compare_and_swap_16 and we can make use of that.
AArch64 does not have, nor ever has had such support, so open-code it.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 19:46:36 -07:00
Richard Henderson 383beda9cf tcg: Add tlb_index and tlb_entry helpers
Isolate the computation of an index from an address into a
helper before we change that function.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[ cota: convert tlb_vaddr_to_host; use atomic_read on addr_write ]
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009175129.17888-2-cota@braap.org>
2018-10-18 18:58:10 -07:00
Emilio G. Cota 71aec3541d cputlb: serialize tlb updates with env->tlb_lock
Currently we rely on atomic operations for cross-CPU invalidations.
There are two cases that these atomics miss: cross-CPU invalidations
can race with either (1) vCPU threads flushing their TLB, which
happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB,
which updates .addr_write with a regular store. This results in
undefined behaviour, since we're mixing regular and atomic ops
on concurrent accesses.

Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table
and the corresponding victim cache now hold the lock.
The readers that do not hold tlb_lock must use atomic reads when
reading .addr_write, since this field can be updated by other threads;
the conversion to atomic reads is done in the next patch.

Note that an alternative fix would be to expand the use of atomic ops.
However, in the case of TLB flushes this would have a huge performance
impact, since (1) TLB flushes can happen very frequently and (2) we
currently use a full memory barrier to flush each TLB entry, and a TLB
has many entries. Instead, acquiring the lock is barely slower than a
full memory barrier since it is uncontended, and with a single lock
acquisition we can flush the entire TLB.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009174557.16125-6-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Emilio G. Cota ea9025cb49 cputlb: fix assert_cpu_is_self macro
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009174557.16125-5-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Emilio G. Cota 5005e2537d exec: introduce tlb_init
Paves the way for the addition of a per-TLB lock.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009174557.16125-4-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Peter Maydell 55a7cb144d accel/tcg: Check whether TLB entry is RAM consistently with how we set it up
We set up TLB entries in tlb_set_page_with_attrs(), where we have
some logic for determining whether the TLB entry is considered
to be RAM-backed, and thus has a valid addend field. When we
look at the TLB entry in get_page_addr_code(), we use different
logic for determining whether to treat the page as RAM-backed
and use the addend field. This is confusing, and in fact buggy,
because the code in tlb_set_page_with_attrs() correctly decides
that rom_device memory regions not in romd mode are not RAM-backed,
but the code in get_page_addr_code() thinks they are RAM-backed.
This typically results in "Bad ram pointer" assertion if the
guest tries to execute from such a memory region.

Fix this by making get_page_addr_code() just look at the
TLB_MMIO bit in the code_address field of the TLB, which
tlb_set_page_with_attrs() sets if and only if the addend
field is not valid for code execution.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180713150945.12348-1-peter.maydell@linaro.org
2018-08-14 17:17:19 +01:00
Peter Maydell 20cb6ae472 accel/tcg: Return -1 for execution from MMIO regions in get_page_addr_code()
Now that all the callers can handle get_page_addr_code() returning -1,
remove all the code which tries to handle execution from MMIO regions
or small-MMU-region RAM areas. This will mean that we can correctly
execute from these areas, rather than ending up either aborting QEMU
or delivering an incorrect guest exception.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180710160013.26559-6-peter.maydell@linaro.org
2018-08-14 17:17:19 +01:00
Peter Maydell dbea78a4d6 accel/tcg: Pass read access type through to io_readx()
The io_readx() function needs to know whether the load it is
doing is an MMU_DATA_LOAD or an MMU_INST_FETCH, so that it
can pass the right value to the cpu_transaction_failed()
function. Plumb this information through from the softmmu
code.

This is currently not often going to give the wrong answer,
because usually instruction fetches go via get_page_addr_code().
However once we switch over to handling execution from non-RAM by
creating single-insn TBs, the path for an insn fetch to generate
a bus error will be through cpu_ld*_code() and io_readx(),
so without this change we will generate a d-side fault when we
should generate an i-side fault.

We also have to pass the access type via a CPU struct global
down to unassigned_mem_read(), for the benefit of the targets
which still use the cpu_unassigned_access() hook (m68k, mips,
sparc, xtensa).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20180710160013.26559-2-peter.maydell@linaro.org
2018-08-14 17:17:19 +01:00
Peter Maydell 3474c98a2a accel/tcg: Assert that tlb fill gave us a valid TLB entry
In commit 4b1a3e1e34 we added a check for whether the TLB entry
we had following a tlb_fill had the INVALID bit set.  This could
happen in some circumstances because a stale or wrong TLB entry was
pulled out of the victim cache.  However, after commit
68fea03855 (which prevents stale entries being in the victim
cache) and the previous commit (which ensures we don't incorrectly
hit in the victim cache)) this should never be possible.

Drop the check on TLB_INVALID_MASK from the "is this a TLB_RECHECK?"
condition, and instead assert that the tlb fill procedure has given
us a valid TLB entry (or longjumped out with a guest exception).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180713141636.18665-3-peter.maydell@linaro.org
2018-07-16 17:26:01 +01:00
Peter Maydell b493ccf1fc accel/tcg: Use correct test when looking in victim TLB for code
In get_page_addr_code(), we were incorrectly looking in the victim
TLB for an entry which matched the target address for reads, not
for code accesses. This meant that we could hit on a victim TLB
entry that indicated that the address was readable but not
executable, and incorrectly bypass the call to tlb_fill() which
should generate the guest MMU exception. Fix this bug.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180713141636.18665-2-peter.maydell@linaro.org
2018-07-16 17:26:01 +01:00
Richard Henderson 68fea03855 accel/tcg: Avoid caching overwritten tlb entries
When installing a TLB entry, remove any cached version of the
same page in the VTLB.  If the existing TLB entry matches, do
not copy into the VTLB, but overwrite it.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-07-02 08:05:16 -07:00
Peter Maydell 4b1a3e1e34 accel/tcg: Don't treat invalid TLB entries as needing recheck
In get_page_addr_code() when we check whether the TLB entry
is marked as TLB_RECHECK, we should not go down that code
path if the TLB entry is not valid at all (ie the TLB_INVALID
bit is set).

Tested-by: Laurent Vivier <laurent@vivier.eu>
Reported-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180629161731.16239-1-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-07-02 08:02:20 -07:00
Peter Maydell e4c967a720 accel/tcg: Correct "is this a TLB miss" check in get_page_addr_code()
In commit 71b9a45330 we changed the condition we use
to determine whether we need to refill the TLB in
get_page_addr_code() to
    if (unlikely(env->tlb_table[mmu_idx][index].addr_code !=
                 (addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK)))) {

This isn't the right check (it will falsely fail if the
input addr happens to have the low bit corresponding to
TLB_INVALID_MASK set, for instance). Replace it with a
use of the new tlb_hit() function, which is the correct test.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180629162122.19376-3-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-07-02 08:02:20 -07:00
Peter Maydell 334692bce7 tcg: Define and use new tlb_hit() and tlb_hit_page() functions
The condition to check whether an address has hit against a particular
TLB entry is not completely trivial. We do this in various places, and
in fact in one place (get_page_addr_code()) we have got the condition
wrong. Abstract it out into new tlb_hit() and tlb_hit_page() inline
functions (one for a known-page-aligned address and one for an
arbitrary address), and use them in all the places where we had the
condition correct.

This is a no-behaviour-change patch; we leave fixing the buggy
code in get_page_addr_code() to a subsequent patch.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180629162122.19376-2-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-07-02 08:02:20 -07:00
Peter Maydell 55df6fcf54 tcg: Support MMU protection regions smaller than TARGET_PAGE_SIZE
Add support for MMU protection regions that are smaller than
TARGET_PAGE_SIZE. We do this by marking the TLB entry for those
pages with a flag TLB_RECHECK. This flag causes us to always
take the slow-path for accesses. In the slow path we can then
special case them to always call tlb_fill() again, so we have
the correct information for the exact address being accessed.

This change allows us to handle reading and writing from small
regions; we cannot deal with execution from the small region.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180620130619.11362-2-peter.maydell@linaro.org
2018-06-26 17:50:41 +01:00
Emilio G. Cota b7542f7fe8 cputlb: remove tb_lock from tlb_flush functions
The acquisition of tb_lock was added when the async tlb_flush
was introduced in e3b9ca810 ("cputlb: introduce tlb_flush_* async work.")

tb_lock was there to allow us to do memset() on the tb_jmp_cache's.
However, since f3ced3c592 ("tcg: consistently access cpu->tb_jmp_cache
atomically") all accesses to tb_jmp_cache are atomic, so tb_lock
is not needed here. Get rid of it.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 08:18:48 -10:00
Peter Maydell 1f871c5e6b exec.c: Handle IOMMUs in address_space_translate_for_iotlb()
Currently we don't support board configurations that put an IOMMU
in the path of the CPU's memory transactions, and instead just
assert() if the memory region fonud in address_space_translate_for_iotlb()
is an IOMMUMemoryRegion.

Remove this limitation by having the function handle IOMMUs.
This is mostly straightforward, but we must make sure we have
a notifier registered for every IOMMU that a transaction has
passed through, so that we can flush the TLB appropriately
when any of the IOMMUs change their mappings.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-5-peter.maydell@linaro.org
2018-06-15 15:23:34 +01:00
Peter Maydell 2d54f19401 cputlb: Pass cpu_transaction_failed() the correct physaddr
The API for cpu_transaction_failed() says that it takes the physical
address for the failed transaction. However we were actually passing
it the offset within the target MemoryRegion. We don't currently
have any target CPU implementations of this hook that require the
physical address; fix this bug so we don't get confused if we ever
do add one.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611125633.32755-3-peter.maydell@linaro.org
2018-06-15 15:23:34 +01:00
Peter Maydell ace4109011 cpu-defs.h: Document CPUIOTLBEntry 'addr' field
The 'addr' field in the CPUIOTLBEntry struct has a rather non-obvious
use; add a comment documenting it (reverse-engineered from what
the code that sets it is doing).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611125633.32755-2-peter.maydell@linaro.org
2018-06-15 15:23:34 +01:00
Laurent Vivier 98670d47cd accel/tcg: add size paremeter in tlb_fill()
The MC68040 MMU provides the size of the access that
triggers the page fault.

This size is set in the Special Status Word which
is written in the stack frame of the access fault
exception.

So we need the size in m68k_cpu_unassigned_access() and
m68k_cpu_handle_mmu_fault().

To be able to do that, this patch modifies the prototype of
handle_mmu_fault handler, tlb_fill() and probe_write().
do_unassigned_access() already includes a size parameter.

This patch also updates handle_mmu_fault handlers and
tlb_fill() of all targets (only parameter, no code change).

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20180118193846.24953-2-laurent@vivier.eu>
2018-01-25 16:02:24 +01:00
Peter Maydell 34d49937e4 accel/tcg: Handle atomic accesses to notdirty memory correctly
To do a write to memory that is marked as notdirty, we need
to invalidate any TBs we have cached for that memory, and
update the cpu physical memory dirty flags for VGA and migration.
The slowpath code in notdirty_mem_write() does all this correctly,
but the new atomic handling code in atomic_mmu_lookup() doesn't
do anything at all, it just clears the dirty bit in the TLB.

The effect of this bug is that if the first write to a notdirty
page for which we have cached TBs is by a guest atomic access,
we fail to invalidate the TBs and subsequently will execute
incorrect code. This can be seen by trying to run 'javac' on AArch64.

Use the new notdirty_call_before() and notdirty_call_after()
functions to correctly handle the update to notdirty memory
in the atomic codepath.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1511201308-23580-3-git-send-email-peter.maydell@linaro.org
2017-11-21 12:09:25 +00:00
Richard Henderson ec603b5584 tcg: Record code_gen_buffer address for user-only memory helpers
When we handle a signal from a fault within a user-only memory helper,
we cannot cpu_restore_state with the PC found within the signal frame.
Use a TLS variable, helper_retaddr, to record the unwind start point
to find the faulting guest insn.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-11-15 10:33:27 +01:00