Commit Graph

42 Commits

Author SHA1 Message Date
Xin Tong 88e89a57f9 implementing victim TLB for QEMU system emulated TLB
QEMU system mode page table walks are expensive. Taken by running QEMU
qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
4-level page tables in guest Linux OS takes ~450 X86 instructions on
average.

QEMU system mode TLB is implemented using a directly-mapped hashtable.
This structure suffers from conflict misses. Increasing the
associativity of the TLB may not be the solution to conflict misses as
all the ways may have to be walked in serial.

A victim TLB is a TLB used to hold translations evicted from the
primary TLB upon replacement. The victim TLB lies between the main TLB
and its refill path. Victim TLB is of greater associativity (fully
associative in this patch). It takes longer to lookup the victim TLB,
but its likely better than a full page table walk. The memory
translation path is changed as follows :

Before Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. TLB refill.
5. Do the memory access.
6. Return to code cache.

After Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. Victim TLB lookup.
5. If victim TLB misses, TLB refill
6. Do the memory access.
7. Return to code cache

The advantage is that victim TLB can offer more associativity to a
directly mapped TLB and thus potentially fewer page table walks while
still keeping the time taken to flush within reasonable limits.
However, placing a victim TLB before the refill path increase TLB
refill path as the victim TLB is consulted before the TLB refill. The
performance results demonstrate that the pros outweigh the cons.

some performance results taken on SPECINT2006 train
datasets and kernel boot and qemu configure script on an
Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine are shown in the
Google Doc link below.

https://docs.google.com/spreadsheets/d/1eiItzekZwNQOal_h-5iJmC4tMDi051m9qidi5_nwvH4/edit?usp=sharing

In summary, victim TLB improves the performance of qemu-system-x86_64 by
11% on average on SPECINT2006, kernelboot and qemu configscript and with
highest improvement of in 26% in 456.hmmer. And victim TLB does not result
in any performance degradation in any of the measured benchmarks. Furthermore,
the implemented victim TLB is architecture independent and is expected to
benefit other architectures in QEMU as well.

Although there are measurement fluctuations, the performance
improvement is very significant and by no means in the range of
noises.

Signed-off-by: Xin Tong <trent.tong@gmail.com>
Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01 17:43:06 +01:00
Paolo Bonzini f08b617018 softmmu: introduce cpu_ldst.h
This will collect all load and store helpers soon.  For now
it is just a replacement for softmmu_exec.h, which this patch
stops including directly, but we also include it where this will
be necessary in order to simplify the next patch.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-05 16:10:33 +02:00
Paolo Bonzini 58ed270df9 softmmu: move softmmu_template.h out of include/
It is only included in cputlb.c now.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-05 16:10:33 +02:00
Paolo Bonzini 0f590e749f softmmu: commonize helper definitions
They do not need to be in op_helper.c.  Because cputlb.c now includes
softmmu_template.h twice for each size, io_readX must be elided the
second time through.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-05 16:10:33 +02:00
Stefan Weil 7e4e88656c cputlb: Fix regression with TCG interpreter (bug 1310324)
Commit 0f842f8a24 replaced GETPC_EXT() which
was derived from GETPC() by GETRA_EXT() without fixing cputlb.c. A later
patch replaced GETRA_EXT() by GETRA() in exec/softmmu_template.h which
is included in cputlb.c.

The TCG interpreter failed because the values returned by GETRA() were no
longer explicitly set to 0. The redefinition of GETRA() introduced here
fixes this.

In addition, GETPC_ADJ which is also used in exec/softmmu_template.h is
set to 0. Both changes reduce the compiled code size for cputlb.c by more
than 100 bytes, so the normal TCG without interpreter also profits from
the reduced code size and slightly faster code.

Cc: qemu-stable@nongnu.org
Reported-by: Giovanni Mascellani <gio@debian.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-05 16:03:37 +02:00
Andreas Färber 0c591eb0a9 cputlb: Change tlb_set_page() argument to CPUState
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13 19:52:47 +01:00
Andreas Färber 00c8cb0a36 cputlb: Change tlb_flush() argument to CPUState
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13 19:52:47 +01:00
Andreas Färber 31b030d4ab cputlb: Change tlb_flush_page() argument to CPUState
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13 19:52:47 +01:00
Andreas Färber a47dddd734 exec: Change cpu_abort() argument to CPUState
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13 19:52:28 +01:00
Andreas Färber bb0e627a84 exec: Change memory_region_section_get_iotlb() argument to CPUState
It no longer needs CPUArchState since moving watchpoints to CPUState.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13 19:20:48 +01:00
Andreas Färber baea4fae7b cputlb: Change tlb_unprotect_code_phys() argument to CPUState
Note that the argument is unused.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13 19:20:48 +01:00
Andreas Färber 611d4f996f translate-all: Change tb_flush_jmp_cache() argument to CPUState
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13 19:20:48 +01:00
Andreas Färber 8cd70437f3 cpu: Move tb_jmp_cache field from CPU_COMMON to CPUState
Clear it on reset.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13 19:20:46 +01:00
Edgar E. Iglesias 09daed848c cpu: Add per-cpu address space
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-02-11 22:56:37 +10:00
Edgar E. Iglesias 777170946f exec: Make iotlb_to_region input an AS
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2014-02-11 22:56:09 +10:00
Juan Quintela 220c3ebddb memory: split cpu_physical_memory_* functions to its own include
All the functions that use ram_addr_t should be here.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
2014-01-13 14:04:54 +01:00
Juan Quintela a2f4d5bef2 memory: make cpu_physical_memory_reset_dirty() take a length parameter
We have an end parameter in all the callers, and this make it coherent
with the rest of cpu_physical_memory_* functions, that also take a
length parameter.

Once here, move the start/end calculation to
tlb_reset_dirty_range_all() as we don't need it here anymore.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
2014-01-13 14:04:54 +01:00
Juan Quintela a2cd8c852d memory: s/dirty/clean/ in cpu_physical_memory_is_dirty()
All uses except one really want the other meaning.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
2014-01-13 14:04:54 +01:00
Juan Quintela 5215919291 memory: cpu_physical_memory_mask_dirty_range() always clears a single flag
Document it

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
2014-01-13 14:04:54 +01:00
Juan Quintela a1390db4df memory: create function to set a single dirty bit
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-01-13 14:04:53 +01:00
Richard Henderson eb2535f411 cputlb: Tidy memset() of arrays
Don't duplicate the array length computation in the memset()
when plain sizeof() can produce the correct results.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-12-23 15:32:36 +01:00
Richard Henderson 4fadb3bb57 cputlb: Use memset() when flushing entries
The size of tlb_table is 4k on a 64-bit host.  For overwriting
memory at this size, cacheline tricks can help.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-12-23 15:31:19 +01:00
liguang 812586405c cputlb: Remove dead function tlb_update_dirty()
Signed-off-by: liguang <lig.fnst@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-10-07 11:48:03 +02:00
Andreas Färber bdc44640cb cpu: Use QTAILQ for CPU list
Introduce CPU_FOREACH(), CPU_FOREACH_SAFE() and CPU_NEXT() shorthand
macros.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-09-03 12:25:55 +02:00
Andreas Färber 182735efaf cpu: Make first_cpu and next_cpu CPUState
Move next_cpu from CPU_COMMON to CPUState.
Move first_cpu variable to qom/cpu.h.

gdbstub needs to use CPUState::env_ptr for now.
cpu_copy() no longer needs to save and restore cpu_next.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
[AF: Rebased, simplified cpu_copy()]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-09 21:32:54 +02:00
Paolo Bonzini 1b5ec23467 memory: return MemoryRegion from qemu_ram_addr_from_host
It will be needed in the next patch.

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-04 17:42:46 +02:00
Paolo Bonzini 7443b43758 exec: move qemu_ram_addr_from_host_nofail to cputlb.c
After the next patch it would not be used elsewhere anyway.  Also,
the _nofail and the standard versions of this function return different
things, which is confusing.  Removing the function from the public headers
limits the confusion.

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-04 17:42:45 +02:00
Andreas Färber c658b94f6e cpu: Turn cpu_unassigned_access() into a CPUState hook
Use it for all targets, but be careful not to pass invalid CPUState.
cpu_single_env can be NULL, e.g. on Xen.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-06-28 13:25:13 +02:00
Jan Kiszka 90260c6c09 exec: Resolve subpages in one step except for IOTLB fills
Except for the case of setting the IOTLB entry in TCG mode, we can avoid
the subpage dispatching handlers and do the resolution directly on
address_space_lookup_region. An IOTLB entry describes a full page, not
only the region that the first access to a sub-divided page may return.

This patch therefore introduces a special translation function,
address_space_translate_for_iotlb, that avoids the subpage resolutions.
In contrast, callers of the existing address_space_translate service
will now always receive the terminal memory region section. This will be
important for breaking the BQL and for enabling unaligned memory region.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-20 16:32:46 +02:00
Hervé Poussineau 54b949d270 cputlb: fix debug logs
'pd' variable has been removed in 06ef3525e1.

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-06-14 14:29:04 +04:00
Paolo Bonzini 149f54b53b memory: add address_space_translate
Using phys_page_find to translate an AddressSpace to a MemoryRegionSection
is unwieldy.  It requires to pass the page index rather than the address,
and later memory_region_section_addr has to be called.  Replace
memory_region_section_addr with a function that does all of it: call
phys_page_find, compute the offset within the region, and check how
big the current mapping is.  This way, a large flat region can be written
with a single lookup rather than a page at a time.

address_space_translate will also provide a single point where IOMMU
forwarding is implemented.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-05-29 16:26:50 +02:00
Paolo Bonzini 8f3e03cb73 cputlb: simplify tlb_set_page
The same "if" condition is repeated twice.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-05-29 16:26:39 +02:00
Andreas Färber d77953b94f cpu: Move current_tb field to CPUState
Explictly NULL it on CPU reset since it was located before breakpoints.

Change vapic_report_tpr_access() argument to CPUState. This also
resolves the use of void* for cpu.h independence.
Change vAPIC patch_instruction() argument to X86CPU.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-02-16 14:51:00 +01:00
Paolo Bonzini 022c62cbbc exec: move include files to include/exec/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Avi Kivity a8170e5e97 Rename target_phys_addr_t to hwaddr
target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
reserved) and its purpose doesn't match the name (most target_phys_addr_t
addresses are not target specific).  Replace it with a finger-friendly,
standards conformant hwaddr.

Outstanding patchsets can be fixed up with the command

  git rebase -i --exec 'find -name "*.[ch]"
                        | xargs s/target_phys_addr_t/hwaddr/g' origin

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-23 08:58:25 -05:00
Avi Kivity ac1970fbe8 memory: per-AddressSpace dispatch
Currently we use a global radix tree to dispatch memory access.  This only
works with a single address space; to support multiple address spaces we
make the radix tree a member of AddressSpace (via an intermediate structure
AddressSpaceDispatch to avoid exposing too many internals).

A side effect is that address_space_io also gains a dispatch table.  When
we remove all the pre-memory-API I/O registrations, we can use that for
dispatching I/O and get rid of the original I/O dispatch.

Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-22 14:50:08 +02:00
Avi Kivity 7762c2c1e0 memory: rename 'exec-obsolete.h'
exec-obsolete.h used to hold pre-memory-API functions that were used from
device code prior to the transition to the memory API.  Now that the
transition is complete, the name no longer describes the file.  The
functions still need to be merged better into the memory core, but there's
no danger of anyone using them.

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-15 11:43:05 +02:00
Blue Swirl 89c33337fd Remove unused CONFIG_TCG_PASS_AREG0 and dead code
Now that CONFIG_TCG_PASS_AREG0 is enabled for all targets,
remove dead code and support for !CONFIG_TCG_PASS_AREG0 case.

Remove dyngen-exec.h and all references to it. Although included by
hw/spapr_hcall.c, it does not seem to use it.

Remove unused HELPER_CFLAGS.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2012-09-15 17:51:14 +00:00
Peter Maydell 116aae36ae cputlb.c: Fix out of date comment
The comment about the return address from get_page_addr_code() was
well out of date as phys_ram_base has not existed for some time.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-08-15 15:18:54 +01:00
Max Filippov 56eb21e158 cputlb: fix watchpoints handling
Cleanup commit e554861766 have changed
code_address calculation in the tlb_set_page function in case of access
to a page with a watchpoint. This caused QEMU segfault in the xtensa
test_break unit test. Fix it by moving code_address assignment above
memory_region_section_get_iotlb call.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-05-12 09:14:38 +00:00
Blue Swirl cc5bea608d cputlb: prepare private memory API for public consumption
Fold is_ram_rom and is_ram_rom_romd() into callers.

Change is_romd() and section_addr() to take MemoryRegion
instead of MemoryRegionSection for consistency and
use memory_region_ prefix.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-05-01 10:45:05 +00:00
Blue Swirl 0cac1b66c8 cputlb: move TLB handling to a separate file
Move TLB handling and softmmu code load helpers to cputlb.c,
compile only for softmmu targets.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-05-01 10:45:04 +00:00