Commit Graph

62239 Commits

Author SHA1 Message Date
Greg Kurz 9986ddec4c spapr_cpu_core: add missing rollback on realization path
The spapr_realize_vcpu() function doesn't rollback in case of error.
This isn't a problem with coldplugged CPUs because the machine won't
start and QEMU will exit. Hotplug is a different story though: the
CPU thread is started under object_property_set_bool() and it assumes
it can access the CPU object.

If icp_create() fails, we return an error without unregistering the
reset handler for this CPU, and we let the underlying QEMU thread for
this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize
already realized CPUs either, but happily frees all of them anyway, the
CPU thread crashes instantly:

(qemu) device_add host-spapr-cpu-core,core-id=1,id=gku
GKU: failing icp_create (cpu 0x11497fd0)
                             ^^^^^^^^^^
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffee3feaa0 (LWP 24725)]
0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0,
                                                  ^^^^^^^^^^^^^^
                                             pointer to the CPU object
623         trace_object_dynamic_cast_assert(obj ? obj->class->type->name
(gdb) p obj->class->type
$1 = (Type) 0x0
(gdb) p * obj
$2 = {class = 0x10ea9c10, free = 0x11244620,
                                 ^^^^^^^^^^
                              should be g_free
(gdb) p g_free
$3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free>

obj is a dangling pointer to the CPU that was just destroyed in
spapr_cpu_core_realize().

This patch adds proper rollback to both spapr_realize_vcpu() and
spapr_cpu_core_realize().

Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fixed a conflict due to a change in my tree]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Greg Kurz 27607c1cdc spapr_cpu_core: fix potential leak in spapr_cpu_core_realize()
Commit 94ad93bd97 (QEMU 2.12) switched to instantiate CPUs separately
but it missed to adapt the error path accordingly. If something fails in
the CPU creation loop, then the CPU object that was just created is leaked.

The error paths in this function are a bit obfuscated, and adding
yet another label to free this CPU object makes it worse. We should
move the block of the loop to a separate function, with a proper
rollback path, but this is a bigger cleanup.

For now, let's just fix the bug by adding the missing calls to
object_unref(). This will allow easier backport to older QEMU
versions.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Greg Kurz dbb3e8d5da spapr_cpu_core: convert last snprintf() to g_strdup_printf()
Because this is the preferred practice in QEMU.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
David Gibson 5e22e29201 pnv: Add cpu unrealize path
Currently we don't have any unrealize path for pnv cpu cores.  We get away
with this because we don't yet support cpu hotplug for pnv.

However, we're going to want it eventually, and in the meantime, it makes
it non-obvious why there are a bunch of allocations on the realize() path
that don't have matching frees.

So, implement the missing unrealize path.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-16 16:32:33 +10:00
David Gibson 3a24752112 pnv: Clean up cpu realize path
pnv_cpu_init() is only called from the the pnv cpu core realize path, and
really only can be called from there.  So fold it into its caller, which
we also rename for brevity.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-16 16:32:33 +10:00
David Gibson 08304a8689 pnv_core: Allocate cpu thread objects individually
Currently, we allocate space for all the cpu objects within a single core
in one big block.  This was copied from an older version of the spapr code
and requires some ugly pointer manipulation to extract the individual
objects.

This design was due to a misunderstanding of qemu lifetime conventions and
has already been changed in spapr (in 94ad93bd "spapr_cpu_core: instantiate
CPUs separately".

Make an equivalent change in pnv_core to get rid of the nasty pointer
arithmetic.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-16 16:32:33 +10:00
David Gibson 937c2146a6 pnv: Fix some error handling cpu realize()
In pnv_core_realize() we call two functions with an Error * parameter in
succession, which will go badly if they both cause errors.  In fact, a
failure in either of them indicates a qemu internal error, so we can just
use &error_abort in both cases.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-16 16:32:33 +10:00
David Gibson b1d40d6e09 spapr: Clean up cpu realize/unrealize paths
spapr_cpu_init() and spapr_cpu_destroy() are only called from the spapr
cpu core realize/unrealize paths, and really can only be called from there.

Those are all short functions, so fold the pairs together for simplicity.
While we're there rename some functions and change some parameter types
for brevity and clarity.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-16 16:32:33 +10:00
BALATON Zoltan 2100b6b21e sm501: Do not clear read only bits when writing registers
When writing registers that have read only bits we have to avoid
changing these bits as they may have non zero values. Make sure we use
the correct masks to mask out read only and reserved bits when
changing registers.

Also remove extra spaces from dram_control and arbitration_control
assignments.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland b6c7e42f74 mos6522: expose mos6522_update_irq() through MOS6522DeviceClass
In the case where we have an interrupt generated externally from inputs to
bits 1 and 2 of port A and/or port B, it is necessary to expose
mos6522_update_irq() so it can be called by the interrupt source.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland 32a8c27b5d mos6522: remove additional interrupt flag filter from mos6522_update_irq()
The datasheet indicates that the interrupt is generated by ANDing the
interrupt flags register (IFR) with the interrupt enable register (IER)
but currently there is an extra filter for the SR and timer interrupts.

Remove this extra filter to allow interrupts to be generated by external
inputs on bits 1 and 2 of ports A and B.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland 7f5d6517e3 mos6522: only clear the shift register interrupt upon write
According to the 6522 datasheet the shift register (SR) interrupt flag is
cleared upon write with no mention of any other interrupt flags.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Cédric Le Goater 52b438815e xics_kvm: fix a build break
On CentOS 7.5, gcc-4.8.5-28.el7_5.1.ppc64le fails to build QEMU due to :

  hw/intc/xics_kvm.c: In function ‘ics_set_kvm_state’:
  hw/intc/xics_kvm.c:281:13: error: ‘ret’ may be used uninitialized in this
    function [-Werror=maybe-uninitialized]
             return ret;

Fix the breakage and also remove the extra error reporting as
kvm_device_access() already provides a substantial error message.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland d811d61fbc mac_newworld: add PMU device
The PMU device supercedes the CUDA device found on older New World Macs and
is supported by a larger number of guest OSs from OS 9 to OS X 10.5.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland 84051eb400 adb: add property to disable direct reg 3 writes
MacOS 9 has a bug in its PMU driver whereby after configuring the ADB bus
devices it sends another write to reg 3 on both devices resetting them
both back to the same address.

Add a new disable_direct_reg3_writes property to ADBDevice to disable these
direct writes which can enabled just for the upcoming pmu-adb support.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland fb6649f172 adb: fix read reg 3 byte ordering
According to the Apple ADB documentation, register 3 is a 2-byte register
with the device address in the first byte, and the handler ID in the second
byte.

This is currently the opposite away to which QEMU returns them so switch the
order around.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland 8f55ac1304 mac_newworld: wire up programmer switch to NMI handler
The programmer switch is wired up via an external GPIO pin and can be used
to aid debugging Mac guests.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland 7c4166a971 mac_newworld: add gpios to macio devices with PMU enabled
PMU-enabled New World Macs expose their GPIOs via a separate memory region
within the macio device.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland f1114c17ee mac_newworld: add via machine option to control mac99 VIA/ADB configuration
This option allows the VIA configuration to be controlled between 3
different possible setups: cuda, pmu-adb and pmu with USB rather than ADB
keyboard/mouse.

For the moment we don't do anything with the configuration except to pass
it to the macio device (the via-cuda parent) and also to the firmware via
the fw_cfg interface so that it can present the correct device tree.

The default is cuda which is the current default and so will have no
change in behaviour.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Mark Cave-Ayland 06fe3a5bf1 ppc: introduce Core99MachinesState for the mac99 machine
This is in preparation for adding configuration controlled via machine
options.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Greg Kurz 2c9dfdacc5 spapr: fix leak in h_client_architecture_support()
If the negotiated compat mode can't be set, but raw mode is supported,
we decide to ignore the error. An so, we should free it to prevent a
memory leak.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Greg Kurz e493786c95 target/ppc: drop empty #if/#endif block
Commit 9d6f106552 moved the last line in this block to somewhere else,
but it forgot to remove the now useless #if/#endif.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Suraj Jitindar Singh b2540203bd ppc/spapr_caps: Don't disable cap_cfpc on POWER8 by default
In default_caps_with_cpu() we set spapr_cap_cfpc to broken for POWER8
processors and before.

Since we no longer require private l1d cache on POWER8 for this cap to
be set to workaround change this to default to broken for POWER7
processors and before.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Suraj Jitindar Singh 072f416a53 target/ppc: Don't require private l1d cache on POWER8 for cap_ppc_safe_cache
For cap_ppc_safe_cache to be set to workaround, we require both a l1d
cache flush instruction and private l1d cache.

On POWER8 don't require private l1d cache. This means a guest on a
POWER8 machine can make use of the cache flush workarounds.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16 16:32:33 +10:00
Richard Henderson 9f75462065 tcg: Reduce max TB opcode count
Also, assert that we don't overflow any of two different offsets into
the TB. Both unwind and goto_tb both record a uint16_t for later use.

This fixes an arm-softmmu test case utilizing NEON in which there is
a TB generated that runs to 7800 opcodes, and compiles to 96k on an
x86_64 host.  This overflows the 16-bit offset in which we record the
goto_tb reset offset.  Because of that overflow, we install a jump
destination that goes to neverland.  Boom.

With this reduced op count, the same TB compiles to about 48k for
aarch64, ppc64le, and x86_64 hosts, and neither assertion fires.

Cc: qemu-stable@nongnu.org
Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 09:39:53 -10:00
Philippe Mathieu-Daudé 1b145d59b7 configure: Enable out-of-tree acceptance tests
Currently to run Avocado acceptance tests in an out-of-tree
build directory, we need to use the full path to the test:

  build_dir$ avocado run /full/path/to/sources/qemu/tests/acceptance/boot_linux_console.py

This patch adds a symlink in the build tree to simplify the
tests invocation, allowing the same command than in in-tree builds:

  build_dir$ avocado run tests/acceptance/boot_linux_console.py

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180612173437.14462-1-f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-06-15 16:10:11 -03:00
Cleber Rosa c1cc73f407 Acceptance tests: add Linux kernel boot and console checking test
This test boots a Linux kernel, and checks that the given command
line was effective in two ways:

 * It makes the kernel use the set "console device" as a console
 * The kernel records the command line as expected in the console

Given that way too many error conditions may occur, and detecting the
kernel boot progress status may not be trivial, this test relies on a
timeout to handle unexpected situations.  Also, it's *not* tagged as a
quick test for obvious reasons.

It may be useful, while interactively running/debugging this test, or
tests similar to this one, to show some of the logging channels.
Example:

 $ avocado --show=QMP,console run boot_linux_console.py

Signed-off-by: Cleber Rosa <crosa@redhat.com>
Message-Id: <20180530184156.15634-6-crosa@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-06-15 16:10:11 -03:00
Cleber Rosa 22dea9db2b scripts/qemu.py: introduce set_console() method
The set_console() method is intended to ease higher level use cases
that require a console device.

The amount of intelligence is limited on purpose, requiring either the
device type explicitly, or the existence of a machine (pattern)
definition.

Because of the console device type selection criteria (by machine
type), users should also be able to define that.  It'll then be used
for both '-machine' and for the console device type selection.

Users of the set_console() method will certainly be interested in
accessing the console device, and for that a console_socket property
has been added.

Signed-off-by: Cleber Rosa <crosa@redhat.com>
Message-Id: <20180530184156.15634-5-crosa@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-06-15 16:10:11 -03:00
Cleber Rosa 7b1bd11cff Acceptance tests: add quick VNC tests
This patch adds a few simple behavior tests for VNC.

Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180530184156.15634-4-crosa@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-06-15 16:10:11 -03:00
Cleber Rosa 572a824383 scripts/qemu.py: allow adding to the list of extra arguments
Tests will often need to add extra arguments to QEMU command
line arguments.

Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180530184156.15634-3-crosa@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-06-15 16:10:11 -03:00
Cleber Rosa c3d7e8c90d Add functional/acceptance tests infrastructure
This patch adds the very minimum infrastructure necessary for writing
and running functional/acceptance tests, including:

 * Documentation
 * The avocado_qemu.Test base test class
 * One example tests (version.py)

Additional functionality is expected to be added along the tests that
require them.

Signed-off-by: Cleber Rosa <crosa@redhat.com>
Message-Id: <20180530184156.15634-2-crosa@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
[ehabkost: fix typo on testing.rst]
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-06-15 16:10:11 -03:00
Eduardo Habkost 93bc3b29fa Remove COPYING.PYTHON
The COPYING.PYTHON file was added when we added the compatibility
argparse.py module, which was licensed under the Python Software
Foundation License Version 2.

Now the compatibility argparse.py module was removed, and we are
not carrying any code under that license anymore.  Remove
COPYING.PYTHON.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180611180152.2681-1-ehabkost@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-06-15 16:10:11 -03:00
Emilio G. Cota 0ac20318ce tcg: remove tb_lock
Use mmap_lock in user-mode to protect TCG state and the page descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks needed.
Per-page locks are used to protect the page descriptors.

Per-TB locks are used in both modes to protect TB jumps.

Some notes:

- tb_lock is removed from notdirty_mem_write by passing a
  locked page_collection to tb_invalidate_phys_page_fast.

- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
  so there is no need to further serialize access to them.

- do_tb_flush is run in a safe async context, meaning no other
  vCPU threads are running. Therefore acquiring mmap_lock there
  is just to please tools such as thread sanitizer.

- Not visible in the diff, but tb_invalidate_phys_page already
  has an assert_memory_lock.

- cpu_io_recompile is !user-only, so no mmap_lock there.

- Added mmap_unlock()'s before all siglongjmp's that could
  be called in user-mode while mmap_lock is held.
  + Added an assert for !have_mmap_lock() after returning from
    the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.

Performance numbers before/after:

Host: AMD Opteron(tm) Processor 6376

                 ubuntu 17.04 ppc64 bootup+shutdown time

  700 +-+--+----+------+------------+-----------+------------*--+-+
      |    +    +      +            +           +           *B    |
      |         before ***B***                            ** *    |
      |tb lock removal ###D###                         ***        |
  600 +-+                                           ***         +-+
      |                                           **         #    |
      |                                        *B*          #D    |
      |                                     *** *         ##      |
  500 +-+                                ***           ###      +-+
      |                             * ***           ###           |
      |                            *B*          # ##              |
      |                          ** *          #D#                |
  400 +-+                      **            ##                 +-+
      |                      **           ###                     |
      |                    **           ##                        |
      |                  **         # ##                          |
  300 +-+  *           B*          #D#                          +-+
      |    B         ***        ###                               |
      |    *       **       ####                                  |
      |     *   ***      ###                                      |
  200 +-+   B  *B     #D#                                       +-+
      |     #B* *   ## #                                          |
      |     #*    ##                                              |
      |    + D##D#     +            +           +            +    |
  100 +-+--+----+------+------------+-----------+------------+--+-+
           1    8      16      Guest CPUs       48           64
  png: https://imgur.com/HwmBHXe

              debian jessie aarch64 bootup+shutdown time

  90 +-+--+-----+-----+------------+------------+------------+--+-+
     |    +     +     +            +            +            +    |
     |         before ***B***                                B    |
  80 +tb lock removal ###D###                              **D  +-+
     |                                                   **###    |
     |                                                 **##       |
  70 +-+                                             ** #       +-+
     |                                             ** ##          |
     |                                           **  #            |
  60 +-+                                       *B  ##           +-+
     |                                       **  ##               |
     |                                    ***  #D                 |
  50 +-+                               ***   ##                 +-+
     |                             * **   ###                     |
     |                           **B*  ###                        |
  40 +-+                     ****  # ##                         +-+
     |                   ****     #D#                             |
     |             ***B**      ###                                |
  30 +-+    B***B**        ####                                 +-+
     |    B *   *     # ###                                       |
     |     B       ###D#                                          |
  20 +-+   D  ##D##                                             +-+
     |      D#                                                    |
     |    +     +     +            +            +            +    |
  10 +-+--+-----+-----+------------+------------+------------+--+-+
          1     8     16      Guest CPUs        48           64
  png: https://imgur.com/iGpGFtv

The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 08:18:48 -10:00
Emilio G. Cota 705ad1ff0c translate-all: remove tb_lock mention from cpu_restore_state_from_tb
tb_lock was needed when the function did retranslation. However,
since fca8a500d5 ("tcg: Save insn data and use it in
cpu_restore_state_from_tb") we don't do retranslation.

Get rid of the comment.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 08:18:48 -10:00
Emilio G. Cota b7542f7fe8 cputlb: remove tb_lock from tlb_flush functions
The acquisition of tb_lock was added when the async tlb_flush
was introduced in e3b9ca810 ("cputlb: introduce tlb_flush_* async work.")

tb_lock was there to allow us to do memset() on the tb_jmp_cache's.
However, since f3ced3c592 ("tcg: consistently access cpu->tb_jmp_cache
atomically") all accesses to tb_jmp_cache are atomic, so tb_lock
is not needed here. Get rid of it.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 08:18:48 -10:00
Emilio G. Cota 194125e3eb translate-all: protect TB jumps with a per-destination-TB lock
This applies to both user-mode and !user-mode emulation.

Instead of relying on a global lock, protect the list of incoming
jumps with tb->jmp_lock. This lock also protects tb->cflags,
so update all tb->cflags readers outside tb->jmp_lock to use
atomic reads via tb_cflags().

In order to find the destination TB (and therefore its jmp_lock)
from the origin TB, we introduce tb->jmp_dest[].

I considered not using a linked list of jumps, which simplifies
code and makes the struct smaller. However, it unnecessarily increases
memory usage, which results in a performance decrease. See for
instance these numbers booting+shutting down debian-arm:
                      Time (s)  Rel. err (%)  Abs. err (s)  Rel. slowdown (%)
------------------------------------------------------------------------------
 before                  20.88          0.74      0.154512                 0.
 after                   20.81          0.38      0.079078        -0.33524904
 GTree                   21.02          0.28      0.058856         0.67049808
 GHashTable + xxhash     21.63          1.08      0.233604          3.5919540

Using a hash table or a binary tree to keep track of the jumps
doesn't really pay off, not only due to the increased memory usage,
but also because most TBs have only 0 or 1 jumps to them. The maximum
number of jumps when booting debian-arm that I measured is 35, but
as we can see in the histogram below a TB with that many incoming jumps
is extremely rare; the average TB has 0.80 incoming jumps.

n_jumps: 379208; avg jumps/tb: 0.801099
dist: [0.0,1.0)|▄█▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁  ▁▁▁     ▁|[34.0,35.0]

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 08:18:48 -10:00
Emilio G. Cota 95590e24af translate-all: discard TB when tb_link_page returns an existing matching TB
Use the recently-gained QHT feature of returning the matching TB if it
already exists. This allows us to get rid of the lookup we perform
right after acquiring tb_lock.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 08:18:42 -10:00
Emilio G. Cota faa9372c07 translate-all: introduce assert_no_pages_locked
The appended adds assertions to make sure we do not longjmp with page
locks held. Note that user-mode has nothing to check, since page_locks
are !user-mode only.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 6d9abf85d5 translate-all: add page_locked assertions
This is only compiled under CONFIG_DEBUG_TCG to avoid
bloating the binary.

In user-mode, assert_page_locked is equivalent to assert_mmap_lock.

Note: There are some tb_lock assertions left that will be
removed by later patches.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 0b5c91f74f translate-all: use per-page locking in !user-mode
Groundwork for supporting parallel TCG generation.

Instead of using a global lock (tb_lock) to protect changes
to pages, use fine-grained, per-page locks in !user-mode.
User-mode stays with mmap_lock.

Sometimes changes need to happen atomically on more than one
page (e.g. when a TB that spans across two pages is
added/invalidated, or when a range of pages is invalidated).
We therefore introduce struct page_collection, which helps
us keep track of a set of pages that have been locked in
the appropriate locking order (i.e. by ascending page index).

This commit first introduces the structs and the function helpers,
to then convert the calling code to use per-page locking. Note
that tb_lock is not removed yet.

While at it, rename tb_alloc_page to tb_page_add, which pairs with
tb_page_remove.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 45c73de594 translate-all: move tb_invalidate_phys_page_range up in the file
This greatly simplifies next commit's diff.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota ae5486e273 translate-all: work page-by-page in tb_invalidate_phys_range_1
So that we pass a same-page range to tb_invalidate_phys_page_range,
instead of always passing an end address that could be on a different
page.

As discussed with Peter Maydell on the list [1], tb_invalidate_phys_page_range
doesn't actually do much with 'end', which explains why we have never
hit a bug despite going against what the comment on top of
tb_invalidate_phys_page_range requires:

> * Invalidate all TBs which intersect with the target physical address range
> * [start;end[. NOTE: start and end must refer to the *same* physical page.

The appended honours the comment, which avoids confusion.

While at it, rework the loop into a for loop, which is less error prone
(e.g. "continue" won't result in an infinite loop).

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg09165.html

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 94da9aec2a translate-all: remove hole in PageDesc
Groundwork for supporting parallel TCG generation.

Move the hole to the end of the struct, so that a u32
field can be added there without bloating the struct.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 78722ed0b8 translate-all: make l1_map lockless
Groundwork for supporting parallel TCG generation.

We never remove entries from the radix tree, so we can use cmpxchg
to implement lockless insertions.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 1e05197f24 translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB
This commit does several things, but to avoid churn I merged them all
into the same commit. To wit:

- Use uintptr_t instead of TranslationBlock * for the list of TBs in a page.
  Just like we did in (c37e6d7e "tcg: Use uintptr_t type for
  jmp_list_{next|first} fields of TB"), the rationale is the same: these
  are tagged pointers, not pointers. So use a more appropriate type.

- Only check the least significant bit of the tagged pointers. Masking
  with 3/~3 is unnecessary and confusing.

- Introduce the TB_FOR_EACH_TAGGED macro, and use it to define
  PAGE_FOR_EACH_TB, which improves readability. Note that
  TB_FOR_EACH_TAGGED will gain another user in a subsequent patch.

- Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there
  is a bug and we attempt to remove a TB that is not in the list, instead
  of segfaulting (since the list is NULL-terminated) we will reach
  g_assert_not_reached().

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 128ed2278c tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx
Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota be2cdc5e35 tcg: track TBs with per-region BST's
This paves the way for enabling scalable parallel generation of TCG code.

Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.

The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.

Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.

Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().

As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 32359d529f qht: return existing entry when qht_insert fails
The meaning of "existing" is now changed to "matches in hash and
ht->cmp result". This is saner than just checking the pointer value.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by:  Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
Emilio G. Cota 61b8cef1d4 qht: require a default comparison function
qht_lookup now uses the default cmp function. qht_lookup_custom is defined
to retain the old behaviour, that is a cmp function is explicitly provided.

qht_insert will gain use of the default cmp in the next patch.

Note that we move qht_lookup_custom's @func to be the last argument,
which makes the new qht_lookup as simple as possible.
Instead of this (i.e. keeping @func 2nd):
0000000000010750 <qht_lookup>:
   10750:       89 d1                   mov    %edx,%ecx
   10752:       48 89 f2                mov    %rsi,%rdx
   10755:       48 8b 77 08             mov    0x8(%rdi),%rsi
   10759:       e9 22 ff ff ff          jmpq   10680 <qht_lookup_custom>
   1075e:       66 90                   xchg   %ax,%ax

We get:
0000000000010740 <qht_lookup>:
   10740:       48 8b 4f 08             mov    0x8(%rdi),%rcx
   10744:       e9 37 ff ff ff          jmpq   10680 <qht_lookup_custom>
   10749:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00
John Arbuckle 1019242af1 tcg/i386: Use byte form of xgetbv instruction
The assembler in most versions of Mac OS X is pretty old and does not
support the xgetbv instruction.  To go around this problem, the raw
encoding of the instruction is used instead.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Message-Id: <20180604215102.11002-1-programmingkidx@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 07:42:55 -10:00