Commit Graph

272 Commits

Author SHA1 Message Date
Alexey Kardashevskiy
f682e9c244 memory: Add reporting of supported page sizes
Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate
uses when translating, however this information is not available outside
the translate context for various checks.

This adds a get_min_page_size callback to MemoryRegionIOMMUOps and
a wrapper for it so IOMMU users (such as VFIO) can know the minimum
actual page size supported by an IOMMU.

As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE
as fallback.

This removes vfio_container_granularity() and uses new helper in
memory_region_iommu_replay() when replaying IOMMU mappings on added
IOMMU memory region.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[dwg: Removed an unnecessary calculation]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-22 11:13:09 +10:00
Paolo Bonzini
0878d0e11b exec: hide mr->ram_addr from qemu_get_ram_ptr users
Let users of qemu_get_ram_ptr and qemu_ram_ptr_length pass in an
address that is relative to the MemoryRegion.  This basically means
what address_space_translate returns.

Because the semantics of the second parameter change, rename the
function to qemu_map_ram_ptr.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
07bdaa4196 memory: split memory_region_from_host from qemu_ram_addr_from_host
Move the old qemu_ram_addr_from_host to memory_region_from_host and
make it return an offset within the region.  For qemu_ram_addr_from_host
return the ram_addr_t directly, similar to what it was before
commit 1b5ec23 ("memory: return MemoryRegion from qemu_ram_addr_from_host",
2013-07-04).

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
4ff87573df memory: remove qemu_get_ram_fd, qemu_set_ram_fd, qemu_ram_block_host_ptr
Remove direct uses of ram_addr_t and optimize memory_region_{get,set}_fd
now that a MemoryRegion knows its RAMBlock directly.

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
b138e654a0 Revert "memory: Drop FlatRange.romd_mode"
This reverts commit 5b5660adf1,
as it breaks the UEFI guest firmware (known as ArmVirtPkg or AAVMF)
running in the "virt" machine type of "qemu-system-aarch64":

Contrary to the commit message, (a->mr == b->mr) does *not* imply
that (a->romd_mode == b->romd_mode): the pflash device model calls
memory_region_rom_device_set_romd() -- for switching between the above
modes --, and that function changes mr->romd_mode but the current
AddressSpaceDispatch's FlatRange keeps the old value.  Therefore
region_del/region_add are not called on the KVM MemoryListener.

Reported-by: Drew Jones <drjones@redhat.com>
Tested-by: Drew Jones <drjones@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Paolo Bonzini
e4e697940d memory: remove unnecessary masking of MemoryRegion ram_addr
mr->ram_block->offset is already aligned to both host and target size
(see qemu_ram_alloc_internal).  Remove further masking as it is
unnecessary.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:45 +02:00
Fam Zheng
5b5660adf1 memory: Drop FlatRange.romd_mode
Its value is alway set to mr->romd_mode, so the removed comparisons are
fully superseded by "a->mr == b->mr".

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1458900629-2334-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Fam Zheng
b613597819 memory: Remove code for mr->may_overlap
The collision check does nothing and hasn't been used. Remove the
variable together with related code.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1458900629-2334-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Gonglei
fa53a0e53e memory: drop find_ram_block()
On the one hand, we have already qemu_get_ram_block() whose function
is similar. On the other hand, we can directly use mr->ram_block but
searching RAMblock by ram_addr which is a kind of waste.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1462845901-89716-2-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Paolo Bonzini
33c11879fd qemu-common: push cpu.h inclusion out of qemu-common.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Markus Armbruster
da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Hollis Blanchard
f2d089425d trace: separate MMIO tracepoints from TB-access tracepoints
Memory accesses to code which has previously been translated into a TB show up
in the MMIO path, so that they may invalidate the TB. It's extremely confusing
to mix those in with device MMIOs, so split them into their own tracepoint.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1456949575-1633-2-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-14 09:34:30 +00:00
Hollis Blanchard
5a68be94ac trace: include CPU index in trace_memory_region_*()
Knowing which CPU performed an action is essential for understanding SMP guest
behavior.

However, cpu_physical_memory_rw() may be executed by a machine init function,
before any VCPUs are running, when there is no CPU running ('current_cpu' is
NULL). In this case, store -1 in the trace record as the CPU index. Trace
analysis tools may need to be aware of this special case.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Message-id: 1456949575-1633-1-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-14 09:34:30 +00:00
Fam Zheng
f1060c55bf exec: Pass RAMBlock pointer to qemu_ram_free
The only caller now knows exactly which RAMBlock to free, so it's not
necessary to do the lookup.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-6-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:37 +01:00
Fam Zheng
8e41fb63c5 memory: Drop MemoryRegion.ram_addr
All references to mr->ram_addr are replaced by
memory_region_get_ram_addr(mr) (except for a few assertions that are
replaced with mr->ram_block).

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-5-git-send-email-famz@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:29 +01:00
Fam Zheng
7ebb2745ac memory: Implement memory_region_get_ram_addr with mr->ram_block
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-4-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Fam Zheng
0a75601853 memory: Move assignment to ram_block to memory_region_init_*
We don't force "const" qualifiers with pointers in QEMU, but it's still
good to keep a clean function interface. Assigning to mr->ram_block is
in this sense ugly - one initializer mutating its owning object's state.

Move it to memory_region_init_*, where mr->ram_addr is assigned.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Fam Zheng
528f46af6e exec: Return RAMBlock pointer from allocating functions
Previously we return RAMBlock.offset; now return the pointer to the
whole structure.

ram_block_add returns void now, error is completely passed with errp.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Hollis Blanchard
4779dc1d19 trace: use addresses instead of offsets in memory tracepoints
When memory_region_ops tracepoints are enabled, calculate and record the
absolute address being accessed. Otherwise, we only get offsets into the
memory region instead of addresses.

[Fixed "offset" -> "addr" in trace event format strings.
--Stefan]

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Message-id: 1454976185-30095-3-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:20:15 +00:00
Hollis Blanchard
23d92d68e7 trace: split subpage MMIOs into their own trace events.
Previously, a single MMIO could trigger the memory_region_ops tracepoint twice:
once on its way into subpage ops, then later on its way into the model's ops.

Also, the fields previously called "addr" are actually offsets into the memory
region. Rename them to "offset" while we're editing the tracepoint definitions.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Message-id: 1454976185-30095-2-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:20:15 +00:00
Gonglei
3655cb9c73 memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length
these two functions consume too much cpu overhead to
find the RAMBlock by ram address.

After this patch, we can pass the RAMBlock pointer
to them so that they don't need to find the RAMBlock
anymore most of the time. We can get better performance
in address translation processing.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1455935721-8804-3-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-25 16:11:29 +01:00
Gonglei
58eaa2174e exec: store RAMBlock pointer into memory region
Each RAM memory region has a unique corresponding RAMBlock.
In the current realization, the memory region only stored
the ram_addr which means the offset of RAM address space,
We need to qurey the global ram.list to find the ram block
by ram_addr if we want to get the ram block, which is very
expensive.

Now, we store the RAMBlock pointer into memory region
structure. So, if we know the mr, we can easily get the
RAMBlock.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1456130097-4208-2-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-25 16:11:26 +01:00
Eric Blake
d7bce9999d qom: Swap 'name' next to visitor in ObjectPropertyAccessor
Similar to the previous patch, it's nice to have all functions
in the tree that involve a visitor and a name for conversion to
or from QAPI to consistently stick the 'name' parameter next
to the Visitor parameter.

Done by manually changing include/qom/object.h and qom/object.c,
then running this Coccinelle script and touching up the fallout
(Coccinelle insisted on adding some trailing whitespace).

    @ rule1 @
    identifier fn;
    typedef Object, Visitor, Error;
    identifier obj, v, opaque, name, errp;
    @@
     void fn
    - (Object *obj, Visitor *v, void *opaque, const char *name,
    + (Object *obj, Visitor *v, const char *name, void *opaque,
       Error **errp) { ... }

    @@
    identifier rule1.fn;
    expression obj, v, opaque, name, errp;
    @@
     fn(obj, v,
    -   opaque, name,
    +   name, opaque,
        errp)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-20-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Eric Blake
51e72bc1dd qapi: Swap visit_* arguments for consistent 'name' placement
JSON uses "name":value, but many of our visitor interfaces were
called with visit_type_FOO(v, &value, name, errp).  This can be
a bit confusing to have to mentally swap the parameter order to
match JSON order.  It's particularly bad for visit_start_struct(),
where the 'name' parameter is smack in the middle of the
otherwise-related group of 'obj, kind, size' parameters! It's
time to do a global swap of the parameter ordering, so that the
'name' parameter is always immediately after the Visitor argument.

Additional reason in favor of the swap: the existing include/qjson.h
prefers listing 'name' first in json_prop_*(), and I have plans to
unify that file with the qapi visitors; listing 'name' first in
qapi will minimize churn to the (admittedly few) qjson.h clients.

Later patches will then fix docs, object.h, visitor-impl.h, and
those clients to match.

Done by first patching scripts/qapi*.py by hand to make generated
files do what I want, then by running the following Coccinelle
script to affect the rest of the code base:
 $ spatch --sp-file script `git grep -l '\bvisit_' -- '**/*.[ch]'`
I then had to apply some touchups (Coccinelle insisted on TAB
indentation in visitor.h, and botched the signature of
visit_type_enum() by rewriting 'const char *const strings[]' to
the syntactically invalid 'const char*const[] strings').  The
movement of parameters is sufficient to provoke compiler errors
if any callers were missed.

    // Part 1: Swap declaration order
    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_start_struct
    -(TV v, TObj OBJ, T1 ARG1, const char *name, T2 ARG2, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type bool, TV, T1;
    identifier ARG1;
    @@
     bool visit_optional
    -(TV v, T1 ARG1, const char *name)
    +(TV v, const char *name, T1 ARG1)
     { ... }

    @@
    type TV, TErr, TObj, T1;
    identifier OBJ, ARG1;
    @@
     void visit_get_next_type
    -(TV v, TObj OBJ, T1 ARG1, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_type_enum
    -(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj;
    identifier OBJ;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
     void VISIT_TYPE
    -(TV v, TObj OBJ, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, TErr errp)
     { ... }

    // Part 2: swap caller order
    @@
    expression V, NAME, OBJ, ARG1, ARG2, ERR;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
    (
    -visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR)
    +visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -visit_optional(V, ARG1, NAME)
    +visit_optional(V, NAME, ARG1)
    |
    -visit_get_next_type(V, OBJ, ARG1, NAME, ERR)
    +visit_get_next_type(V, NAME, OBJ, ARG1, ERR)
    |
    -visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR)
    +visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -VISIT_TYPE(V, OBJ, NAME, ERR)
    +VISIT_TYPE(V, NAME, OBJ, ERR)
    )

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Peter Maydell
d38ea87ac5 all: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
2016-02-04 17:41:30 +00:00
Peter Crosthwaite
f0c02d15b5 memory: Add address_space_init_shareable()
This will either create a new AS or return a pointer to an
already existing equivalent one, if we have already created
an AS for the specified root memory region.

The motivation is to reuse address spaces as much as possible.
It's going to be quite common that bus masters out in device land
have pointers to the same memory region for their mastering yet
each will need to create its own address space. Let the memory
API implement sharing for them.

Aside from the perf optimisations, this should reduce the amount
of redundant output on info mtree as well.

Thee returned value will be malloced, but the malloc will be
automatically freed when the AS runs out of refs.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
[PMM: dropped check for NULL root as unused; added doc-comment;
 squashed Peter C's reference-counting patch into this one;
 don't compare name string when deciding if we can share ASes;
 read as->malloced before the unref of as->root to avoid possible
 read-after-free if as->root was the owner of as]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21 14:15:06 +00:00
Paolo Bonzini
1619d1fe73 memory: inline a few small accessors
These are used in the address_space_* fast paths.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 17:33:49 +01:00
Paolo Bonzini
612263cf33 memory: avoid unnecessary object_ref/unref
For the common case of DMA into non-hotplugged RAM, it is unnecessary
but expensive to do object_ref/unref.  Add back an owner field to
MemoryRegion, so that these memory regions can skip the reference
counting.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 17:33:48 +01:00
Paolo Bonzini
49b24afcb1 exec: always call qemu_get_ram_ptr within rcu_read_lock
Simplify the code and document the assumption.  The only caller
that is not within rcu_read_lock is memory_region_get_ram_ptr.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 17:33:48 +01:00
Pavel Fedin
8c56c1a592 memory: emulate ioeventfd
The ioeventfd mechanism is used by vhost, dataplane, and virtio-pci to
turn guest MMIO/PIO writes into eventfd file descriptor events.  This
allows arbitrary threads to be notified when the guest writes to a
specific MMIO/PIO address.

qtest and TCG do not support ioeventfd because memory writes are not
checked against registered ioeventfds in QEMU.  This patch implements
this in memory_region_dispatch_write() so qtest can use ioeventfd.

Also this patch fixes vhost aborting on some misconfigured old kernels
like 3.18.0 on ARM. It is possible to explicitly enable CONFIG_EVENTFD
in expert settings, while MMIO binding support in KVM will still be
missing.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Message-Id: <006e01d12377$0b9c2d40$22d487c0$@samsung.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 15:24:34 +01:00
Eduardo Habkost
fc3e7665d7 memory: Eliminate memory_region_destructor_ram_from_ptr()
The function is equivalent to memory_region_destructor_ram(), so
it's not needed anymore.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446844805-14492-3-git-send-email-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 15:24:34 +01:00
Eduardo Habkost
a29ac16632 exec: Eliminate qemu_ram_free_from_ptr()
Replace qemu_ram_free_from_ptr() with qemu_ram_free().

The only difference between qemu_ram_free_from_ptr() and
qemu_ram_free() is that g_free_rcu() is used instead of
call_rcu(reclaim_ramblock). We can safely replace it because:

* RAM blocks allocated by qemu_ram_alloc_from_ptr() always have
  RAM_PREALLOC set;
* reclaim_ramblock(block) will do nothing except g_free(block)
  if RAM_PREALLOC is set at block->flags.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446844805-14492-2-git-send-email-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 15:24:33 +01:00
Jason Wang
b8aecea23a memory: don't try to adjust endianness for zero length eventfd
There's no need to adjust endianness for zero length eventfd since the
data wrote was actually ignored by kernel. So skip the adjust in this
case to fix a possible crash when trying to use wildcard mmio eventfd
in ppc.

Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:32 +02:00
Paolo Bonzini
680a4783dc memory: call begin, log_start and commit when registering a new listener
This ensures that cpu_reload_memory_map() is called as soon as
tcg_cpu_address_space_init() is called, and before cpu->memory_dispatch
is used.  qemu-system-s390x never changes the address spaces after
tcg_cpu_address_space_init() is called, and thus tcg_commit() is never
called.  This causes a SIGSEGV.

Because memory_map_init() will now call mem_commit(), we have to
initialize io_mem_* before address_space_memory and friends.

Reported-by: Philipp Kern <pkern@debian.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 0a1c71cec6
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:56:01 +01:00
Paolo Bonzini
2e2b8eb70f memory: allow destroying a non-empty MemoryRegion
This is legal; the MemoryRegion will simply unreference all the
existing subregions and possibly bring them down with it as well.
However, it requires a bit of care to avoid an infinite loop.
Finalizing a memory region cannot trigger an address space update,
but memory_region_del_subregion errs on the side of caution and
might trigger a spurious update: avoid that by resetting mr->enabled
first.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1443689999-12182-2-git-send-email-armbru@redhat.com>
2015-10-09 15:25:56 +02:00
David Gibson
a788f227ef memory: Allow replay of IOMMU mapping notifications
When we have guest visible IOMMUs, we allow notifiers to be registered
which will be informed of all changes to IOMMU mappings.  This is used by
vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings.

However, unlike with a memory region listener, an iommu notifier won't be
told about any mappings which already exist in the (guest) IOMMU at the
time it is registered.  This can cause problems if hotplugging a VFIO
device onto a guest bus which had existing guest IOMMU mappings, but didn't
previously have an VFIO devices (and hence no host IOMMU mappings).

This adds a memory_region_iommu_replay() function to handle this case.  It
replays any existing mappings in an IOMMU memory region to a specified
notifier.  Because the IOMMU memory region doesn't internally remember the
granularity of the guest IOMMU it has a small hack where the caller must
specify a granularity at which to replay mappings.

If there are finer mappings in the guest IOMMU these will be reported in
the iotlb structures passed to the notifier which it must handle (probably
causing it to flag an error).  This isn't new - the VFIO iommu notifier
must already handle notifications about guest IOMMU mappings too short
for it to represent in the host IOMMU.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-10-05 12:39:03 -06:00
Markus Armbruster
0bdaa3a429 memory: Fix bad error handling in memory_region_init_ram_ptr()
Commit ef701d7 screwed up handling of out-of-memory conditions.
Before the commit, we report the error and exit(1), in one place.  The
commit lifts the error handling up the call chain some, to three
places.  Fine.  Except it uses &error_abort in these places, changing
the behavior from exit(1) to abort(), and thus undoing the work of
commit 3922825 "exec: Don't abort when we can't allocate guest
memory".

The previous two commits fixed one of the three places, another one
was fixed in commit 33e0eb5.  This commit fixes the third one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1441983105-26376-5-git-send-email-armbru@redhat.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
2015-09-18 14:39:39 +02:00
Pavel Fedin
6d6d2abf2c Merge memory_region_init_reservation() into memory_region_init_io()
Just specifying ops = NULL in some cases can be more convenient than having
two functions.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 78a379ab1b6b30ab497db7971ad336dad1dbee76.1438758065.git.p.fedin@samsung.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-08-13 11:26:21 +01:00
Paolo Bonzini
52c91dac6b memory: do not add a reference to the owner of aliased regions
Very often the owner of the aliased region is the same as the owner of the alias
region itself.  When this happens, the reference count can never go back to 0 and
the owner is leaked.  This is for example breaking hot-unplug of virtio-pci
devices (the device cannot be plugged back again with the same id).

Another common use for alias is to transform the system I/O address space
into an MMIO regions; in this case the aliased region never dies, so there
is no problem.  Otherwise the owner is always the same for aliasing
and aliased region.

I checked all calls to memory_region_init_alias introduced after commit
dfde4e6 (memory: add ref/unref calls, 2013-05-06) and they do not need the
reference in order to keep the owner of the aliased region alive.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-27 23:05:49 +02:00
Paolo Bonzini
deb809edb8 memory: count number of active VGA logging clients
For a board that has multiple framebuffer devices, both of them
might want to use DIRTY_MEMORY_VGA on the same memory region.
The lack of reference counting in memory_region_set_log makes
this very awkward to implement.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-24 13:57:45 +02:00
Paolo Bonzini
c6742b14fe memory: fix refcount leak in memory_region_present
memory_region_present() leaks a reference to a MemoryRegion in the
case "mr == container".  While fixing it, avoid reference counting
altogether for memory_region_present(), by using RCU only.

The return value could in principle be already invalid immediately
after memory_region_present returns, but presumably the caller knows
that and it's using memory_region_present to probe for devices that
are unpluggable, or something like that.  The RCU critical section
is needed anyway, because it protects as->current_map.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-16 20:00:20 +02:00
Paolo Bonzini
125b380666 exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st*
As memory_region_read/write_accessor will now be run also without BQL held,
we need to move coalesced MMIO flushing earlier in the dispatch process.

Cc: Frederic Konrad <fred.konrad@greensocs.com>
Message-Id: <1434646046-27150-5-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-01 15:45:50 +02:00
Jan Kiszka
196ea13104 memory: Add global-locking property to memory regions
This introduces the memory region property "global_locking". It is true
by default. By setting it to false, a device model can request BQL-free
dispatching of region accesses to its r/w handlers. The actual BQL
break-up will be provided in a separate patch.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Frederic Konrad <fred.konrad@greensocs.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1434646046-27150-4-git-send-email-pbonzini@redhat.com>
2015-07-01 15:45:50 +02:00
Paolo Bonzini
ec05ec26f9 memory: use mr->ram_addr in "is this RAM?" assertions
mr->terminates alone doesn't guarantee that we are looking at a RAM region.
mr->ram_addr also has to be checked, in order to distinguish RAM and I/O
regions.

So, do the following:

1) add a new define RAM_ADDR_INVALID, and test it in the assertions
instead of mr->terminates

2) IOMMU regions were not setting mr->ram_addr to a bogus value, initialize
it in the instance_init function so that the new assertions would fire
for IOMMU regions as well.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:10:00 +02:00
Stefan Hajnoczi
03eebc9e32 memory: replace cpu_physical_memory_reset_dirty() with test-and-clear
The cpu_physical_memory_reset_dirty() function is sometimes used
together with cpu_physical_memory_get_dirty().  This is not atomic since
two separate accesses to the dirty memory bitmap are made.

Turn cpu_physical_memory_reset_dirty() and
cpu_physical_memory_clear_dirty_range_type() into the atomic
cpu_physical_memory_test_and_clear_dirty().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <1417519399-3166-6-git-send-email-stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:10:00 +02:00
Paolo Bonzini
58d2707e87 exec: pass client mask to cpu_physical_memory_set_dirty_range
This cuts in half the cost of bitmap operations (which will become more
expensive when made atomic) during migration on non-VRAM regions.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:59 +02:00
Paolo Bonzini
6f6a5ef3e4 memory: include DIRTY_MEMORY_MIGRATION in the dirty log mask
The separate handling of DIRTY_MEMORY_MIGRATION, which does not
call log_start/log_stop callbacks when it changes in a region's
dirty logging mask, has caused several bugs.

One recent example is commit 4cc856f (kvm-all: Sync dirty-bitmap from
kvm before kvm destroy the corresponding dirty_bitmap, 2015-04-02).
Another performance problem is that KVM keeps tracking dirty pages
after a failed live migration, which causes bad performance due to
disallowing huge page mapping.

This patch removes the root cause of the problem by reporting
DIRTY_MEMORY_MIGRATION changes via log_start and log_stop.
Note that we now have to rebuild the FlatView when global dirty
logging is enabled or disabled; this ensures that log_start and
log_stop callbacks are invoked.

This will also be used to make the setting of bitmaps conditional.
In general, this patch lets users of the memory API ignore the
global state of dirty logging if they handle dirty logging
generically per region.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:59 +02:00
Paolo Bonzini
677e7805cf memory: track DIRTY_MEMORY_CODE in mr->dirty_log_mask
DIRTY_MEMORY_CODE is only needed for TCG.  By adding it directly to
mr->dirty_log_mask, we avoid testing for TCG everywhere a region is
checked for the enabled/disabled state of dirty logging.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:59 +02:00
Paolo Bonzini
b2dfd71c48 memory: prepare for multiple bits in the dirty log mask
When the dirty log mask will also cover other bits than DIRTY_MEMORY_VGA,
some listeners may be interested in the overall zero/non-zero value of
the dirty log mask; others may be interested in the value of single bits.

For this reason, always call log_start/log_stop if bits have respectively
appeared or disappeared, and pass the old and new values of the dirty log
mask so that listeners can distinguish the kinds of change.

For example, KVM checks if dirty logging used to be completely disabled
(in log_start) or is now completely disabled (in log_stop).  On the
other hand, Xen has to check manually if DIRTY_MEMORY_VGA changed,
since that is the only bit it cares about.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:59 +02:00
Paolo Bonzini
2d1a35bef0 memory: differentiate memory_region_is_logging and memory_region_get_dirty_log_mask
For now memory regions only track DIRTY_MEMORY_VGA individually, but
this will change soon.  To support this, split memory_region_is_logging
in two functions: one that returns a given bit from dirty_log_mask,
and one that returns the entire mask.  memory_region_is_logging gets an
extra parameter so that the compiler flags misuse.

While VGA-specific users (including the Xen listener!) will want to keep
checking that bit, KVM and vhost check for "any bit except migration"
(because migration is handled via the global start/stop listener
callbacks).

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:58 +02:00