Our memory API MMIO regions know the concept of device endianness. This
is used to automatically swap endianness between devices and host CPU,
depending on whether buses in between would swizzle the bits.
The ioeventfd value comparison does not adhere to that semantic though.
Probably because nobody has been running ioeventfd on a BE platform and
the only device implementing ioeventfd right now is LE (PCI) based.
So add swizzling to ioeventfd registration / deletion to make the rest
of the code as consistent as possible.
Thanks a lot to Michael Tsirkin to point me towards the right direction.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Many listeners don't need to respond to all MemoryListener callbacks;
provide suitable no-op defaults instead.
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of embedding knowledge of the memory and I/O address spaces in the
memory core, maintain a list of all address spaces. This list will later
be extended dynamically for other bus masters.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The DMA API will use an AddressSpace to differentiate among different
initiators.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
AddressSpace contains a member, current_map, of type FlatView. Since we
want to limit the leakage of internal types to public headers, switch to
a pointer to a FlatView. There is no performance impact as this isn't used
during lookups, only address space reconfigurations.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
exec-obsolete.h used to hold pre-memory-API functions that were used from
device code prior to the transition to the memory API. Now that the
transition is complete, the name no longer describes the file. The
functions still need to be merged better into the memory core, but there's
no danger of anyone using them.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Flush pending coalesced MMIO before performing mapping or state changes
that could affect the event orderings or route the buffered requests to
a wrong region.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Simplify the code as we are using now only a subset of the original
features of memory_region_update_topology.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wrap also simple operations consisting only of a single step with
memory_region_transaction_begin/commit. This allows to perform
additional steps like coalesced MMIO flushing from a single place.
This requires dropping some micro-optimizations: The skipping of
topology updates after updating disabled or unregistered regions.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of flushing pending coalesced MMIO requests on every vmexit,
this provides a mechanism to selectively flush when memory regions
related to the coalesced one are accessed. This first of all includes
the coalesced region itself but can also applied to other regions, e.g.
of the same device, by calling memory_region_set_flush_coalesced.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The last argument of find_portio is "write", so this must be true here.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Under Win32, EventNotifiers will not have event_notifier_get_fd, so we
cannot call it in common code such as hw/virtio-pci.c. Pass a pointer to
the notifier, and only retrieve the file descriptor in kvm-specific code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch resolves a bug in memory listener registration.
"range_add" callback was called on each section of the both
address space (IO and memory space) even if it doesn't match
the address space filter.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The return value of cpu_register_io_memory() is no longer used anywhere, so
we can remove it and all associated data and code.
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of indirecting via io_mem_region, dispatch directly
through the MemoryRegion obtained from the iotlb or phys_page_find().
Signed-off-by: Avi Kivity <avi@redhat.com>
Commit e58ac72b6a0 ("ioport: change portio_list not to use
memory_region_set_offset()") started using aliases of I/O memory
regions. Since the IORange used for the I/O was contained in the
target region, the alias information (specifically, the offset
into the region) was lost. This broke -vga std.
Fix by allocating an independent object to hold the IORange and
also the new offset.
Note that I/O memory regions were conceptually broken wrt aliases
in a different way: an alias can cause the same region to appear
twice in an address space, but we had just one IORange to service it.
This patch fixes that problem as well, since we can now have multiple
IORange/MemoryRegion associations.
Signed-off-by: Avi Kivity <avi@redhat.com>
Current memory listeners are incremental; that is, they are expected to
maintain their own state, and receive callbacks for changes to that state.
This patch adds support for stateless listeners; these work by receiving
a ->begin() callback (which tells them that new state is coming), a
sequence of ->region_add() and ->region_nop() callbacks, and then a
->commit() callback which signifies the end of the new state. They should
ignore ->region_del() callbacks.
Signed-off-by: Avi Kivity <avi@redhat.com>
All functionality has been moved to various MemoryListeners.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
This transforms memory.c into a library which can then be unit tested
easily, by feeding it inputs and listening to its outputs.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
It can be derived from the MemoryRegion itself (which is why it is not
used there).
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
.readonly cannot be obtained from the MemoryRegion, since it is
inherited from aliases (so you can have a MemoryRegion mapped RW
at one address and RO at another). Record it in a MemoryRegionSection
for listeners.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
This allows reverse iteration, which in turns allows consistent ordering
among multiple listeners:
l1->add
l2->add
l2->del
l1->del
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
memory_region_set_offset() complicates the API, and has been deprecated
since its introduction. Now that it is no longer used, remove it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Helpful to understand guest configurations of things like the i440FX's
PAM or the state of ROM devices.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Instead of each device knowing or guessing the guest page size,
just pass the desired size of dirtied memory area.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Instead of each target knowing or guessing the guest page size,
just pass the desired size of dirtied memory area.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Introduce a memory region type that can reserve I/O space. Such regions
are useful for modeling I/O that is only handled outside of QEMU, i.e.
in the context of an accelerator like KVM.
Any access to such a region from QEMU is a bug, but could theoretically
be triggered by guest code (DMA to reserved region). So only warning
about such events once, then ignore them.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
All files under GPLv2 will get GPLv2+ changes starting tomorrow.
event_notifier.c and exec-obsolete.h were only ever touched by Red Hat
employees and can be relicensed now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Commit a621f38de8 (Direct dispatch
through MemoryRegion) moved byte swaps to a central function.
Add a missing break, so that long-sized byte swaps don't abort.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
Since commit be675c9720 (memory: move
endianness compensation to memory core) it was checking for
TARGET_BIG_ENDIAN instead of TARGET_WORDS_BIGENDIAN, thereby not
swapping correctly for Big Endian targets.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
Unlike ->readonly, ->readable is not inherited from aliase, so we can simply
query the memory region.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Now that all mmio goes through MemoryRegions, we can convert
io_mem_opaque to be a MemoryRegion pointer, and remove the thunks
that convert from old-style CPU{Read,Write}MemoryFunc to MemoryRegionOps.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Convert the fixed-address IO_MEM_RAM, IO_MEM_ROM, IO_MEM_UNASSIGNED,
and IO_MEM_NOTDIRTY io handlers to MemoryRegions. These aren't real
regions, since they are never added to the memory hierarchy, but they
allow reuse of the dispatch functionality.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The code sometimes uses range comparisons on io indexes (e.g.
index =< IO_MEM_ROM). Avoid these as they make moving to objects harder.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
backend_registered was used to lazify the process of registering an
mmio region, since the it is different for the I/O address space and
the memory address space. However, it also makes registration dependent
on the region being visible in the address space. This is not the case
for "fake" regions, like watchpoints or IO_MEM_UNASSIGNED.
Remove backend_registered and always initialize the region. If it turns
out to be part of the I/O address space, we've wasted an I/O slot, but
that's not too bad. In any case this will be optimized later on.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Currently mmio access goes directly to the io_mem_{read,write} arrays.
In preparation for eliminating them, add indirection via a function.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Instead of doing device endianness compensation in cpu_register_io_memory(),
do it in the memory core.
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The getter is no longer used, so it is completely removed.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently creating a memory region automatically registers it for
live migration. This differs from other state (which is enumerated
in a VMStateDescription structure) and ties the live migration code
into the memory core.
Decouple the two by introducing a separate API, vmstate_register_ram(),
for registering a RAM block for migration. Currently the same
implementation is reused, but later it can be moved into a separate list,
and registrations can be moved to VMStateDescription blocks.
Signed-off-by: Avi Kivity <avi@redhat.com>
This is a layering violation, but needed while the code contains
naked calls to qemu_get_ram_ptr() and the like.
Signed-off-by: Avi Kivity <avi@redhat.com>
Add an API that allows a client to observe changes in the global
memory map:
- region added (possibly with logging enabled)
- region removed (possibly with logging enabled)
- logging started on a region
- logging stopped on a region
- global logging started
- global logging removed
This API will eventually replace cpu_register_physical_memory_client().
Signed-off-by: Avi Kivity <avi@redhat.com>
Given an address space (represented by the top-level memory region),
returns the memory region that maps a given range. Useful for implementing
DMA.
The implementation is a simplistic binary search. Once we have a tree
representation this can be optimized.
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently xen_ram_alloc() relies on ram_addr, which is going away.
Give it something else to use as a cookie.
Signed-off-by: Avi Kivity <avi@redhat.com>
The mutating memory APIs can easily cause empty transactions,
where the mutators don't actually change anything, or perhaps
only modify disabled regions. Detect these conditions and
avoid regenerating the memory topology.
Signed-off-by: Avi Kivity <avi@redhat.com>
Add an API to update an alias offset of an active alias. This can be
used to simplify implementation of dynamic memory banks.
Signed-off-by: Avi Kivity <avi@redhat.com>
This allows users to disable a memory region without removing
it from the hierarchy, simplifying the implementation of
memory routers.
Signed-off-by: Avi Kivity <avi@redhat.com>
MemoryRegionOps::valid tries to declaratively specify which transactions
are accepted by the device/bus, however it is not completely generic. Add
a callback for special cases.
Signed-off-by: Avi Kivity <avi@redhat.com>
'info mtree' accesses invalid memory in two cases, both due to incorrect
(and unsafe) usage of QTAILQ_FOREACH_SAFE().
Reported-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
As we register old portio regions via ioport_register, we are also
responsible for providing the word access wrapper.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add a type and methods for manipulating a list of disjoint I/O ports,
used in some older hardware devices.
Based on original patch by Richard Henderson.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add a monitor command 'info mtree' to show the memory hierarchy
much like /proc/iomem in Linux.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The property is inheritable, but only if set to true. This is so
that memory routers can mark sections of RAM as read-only via aliases.
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of the offset property use the proper addr property to calculate
the offsets.
Additionally, be a little more verbose on the warning and print the
subregion name.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Avi Kivity <avi@redhat.com>
It is quite common to have a MemoryRegion with size of INT64_MAX.
When processing alias regions in render_memory_region() it's quite
easy to find a case where it will construct a temporary AddrRange with
a non-zero start, and size still of INT64_MAX. When means attempting
to compute the end of such a range as start + size will result in
signed integer overflow.
This integer overflow means that addrrange_intersects() can
incorrectly report regions as not intersecting when they do. For
example consider the case of address ranges {0x10000000000,
0x7fffffffffffffff} and {0x10010000000, 0x10000000} where the second
is in fact included completely in the first.
This patch rearranges addrrange_intersects() to avoid the integer
overflow, correcting this behaviour.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mask out the sub-page bits that are used by ROM device for storing the
io-index and the IO_MEM_ROMD flag.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When adding a rom_device in I/O mode, we incorrectly masked off the low
bits, resulting in a pure RAM map. Fix my masking off the high bits and
IO_MEM_ROMD, yielding a pure I/O map.
Signed-off-by: Avi Kivity <avi@redhat.com>
The legacy functions that we're wrapping expect that offset
to be included in the register. Indeed, they generally
expect the absolute address and then mask off the "high" bits.
The FDC is the first converted device with a non-zero offset.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
After 312b4234, the APIC and PCI devices are colliding with each other. This
is harmless in practice because the APIC accesses are special cased and never
make there way onto the bus.
Avi is working on a proper fix, but until that's ready, avoid printing the
warning.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The memory API supports cracking wide accesses into narrower ones
when needed; but this was no implemented for the pio address space,
causing lsi53c895a's IO BAR to malfunction.
Fix by correctly cracking wide accesses when needed.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The memory API automatically cracks large reads and writes into smaller
ones when needed. Factor out this mechanism, which is now duplicated between
memory reads and memory writes, into a function.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
ROM/device regions act as mapped RAM for reads, can I/O memory for
writes. This allow emulation of flash devices.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When trying to map an alias of a ram region, where the alias starts at
address A and we map it into address B, and A > B, we had an arithmetic
underflow. Because we use unsigned arithmetic, the underflow converted
into a large number which failed addrrange_intersects() tests.
The concrete example which triggered this was cirrus vga mapping
the framebuffer at offsets 0xc0000-0xc7fff (relative to the start of
the framebuffer) into offsets 0xa0000 (relative to system addres space
start).
With our favorite analogy of a windowing system, this is equivalent to
dragging a subwindow off the left edge of the screen, and failing to clip
it into its parent window which is on screen.
Fix by switching to signed arithmetic.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When a range is being unmapped, ask accelerators (e.g. kvm) to synchronize the
dirty bitmap to avoid losing information forever.
Fixes grub2 screen update.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Allow changes to the memory hierarchy to be accumulated and
made visible all at once. This reduces computational effort,
especially when an accelerator (e.g. kvm) is involved.
Useful when a single register update causes multiple changes
to an address space.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Instead of adding and deleting regions in one pass, do a delete
pass followed by an add pass. This fixes the following case:
from:
0x0000-0x0fff ram (a1)
0x1000-0x1fff mmio (a2)
0x2000-0x2fff ram (a3)
to:
0x0000-0x2fff ram (b1)
The single pass algorithm removed a1, added b2, then removed a2 and a3,
which caused the wrong memory map to be built. The two pass algorithm
removes a1, a2, and a3, then adds b1.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
As with the rest of the memory API, the caller associates an eventfd
with an address, and the memory API takes care of registering or
unregistering when the address is made visible or invisible to the
guest.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This eases the transition to the new API.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Allow registering I/O ports via the same mechanism as mmio ranges.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
For non-RAM memory regions, we cannot tell whether this is an I/O region
or an MMIO region. Since the qemu backing registration is different for
the two, we have to defer initialization until we know which address
space we are in.
These shenanigans will be removed once the backing registration is unified
with the memory API.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
I/O regions will not have ram_addrs, so this is a better name.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Prepare for multiple address space support by abstracting away the details
of registering a memory range with qemu's flat representation into an
AddressSpace object.
Note operations which are memory specific are not abstracted, since they will
never be called on I/O address spaces anyway.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
get_system_memory() provides the root of the memory hierarchy.
This interface is intended to be private between memory.c and exec.c.
If this file is included elsewhere, it should be regarded as a bug (or
TODO item). However, it will be temporarily needed for the conversion
to hierarchical memory routing.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Simple implementations of memory routers, for example the Cirrus VGA memory banks
or the 440FX PAM registers can generate adjacent memory regions which are contiguous.
Detect these and merge them; this saves kvm memory slots and shortens lookup times.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently dirty tracking is implemented by passing through
all calls to the underlying cpu_physical_memory_*() calls.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The memory API separates the attributes of a memory region (its size, how
reads or writes are handled, dirty logging, and coalescing) from where it
is mapped and whether it is enabled. This allows a device to configure
a memory region once, then hand it off to its parent bus to map it according
to the bus configuration.
Hierarchical registration also allows a device to compose a region out of
a number of sub-regions with different properties; for example some may be
RAM while others may be MMIO.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>