qemu-e2k

Author	SHA1	Message	Date
David Gibson	ce2918cbc3	spapr: Use CamelCase properly The qemu coding standard is to use CamelCase for type and structure names, and the pseries code follows that... sort of. There are quite a lot of places where we bend the rules in order to preserve the capitalization of internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR". That was a bad idea - it frequently leads to names ending up with hard to read clusters of capital letters, and means they don't catch the eye as type identifiers, which is kind of the point of the CamelCase convention in the first place. In short, keeping type identifiers look like CamelCase is more important than preserving standard capitalization of internal "words". So, this patch renames a heap of spapr internal type names to a more standard CamelCase. In addition to case changes, we also make some other identifier renames: VIOsPAPR* -> SpaprVio* The reverse word ordering was only ever used to mitigate the capital cluster, so revert to the natural ordering. VIOsPAPRVTYDevice -> SpaprVioVty VIOsPAPRVLANDevice -> SpaprVioVlan Brevity, since the "Device" didn't add useful information sPAPRDRConnector -> SpaprDrc sPAPRDRConnectorClass -> SpaprDrcClass Brevity, and makes it clearer this is the same thing as a "DRC" mentioned in many other places in the code This is 100% a mechanical search-and-replace patch. It will, however, conflict with essentially any and all outstanding patches touching the spapr code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:05 +11:00
Alexey Kardashevskiy	5f36666722	spapr_iommu: Do not replay mappings from just created DMA window On sPAPR vfio_listener_region_add() is called in 2 situations: 1. a new listener is registered from vfio_connect_container(); 2. a new IOMMU Memory Region is added from rtas_ibm_create_pe_dma_window(). In both cases vfio_listener_region_add() calls memory_region_iommu_replay() to notify newly registered IOMMU notifiers about existing mappings which is totally desirable for case 1. However for case 2 it is nothing but noop as the window has just been created and has no valid mappings so replaying those does not do anything. It is barely noticeable with usual guests but if the window happens to be really big, such no-op replay might take minutes and trigger RCU stall warnings in the guest. For example, a upcoming GPU RAM memory region mapped at 64TiB (right after SPAPR_PCI_LIMIT) causes a 64bit DMA window to be at least 128TiB which is (128<<40)/0x10000=2.147.483.648 TCEs to replay. This mitigates the problem by adding an "skipping_replay" flag to sPAPRTCETable and defining sPAPR own IOMMU MR replay() hook which does exactly the same thing as the generic one except it returns early if @skipping_replay==true. Another way of fixing this would be delaying replay till the very first H_PUT_TCE but this does not work if in-kernel H_PUT_TCE handler is enabled (a likely case). When "ibm,create-pe-dma-window" is complete, the guest will map only required regions of the huge DMA window. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190307050518.64968-2-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-03-12 14:33:04 +11:00
Alexey Kardashevskiy	8994e91e96	spapr-iommu: Always advertise the maximum possible DMA window size When deciding about the huge DMA window, the typical Linux pseries guest uses the maximum allowed RAM size as the upper limit. We did the same on QEMU side to match that logic. Now we are going to support a GPU RAM pass through which is not available at the guest boot time as it requires the guest driver interaction. As the result, the guest requests a smaller window than it should. Therefore the guest needs to be patched to understand this new memory and so does QEMU. Instead of reimplementing here whatever solution we choose for the guest, this advertises the biggest possible window size limited by 32 bit (as defined by LoPAPR). Since the window size has to be power-of-two (the create rtas call receives a window shift, not a size), this uses 0x8000.0000 as the maximum number of TCEs possible (rather than 32bit maximum of 0xffff.ffff). This is safe as: 1. The guest visible emulated table is allocated in KVM (actual pages are allocated in page fault handler) and QEMU (actual pages are allocated when updated); 2. The hardware table (and corresponding userspace address table) supports sparse allocation and also checks for locked_vm limit so it is unable to cause the host any damage. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:37:38 +11:00
David Hildenbrand	e017da370b	machine: rename MemoryHotplugState to DeviceMemoryState Rename it to better match the new terminology. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-9-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
David Hildenbrand	b0c14ec4ef	machine: make MemoryHotplugState accessible via the machine Let's allow to query the MemoryHotplugState directly from the machine. If the pointer is NULL, the machine does not support memory devices. If the pointer is !NULL, the machine supports memory devices and the data structure contains information about the applicable physical guest address space region. This allows us to generically detect if a certain machine has support for memory devices, and to generically manage it (find free address range, plug/unplug a memory region). We will rename "MemoryHotplugState" to something more meaningful ("DeviceMemory") after we completed factoring out the pc-dimm code into MemoryDevice code. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> [ehabkost: rebased series, solved conflicts at spapr.c] [ehabkost: squashed fix to use g_malloc0()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2018-05-07 10:00:02 -03:00
Alexey Kardashevskiy	ae4de14cd3	spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) This adds support for Dynamic DMA Windows (DDW) option defined by the SPAPR specification which allows to have additional DMA window(s) The "ddw" property is enabled by default on a PHB but for compatibility the pseries-2.6 machine and older disable it. This also creates a single DMA window for the older machines to maintain backward migration. This implements DDW for PHB with emulated and VFIO devices. The host kernel support is required. The advertised IOMMU page sizes are 4K and 64K; 16M pages are supported but not advertised by default, in order to enable them, the user has to specify "pgsz" property for PHB and enable huge pages for RAM. The existing linux guests try creating one additional huge DMA window with 64K or 16MB pages and map the entire guest RAM to. If succeeded, the guest switches to dma_direct_ops and never calls TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM and not waste time on map/unmap later. This adds a "dma64_win_addr" property which is a bus address for the 64bit window and by default set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware uses and this allows having emulated and VFIO devices on the same bus. This adds 4 RTAS handlers: * ibm,query-pe-dma-window * ibm,create-pe-dma-window * ibm,remove-pe-dma-window * ibm,reset-pe-dma-window These are registered from type_init() callback. These RTAS handlers are implemented in a separate file to avoid polluting spapr_iommu.c with PCI. This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs and updates all references to dma_liobn. However this does not add 64bit LIOBN to the migration stream as in fact even 32bit LIOBN is rather pointless there (as it is a PHB property and the management software can/should pass LIOBNs via CLI) but we keep it for the backward migration support. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00

6 Commits