a90ab95a95
This is a large patch but the normal code path is not affected. For non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled pSeries systems this does not affect the normal code path. Devices that do not perform DMA operations do not need modification with this patch. The function get_desired_dma was renamed from get_io_entitlement for clarity. Overview Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions to be run with less RAM than the aggregate needs of the group of partitions. The firmware will balance memory between the partitions and page in/out memory as needed. Based on the number and type of IO adpaters preset each partition is allocated an amount of memory for DMA operations and this allocation will be guaranteed to the partition; this is referred to as the partition's 'entitlement'. Partitions running in a CMO environment can only have virtual IO devices present. The VIO bus layer will manage the IO entitlement for the system. Accounting, at a system and per-device level, is tracked in the VIO bus code and exposed via sysfs. A set of dma_ops functions are added to the bus to allow for this accounting. Bus initialization At initialization, the bus will calculate the minimum needs of the system based on providing each device present with a standard minimum entitlement along with a spare allocation for the bus to handle hotplug events. If the minimum needs can not be met the system boot will be halted. Device changes The significant changes for devices while running under CMO are that the devices must specify how much dedicated IO entitlement they desire and must also handle DMA mapping errors that can occur due to constrained IO memory. The virtual IO drivers are modified to silence errors when DMA mappings fail for CMO and handle these failures gracefully. Each devices will be guaranteed a minimum entitlement that can always be mapped. Devices will specify how much entitlement they desire and the VIO bus will attempt to provide for this. Devices can change their desired entitlement level at any point in time to address particular needs (via vio_cmo_set_dev_desired()), not just at device probe time. VIO bus changes The system will have a particular entitlement level available from which it can provide memory to the devices. The bus defines two pools of memory within this entitlement, the reserved and excess pools. Each device is provided with it's own entitlement no less than a system defined minimum entitlement and no greater than what the device has specified as it's desired entitlement. The entitlement provided to devices comes from the reserve pool. The reserve pool can also contain a spare allocation as large as the system defined minimum entitlement which is used for device hotplug events. Any entitlement not needed to fulfill the needs of a reserve pool is placed in the excess pool. Each device is guaranteed that it can map up to it's entitled level; additional mapping are possible as long as there is unmapped memory in the excess pool. Bus probe As the system starts, each device is given an entitlement equal only to the system defined minimum entitlement. The reserve pool is equal to the sum of these entitlements, plus a spare allocation. The VIO bus also tracks the aggregate desired entitlement of all the devices. If the system desired entitlement is greater than the size of the reserve pool, when devices unmap IO memory it will be reserved and a balance operation will be scheduled for some time in the future. Entitlement balancing The balance function tries to fairly distribute entitlement between the devices in the system with the goal of providing each device with it's desired amount of entitlement. Devices using more than what would be ideal will have their entitled set-point adjusted; this will effectively set a goal for lower IO memory usage as future mappings can fail and deallocations will trigger a balance operation to distribute the newly unmapped memory. A fair distribution of entitlement can take several balance operations to achieve. Entitlement changes and device DLPAR events will alter the state of CMO and will trigger balance operations. Hotplug events The VIO bus allows for changes in system entitlement at run-time via 'vio_cmo_entitlement_update()'. When devices are added the hotplug device event will be preceded by a system entitlement increase and this is reversed when devices are removed. The following changes are made that the VIO bus layer for CMO: * add IO memory accounting per device structure. * add IO memory entitlement query function to driver structure. * during vio bus probe, if CMO is enabled, check that driver has memory entitlement query function defined. Fail if function not defined. * fail to register driver if io entitlement function not defined. * create set of dma_ops at vio level for CMO that will track allocations and return DMA failures once entitlement is reached. Entitlement will limited by overall system entitlement. Devices will have a reserved quantity of memory that is guaranteed, the rest can be used as available. * expose entitlement, current allocation, desired allocation, and the allocation error counter for devices to the user through sysfs * provide mechanism for changing a device's desired entitlement at run time for devices as an exported function and sysfs tunable * track any DMA failures for entitled IO memory for each vio device. * check entitlement against available system entitlement on device add * track entitlement metrics (high water mark, current usage) * provide function to reset high water mark * provide minimum and desired entitlement numbers at a bus level * provide drivers with a minimum guaranteed entitlement * balance available entitlement between devices to satisfy their needs * handle system entitlement changes and device hotplug Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
119 lines
3.2 KiB
C
119 lines
3.2 KiB
C
/*
|
|
* IBM PowerPC Virtual I/O Infrastructure Support.
|
|
*
|
|
* Copyright (c) 2003 IBM Corp.
|
|
* Dave Engebretsen engebret@us.ibm.com
|
|
* Santiago Leon santil@us.ibm.com
|
|
*
|
|
* This program is free software; you can redistribute it and/or
|
|
* modify it under the terms of the GNU General Public License
|
|
* as published by the Free Software Foundation; either version
|
|
* 2 of the License, or (at your option) any later version.
|
|
*/
|
|
|
|
#ifndef _ASM_POWERPC_VIO_H
|
|
#define _ASM_POWERPC_VIO_H
|
|
#ifdef __KERNEL__
|
|
|
|
#include <linux/init.h>
|
|
#include <linux/errno.h>
|
|
#include <linux/device.h>
|
|
#include <linux/dma-mapping.h>
|
|
#include <linux/mod_devicetable.h>
|
|
|
|
#include <asm/hvcall.h>
|
|
#include <asm/scatterlist.h>
|
|
|
|
/*
|
|
* Architecture-specific constants for drivers to
|
|
* extract attributes of the device using vio_get_attribute()
|
|
*/
|
|
#define VETH_MAC_ADDR "local-mac-address"
|
|
#define VETH_MCAST_FILTER_SIZE "ibm,mac-address-filters"
|
|
|
|
/* End architecture-specific constants */
|
|
|
|
#define h_vio_signal(ua, mode) \
|
|
plpar_hcall_norets(H_VIO_SIGNAL, ua, mode)
|
|
|
|
#define VIO_IRQ_DISABLE 0UL
|
|
#define VIO_IRQ_ENABLE 1UL
|
|
|
|
/*
|
|
* VIO CMO minimum entitlement for all devices and spare entitlement
|
|
*/
|
|
#define VIO_CMO_MIN_ENT 1562624
|
|
|
|
struct iommu_table;
|
|
|
|
/**
|
|
* vio_dev - This structure is used to describe virtual I/O devices.
|
|
*
|
|
* @desired: set from return of driver's get_desired_dma() function
|
|
* @entitled: bytes of IO data that has been reserved for this device.
|
|
* @allocated: bytes of IO data currently in use by the device.
|
|
* @allocs_failed: number of DMA failures due to insufficient entitlement.
|
|
*/
|
|
struct vio_dev {
|
|
const char *name;
|
|
const char *type;
|
|
uint32_t unit_address;
|
|
unsigned int irq;
|
|
struct {
|
|
size_t desired;
|
|
size_t entitled;
|
|
size_t allocated;
|
|
atomic_t allocs_failed;
|
|
} cmo;
|
|
struct device dev;
|
|
};
|
|
|
|
struct vio_driver {
|
|
const struct vio_device_id *id_table;
|
|
int (*probe)(struct vio_dev *dev, const struct vio_device_id *id);
|
|
int (*remove)(struct vio_dev *dev);
|
|
/* A driver must have a get_desired_dma() function to
|
|
* be loaded in a CMO environment if it uses DMA.
|
|
*/
|
|
unsigned long (*get_desired_dma)(struct vio_dev *dev);
|
|
struct device_driver driver;
|
|
};
|
|
|
|
extern int vio_register_driver(struct vio_driver *drv);
|
|
extern void vio_unregister_driver(struct vio_driver *drv);
|
|
|
|
extern int vio_cmo_entitlement_update(size_t);
|
|
extern void vio_cmo_set_dev_desired(struct vio_dev *viodev, size_t desired);
|
|
|
|
extern void __devinit vio_unregister_device(struct vio_dev *dev);
|
|
|
|
struct device_node;
|
|
|
|
extern struct vio_dev *vio_register_device_node(
|
|
struct device_node *node_vdev);
|
|
extern const void *vio_get_attribute(struct vio_dev *vdev, char *which,
|
|
int *length);
|
|
#ifdef CONFIG_PPC_PSERIES
|
|
extern struct vio_dev *vio_find_node(struct device_node *vnode);
|
|
extern int vio_enable_interrupts(struct vio_dev *dev);
|
|
extern int vio_disable_interrupts(struct vio_dev *dev);
|
|
#else
|
|
static inline int vio_enable_interrupts(struct vio_dev *dev)
|
|
{
|
|
return 0;
|
|
}
|
|
#endif
|
|
|
|
static inline struct vio_driver *to_vio_driver(struct device_driver *drv)
|
|
{
|
|
return container_of(drv, struct vio_driver, driver);
|
|
}
|
|
|
|
static inline struct vio_dev *to_vio_dev(struct device *dev)
|
|
{
|
|
return container_of(dev, struct vio_dev, dev);
|
|
}
|
|
|
|
#endif /* __KERNEL__ */
|
|
#endif /* _ASM_POWERPC_VIO_H */
|