Commit Graph

130 Commits

Author SHA1 Message Date
Frederic Weisbecker abf917cd91 cputime: Generic on-demand virtual cputime accounting
If we want to stop the tick further idle, we need to be
able to account the cputime without using the tick.

Virtual based cputime accounting solves that problem by
hooking into kernel/user boundaries.

However implementing CONFIG_VIRT_CPU_ACCOUNTING require
low level hooks and involves more overhead. But we already
have a generic context tracking subsystem that is required
for RCU needs by archs which plan to shut down the tick
outside idle.

This patch implements a generic virtual based cputime
accounting that relies on these generic kernel/user hooks.

There are some upsides of doing this:

- This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
if context tracking is already built (already necessary for RCU in full
tickless mode).

- We can rely on the generic context tracking subsystem to dynamically
(de)activate the hooks, so that we can switch anytime between virtual
and tick based accounting. This way we don't have the overhead
of the virtual accounting when the tick is running periodically.

And one downside:

- There is probably more overhead than a native virtual based cputime
accounting. But this relies on hooks that are already set anyway.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-27 19:23:27 +01:00
Ian Munsie cedddd812a powerpc: Disable relocation on exceptions when kexecing
Since we don't know if they new kernel we are kexecing into has been
built to support relocation on exceptions, we disable them before we
kexec.

We do NOT disable them if we are execing a kdump kernel, because we
want to change as little state as possible and it is likely that we are
execing ourselves and will be able to handle them anyway.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:08 +11:00
Ian Munsie fc8effa4e4 powerpc: Enable relocation on during exceptions at boot
We currently do this synchronously at boot from setup_arch. On a large
system this could hypothetically take a little while to complete, so
currently we will give up if we are asked to wait for more than a second
in total.

If we actually start hitting that timeout in practice we can always move
this code into a kernel thread to take care of it in the background.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 15:08:08 +11:00
Nathan Fontenot 1cf3d8b3d2 powerpc+of: Add of node/property notification chain for adds and removes
This patch moves the notification chain for updates to the device tree
from the powerpc/pseries code to the base OF code. This makes this
functionality available to all architectures.

Additionally the notification chain is updated to allow notifications
for property add/remove/update. To make this work a pointer to a new
struct (of_prop_reconfig) is passed to the routines in the notification chain.
The of_prop_reconfig property contains a pointer to the node containing the
property and a pointer to the property itself. In the case of property
updates, the property pointer refers to the new property.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-15 12:56:41 +11:00
Michael Neuling cd14457304 powerpc: Dynamically calculate the dabrx based on kernel/user/hypervisor
Currently we mark the DABRX to interrupt on all matches
(hypervisor/kernel/user and then filter in software.  We can be a lot
smarter now that we can set the DABRX dynamically.

This sets the DABRX based on the flags passed by the user.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-10 09:59:13 +10:00
Michael Neuling 4474ef055c powerpc: Rework set_dabr so it can take a DABRX value as well
Rework set_dabr to take a DABRX value as well.

Both the pseries and PS3 hypervisors do some checks on the DABRX
values that are passed in the hcall.  This patch stops bogus values
from being passed to hypervisor.  Also, in the case where we are
clearing the breakpoint, where DABR and DABRX are zero, we modify the
DABRX value to make it valid so that the hcall won't fail.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-10 09:59:10 +10:00
Gavin Shan 35e5cfe27e powerpc/eeh: Move EEH initialization around
Currently, we have 3 phases for EEH initialization on pSeries platform.
All of them are done through builtin functions: platform initialization,
EEH device creation, and EEH subsystem enablement. All of them are done
no later than ppc_md.setup_arch. That means that the slab/slub isn't ready
yet, so we have to allocate memory chunks on basis of PAGE_SIZE for those
dynamically created EEH devices. That's pretty expensive.

In order to utilize slab/slub for memory allocation, we have to move the EEH
initialization functions around, but all of them should be called after slab
is ready.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-10 09:35:27 +10:00
Michael Neuling 06c8876668 powerpc: Use the XDABR hcall
We never use the XDABR hcall since we check for DABR hcall first.
XDABR syscall is better since it allows us to also set the DABRX.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-07 11:44:42 +10:00
Linus Torvalds 475c77edf8 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci
Pull PCI changes (including maintainer change) from Jesse Barnes:
 "This pull has some good cleanups from Bjorn and Yinghai, as well as
  some more code from Yinghai to better handle resource re-allocation
  when enabled.

  There's also a new initcall_debug feature from Arjan which will print
  out quirk timing information to help identify slow quirks for fixing
  or refinement (Yinghai sent in a few patches to do just that once the
  new debug code landed).

  Beyond that, I'm handing off PCI maintainership to Bjorn Helgaas.
  He's been a core PCI and Linux contributor for some time now, and has
  kindly volunteered to take over.  I just don't feel I have the time
  for PCI review and work that it deserves lately (I've taken on some
  other projects), and haven't been as responsive lately as I'd like, so
  I approached Bjorn asking if he'd like to manage things.  He's going
  to give it a try, and I'm confident he'll do at least as well as I
  have in keeping the tree managed, patches flowing, and keeping things
  stable."

Fix up some fairly trivial conflicts due to other cleanups (mips device
resource fixup cleanups clashing with list handling cleanup, ppc iseries
removal clashing with pci_probe_only cleanup etc)

* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci: (112 commits)
  PCI: Bjorn gets PCI hotplug too
  PCI: hand PCI maintenance over to Bjorn Helgaas
  unicore32/PCI: move <asm-generic/pci-bridge.h> include to asm/pci.h
  sparc/PCI: convert devtree and arch-probed bus addresses to resource
  powerpc/PCI: allow reallocation on PA Semi
  powerpc/PCI: convert devtree bus addresses to resource
  powerpc/PCI: compute I/O space bus-to-resource offset consistently
  arm/PCI: don't export pci_flags
  PCI: fix bridge I/O window bus-to-resource conversion
  x86/PCI: add spinlock held check to 'pcibios_fwaddrmap_lookup()'
  PCI / PCIe: Introduce command line option to disable ARI
  PCI: make acpihp use __pci_remove_bus_device instead
  PCI: export __pci_remove_bus_device
  PCI: Rename pci_remove_behind_bridge to pci_stop_and_remove_behind_bridge
  PCI: Rename pci_remove_bus_device to pci_stop_and_remove_bus_device
  PCI: print out PCI device info along with duration
  PCI: Move "pci reassigndev resource alignment" out of quirks.c
  PCI: Use class for quirk for usb host controller fixup
  PCI: Use class for quirk for ti816x class fixup
  PCI: Use class for quirk for intel e100 interrupt fixup
  ...
2012-03-23 14:02:12 -07:00
Gavin Shan eb740b5f3e powerpc/eeh: Introduce EEH device
Original EEH implementation depends on struct pci_dn heavily. However,
EEH shouldn't depend on that actually because EEH needn't share much
information with other PCI components. That's to say, EEH should have
worked independently.

The patch introduces struct eeh_dev so that EEH core components needn't
be working based on struct pci_dn in future. Also, struct pci_dn, struct
eeh_dev instances are created in dynamic fasion and the binding with EEH
device, OF node, PCI device is implemented as well.

The EEH devices are created after PHBs are detected and initialized, but
PCI emunation hasn't started yet. Apart from that, PHB might be created
dynamically through DLPAR component and the EEH devices should be creatd
as well. Another case might be OF node is created dynamically by DR
(Dynamic Reconfiguration), which has been defined by PAPR. For those OF
nodes created by DR, EEH devices should be also created accordingly. The
binding between EEH device and OF node is done while the EEH device is
initially created.

The binding between EEH device and PCI device should be done after PCI
emunation is done. Besides, PCI hotplug also needs the binding so that
the EEH devices could be traced from the newly coming PCI buses or PCI
devices.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:39:29 +11:00
Gavin Shan aa1e6374ae powerpc/eeh: Platform dependent EEH operations
EEH has been implemented on RTAS-compliant pSeries platform.
That's to say, the EEH operations will be implemented through RTAS
calls eventually. The situation limited feasible extension on EEH.
In order to support EEH on multiple platforms like pseries and powernv
simutaneously. We have to split the platform dependent EEH options
up out of current implementation.

The patch addresses supporting EEH on multiple platforms. The pseries
platform dependent EEH operations will be abstracted by struct eeh_ops.
EEH core components will be built based on the registered EEH operations.
With the mechanism, what the individual platform needs to do is implement
platform dependent EEH operations.

For now, the pseries platform is covered under the mechanism. That means
we have to think about other platforms to support EEH, like powernv.
Besides, we only have framework for the mechanism and we have to implement
it for pseries platform later.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09 11:08:54 +11:00
Bjorn Helgaas 673c975624 powerpc/PCI: replace pci_probe_only with pci_flags
We already use pci_flags, so this just sets pci_flags directly and removes
the intermediate step of figuring out pci_probe_only, then using it to set
pci_flags.

The PCI core provides a pci_flags definition (currently __weak), so drop
the powerpc definitions in favor of that.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:18:58 -07:00
Bjorn Helgaas 3c13be017a powerpc/PCI: make pci_probe_only default to 0
pci_probe_only is set on ppc64 to prevent resource re-allocation
by the core. It's meant to be used in very specific circumstances
such as when operating under a hypervisor that may prevent such
re-allocation.

Instead of default to 1, we make it default to 0 and explicitly
set it in the few cases where we need it.

This fixes FSL PCI which wants it clear among others.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:18:58 -07:00
Kyle Moffett e55d7f737d powerpc/mpic: Remove duplicate MPIC_WANTS_RESET flag
There are two separate flags controlling whether or not the MPIC is
reset during initialization, which is completely unnecessary, and only
one of them can be specified in the device tree.

Also, most platforms in-tree right now do actually want to reset the
MPIC during initialization anyways, which means lots of duplicate code
passing the MPIC_WANTS_RESET flag.

Fix all of the callers which currently do not pass the MPIC_WANTS_RESET
flag to pass the MPIC_NO_RESET flag, then remove the MPIC_WANTS_RESET
flag and make the code reset the MPIC by default.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:00 +11:00
Kyle Moffett 5019609fce powerpc/mpic: Remove MPIC_BROKEN_FRR_NIRQS and duplicate irq_count
The mpic->irq_count variable is only used as a software error-checking
limit to determine whether or not an IRQ number is valid.  In board code
which does not manually specify an IRQ count to mpic_alloc(), i.e. 0, it
is automatically detected from the number of ISUs and the ISU size.

In practice, all hardware ends up with irq_count == num_sources, so all
of the runtime checks on mpic->irq_count should just check the value of
mpic->num_sources instead.

When platform hardware does not correctly report the number of IRQs,
which only happens on the MPC85xx/MPC86xx, the MPIC_BROKEN_FRR_NIRQS
flag is used to override the detected value of num_sources with the
manual irq_count parameter.  Since there's no need to manually specify
the number of IRQs except in this case, the extra flag can be eliminated
and the test changed to "irq_count != 0".

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:59 +11:00
Benjamin Herrenschmidt 43ca5d347a Merge branch 'kexec' into next 2011-12-16 11:09:21 +11:00
Anton Blanchard a934904d8a powerpc: Reduce pseries panic timeout from 180s to 10s
We've had a 180 second panic timeout on ppc64 for as long as I
can remember. This patch reduces it to 10 seconds on pseries for a few
reasons:

- Almost all pseries machines have a hypervisor console so panic
  output will be available in a scrollback buffer.

- The 180 seconds impacts our availability, users (other than
  kernel hackers) just want the box to come back around so it
  can continue its work.

- I spend a lot of my life staring at the 180 second panic timeout.
  Many pseries machines take minutes to power cycle, so it's quicker
  to sit through the 180 seconds than it is to power cycle.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 14:02:23 +11:00
Deepthi Dharwar e179816ce6 powerpc/cpuidle: Enable cpuidle and directly call cpuidle_idle_call() for pSeries
This patch enables cpuidle for pSeries and pSeries_idle is
directly called from the idle loop. As a result of pSeries_idle, cpuidle
driver registered with cpuidle subsystem comes into action. On
failure of loading of the driver or cpuidle framework default idle
is executed as part of the function. This patch
also removes the routines pseries_shared_idle_sleep and
pseries_dedicated_idle_sleep as they are now implemented as part of
pseries_idle cpuidle driver.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 13:57:20 +11:00
Deepthi Dharwar 707827f338 powerpc/cpuidle: cpuidle driver for pSeries
This patch implements a back-end cpuidle driver for pSeries
based on pseries_dedicated_idle_loop and pseries_shared_idle_loop
routines.  The driver is built only if CONFIG_CPU_IDLE is set. This
cpuidle driver uses global registration of idle states and
not per-cpu.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-08 13:56:31 +11:00
Kyle Moffett be8bec56df powerpc/mpic: Invert the meaning of MPIC_PRIMARY
It turns out that there are only 2 in-tree platforms which use MPICs
which are not "primary":  IBM Cell and PowerMac.  To reduce the
complexity of the typical board setup code, invert the MPIC_PRIMARY bit
into MPIC_SECONDARY.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-07 13:43:08 +11:00
Paul Gortmaker 4b16f8e2d6 powerpc: various straight conversions from module.h --> export.h
All these files were including module.h just for the basic
EXPORT_SYMBOL infrastructure.  We can shift them off to the
export.h header which is a way smaller footprint and thus
realize some compile time gains.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:44 -04:00
Anton Blanchard 711ef84e80 powerpc/pseries: Cleanup VPA registration and deregistration errors
Make the VPA, SLB shadow and DTL registration and deregistration
functions print consistent messages on error. I needed the firmware
error code while chasing a kexec bug but we weren't printing it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05 14:47:57 +10:00
Benjamin Herrenschmidt 4d2bb3f500 powerpc/pseries: Re-implement HVSI as part of hvc_vio
On pseries machines, consoles are provided by the hypervisor using
a low level get_chars/put_chars type interface. However, this is
really just a transport to the service processor which implements
them either as "raw" console (networked consoles, HMC, ...) or as
"hvsi" serial ports.

The later is a simple packet protocol on top of the raw character
interface that is supposed to convey additional "serial port" style
semantics. In practice however, all it does is provide a way to
read the CD line and set/clear our DTR line, that's it.

We currently implement the "raw" protocol as an hvc console backend
(/dev/hvcN) and the "hvsi" protocol using a separate tty driver
(/dev/hvsi0).

However this is quite impractical. The arbitrary difference between
the two type of devices has been a major source of user (and distro)
confusion. Additionally, there's an additional mini -hvsi implementation
in the pseries platform code for our low level debug console and early
boot kernel messages, which means code duplication, though that low
level variant is impractical as it's incapable of doing the initial
protocol negociation to establish the link to the FSP.

This essentially replaces the dedicated hvsi driver and the platform
udbg code completely by extending the existing hvc_vio backend used
in "raw" mode so that:

 - It now supports HVSI as well
 - We add support for hvc backend providing tiocm{get,set}
 - It also provides a udbg interface for early debug and boot console

This is overall less code, though this will only be obvious once we
remove the old "hvsi" driver, which is still available for now. When
the old driver is enabled, the new code still kicks in for the low
level udbg console, replacing the old mini implementation in the platform
code, it just doesn't provide the higher level "hvc" interface.

In addition to producing generally simler code, this has several benefits
over our current situation:

 - The user/distro only has to deal with /dev/hvcN for the hypervisor
console, avoiding all sort of confusion that has plagued us in the past

 - The tty, kernel and low level debug console all use the same code
base which supports the full protocol establishment process, thus the
console is now available much earlier than it used to be with the
old HVSI driver. The kernel console works much earlier and udbg is
available much earlier too. Hackers can enable a hard coded very-early
debug console as well that works with HVSI (previously that was only
supported for the "raw" mode).

I've tried to keep the same semantics as hvsi relative to how I react
to things like CD changes, with some subtle differences though:

 - I clear DTR on close if HUPCL is set

 - Current hvsi triggers a hangup if it detects a up->down transition
   on CD (you can still open a console with CD down). My new implementation
   triggers a hangup if the link to the FSP is severed, and severs it upon
   detecting a up->down transition on CD.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-29 17:48:35 +10:00
Nishanth Aravamudan af442a1baa powerpc: Ensure dtl buffers do not cross 4k boundary
Future releases of fimrware will enforce a requirement that DTL buffers
do not cross a 4k boundary. Commit
127493d5dc satisfies this requirement for
CONFIG_VIRT_CPU_ACCOUNTING=y kernels, but if !CONFIG_VIRT_CPU_ACCOUNTING
&& CONFIG_DTL=y, the current code will fail at dtl registration time.
Fix this by making the kmem cache from
127493d5dc visible outside of setup.c and
using the same cache in both dtl.c and setup.c. This requires a bit of
reorganization to ensure ordering of the kmem cache and buffer
allocations.

Note: Since firmware now limits the size of the buffer, I made
dtl_buf_entries read-only in debugfs.

Tested with upcoming firmware with the 4 combinations of
CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_DTL.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-19 14:30:41 +10:00
Brian King 9ee820fa00 powerpc/pseries: Add page coalescing support
Adds support for page coalescing, which is a feature on IBM Power servers
which allows for coalescing identical pages between logical partitions.
Hint text pages as coalesce candidates, since they are the most likely
pages to be able to be coalesced between partitions. This patch also
exports some page coalescing statistics available from firmware via
lparcfg.

[BenH: Moved a couple of things around to fix compile problems]

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04 16:02:21 +10:00
Benjamin Herrenschmidt 0b05ac6e24 powerpc/xics: Rewrite XICS driver
This is a significant rework of the XICS driver, too significant to
conveniently break it up into a series of smaller patches to be honest.

The driver is moved to a more generic location to allow new platforms
to use it, and is broken up into separate ICP and ICS "backends". For
now we have the native and "hypervisor" ICP backends and one common
RTAS ICS backend.

The driver supports one ICP backend instanciation, and many ICS ones,
in order to accomodate future platforms with multiple possibly different
interrupt "sources" mechanisms.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-20 11:02:35 +10:00
Nishanth Aravamudan 127493d5dc powerpc/pseries: Use a kmem cache for DTL buffers
PAPR specifies that DTL buffers can not cross AMS environments (aka CMO
in the PAPR) and can not cross a memory entitlement granule boundary
(4k). This is found in section 14.11.3.2 H_REGISTER_VPA of the PAPR.
kmalloc does not guarantee an alignment of the allocation, though,
beyond 8 bytes (at least in my understanding). Create a special kmem
cache for DTL buffers with the alignment requirement.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-18 13:08:08 +10:00
Benjamin Herrenschmidt f86d6b9b36 powerpc/pseries: Don't register global initcall
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-05 16:22:11 +10:00
Thomas Gleixner ec775d0e70 powerpc: Convert to new irq_* function names
Scripted with coccinelle.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29 14:48:12 +02:00
Lennert Buytenhek 79f26c268e powerpc: platforms/pseries irq_data conversion.
Signed-off-by: Lennert Buytenhek <buytenh@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-10 11:04:01 +11:00
Paul Mackerras cf9efce0ce powerpc: Account time using timebase rather than PURR
Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
PURR register for measuring the user and system time used by
processes, as well as other related times such as hardirq and
softirq times.  This turns out to be quite confusing for users
because it means that a program will often be measured as taking
less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
than it does when run on a single-threaded processor (ST mode), even
though the program takes longer to finish.  The discrepancy is
accounted for as stolen time, which is also confusing, particularly
when there are no other partitions running.

This changes the accounting to use the timebase instead, meaning that
the reported user and system times are the actual number of real-time
seconds that the program was executing on the processor thread,
regardless of which SMT mode the processor is in.  Thus a program will
generally show greater user and system times when run on a
multi-threaded processor than on a single-threaded processor.

On pSeries systems on POWER5 or later processors, we measure the
stolen time (time when this partition wasn't running) using the
hypervisor dispatch trace log.  We check for new entries in the
log on every entry from user mode and on every transition from
kernel process context to soft or hard IRQ context (i.e. when
account_system_vtime() gets called).  So that we can correctly
distinguish time stolen from user time and time stolen from system
time, without having to check the log on every exit to user mode,
we store separate timestamps for exit to user mode and entry from
user mode.

On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
in account_system_vtime() (as before), and then apportion the SPURR
ticks since the last time we read it between scaled user time and
scaled system time according to the relative proportions of user
time and system time over the same interval.  This avoids having to
read the SPURR on every kernel entry and exit.  On systems that have
PURR but not SPURR (i.e., POWER5), we do the same using the PURR
rather than the SPURR.

This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
for now since it conflicts with the use of the dispatch trace log
by the time accounting code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02 14:07:31 +10:00
Anton Blanchard b878dc0059 powerpc: Use smt_snooze_delay=-1 to always busy loop
Right now if we want to busy loop and not give up any time to the hypervisor
we put a very large value into smt_snooze_delay. This is sometimes useful
when running a single partition and you want to avoid any latencies due
to the hypervisor or CPU power state transitions. While this works, it's a bit
ugly - how big a number is enough now we have NO_HZ and can be idle for a very
long time.

The patch below makes smt_snooze_delay signed, and a negative value means loop
forever:

echo -1 > /sys/devices/system/cpu/cpu0/smt_snooze_delay

This change shouldn't affect the existing userspace tools (eg ppc64_cpu), but
I'm cc-ing Nathan just to be sure.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21 17:31:12 +10:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Paul Mackerras a6dbf93a2a powerpc: Fix bug where perf_counters breaks oprofile
Currently there is a bug where if you use oprofile on a pSeries
machine, then use perf_counters, then use oprofile again, oprofile
will not work correctly; it will lose the PMU configuration the next
time the hypervisor does a partition context switch, and thereafter
won't count anything.

Maynard Johnson identified the sequence causing the problem:
- oprofile setup calls ppc_enable_pmcs(), which calls
  pseries_lpar_enable_pmcs, which tells the hypervisor that we want
  to use the PMU, and sets the "PMU in use" flag in the lppaca.
  This flag tells the hypervisor whether it needs to save and restore
  the PMU config.
- The perf_counter code sets and clears the "PMU in use" flag directly
  as it context-switches the PMU between tasks, and leaves it clear
  when it finishes.
- oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
  which does nothing because it has already been called.  In particular
  it doesn't set the "PMU in use" flag.

This fixes the problem by arranging for ppc_enable_pmcs to always set
the "PMU in use" flag.  It makes the perf_counter code call
ppc_enable_pmcs also rather than calling the lower-level function
directly, and removes the setting of the "PMU in use" flag from
pseries_lpar_enable_pmcs, since that is now done in its caller.

This also removes the declaration of pasemi_enable_pmcs because it
isn't defined anywhere.

Reported-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-11 11:27:58 +10:00
Kumar Gala 2eb4afb69f powerpc/pci: Move pseries code into pseries platform specific area
There doesn't appear to be any specific reason that we need to setup the
pseries specific notifier in generic arch pci code.  Move it into pseries
land.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21 15:44:24 +10:00
Adrian Bunk 9d5a9e7465 Remove asm/a.out.h files for all architectures without a.out support.
This patch also includes the required removal of (unused) inclusion of
<asm/a.out.h> <linux/a.out.h>'s in the arch/ code for these
architectures.

[dwmw2: updated for 2.6.27-rc]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-09-06 19:30:24 +01:00
Andrew Morton d617a40227 powerpc: Export CMO_PageSize
This fixes an error building powerpc allmodconfig:

ERROR: "CMO_PageSize" [arch/powerpc/platforms/pseries/cmm.ko] undefined!

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-26 10:24:47 +10:00
Robert Jennings 81f14997e8 powerpc: Make CMO paging space pool ID and page size available
During platform setup, save off the primary/secondary paging space
pool IDs and the page size.  Added accessors in hvcall.h for these
variables.  This is needed for a subsequent fix.

Submitted-by: Robert Jennings <rcj@linux.vnet.ibm.com>

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-18 14:22:34 +10:00
Robert Jennings e46de429cb powerpc/pseries: Enable CMO feature during platform setup
For Cooperative Memory Overcommitment (CMO), set the FW_FEATURE_CMO
flag in powerpc_firmware_features from the rtas ibm,get-system-parameters
table prior to calling iommu_init_early_pSeries.

With this, any CMO specific functionality can be controlled by checking:
 firmware_has_feature(FW_FEATURE_CMO)

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:42 +10:00
Michael Ellerman 541b2755c2 [POWERPC] Fix sparse warnings in arch/powerpc/platforms/pseries
Don't return void in pseries/iommu.c
Make mce_data_buf static in pseries/ras.c
Make things static in pseries/rtasd.c
Make things static in pseries/setup.c
vtermno may as well be static in platforms/pseries/lpar.c

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-14 22:32:02 +10:00
Michael Ellerman 36f8a2c4c6 [POWERPC] Add CONFIG_PPC_PSERIES_DEBUG to enable debugging for platforms/pseries
Add a DEBUG config setting which turns on all (most) of the debugging
under platforms/pseries.

To have this take effect we need to remove all the #undef DEBUG's, in
various files. We leave the #undef DEBUG in platforms/pseries/lpar.c,
as this enables debugging printks from the low-level hash table routines,
and tends to make your system unusable. If you want those enabled you
still have to turn them on by hand.

Also some of the RAS code has a DEBUG block which causes a functional
change, so I've keyed this off a different (non-existant) debug #define.

This is only enabled if you have PPC_EARLY_DEBUG enabled also.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 21:08:12 +10:00
Michael Ellerman f7ebf352b2 [POWERPC] Convert from DBG() to pr_debug() in platforms/pseries/
In pseries/lpar.c, fix some printf specifier mismatches, and add
a newline to one printk.

In pseries/rtasd.c add "rtasd" to some messages to make it clear
where they're coming from.

In pseries/scanlog.c remove the hand-rolled runtime debugging support
in there. This file has been largely unchanged for eons, if we need to
debug it in future we can recompile.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24 21:08:12 +10:00
Michael Ellerman f01567d6d5 [POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
pseries_mpic_init_IRQ() implements the same logic as the xics code did to
find the i8259 cascade irq.  Now that we've pulled that logic out into
pseries_setup_i8259_cascade() we can use it in the mpic code.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-18 15:36:10 +10:00
Michael Ellerman 30d6ad251b [POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
Remove the xics references from xics_setup_8259_cascade(), and merge the
good bits from the almost identical logic in pseries_mpic_init_IRQ().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-18 15:36:10 +10:00
Michael Ellerman 032ace7e17 [POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
The code in xics.c to setup the i8259 cascaded irq handler is not really
xics specific, so move it into setup.c - we will clean this up further in
a subsequent patch.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-18 15:36:10 +10:00
Michael Ellerman 21cf91338f [POWERPC] Move prototype for find_udbg_vterm() into a header file
Move the prototype for find_udbg_vterm() into pseries.h, removing
it from setup.c.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-17 10:00:59 +10:00
Tony Breeds 96366a8d3f [POWERPC] Update wait_state_cycles in the VPA
The hypervisor can look at the value in the wait_state_cycles field of
the VPA for an estimate of how busy dedicated processors are.
Currently, as the kernel never touches this field, we appear to be
100% busy.  This records the duration the kernel is in powersave and
passes that to the HV to provide a reasonable indication of
utilisation.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-03-26 08:44:05 +11:00
Paul Mackerras 8f51506164 Revert "[POWERPC] Fix RTAS os-term usage on kernel panic"
This reverts commit a2b51812a4.

It turns out that this change caused some machines to fail to come
back up when being rebooted, and generated an error in the hypervisor
error log on some machines.  The platform architecture (PAPR) is a
little unclear on exactly when the RTAS ibm,os-term function should be
called.  Until that is clarified I'm reverting this commit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-03 09:39:45 +11:00
Linas Vepstas a2b51812a4 [POWERPC] Fix RTAS os-term usage on kernel panic
The rtas_os_term() routine was being called at the wrong time.
The actual rtas call "os-term" will not ever return, and so
calling it from the panic notifier is too early.  Instead,
call it from the machine_reset() call.

This splits the rtas_os_term() routine into two: one part to capture
the kernel panic message, invoked during the panic notifier, and
another part that is invoked during machine_reset().

Prior to this patch, the os-term call was never being made,
because panic_timeout was always non-zero.  Calling os-term
helps keep the hypervisor happy!  We have to keep the hypervisor
happy to avoid service, dump and error reporting problems.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-20 16:10:09 +11:00
Grant Likely 745e102775 [POWERPC] Platforms shouldn't mess with ROOT_DEV
There is no good reason for board platform code to mess with the
ROOT_DEV.  Remove it from all in-tree platforms except powermac.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-11 20:40:43 +10:00