Commit Graph

85 Commits

Author SHA1 Message Date
Michael Ellerman f1079d3a3d Merge branch 'fixes' into next
We ended up with an ugly conflict between fixes and next in ftrace.h
involving multiple nested ifdefs, and the automatic resolution is
wrong. So merge fixes into next so we can fix it up.
2018-06-03 20:32:02 +10:00
Michael Ellerman c4ec1f0353 powerpc/64: Remove unused paca->soft_enabled
In commit 4e26bc4a4e ("powerpc/64: Rename soft_enabled to
irq_soft_mask") we renamed paca->soft_enabled. But then in commit
8e0b634b13 ("powerpc/64s: Do not allocate lppaca if we are not
virtualized") we added it back. Oops. This happened because the two
patches were in flight at the same time and rebased vs each other
multiple times, and we missed it in review.

Fixes: 8e0b634b13 ("powerpc/64s: Do not allocate lppaca if we are not virtualized")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07 21:22:26 +10:00
Naveen N. Rao ea678ac627 powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths
We have some C code that we call into from real mode where we cannot
take any exceptions. Though the C functions themselves are mostly safe,
if these functions are traced, there is a possibility that we may take
an exception. For instance, in certain conditions, the ftrace code uses
WARN(), which uses a 'trap' to do its job.

For such scenarios, introduce a new field in paca 'ftrace_enabled',
which is checked on ftrace entry before continuing. This field can then
be set to zero to disable/pause ftrace, and set to a non-zero value to
resume ftrace.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-03 22:32:25 +10:00
Michael Ellerman f437c51748 Merge branch 'topic/paca' into next
Bring in yet another series that touches KVM code, and might need to
be merged into the kvm-ppc branch to resolve conflicts.

This required some changes in pnv_power9_force_smt4_catch/release()
due to the paca array becomming an array of pointers.
2018-03-31 09:09:36 +11:00
Nicholas Piggin 59f577743d powerpc/64: Defer paca allocation until memory topology is discovered
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Rename the dummy allocate_pacas() to fix 32-bit build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:28 +11:00
Nicholas Piggin d2e60075a3 powerpc/64: Use array of paca pointers and allocate pacas individually
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.

This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.

This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:23 +11:00
Nicholas Piggin 8e0b634b13 powerpc/64s: Do not allocate lppaca if we are not virtualized
The "lppaca" is a structure registered with the hypervisor. This is
unnecessary when running on non-virtualised platforms. One field from
the lppaca (pmcregs_in_use) is also used by the host, so move the host
part out into the paca (lppaca field is still updated in
guest mode).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix non-pseries build with some #ifdefs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30 23:34:22 +11:00
Michael Ellerman a26cf1c9fe Merge branch 'topic/ppc-kvm' into next
This brings in two series from Paul, one of which touches KVM code and
may need to be merged into the kvm-ppc tree to resolve conflicts.
2018-03-24 08:43:18 +11:00
Paul Mackerras 7672691a08 powerpc/powernv: Provide a way to force a core into SMT4 mode
POWER9 processors up to and including "Nimbus" v2.2 have hardware
bugs relating to transactional memory and thread reconfiguration.
One of these bugs has a workaround which is to get the core into
SMT4 state temporarily.  This workaround is only needed when
running bare-metal.

This patch provides a function which gets the core into SMT4 mode
by preventing threads from going to a stop state, and waking up
those which are already in a stop state.  Once at least 3 threads
are not in a stop state, the core will be in SMT4 and we can
continue.

To do this, we add a "dont_stop" flag to the paca to tell the
thread not to go into a stop state.  If this flag is set,
power9_idle_stop() just returns immediately with a return value
of 0.  The pnv_power9_force_smt4_catch() function does the following:

1. Set the dont_stop flag for each thread in the core, except
   ourselves (in fact we use an atomic_inc() in case more than
   one thread is calling this function concurrently).
2. See how many threads are awake, indicated by their
   requested_psscr field in the paca being 0.  If this is at
   least 3, skip to step 5.
3. Send a doorbell interrupt to each thread that was seen as
   being in a stop state in step 2.
4. Until at least 3 threads are awake, scan the threads to which
   we sent a doorbell interrupt and check if they are awake now.

This relies on the following properties:

- Once dont_stop is non-zero, requested_psccr can't go from zero to
  non-zero, except transiently (and without the thread doing stop).
- requested_psscr being zero guarantees that the thread isn't in
  a state-losing stop state where thread reconfiguration could occur.
- Doing stop with a PSSCR value of 0 won't be a state-losing stop
  and thus won't allow thread reconfiguration.
- Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core
  must be in SMT4 mode, since SMT modes are powers of 2.

This does add a sync to power9_idle_stop(), which is necessary to
provide the correct ordering between setting requested_psscr and
checking dont_stop.  The overhead of the sync should be unnoticeable
compared to the latency of going into and out of a stop state.

Because some objected to incurring this extra latency on systems where
the XER[SO] bug is not relevant, I have put the test in
power9_idle_stop inside a feature section.  This means that
pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems
without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will
probably hang the system.

In order to cater for uses where the caller has an operation that
has to be done while the core is in SMT4, the core continues to be
kept in SMT4 after pnv_power9_force_smt4_catch() function returns,
until the pnv_power9_force_smt4_release() function is called.
It undoes the effect of step 1 above and allows the other threads
to go into a stop state.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24 00:39:11 +11:00
Christophe Leroy 15472423ce powerpc/mm/slice: Allow up to 64 low slices
While the implementation of the "slices" address space allows
a significant amount of high slices, it limits the number of
low slices to 16 due to the use of a single u64 low_slices_psize
element in struct mm_context_t

On the 8xx, the minimum slice size is the size of the area
covered by a single PMD entry, ie 4M in 4K pages mode and 64M in
16K pages mode. This means we could have at least 64 slices.

In order to override this limitation, this patch switches the
handling of low_slices_psize to char array as done already for
high_slices_psize.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-06 09:21:23 +11:00
Nicholas Piggin bdcb1aefc5 powerpc/64s: Improve RFI L1-D cache flush fallback
The fallback RFI flush is used when firmware does not provide a way
to flush the cache. It's a "displacement flush" that evicts useful
data by displacing it with an uninteresting buffer.

The flush has to take care to work with implementation specific cache
replacment policies, so the recipe has been in flux. The initial
slow but conservative approach is to touch all lines of a congruence
class, with dependencies between each load. It has since been
determined that a linear pattern of loads without dependencies is
sufficient, and is significantly faster.

Measuring the speed of a null syscall with RFI fallback flush enabled
gives the relative improvement:

P8 - 1.83x
P9 - 1.75x

The flush also becomes simpler and more adaptable to different cache
geometries.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-23 16:16:33 +11:00
Michael Ellerman ebf0b6a8b1 Merge branch 'fixes' into next
Merge our fixes branch from the 4.15 cycle.

Unusually the fixes branch saw some significant features merged,
notably the RFI flush patches, so we want the code in next to be
tested against that, to avoid any surprises when the two are merged.

There's also some other work on the panic handling that was reverted
in fixes and we now want to do properly in next, which would conflict.

And we also fix a few other minor merge conflicts.
2018-01-21 23:21:14 +11:00
Madhavan Srinivasan 4e26bc4a4e powerpc/64: Rename soft_enabled to irq_soft_mask
Rename the paca->soft_enabled to paca->irq_soft_mask as it is no
longer used as a flag for interrupt state, but a mask.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19 22:37:01 +11:00
Michael Ellerman aa8a5e0062 powerpc/64s: Add support for RFI flush of L1-D cache
On some CPUs we can prevent the Meltdown vulnerability by flushing the
L1-D cache on exit from kernel to user mode, and from hypervisor to
guest.

This is known to be the case on at least Power7, Power8 and Power9. At
this time we do not know the status of the vulnerability on other CPUs
such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
CPUs. As more information comes to light we can enable this, or other
mechanisms on those CPUs.

The vulnerability occurs when the load of an architecturally
inaccessible memory region (eg. userspace load of kernel memory) is
speculatively executed to the point where its result can influence the
address of a subsequent speculatively executed load.

In order for that to happen, the first load must hit in the L1,
because before the load is sent to the L2 the permission check is
performed. Therefore if no kernel addresses hit in the L1 the
vulnerability can not occur. We can ensure that is the case by
flushing the L1 whenever we return to userspace. Similarly for
hypervisor vs guest.

In order to flush the L1-D cache on exit, we add a section of nops at
each (h)rfi location that returns to a lower privileged context, and
patch that with some sequence. Newer firmwares are able to advertise
to us that there is a special nop instruction that flushes the L1-D.
If we do not see that advertised, we fall back to doing a displacement
flush in software.

For guest kernels we support migration between some CPU versions, and
different CPUs may use different flush instructions. So that we are
prepared to migrate to a machine with a different flush instruction
activated, we may have to patch more than one flush instruction at
boot if the hypervisor tells us to.

In the end this patch is mostly the work of Nicholas Piggin and
Michael Ellerman. However a cast of thousands contributed to analysis
of the issue, earlier versions of the patch, back ports testing etc.
Many thanks to all of them.

Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-10 21:27:06 +11:00
Nicholas Piggin 4722476bce powerpc/64s: mm_context.addr_limit is only used on hash
Radix keeps no meaningful state in addr_limit, so remove it from radix
code and rename to slb_addr_limit to make it clear it applies to hash
only.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13 23:35:43 +11:00
Michael Ellerman 4e00374704 powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64
CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
found in "server" processors, from IBM mainly.

Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
annoying to have two symbols that always have the same value, it's not
quite annoying enough to bother removing one.

However with the arrival of Power9, we now have the situation where
CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
Radix MMU - *not* the "standard" MMU. So it is now actively confusing
to use it, because it implies that code is disabled or inactive when
the Radix MMU is in use, however that is not necessarily true.

So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
formatting updates of some of the affected lines.

This will be a pain for backports, but c'est la vie.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 16:48:14 +11:00
Michael Neuling 5080332c2c powerpc/64s: Add workaround for P9 vector CI load issue
POWER9 DD2.1 and earlier has an issue where some cache inhibited
vector load will return bad data. The workaround is two part, one
firmware/microcode part triggers HMI interrupts when hitting such
loads, the other part is this patch which then emulates the
instructions in Linux.

The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
lxvh8x.

When an instruction triggers the HMI, all threads in the core will be
sent to the HMI handler, not just the one running the vector load.

In general, these spurious HMIs are detected by the emulation code and
we just return back to the running process. Unfortunately, if a
spurious interrupt occurs on a vector load that's to normal memory we
have no way to detect that it's spurious (unless we walk the page
tables, which is very expensive). In this case we emulate the load but
we need do so using a vector load itself to ensure 128bit atomicity is
preserved.

Some additional debugfs emulated instruction counters are added also.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-27 08:23:22 +10:00
Gautham R. Shenoy e1c1cfed54 powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.

Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug.  Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.

When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.

In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-01 21:01:20 +10:00
Nicholas Piggin 8c38851415 powerpc/64s: Add EX_SIZE definition for paca exception save areas
Rather than open-coding it 4 times.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move __ASSEMBLY__ guards into head-64.h where they're really needed]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:00 +10:00
Gautham R. Shenoy 22c6663dc6 powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1
On Power9 DD1 due to a hardware bug the Power-Saving Level Status
field (PLS) of the PSSCR for a thread waking up from a deep state can
under-report if some other thread in the core is in a shallow stop
state. The scenario in which this can manifest is as follows:

   1) All the threads of the core are in deep stop.
   2) One of the threads is woken up. The PLS for this thread will
      correctly reflect that it is waking up from deep stop.
   3) The thread that has woken up now executes a shallow stop.
   4) When some other thread in the core is woken, its PLS will reflect
      the shallow stop state.

Thus, the subsequent thread for which the PLS is under-reporting the
wakeup state will not restore the hypervisor resources.

Hence, on DD1 systems, use the Requested Level (RL) field as a
workaround to restore the contents of the hypervisor resources on the
wakeup from the stop state.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin b1ee8a3de5 powerpc/64s: Dedicated system reset interrupt stack
The system reset interrupt is used for crash/debug situations, so it is
desirable to have as little impact on the normal state of the system as
possible.

Currently it uses the current kernel stack to process the exception.
This stores into the stack which may be involved with the crash. The
stack pointer may be corrupted, or it may have overflowed.

Avoid or minimise these problems by creating a dedicated NMI stack for
the system reset interrupt to use.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Nicholas Piggin c4f3b52ce7 powerpc/64s: Disallow system reset vs system reset reentrancy
In preparation for using a dedicated stack for system reset interrupts,
prevent a nested system reset from recovering, in order to simplify
code that is called in crash/debug path. This allows a system reset
interrupt to just use the base stack pointer.

Keep an in_nmi nesting counter similarly to the in_mce counter. Consider
the interrrupt non-recoverable if it is taken inside another system
reset.

Interrupt nesting could be allowed similarly to MCE, but system reset
is a special case that's not for normal operation, so simplicity wins
until there is requirement for nested system reset interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Nicholas Piggin a3d96f70c1 powerpc/64s: Fix system reset vs general interrupt reentrancy
The system reset interrupt can occur when MSR_EE=0, and it currently
uses the PACA_EXGEN save area.

Some PACA_EXGEN interrupts have a window where MSR_RI=1 and MSR_EE=0
when the save area is still in use. A system reset interrupt in this
window can lead to undetected corruption when the save area gets
overwritten.

This patch introduces PACA_EXNMI save area for system reset exceptions,
which closes this corruption window. It's also helpful to retain the
EXGEN state for debugging situations, even if not considering the
recoverability aspect.

This patch also moves the PACA_EXMC area down to a less frequently used
part of the paca with the new save area.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Gautham R. Shenoy 17ed4c8f81 powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1
POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
the HSPRG0 of a thread waking up from can contain the paca pointer of
its sibling.

This patch implements a context recovery framework within threads of a
core, by provisioning space in paca_struct for saving every sibling
threads's paca pointers. Basically, we should be able to arrive at the
right paca pointer from any of the thread's existing paca pointer.

At bootup, during powernv idle-init, we save the paca address of every
CPU in each one its siblings paca_struct in the slot corresponding to
this CPU's index in the core.

On wakeup from a stop, the thread will determine its index in the core
from the TIR register and recover its PACA pointer by indexing into
the correct slot in the provisioned space in the current PACA.

Furthermore, ensure that the NVGPRs are restored from the stack on the
way out by setting the NAPSTATELOST in paca.

[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Call it a bug]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11 08:45:09 +10:00
Aneesh Kumar K.V bb1832217a powerpc/mm/hash: Store addr_limit in PACA
We optmize the slice page size array copy to paca by copying only the
range based on addr_limit. This will require us to not look at page size
array beyond addr_limit in PACA on slb fault. To enable that copy task
size to paca which will be used during slb fault.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rename from task_size to addr_limit, consolidate #ifdefs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-01 21:12:27 +11:00
Aneesh Kumar K.V 52b1e66587 powerpc/mm: Move copy_mm_to_paca to paca.c
We also update the function arg to struct mm_struct. Move this so that function
finds the definition of struct mm_struct. No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:54 +11:00
Frederic Weisbecker f828c3d0ae sched/cputime, powerpc: Migrate stolen_time field to the accounting structure
That in order to gather all cputime accumulation to the same place.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-7-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 09:54:12 +01:00
Paolo Bonzini 7c379526d7 powerpc: move hmi.c to arch/powerpc/kvm/
hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use.  So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.

Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Linus Torvalds 221bb8a46e - ARM: GICv3 ITS emulation and various fixes. Removal of the old
VGIC implementation.
 
 - s390: support for trapping software breakpoints, nested virtualization
 (vSIE), the STHYI opcode, initial extensions for CPU model support.
 
 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups,
 preliminary to this and the upcoming support for hardware virtualization
 extensions.
 
 - x86: support for execute-only mappings in nested EPT; reduced vmexit
 latency for TSC deadline timer (by about 30%) on Intel hosts; support for
 more than 255 vCPUs.
 
 - PPC: bugfixes.
 
 The ugly bit is the conflicts.  A couple of them are simple conflicts due
 to 4.7 fixes, but most of them are with other trees. There was definitely
 too much reliance on Acked-by here.  Some conflicts are for KVM patches
 where _I_ gave my Acked-by, but the worst are for this pull request's
 patches that touch files outside arch/*/kvm.  KVM submaintainers should
 probably learn to synchronize better with arch maintainers, with the
 latter providing topic branches whenever possible instead of Acked-by.
 This is what we do with arch/x86.  And I should learn to refuse pull
 requests when linux-next sends scary signals, even if that means that
 submaintainers have to rebase their branches.
 
 Anyhow, here's the list:
 
 - arch/x86/kvm/vmx.c: handle_pcommit and EXIT_REASON_PCOMMIT was removed
 by the nvdimm tree.  This tree adds handle_preemption_timer and
 EXIT_REASON_PREEMPTION_TIMER at the same place.  In general all mentions
 of pcommit have to go.
 
 There is also a conflict between a stable fix and this patch, where the
 stable fix removed the vmx_create_pml_buffer function and its call.
 
 - virt/kvm/kvm_main.c: kvm_cpu_notifier was removed by the hotplug tree.
 This tree adds kvm_io_bus_get_dev at the same place.
 
 - virt/kvm/arm/vgic.c: a few final bugfixes went into 4.7 before the
 file was completely removed for 4.8.
 
 - include/linux/irqchip/arm-gic-v3.h: this one is entirely our fault;
 this is a change that should have gone in through the irqchip tree and
 pulled by kvm-arm.  I think I would have rejected this kvm-arm pull
 request.  The KVM version is the right one, except that it lacks
 GITS_BASER_PAGES_SHIFT.
 
 - arch/powerpc: what a mess.  For the idle_book3s.S conflict, the KVM
 tree is the right one; everything else is trivial.  In this case I am
 not quite sure what went wrong.  The commit that is causing the mess
 (fd7bacbca4, "KVM: PPC: Book3S HV: Fix TB corruption in guest exit
 path on HMI interrupt", 2016-05-15) touches both arch/powerpc/kernel/
 and arch/powerpc/kvm/.  It's large, but at 396 insertions/5 deletions
 I guessed that it wasn't really possible to split it and that the 5
 deletions wouldn't conflict.  That wasn't the case.
 
 - arch/s390: also messy.  First is hypfs_diag.c where the KVM tree
 moved some code and the s390 tree patched it.  You have to reapply the
 relevant part of commits 6c22c98637, plus all of e030c1125e, to
 arch/s390/kernel/diag.c.  Or pick the linux-next conflict
 resolution from http://marc.info/?l=kvm&m=146717549531603&w=2.
 Second, there is a conflict in gmap.c between a stable fix and 4.8.
 The KVM version here is the correct one.
 
 I have pushed my resolution at refs/heads/merge-20160802 (commit
 3d1f53419842) at git://git.kernel.org/pub/scm/virt/kvm/kvm.git.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXoGm7AAoJEL/70l94x66DugQIAIj703ePAFepB/fCrKHkZZia
 SGrsBdvAtNsOhr7FQ5qvvjLxiv/cv7CymeuJivX8H+4kuUHUllDzey+RPHYHD9X7
 U6n1PdCH9F15a3IXc8tDjlDdOMNIKJixYuq1UyNZMU6NFwl00+TZf9JF8A2US65b
 x/41W98ilL6nNBAsoDVmCLtPNWAqQ3lajaZELGfcqRQ9ZGKcAYOaLFXHv2YHf2XC
 qIDMf+slBGSQ66UoATnYV2gAopNlWbZ7n0vO6tE2KyvhHZ1m399aBX1+k8la/0JI
 69r+Tz7ZHUSFtmlmyByi5IAB87myy2WQHyAPwj+4vwJkDGPcl0TrupzbG7+T05Y=
 =42ti
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:

 - ARM: GICv3 ITS emulation and various fixes.  Removal of the
   old VGIC implementation.

 - s390: support for trapping software breakpoints, nested
   virtualization (vSIE), the STHYI opcode, initial extensions
   for CPU model support.

 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots
   of cleanups, preliminary to this and the upcoming support for
   hardware virtualization extensions.

 - x86: support for execute-only mappings in nested EPT; reduced
   vmexit latency for TSC deadline timer (by about 30%) on Intel
   hosts; support for more than 255 vCPUs.

 - PPC: bugfixes.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
  KVM: PPC: Introduce KVM_CAP_PPC_HTM
  MIPS: Select HAVE_KVM for MIPS64_R{2,6}
  MIPS: KVM: Reset CP0_PageMask during host TLB flush
  MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
  MIPS: KVM: Sign extend MFC0/RDHWR results
  MIPS: KVM: Fix 64-bit big endian dynamic translation
  MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
  MIPS: KVM: Use 64-bit CP0_EBase when appropriate
  MIPS: KVM: Set CP0_Status.KX on MIPS64
  MIPS: KVM: Make entry code MIPS64 friendly
  MIPS: KVM: Use kmap instead of CKSEG0ADDR()
  MIPS: KVM: Use virt_to_phys() to get commpage PFN
  MIPS: Fix definition of KSEGX() for 64-bit
  KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
  kvm: x86: nVMX: maintain internal copy of current VMCS
  KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
  KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
  KVM: arm64: vgic-its: Simplify MAPI error handling
  KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
  KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
  ...
2016-08-02 16:11:27 -04:00
Christophe Leroy c223c90386 powerpc32: provide VIRT_CPU_ACCOUNTING
This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
PPC32 doesn't have the PACA structure, so we use the task_info
structure to store the accounting data.

In order to reuse on PPC32 the PPC64 functions, all u64 data has
been replaced by 'unsigned long' so that it is u32 on PPC32 and
u64 on PPC64

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 01:43:50 -05:00
Mahesh Salgaonkar fd7bacbca4 KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt
When a guest is assigned to a core it converts the host Timebase (TB)
into guest TB by adding guest timebase offset before entering into
guest. During guest exit it restores the guest TB to host TB. This means
under certain conditions (Guest migration) host TB and guest TB can differ.

When we get an HMI for TB related issues the opal HMI handler would
try fixing errors and restore the correct host TB value. With no guest
running, we don't have any issues. But with guest running on the core
we run into TB corruption issues.

If we get an HMI while in the guest, the current HMI handler invokes opal
hmi handler before forcing guest to exit. The guest exit path subtracts
the guest TB offset from the current TB value which may have already
been restored with host value by opal hmi handler. This leads to incorrect
host and guest TB values.

With split-core, things become more complex. With split-core, TB also gets
split and each subcore gets its own TB register. When a hmi handler fixes
a TB error and restores the TB value, it affects all the TB values of
sibling subcores on the same core. On TB errors all the thread in the core
gets HMI. With existing code, the individual threads call opal hmi handle
independently which can easily throw TB out of sync if we have guest
running on subcores. Hence we will need to co-ordinate with all the
threads before making opal hmi handler call followed by TB resync.

This patch introduces a sibling subcore state structure (shared by all
threads in the core) in paca which holds information about whether sibling
subcores are in Guest mode or host mode. An array in_guest[] of size
MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
The subcore id is used as index into in_guest[] array. Only primary
thread entering/exiting the guest is responsible to set/unset its
designated array element.

On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
this patch will now force guest to vacate the core/subcore. Primary
thread from each subcore will then turn off its respective bit
from the above bitmap during the guest exit path just after the
guest->host partition switch is complete.

All other threads that have just exited the guest OR were already in host
will wait until all other subcores clears their respective bit.
Once all the subcores turn off their respective bit, all threads will
will make call to opal hmi handler.

It is not necessary that opal hmi handler would resync the TB value for
every HMI interrupts. It would do so only for the HMI caused due to
TB errors. For rest, it would not touch TB value. Hence to make things
simpler, primary thread would call TB resync explicitly once for each
core immediately after opal hmi handler instead of subtracting guest
offset from TB. TB resync call will restore the TB with host value.
Thus we can be sure about the TB state.

One of the primary threads exiting the guest will take up the
responsibility of calling TB resync. It will use one of the top bits
(bit 63) from subcore state flags bitmap to make the decision. The first
primary thread (among the subcores) that is able to set the bit will
have to call the TB resync. Rest all other threads will wait until TB
resync is complete.  Once TB resync is complete all threads will then
proceed.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-06-20 14:11:25 +10:00
Michael Ellerman c33e54fafa powerpc: Fix build break due to paca mm_context_t changes
Commit 2fc251a8dd ("powerpc: Copy only required pieces of the
mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y
and CONFIG_PPC_MM_SLICES=n.

That only happens for a kernel built with 4K pages and HUGETLB disabled,
which is why we missed it.

Fix it by adding a mm_ctx_user_psize member to the paca and populating
it in the appropriate places.

Fixes: 2fc251a8dd ("powerpc: Copy only required pieces of the mm_context_t to the paca")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-09 08:28:44 +11:00
Michael Neuling 2fc251a8dd powerpc: Copy only required pieces of the mm_context_t to the paca
Currently we copy the whole mm_context_t to the paca but only access a
few bits of it.  This is wasteful of space paca and also takes quite
some time in the hot path of context switching.

This patch pulls in only the required bits from the mm_context_t to
the paca and on context switch, copies only those.

Benchmarking this (On top of Anton's recent MSR context switching
changes [1]) using processes and yield shows an improvement of almost
3% on POWER8:

  http://ozlabs.org/~anton/junkcode/context_switch2.c
  ./context_switch2 --test=yield --process 0 0

1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.html

Signed-off-by: Michael Neuling <mikey@neuling.org>
[mpe: Rename paca fields to be mm_ctx_foo rather than context_foo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-27 19:12:14 +11:00
Michael Neuling c395465da6 powerpc: Add function to copy mm_context_t to the paca
This adds a function to copy the mm->context to the paca.  This is
only a basic conversion for now but will be used more extensively in
the next patch.

This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's
not used elsewhere.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-19 22:13:12 +11:00
Kevin Hao 016f8cf0d8 powerpc: book3e_64: fix the align size for paca_struct
All the cache line size of the current book3e 64bit SoCs are 64 bytes.
So we should use this size to align the member of paca_struct.
This only change the paca_struct's members which are private to book3e
CPUs, and should not have any effect to book3s ones. With this, we save
192 bytes. Also change it to __aligned(size) since it is preferred over
__attribute__((aligned(size))).

Before:
	/* size: 1920, cachelines: 30, members: 46 */
	/* sum members: 1667, holes: 6, sum holes: 141 */
	/* padding: 112 */

After:
	/* size: 1728, cachelines: 27, members: 46 */
	/* sum members: 1667, holes: 4, sum holes: 13 */
	/* padding: 48 */

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-31 22:23:17 -05:00
Shreyas B. Prabhu 77b54e9f21 powernv/powerpc: Add winkle support for offline cpus
Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.

But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.

Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.

With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:41 +11:00
Shreyas B. Prabhu 7cba160ad7 powernv/cpuidle: Redesign idle states management
Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.

The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.

This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:40 +11:00
Mahesh Salgaonkar 6d626c5ea3 powerpc/powernv: Cleanup unused MCE definitions/declarations.
Cleanup OpalMCE_* definitions/declarations and other related code which
is not used anymore.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Benjamin Herrrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 11:03:45 +11:00
Mahesh Salgaonkar 0ef95b411e powerpc/powernv: Invoke opal call to handle hmi.
When we hit the HMI in Linux, invoke opal call to handle/recover from HMI
errors in real mode and then in virtual mode during check_irq_replay()
invoke opal_poll_events()/opal_do_notifier() to retrieve HMI event from
OPAL and act accordingly.

Now that we are ready to handle HMI interrupt directly in linux, remove
the HMI interrupt registration with firmware.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 16:33:52 +10:00
Michael Ellerman 376af5947c powerpc: Remove STAB code
Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had
a Segment Table (STAB). Now that we've dropped support for those cpus,
we can remove the STAB support entirely.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28 14:10:22 +10:00
Sam bobroff 1739ea9e13 powerpc: Fix regression of per-CPU DSCR setting
Since commit "efcac65 powerpc: Per process DSCR + some fixes (try#4)"
it is no longer possible to set the DSCR on a per-CPU basis.

The old behaviour was to minipulate the DSCR SPR directly but this is no
longer sufficient: the value is quickly overwritten by context switching.

This patch stores the per-CPU DSCR value in a kernel variable rather than
directly in the SPR and it is used whenever a process has not set the DSCR
itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.

Writes to the old global default (/sys/devices/system/cpu/dscr_default)
now set all of the per-CPU values and reads return the last written value.

The new per-CPU default is added to the paca_struct and is used everywhere
outside of sysfs.c instead of the old global default.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-05-28 13:35:40 +10:00
Scott Wood 609af38f8f powerpc/booke64: Critical and machine check exception support
Add special state saving for critical and machine check exceptions.

Most of this code could be used to handle debug exceptions taken from
kernel space, but actually doing so is outside the scope of this patch.

The various critical and machine check exceptions now point to their
real handlers, rather than hanging the kernel.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-03-19 19:57:27 -05:00
Scott Wood 9d378dfac8 powerpc/booke64: Use SPRG7 for VDSO
Previously SPRG3 was marked for use by both VDSO and critical
interrupts (though critical interrupts were not fully implemented).

In commit 8b64a9dfb0 ("powerpc/booke64:
Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman
made an attempt to resolve this conflict by restoring the VDSO value
early in the critical interrupt, but this has some issues:

 - It's incompatible with EXCEPTION_COMMON which restores r13 from the
   by-then-overwritten scratch (this cost me some debugging time).
 - It forces critical exceptions to be a special case handled
   differently from even machine check and debug level exceptions.
 - It didn't occur to me that it was possible to make this work at all
   (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after
   I made (most of) this patch. :-)

It might be worth investigating using a load rather than SPRG on return
from all exceptions (except TLB misses where the scratch never leaves
the SPRG) -- it could save a few cycles.  Until then, let's stick with
SPRG for all exceptions.

Since we cannot use SPRG4-7 for scratch without corrupting the state of
a KVM guest, move VDSO to SPRG7 on book3e.  Since neither SPRG4-7 nor
critical interrupts exist on book3s, SPRG3 is still used for VDSO
there.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: kvm-ppc@vger.kernel.org
2014-03-19 19:57:14 -05:00
Benjamin Herrenschmidt fac515db45 Merge remote-tracking branch 'scott/next' into next
Freescale updates from Scott:

<<
Highlights include 32-bit booke relocatable support, e6500 hardware
tablewalk support, various e500 SPE fixes, some new/revived boards, and
e6500 deeper idle and altivec powerdown modes.
>>
2014-01-15 14:22:35 +11:00
Paul Gortmaker c141611fb1 powerpc: Delete non-required instances of include <linux/init.h>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

The one instance where we add an include for init.h covers off
a case where that file was implicitly getting it from another
header which itself didn't need it.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:46:44 +11:00
Scott Wood 28efc35fe6 powerpc/e6500: TLB miss handler with hardware tablewalk support
There are a few things that make the existing hw tablewalk handlers
unsuitable for e6500:

 - Indirect entries go in TLB1 (though the resulting direct entries go in
   TLB0).

 - It has threads, but no "tlbsrx." -- so we need a spinlock and
   a normal "tlbsx".  Because we need this lock, hardware tablewalk
   is mandatory on e6500 unless we want to add spinlock+tlbsx to
   the normal bolted TLB miss handler.

 - TLB1 has no HES (nor next-victim hint) so we need software round robin
   (TODO: integrate this round robin data with hugetlb/KVM)

 - The existing tablewalk handlers map half of a page table at a time,
   because IBM hardware has a fixed 1MiB indirect page size.  e6500
   has variable size indirect entries, with a minimum of 2MiB.
   So we can't do the half-page indirect mapping, and even if we
   could it would be less efficient than mapping the full page.

 - Like on e5500, the linear mapping is bolted, so we don't need the
   overhead of supporting nested tlb misses.

Note that hardware tablewalk does not work in rev1 of e6500.
We do not expect to support e6500 rev1 in mainline Linux.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
2014-01-09 17:52:19 -06:00
Mahesh Salgaonkar 729b0f7153 powerpc/book3s: Introduce exclusive emergency stack for machine check exception.
This patch introduces exclusive emergency stack for machine check exception.
We use emergency stack to handle machine check exception so that we can save
MCE information (srr1, srr0, dar and dsisr) before turning on ME bit and be
ready for re-entrancy. This helps us to prevent clobbering of MCE information
in case of nested machine checks.

The reason for using emergency stack over normal kernel stack is that the
machine check might occur in the middle of setting up a stack frame which may
result into improper use of kernel stack.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-05 16:02:05 +11:00
Aneesh Kumar K.V 7aa79938f7 kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE
With later patches supporting PR kvm as a kernel module, the changes
that has to be built into the main kernel binary to enable PR KVM module
is now selected via KVM_BOOK3S_PR_POSSIBLE

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17 15:17:49 +02:00
Anton Blanchard 54bb7f4bda powerpc: Make rwlocks endian safe
Our ppc64 spinlocks and rwlocks use a trick where a lock token and
the paca index are placed in the lock with a single store. Since we
are using two u16s they need adjusting for little endian.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14 15:33:40 +10:00
Michael Neuling bc2e6c6ac2 powerpc: Avoid link stack corruption for MMU on exceptions
When we have MMU on exceptions (POWER8) and a relocatable kernel, we
need to branch from the initial exception vectors at 0x0 to up high
where the kernel might be located.  Currently we do this using the link
register.

Unfortunately this corrupts the link stack and instead we should use the
count register.  We did this for the syscall entry path in:
  6a40480 powerpc: Avoid link stack corruption in MMU on syscall entry path
but I stupidly forgot to do the same for other exceptions.

This patch changes the initial exception vectors to use the count
register instead of the link register when we need to branch up to the
relocated kernel.

I have a dodgy userspace test which loops calling a function that reads
the PVR (mfpvr in userspace will be emulated by the kernel via the
program check exception).  On POWER8 and with CONFIG_RELOCATABLE=y, I
get a ~10% performance improvement with my userspace test with this
patch.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14 15:33:18 +10:00