Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.
On arm64, these operations have been incorrectly implemented as follows:
// A, B, C are independent memory locations
<Access [A]>
// atomic_op (B)
1: ldaxr x0, [B] // Exclusive load with acquire
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
<Access [C]>
The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).
Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.
The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:
<Access [A]>
// atomic_op (B)
dmb ish // Full barrier
1: ldxr x0, [B] // Exclusive load
<op(B)>
stxr w1, x0, [B] // Exclusive store
cbnz w1, 1b
dmb ish // Full barrier
<Access [C]>
but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:
<Access [A]>
// atomic_op (B)
1: ldxr x0, [B] // Exclusive load
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
dmb ish // Full barrier
<Access [C]>
The simple observations here are:
- The dmb ensures that no subsequent accesses (e.g. the access to C)
can enter or pass the atomic sequence.
- The dmb also ensures that no prior accesses (e.g. the access to A)
can pass the atomic sequence.
- Therefore, no prior access can pass a subsequent access, or
vice-versa (i.e. A is strictly ordered before C).
- The stlxr ensures that no prior access can pass the store component
of the atomic operation.
The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.
From an (arbitrary) observer's point of view, there are two scenarios:
1. We have observed the ldxr. This means that if we perform a store to
[B], the ldxr will still return older data. If we can observe the
ldxr, then we can potentially observe the permitted re-ordering
with the access to A, which is clearly an issue when compared to
the dmb variant of the code. Thankfully, the exclusive monitor will
save us here since it will be cleared as a result of the store and
the ldxr will retry. Notice that any use of a later memory
observation to imply observation of the ldxr will also imply
observation of the access to A, since the stlxr/dmb ensure strict
ordering.
2. We have not observed the ldxr. This means we can perform a store
and influence the later ldxr. However, that doesn't actually tell
us anything about the access to [A], so we've not lost anything
here either when compared to the dmb variant.
This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.
Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Update wall-to-monotonic fields in the VDSO data page
unconditionally. These are used to service CLOCK_MONOTONIC_COARSE,
which is not guarded by use_syscall.
Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When __kernel_clock_gettime is called with a CLOCK_MONOTONIC_COARSE or
CLOCK_REALTIME_COARSE clock id, it returns incorrectly to whatever the
caller has placed in x2 ("ret x2" to return from the fast path). Fix
this by saving x30/LR to x2 only in code that will call
__do_get_tspec, restoring x30 afterward, and using a plain "ret" to
return from the routine.
Also: while the resulting tv_nsec value for CLOCK_REALTIME and
CLOCK_MONOTONIC must be computed using intermediate values that are
left-shifted by cs_shift (x12, set by __do_get_tspec), the results for
coarse clocks should be calculated using unshifted values
(xtime_coarse_nsec is in units of actual nanoseconds). The current
code shifts intermediate values by x12 unconditionally, but x12 is
uninitialized when servicing a coarse clock. Fix this by setting x12
to 0 once we know we are dealing with a coarse clock id.
Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Whilst the text segment for our VDSO is marked as PT_LOAD in the ELF
headers, it is mapped by the kernel and not actually subject to
demand-paging. ld doesn't realise this, and emits a p_align field of 64k
(the maximum supported page size), which conflicts with the load address
picked by the kernel on 4k systems, which will be 4k aligned. This
causes GDB to fail with "Failed to read a valid object file image from
memory" when attempting to load the VDSO.
This patch passes the -n option to ld, which prevents it from aligning
PT_LOAD segments to the maximum page size.
Cc: <stable@vger.kernel.org>
Reported-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Introduction of PTE_WRITE to distinguish between writable but clean
and truly read-only pages
- FIQs enabling/disabling clean-up (they aren't used on arm64)
- CPU resume fix for the per-cpu offset restoring
- Code comment typos
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iQIcBAABAgAGBQJS6+SLAAoJEGvWsS0AyF7xk60P/RBEAXqBdlNkTPQ9eIBs8P9b
txv7T6k6MnQt6vK+3Kepgnoadeaw4WRCqKPduA90v7tnrw8fVLmuAM29o0uITkDH
4Cz69/8A9B2Y+ouiODS6MrEvlz99BfbSZdUZd89eNSO7tTzv1YckpepNQCL3Wf+S
1nXa4dZrciCR2vMmkF01jvc3yTjCJ0W7IduUhFQevpMfEzq2bLhvS5eGQd618pBc
Ih44IYO6RfBeYB3CxWiSJeHgSXzlSijUeoPQbiDEhFsKXO7SaZN5oQ5Ra8v+YAMr
fhcblX9rxH2Z1AfYizPssQ0VEnm3cN4nKurr7CA5R+LeHuR38lD9WWABhkYgQMRB
V43/3wvozRyWhLBkS9V0NjObfpPhDuLR6pdbWRQnCO97mH/0HVd2ql49gvsOcdgb
qa2XTF+iOJOQ3325SGdruTkbzkccFoKniiiOsndhx48VGLhHD9U0sCYLfb5X6NJC
ZFI3qQx+3+q32jkDoShqQZ/4fnb8TqzH/KqNtZdZPJ7dlkw/2B8wcnAIoF+BfgbH
2TRfbndSrNGLJSc4bgFQ3e7SCHSdm/PUR/l4c+xN9Ia4tF9614ndfPAaSKaMaKca
O6IwX8+IBo94JDhtPsypplYllBlBeXzMjZFO6b60sBF9UaxgdS1ZQ3NP9pumGN80
2czoY2kwV68DScgxBFLx
=tWas
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pyll ARM64 patches from Catalin Marinas:
- Build fix with DMA_CMA enabled
- Introduction of PTE_WRITE to distinguish between writable but clean
and truly read-only pages
- FIQs enabling/disabling clean-up (they aren't used on arm64)
- CPU resume fix for the per-cpu offset restoring
- Code comment typos
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: Introduce PTE_WRITE
arm64: mm: Remove PTE_BIT_FUNC macro
arm64: FIQs are unused
arm64: mm: fix the function name in comment of cpu_do_switch_mm
arm64: fix build error if DMA_CMA is enabled
arm64: kernel: fix per-cpu offset restore on resume
arm64: mm: fix the function name in comment of __flush_dcache_area
arm64: mm: use ubfm for dcache_line_size
So any FIQ handling is superfluous at the moment. The functions to
disable/enable FIQs is kept around if ever someone needs them in the
future, but existing calling sites including arch_cpu_idle_prepare()
may go for now.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The introduction of percpu offset optimisation through tpidr_el1 in:
Commit id :7158627686f02319c50c8d9d78f75d4c8
"arm64: percpu: implement optimised pcpu access using tpidr_el1"
requires cpu_{suspend/resume} to restore the tpidr_el1 register upon resume
so that percpu variables can be addressed correctly when a CPU comes out
of reset from warm-boot.
This patch fixes cpu_{suspend}/{resume} tpidr_el1 restoration on resume, by
calling the set_my_cpu_offset C API, as it is done on primary and secondary
CPUs on cold boot, so that, even if the register used to store the percpu
offset is changed, the save and restore of general purpose registers does not
have to be updated.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Interface)
- Jump label support
- CMA can now be enabled on arm64
- HWCAP bits for crypto and CRC32 extensions
- Optimised percpu using tpidr_el1 register
- Code cleanup
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iQIcBAABAgAGBQJS3WLBAAoJEGvWsS0AyF7xxx8P+wUYzu04rKtz4BOj5IHl8TxB
xsRw8ce9MdxIVgdtCjDzmdpkd0s8ZPTEweJnVGYYlB9O9Pmz0VSX4Z1y6W6k0P1f
GCKDMa+hn2uQYnw3bS022Zji6OjUfad9XUfe3f61YdA7GrSdjTVMapXuloASRcfl
0XkfpXwbfLPGpuNp4q/QaA9K/y93T/gc6O/ctJh3OUJDOWJXZGsUTRIKXTF9GrWn
/gPEK9MiatAPpcS7iO283a3vllDalNoEGpt+a4cYCc8il2kCWUpX6W2c9m3Ua26k
mAvkoUErfb3cW/PzqDZzr8M3XbnXb8Je99HBbcjQluL6zyw+0hdUHJpCFOamsz5m
pEpT1e0Hvxb6yNbjqyituiYFPwOUZHP/HeZpH4l1njhN7sIyZP5cUEV4f53VN4lB
KL3HSGzUTaNzT5UpD35CA4vXRwRKrV8YsAVhB0p53KgkUreKA6wbJHSXHorjBZaE
uuP7kqOMGQ494+f6h+yvZqwIcObQGaHYNQJLY3Yhzg3WAs59s/bX/s4yWhgkte0U
yfxKpxTSiLjv5LmZrVQer04DIf9duNkEpI/DAKUbXagHJ7RCHjOneg9F5ZvJ598o
umCo+ok9hV+vLUhagh4t5guSk2ehW7qoZOG44XkYcCLXTIlMV1AQA6oJr804DUm2
71UbGFi01OY0Jtp8prO3
=TPaC
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull ARM64 updates from Catalin Marinas:
- CPU suspend support on top of PSCI (firmware Power State Coordination
Interface)
- jump label support
- CMA can now be enabled on arm64
- HWCAP bits for crypto and CRC32 extensions
- optimised percpu using tpidr_el1 register
- code cleanup
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
arm64: fix typo in entry.S
arm64: kernel: restore HW breakpoint registers in cpu_suspend
jump_label: use defined macros instead of hard-coding for better readability
arm64, jump label: optimize jump label implementation
arm64, jump label: detect %c support for ARM64
arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions
arm64: move encode_insn_immediate() from module.c to insn.c
arm64: introduce interfaces to hotpatch kernel and module code
arm64: introduce basic aarch64 instruction decoding helpers
arm64: dts: Reduce size of virtio block device for foundation model
arm64: Remove unused __data_loc variable
arm64: Enable CMA
arm64: Warn on NULL device structure for dma APIs
arm64: Add hwcaps for crypto and CRC32 extensions.
arm64: drop redundant macros from read_cpuid()
arm64: Remove outdated comment
arm64: cmpxchg: update macros to prevent warnings
arm64: support single-step and breakpoint handler hooks
ARM64: fix framepointer check in unwind_frame
ARM64: check stack pointer in get_wchan
...
Commit 64681787 (arm64: let the core code deal with preempt_count)
changed the code, but left the comments unchanged, fix it.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When a CPU resumes from low-power, it restores HW breakpoint and
watchpoint slots through a CPU PM notifier. Since we want to enable
debugging as early as possible in the resume path, the mdscr content
is restored along the general purpose registers in the cpu_suspend API
and debug exceptions are reenabled when cpu_suspend returns. Since the
CPU PM notifier is run after a CPU has been resumed, we cannot expect
HW breakpoint registers to contain sane values till the notifier is run,
since the HW breakpoints registers content is unknown at reset; this means
that the CPU might run with debug exceptions enabled, mdscr restored but HW
breakpoint registers containing junk values that can trigger spurious
debug exceptions.
This patch fixes current HW breakpoints restore by moving the HW breakpoints
registers restoration to the cpu_suspend API, before the debug exceptions are
enabled. This way, as soon as the cpu_suspend function returns the
kernel can resume debugging with sane values in HW breakpoint registers.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Introduce aarch64_insn_gen_{nop|branch_imm}() helper functions, which
will be used to implement jump label on ARM64.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Function encode_insn_immediate() will be used by other instruction
manipulate related functions, so move it into insn.c and rename it
as aarch64_insn_encode_immediate().
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Introduce three interfaces to patch kernel and module code:
aarch64_insn_patch_text_nosync():
patch code without synchronization, it's caller's responsibility
to synchronize all CPUs if needed.
aarch64_insn_patch_text_sync():
patch code and always synchronize with stop_machine()
aarch64_insn_patch_text():
patch code and synchronize with stop_machine() if needed
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The __data_loc variable is an unused left over from the 32 bit arm implementation.
Remove that variable and adjust the __mmap_switched startup routine accordingly.
Signed-off-by: Geoff Levand <geoff@infradead.org> for Huawei, Linaro
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Advertise the optional cryptographic and CRC32 instructions to
user space where present. Several hwcap bits [3-7] are allocated.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
[bit 2 is taken now so use bits 3-7 instead]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Code referenced in the comment has moved to arch/arm64/kernel/cputable.c
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
AArch64 Single Steping and Breakpoint debug exceptions will be
used by multiple debug framworks like kprobes & kgdb.
This patch implements the hooks for those frameworks to register
their own handlers for handling breakpoint and single step events.
Reworked the debug exception handler in entry.S: do_dbg to route
software breakpoint (BRK64) exception to do_debug_exception()
Signed-off-by: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We need at least 24 bytes above frame pointer.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
get_wchan() is lockless. Task may wakeup at any time and change its own stack,
thus each next stack frame may be overwritten and filled with random stuff.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch implements the word-at-a-time interface for arm64 using the
same algorithm as ARM. We use the fls64 macro, which expands to a clz
instruction via a compiler builtin. Big-endian configurations make use
of the implementation from asm-generic.
With this implemented, we can replace our byte-at-a-time strnlen_user
and strncpy_from_user functions with the optimised generic versions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch implements optimised percpu variable accesses using the
el1 r/w thread register (tpidr_el1) along the same lines as arch/arm/.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add support for irq registration when pmu interrupt is percpu.
Signed-off-by: Vinayak Kale <vkale@apm.com>
Signed-off-by: Tuan Phan <tphan@apm.com>
[will: tidied up cross-calling to pass &irq]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We currently try to emit .comment twice, once in STABS_DEBUG, and once
in the line immediately following it. As the two section definitions are
identical, the latter is redundant and can be dropped.
This patch drops the redundant .comment section definition.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit 8f34a1da35 ("arm64: ptrace: use HW_BREAKPOINT_EMPTY type for
disabled breakpoints") fixed an issue with GDB trying to zero breakpoint
control registers. The problem there is that the arch hw_breakpoint code
will attempt to create a (disabled), execute breakpoint of length 0.
This will fail validation and report unexpected failure to GDB. To avoid
this, we treated disabled breakpoints as HW_BREAKPOINT_EMPTY, but that
seems to have broken with recent kernels, causing watchpoints to be
treated as TYPE_INST in the core code and returning ENOSPC for any
further breakpoints.
This patch fixes the problem by prioritising the `enable' field of the
breakpoint: if it is cleared, we simply update the perf_event_attr to
indicate that the thing is disabled and don't bother changing either the
type or the length. This reinforces the behaviour that the breakpoint
control register is essentially read-only apart from the enable bit
when disabling a breakpoint.
Cc: <stable@vger.kernel.org>
Reported-by: Aaron Liu <liucy214@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch adds the required makefile and kconfig entries to enable PM
for arm64 systems.
The kernel relies on the cpu_{suspend}/{resume} infrastructure to
properly save the context for a CPU and put it to sleep, hence this
patch adds the config option required to enable cpu_{suspend}/{resume}
API.
In order to rely on the CPU PM implementation for saving and restoring
of CPU subsystems like GIC and PMU, the arch Kconfig must be also
augmented to select the CONFIG_CPU_PM option when SUSPEND or CPU_IDLE
kernel implementations are selected.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
When CPU idle is enabled, the architectural idle call should go through
the idle subsystem to allow CPUs to enter idle states defined
by the platform CPU idle back-end operations.
This patch, mirroring other archs behaviour, adds the CPU idle call to the
architectural arch_cpu_idle implementation for arm64.
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
On platforms with power management capabilities, timers that are shut
down when a CPU enters deep C-states must be emulated using an always-on
timer and a timer IPI to relay the timer IRQ to target CPUs on an SMP
system.
This patch enables the generic clockevents broadcast infrastructure for
arm64, by providing the required Kconfig entries and adding the timer
IPI infrastructure.
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
When a CPU is shutdown either through CPU idle or suspend to RAM, the
content of HW breakpoint registers must be reset or restored to proper
values when CPU resume from low power states. This patch adds debug register
restore operations to the HW breakpoint control function and implements a
CPU PM notifier that allows to restore the content of HW breakpoint registers
to allow proper suspend/resume operations.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Most of the code executed to install and uninstall breakpoints is
common and can be factored out in a function that through a runtime
operations type provides the requested implementation.
This patch creates a common function that can be used to install/uninstall
breakpoints and defines the set of operations that can be carried out
through it.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
When a CPU enters a low power state, its FP register content is lost.
This patch adds a notifier to save the FP context on CPU shutdown
and restore it on CPU resume. The context is saved and restored only
if the suspending thread is not a kernel thread, mirroring the current
context switch behaviour.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Kernel subsystems like CPU idle and suspend to RAM require a generic
mechanism to suspend a processor, save its context and put it into
a quiescent state. The cpu_{suspend}/{resume} implementation provides
such a framework through a kernel interface allowing to save/restore
registers, flush the context to DRAM and suspend/resume to/from
low-power states where processor context may be lost.
The CPU suspend implementation relies on the suspend protocol registered
in CPU operations to carry out a suspend request after context is
saved and flushed to DRAM. The cpu_suspend interface:
int cpu_suspend(unsigned long arg);
allows to pass an opaque parameter that is handed over to the suspend CPU
operations back-end so that it can take action according to the
semantics attached to it. The arg parameter allows suspend to RAM and CPU
idle drivers to communicate to suspend protocol back-ends; it requires
standardization so that the interface can be reused seamlessly across
systems, paving the way for generic drivers.
Context memory is allocated on the stack, whose address is stashed in a
per-cpu variable to keep track of it and passed to core functions that
save/restore the registers required by the architecture.
Even though, upon successful execution, the cpu_suspend function shuts
down the suspending processor, the warm boot resume mechanism, based
on the cpu_resume function, makes the resume path operate as a
cpu_suspend function return, so that cpu_suspend can be treated as a C
function by the caller, which simplifies coding the PM drivers that rely
on the cpu_suspend API.
Upon context save, the minimal amount of memory is flushed to DRAM so
that it can be retrieved when the MMU is off and caches are not searched.
The suspend CPU operation, depending on the required operations (eg CPU vs
Cluster shutdown) is in charge of flushing the cache hierarchy either
implicitly (by calling firmware implementations like PSCI) or explicitly
by executing the required cache maintainance functions.
Debug exceptions are disabled during cpu_{suspend}/{resume} operations
so that debug registers can be saved and restored properly preventing
preemption from debug agents enabled in the kernel.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
On ARM64 SMP systems, cores are identified by their MPIDR_EL1 register.
The MPIDR_EL1 guidelines in the ARM ARM do not provide strict enforcement of
MPIDR_EL1 layout, only recommendations that, if followed, split the MPIDR_EL1
on ARM 64 bit platforms in four affinity levels. In multi-cluster
systems like big.LITTLE, if the affinity guidelines are followed, the
MPIDR_EL1 can not be considered a linear index. This means that the
association between logical CPU in the kernel and the HW CPU identifier
becomes somewhat more complicated requiring methods like hashing to
associate a given MPIDR_EL1 to a CPU logical index, in order for the look-up
to be carried out in an efficient and scalable way.
This patch provides a function in the kernel that starting from the
cpu_logical_map, implement collision-free hashing of MPIDR_EL1 values by
checking all significative bits of MPIDR_EL1 affinity level bitfields.
The hashing can then be carried out through bits shifting and ORing; the
resulting hash algorithm is a collision-free though not minimal hash that can
be executed with few assembly instructions. The mpidr_el1 is filtered through a
mpidr mask that is built by checking all bits that toggle in the set of
MPIDR_EL1s corresponding to possible CPUs. Bits that do not toggle do not
carry information so they do not contribute to the resulting hash.
Pseudo code:
/* check all bits that toggle, so they are required */
for (i = 1, mpidr_el1_mask = 0; i < num_possible_cpus(); i++)
mpidr_el1_mask |= (cpu_logical_map(i) ^ cpu_logical_map(0));
/*
* Build shifts to be applied to aff0, aff1, aff2, aff3 values to hash the
* mpidr_el1
* fls() returns the last bit set in a word, 0 if none
* ffs() returns the first bit set in a word, 0 if none
*/
fs0 = mpidr_el1_mask[7:0] ? ffs(mpidr_el1_mask[7:0]) - 1 : 0;
fs1 = mpidr_el1_mask[15:8] ? ffs(mpidr_el1_mask[15:8]) - 1 : 0;
fs2 = mpidr_el1_mask[23:16] ? ffs(mpidr_el1_mask[23:16]) - 1 : 0;
fs3 = mpidr_el1_mask[39:32] ? ffs(mpidr_el1_mask[39:32]) - 1 : 0;
ls0 = fls(mpidr_el1_mask[7:0]);
ls1 = fls(mpidr_el1_mask[15:8]);
ls2 = fls(mpidr_el1_mask[23:16]);
ls3 = fls(mpidr_el1_mask[39:32]);
bits0 = ls0 - fs0;
bits1 = ls1 - fs1;
bits2 = ls2 - fs2;
bits3 = ls3 - fs3;
aff0_shift = fs0;
aff1_shift = 8 + fs1 - bits0;
aff2_shift = 16 + fs2 - (bits0 + bits1);
aff3_shift = 32 + fs3 - (bits0 + bits1 + bits2);
u32 hash(u64 mpidr_el1) {
u32 l[4];
u64 mpidr_el1_masked = mpidr_el1 & mpidr_el1_mask;
l[0] = mpidr_el1_masked & 0xff;
l[1] = mpidr_el1_masked & 0xff00;
l[2] = mpidr_el1_masked & 0xff0000;
l[3] = mpidr_el1_masked & 0xff00000000;
return (l[0] >> aff0_shift | l[1] >> aff1_shift | l[2] >> aff2_shift |
l[3] >> aff3_shift);
}
The hashing algorithm relies on the inherent properties set in the ARM ARM
recommendations for the MPIDR_EL1. Exotic configurations, where for instance
the MPIDR_EL1 values at a given affinity level have large holes, can end up
requiring big hash tables since the compression of values that can be achieved
through shifting is somewhat crippled when holes are present. Kernel warns if
the number of buckets of the resulting hash table exceeds the number of
possible CPUs by a factor of 4, which is a symptom of a very sparse HW
MPIDR_EL1 configuration.
The hash algorithm is quite simple and can easily be implemented in assembly
code, to be used in code paths where the kernel virtual address space is
not set-up (ie cpu_resume) and instruction and data fetches are strongly
ordered so code must be compact and must carry out few data accesses.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
The refactoring of el2_setup split code setting up EL2 and detecting the
CPU boot mode in separate chunks. This allows the code that sets up EL2 to
run in an endian independent way - ie before the endianess is set up in
the respective sctlr registers.
This patch brings secondary_entry up-to-date so that CPUs entering the
kernel through this code path set-up EL2 and the cpu boot mode properly.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Mark Rutland <mark.rutand@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The current breakpoint instruction checking code for A32 is not endian
clean. Fix this with appropriate byte-swapping when retrieving
instructions.
Signed-off-by: Matthew Leach <matthew.leach@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On a BE system the wrong half of the X registers is retrieved/written
when attempting to get/set the value of aarch32 registers through
ptrace.
Ensure that types are the correct width so that the relevant
casting occurs.
Signed-off-by: Matthew Leach <matthew.leach@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The asynchronous aborts are generally fatal for the kernel but they can
be masked via the pstate A bit. If a system error happens while in
kernel mode, it won't be visible until returning to user space. This
patch enables this kind of abort early to help identifying the cause.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit f27dde8dee (sched: Add NEED_RESCHED to the preempt_count)
introduced the use of bit 31 in preempt_count for obscure scheduling
purposes.
This causes interrupts taken from EL0 to hit the (open coded) BUG when
this flag is flipped while handling the interrupt (we compare the
values before and after, and kill the kernel if they are different).
The fix is to stop messing with the preempt count entirely, as this
is already being dealt with in the generic code (irq_enter/irq_exit).
Tested on a dual A53 FPGA running cyclictest.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull ARM updates from Russell King:
"Included in this series are:
1. BE8 (modern big endian) changes for ARM from Ben Dooks
2. big.Little support from Nicolas Pitre and Dave Martin
3. support for LPAE systems with all system memory above 4GB
4. Perf updates from Will Deacon
5. Additional prefetching and other performance improvements from Will.
6. Neon-optimised AES implementation fro Ard.
7. A number of smaller fixes scattered around the place.
There is a rather horrid merge conflict in tools/perf - I was never
notified of the conflict because it originally occurred between Will's
tree and other stuff. Consequently I have a resolution which Will
forwarded me, which I'll forward on immediately after sending this
mail.
The other notable thing is I'm expecting some build breakage in the
crypto stuff on ARM only with Ard's AES patches. These were merged
into a stable git branch which others had already pulled, so there's
little I can do about this. The problem is caused because these
patches have a dependency on some code in the crypto git tree - I
tried requesting a branch I can pull to resolve these, and all I got
each time from the crypto people was "we'll revert our patches then"
which would only make things worse since I still don't have the
dependent patches. I've no idea what's going on there or how to
resolve that, and since I can't split these patches from the rest of
this pull request, I'm rather stuck with pushing this as-is or
reverting Ard's patches.
Since it should "come out in the wash" I've left them in - the only
build problems they seem to cause at the moment are with randconfigs,
and since it's a new feature anyway. However, if by -rc1 the
dependencies aren't in, I think it'd be best to revert Ard's patches"
I resolved the perf conflict roughly as per the patch sent by Russell,
but there may be some differences. Any errors are likely mine. Let's
see how the crypto issues work out..
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (110 commits)
ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h"
ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
ARM: 7871/1: amba: Extend number of IRQS
ARM: 7887/1: Don't smp_cross_call() on UP devices in arch_irq_work_raise()
ARM: 7872/1: Support arch_irq_work_raise() via self IPIs
ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode
ARM: 7878/1: nommu: Implement dummy early_paging_init()
ARM: 7876/1: clear Thumb-2 IT state on exception handling
ARM: 7874/2: bL_switcher: Remove cpu_hotplug_driver_{lock,unlock}()
ARM: footbridge: fix build warnings for netwinder
ARM: 7873/1: vfp: clear vfp_current_hw_state for dying cpu
ARM: fix misplaced arch_virt_to_idmap()
ARM: 7848/1: mcpm: Implement cpu_kill() to synchronise on powerdown
ARM: 7847/1: mcpm: Factor out logical-to-physical CPU translation
ARM: 7869/1: remove unused XSCALE_PMU Kconfig param
ARM: 7864/1: Handle 64-bit memory in case of 32-bit phys_addr_t
ARM: 7863/1: Let arm_add_memory() always use 64-bit arguments
ARM: 7862/1: pcpu: replace __get_cpu_var_uses
ARM: 7861/1: cacheflush: consolidate single-CPU ARMv7 cache disabling code
...
Merge first patch-bomb from Andrew Morton:
"Quite a lot of other stuff is banked up awaiting further
next->mainline merging, but this batch contains:
- Lots of random misc patches
- OCFS2
- Most of MM
- backlight updates
- lib/ updates
- printk updates
- checkpatch updates
- epoll tweaking
- rtc updates
- hfs
- hfsplus
- documentation
- procfs
- update gcov to gcc-4.7 format
- IPC"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits)
ipc, msg: fix message length check for negative values
ipc/util.c: remove unnecessary work pending test
devpts: plug the memory leak in kill_sb
./Makefile: export initial ramdisk compression config option
init/Kconfig: add option to disable kernel compression
drivers: w1: make w1_slave::flags long to avoid memory corruption
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
drivers/memstick/core/mspro_block.c: fix attributes array allocation
drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
kernel/panic.c: reduce 1 byte usage for print tainted buffer
gcov: reuse kbasename helper
kernel/gcov/fs.c: use pr_warn()
kernel/module.c: use pr_foo()
gcov: compile specific gcov implementation based on gcc version
gcov: add support for gcc 4.7 gcov format
gcov: move gcov structs definitions to a gcc version specific file
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
...
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:
- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes
plus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc()
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for deferred
probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
=GCbY
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"
* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
Pull timer changes from Ingo Molnar:
"Main changes in this cycle were:
- Updated full dynticks support.
- Event stream support for architected (ARM) timers.
- ARM clocksource driver updates.
- Move arm64 to using the generic sched_clock framework & resulting
cleanup in the generic sched_clock code.
- Misc fixes and cleanups"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
clocksource: sun4i: remove IRQF_DISABLED
clocksource: sun4i: Report the minimum tick that we can program
clocksource: sun4i: Select CLKSRC_MMIO
clocksource: Provide timekeeping for efm32 SoCs
clocksource: em_sti: convert to clk_prepare/unprepare
time: Fix signedness bug in sysfs_get_uname() and its callers
timekeeping: Fix some trivial typos in comments
alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
clocksource: arch_timer: Do not register arch_sys_counter twice
timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
sched_clock: Remove sched_clock_func() hook
arch_timer: Move to generic sched_clock framework
clocksource: tcb_clksrc: Remove IRQF_DISABLED
clocksource: tcb_clksrc: Improve driver robustness
clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare
clocksource: arm_arch_timer: Use clocksource for suspend timekeeping
clocksource: dw_apb_timer_of: Mark a few more functions as __init
clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally
arm: zynq: Enable arm_global_timer
...
The non-IPI interrupts are displayed only for the online cpus from
show_interrupts in kernel/irq/proc.c before calling arch_show_interrupts().
As a result, the column headers and the IPI count don't match if any
CPU is offline.
This patch fixes show_ipi_list to display IPIs for online CPUs only.
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The ARM architecture reference specifies that the IT state bits in the
PSR must be all zeros in ARM mode or behavior is unspecified. If an ARM
function is registered as a signal handler, and that signal is delivered
inside a block of instructions following an IT instruction, some of the
instructions at the beginning of the signal handler may be skipped if
the IT state bits of the Program Status Register are not cleared by the
kernel.
Signed-off-by: T.J. Purtell <tj@mobisocial.us>
[catalin.marinas@arm.com: code comment and commit log updated]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>