Our TLB invalidation routines may require a barrier before the
maintenance (in order to ensure pending page table writes are visible to
the hardware walker) and barriers afterwards (in order to ensure
completion of the maintenance and visibility in the instruction stream).
Whilst this is expensive, the cost can be reduced somewhat by reducing
the scope of the barrier instructions:
- The barrier before only needs to apply to stores (pte writes)
- Local ops are required only to affect the non-shareable domain
- Global ops are required only to affect the inner-shareable domain
This patch makes these changes for the TLB flushing code.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the ASID allocator doesn't require inner-shareable maintenance,
we can convert the local_bp_flush_all function to perform only
non-shareable flushing, in a similar manner to the TLB invalidation
routines.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Branch predictor maintenance is only required when we are either
changing the kernel's view of memory (switching tables completely) or
dealing with ASID rollover.
Both of these use-cases require subsequent TLB invalidation, which has
the relevant barrier instructions to ensure completion and visibility
of the maintenance, so this patch removes the instruction barrier from
[local_]flush_bp_all.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Inner-shareable TLB invalidation is typically more expensive than local
(non-shareable) invalidation, so performing the broadcasting for
local_flush_tlb_* operations is a waste of cycles and needlessly
clobbers entries in the TLBs of other CPUs.
This patch introduces __flush_tlb_* versions for many of the TLB
invalidation functions, which only respect inner-shareable variants of
the invalidation instructions when presented with the TLB_V7_UIS_FULL
flag. The local version is also inlined to prevent SMP_ON_UP kernels
from missing flushes, where the __flush variant would be called with
the UP flags.
This gains us around 0.5% in hackbench scores for a dual-core A15, but I
would expect this to improve as more cores (and clusters) are added to
the equation.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 93dc688 (ARM: 7684/1: errata: Workaround for Cortex-A15 erratum 798181 (TLBI/DSB operations)) causes the following undefined instruction error on a mx53 (Cortex-A8):
Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
CPU: 0 PID: 275 Comm: modprobe Not tainted 3.11.0-rc2-next-20130722-00009-g9b0f371 #881
task: df46cc00 ti: df48e000 task.ti: df48e000
PC is at check_and_switch_context+0x17c/0x4d0
LR is at check_and_switch_context+0xdc/0x4d0
This problem happens because check_and_switch_context() calls dummy_flush_tlb_a15_erratum() without checking if we are really running on a Cortex-A15 or not.
To avoid this issue, only call dummy_flush_tlb_a15_erratum() inside
check_and_switch_context() if erratum_a15_798181() returns true, which means that we are really running on a Cortex-A15.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
These changes bring both HugeTLB support and Transparent HugePage
(THP) support to ARM. Only long descriptors (LPAE) are supported
in this series.
The code has been tested on an Arndale board (Exynos 5250).
Since the merging of Will's tlb-ops branch, specifically 89c7e4b8bb
(ARM: 7661/1: mm: perform explicit branch predictor maintenance when required),
building SMP without CONFIG_MMU has been broken.
The local_flush_bp_all function is only called for operations related to
changing the kernel's view of memory and ASID rollover - both of which are
irrelevant to an !MMU kernel.
This patch adds a stub local_flush_bp_all() function to the other tlb
maintenance stubs and restores the ability to build an SMP !MMU kernel.
Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
nommu platforms do not perform address translation and therefore clearly
don't have TLBs. However, some SMP code assumes the presence of the TLB
flushing routines and will therefore fail to compile for a nommu system.
This patch defines dummy local_* TLB operations and #defines
tlb_ops_need_broadcast() as 0, therefore causing the usual ARM SMP TLB
operations to call the local variants instead.
Signed-off-by: Will Deacon <will.deacon@arm.com>
CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
CC: Nicolas Pitre <nico@linaro.org>
The patch adds support for THP (transparent huge pages) to LPAE
systems. When this feature is enabled, the kernel tries to map
anonymous pages as 2MB sections where possible.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[steve.capper@linaro.org: symbolic constants used, value of
PMD_SECT_SPLITTING adjusted, tlbflush.h included in pgtable.h,
added PROT_NONE support.]
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Many ARMv7 cores have hardware page table walkers that can read the L1
cache. This is discoverable from the ID_MMFR3 register, although this
can be expensive to access from the low-level set_pte functions and is a
pain to cache, particularly with multi-cluster systems.
A useful observation is that the multi-processing extensions for ARMv7
require coherent table walks, meaning that we can make use of ALT_SMP
patching in proc-v7-* to patch away the cache flush safely for these
cores.
Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On Cortex-A15 (r0p0..r3p2) the TLBI/DSB are not adequately shooting down
all use of the old entries. This patch implements the erratum workaround
which consists of:
1. Dummy TLBIMVAIS and DSB on the CPU doing the TLBI operation.
2. Send IPI to the CPUs that are running the same mm (and ASID) as the
one being invalidated (or all the online CPUs for global pages).
3. CPU receiving the IPI executes a DMB and CLREX (part of the exception
return code already).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The ARM architecture requires explicit branch predictor maintenance
when updating an instruction stream for a given virtual address. In
reality, this isn't so much of a burden because the branch predictor
is flushed during the cache maintenance required to make the new
instructions visible to the I-side of the processor.
However, there are still some cases where explicit flushing is required,
so add a local_bp_flush_all operation to deal with this.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch removes support for ARMv3 CPUs, which haven't worked properly
for quite some time (see the FIXME comment in arch/arm/mm/fault.c). The
only V3 parts left is the cache model for ARMv3, which is needed for some
odd reason by ARM740T CPUs, and being able to build with -march=armv3,
which is required for the RiscPC platform due to its bus structure.
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch defines the (pte|pmd)val_t as u32 and changes the page table
types to be based on these. The PMD bits are converted to the
corresponding type using the _AT macro.
The flush_pmd_entry/clean_pmd_entry argument was changed to (void *) to
allow them to be used with both PGD and PMD pointers and avoid code
duplication.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Kernel space needs very little in the way of BTC maintanence as most
mappings which are created and destroyed are non-executable, and so
could never enter the instruction stream.
The case which does warrant BTC maintanence is when a module is loaded.
This creates a new executable mapping, but at that point the pages have
not been initialized with code and data, so at that point they contain
unpredictable information. Invalidating the BTC at this stage serves
little useful purpose.
Before we execute module code, we call flush_icache_range(), which deals
with the BTC maintanence requirements. This ensures that we have a BTC
maintanence operation before we execute code via the newly created
mapping.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There's no need to noMMU to put tlb_flush() in asm/tlbflush.h - it's
part of the tlb shootdown interface. Move it to asm/tlb.h instead, as
per x86.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
UP systems do not implement all the instructions that SMP systems have,
so in order to boot a SMP kernel on a UP system, we need to rewrite
parts of the kernel.
Do this using an 'alternatives' scheme, where the kernel code and data
is modified prior to initialization to replace the SMP instructions,
thereby rendering the problematical code ineffectual. We use the linker
to generate a list of 32-bit word locations and their replacement values,
and run through these replacements when we detect a UP system.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On SMP systems, there is a small chance of a PTE becoming visible to a
different CPU before the current cache maintenance operations in
update_mmu_cache(). To avoid this, cache maintenance must be handled in
set_pte_at() (similar to IA-64 and PowerPC).
This patch provides a unified VIPT cache handling mechanism and
implements the __sync_icache_dcache() function for ARMv6 onwards
architectures. It is called from set_pte_at() and replaces the
update_mmu_cache(). The latter is still used on VIVT hardware where a
vm_area_struct is required.
Tested-by: Rabin Vincent <rabin.vincent@stericsson.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There are places in Linux where writes to newly allocated page cache
pages happen without a subsequent call to flush_dcache_page() (several
PIO drivers including USB HCD). This patch changes the meaning of
PG_arch_1 to be PG_dcache_clean and always flush the D-cache for a newly
mapped page in update_mmu_cache().
The patch also sets the PG_arch_1 bit in the DMA cache maintenance
function to avoid additional cache flushing in update_mmu_cache().
Tested-by: Rabin Vincent <rabin.vincent@stericsson.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On versions of the Cortex-A9 prior to r2p0, performing TLB invalidations by
ASID match can result in the incorrect ASID being broadcast to other CPUs.
As a consequence of this, the targetted TLB entries are not invalidated
across the system.
This workaround changes the TLB flushing routines to invalidate entries
regardless of the ASID.
Cc: <stable@kernel.org>
Tested-by: Rob Clark <rob@ti.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The standard I-cache Invalidate All (ICIALLU) and Branch Predication
Invalidate All (BPIALL) operations are not automatically broadcast to
the other CPUs in an ARMv7 MP system. The patch adds the Inner Shareable
variants, ICIALLUIS and BPIALLIS, if ARMv7 and SMP.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies. We do this via make_coherent() by making the pages
uncacheable.
This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().
Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():
On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
to construct a pointer to the pte again. Passing a pte_t * is much
more elegant. Maybe we might even replace the pte argument with the
pte_t?
Ben Herrenschmidt would also like the pte pointer for PowerPC:
Passing the ptep in there is exactly what I want. I want that
-instead- of the PTE value, because I have issue on some ppc cases,
for I$/D$ coherency, where set_pte_at() may decide to mask out the
_PAGE_EXEC.
So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.
Includes a fix from Stephen Rothwell:
sparc: fix fallout from update_mmu_cache API change
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch fixes the BUG: using smp_processor_id() in preemptible
Below is the stripped backtrace.
BUG: using smp_processor_id() in preemptible [00000000] code: init/1
caller is flush_tlb_mm+0x44/0x70
Backtrace:
[<c00225c4>] (dump_backtrace+0x0/0x110) from [<c01713a0>] (dump_stack+0x18/0x1c)
r7:00000000 r6:c00234f0 r5:00000001 r4:c7828000
[<c0171388>] (dump_stack+0x0/0x1c) from [<c0135364>] (debug_smp_processor_id+0xc0/0xf0)
[<c01352a4>] (debug_smp_processor_id+0x0/0xf0) from [<c00234f0>] (flush_tlb_mm+0x44/0x70)
r7:00000000 r6:c60b41a0 r5:c60b4154 r4:00000001
[<c00234ac>] (flush_tlb_mm+0x0/0x70) from [<c0039568>] (dup_mm+0x304/0x38c)
r5:c1f09058 r4:00000000
[<c0039264>] (dup_mm+0x0/0x38c) from [<c0039de4>] (copy_process+0x7b8/0xeb0)
[<c003962c>] (copy_process+0x0/0xeb0) from [<c003a638>] (do_fork+0x15c/0x29c)
[<c003a4dc>] (do_fork+0x0/0x29c) from [<c0021df0>] (sys_clone+0x34/0x3c)
[<c0021dbc>] (sys_clone+0x0/0x3c) from [<c001efa0>] (ret_fast_syscall+0x0/0x2c)
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ARMv7 SMP hardware can handle the TLB maintenance operations
broadcasting in hardware so that the software can avoid the costly IPIs.
This patch adds the necessary checks (the MMFR3 CPUID register) to avoid
the broadcasting if already supported by the hardware.
(this patch is based on the work done by Tony Thompson @ ARM)
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Adds support for Faraday FA526 core. This core is used at least by:
Cortina Systems Gemini and Centroid family
Cavium Networks ECONA family
Grain Media GM8120
Pixelplus ImageARM
Prolific PL-1029
Faraday IP evaluation boards
v2:
- move TLB_BTB to separate patch
- update copyrights
Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
Commit 2ccdd1e77d doesn't add
v7wbi_possible_flags and v7wbi_always_flags to possible_tlb_flags and
always_tlb_flags. This causes the L2 cache flush in clean_pmd_entry()
(intended for Feroceon only) to execute on ARMv7, and the CPU hangs.
This patch is required for OMAP3 boards to boot.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move platform independent header files to arch/arm/include/asm, leaving
those in asm/arch* and asm/plat* alone.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>