Avoid section mismatch involving arch_register_cpu.
Marking arch_register_cpu as __init and removing the export
for non-hotplug-cpu configurations makes the following warning
go away:
Section mismatch in reference from the function
arch_register_cpu() to the function .devinit.text:register_cpu()
The function arch_register_cpu() references
the function __devinit register_cpu().
This is often because arch_register_cpu lacks a __devinit
annotation or the annotation of register_cpu is wrong.
The only external user of arch_register_cpu in the tree is
in drivers/acpi/processor_core.c where it is guarded by
ACPI_HOTPLUG_CPU (which depends on HOTPLUG_CPU).
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The casts will always be needed, may as well make them the right
signedness. The ebx variables can easily be unsigned, may as well.
arch/x86/kernel/cpu/common.c:261:21: warning: incorrect type in argument 2 (different signedness)
arch/x86/kernel/cpu/common.c:261:21: expected unsigned int *eax
arch/x86/kernel/cpu/common.c:261:21: got int *<noident>
arch/x86/kernel/cpu/common.c:262:9: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/common.c:262:9: expected unsigned int *ebx
arch/x86/kernel/cpu/common.c:262:9: got int *<noident>
arch/x86/kernel/cpu/common.c:263:9: warning: incorrect type in argument 4 (different signedness)
arch/x86/kernel/cpu/common.c:263:9: expected unsigned int *ecx
arch/x86/kernel/cpu/common.c:263:9: got int *<noident>
arch/x86/kernel/cpu/common.c:264:9: warning: incorrect type in argument 5 (different signedness)
arch/x86/kernel/cpu/common.c:264:9: expected unsigned int *edx
arch/x86/kernel/cpu/common.c:264:9: got int *<noident>
arch/x86/kernel/cpu/common.c:293:30: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/common.c:293:30: expected unsigned int *ebx
arch/x86/kernel/cpu/common.c:293:30: got int *<noident>
arch/x86/kernel/cpu/common.c:350:22: warning: incorrect type in argument 2 (different signedness)
arch/x86/kernel/cpu/common.c:350:22: expected unsigned int *eax
arch/x86/kernel/cpu/common.c:350:22: got int *<noident>
arch/x86/kernel/cpu/common.c:351:10: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/common.c:351:10: expected unsigned int *ebx
arch/x86/kernel/cpu/common.c:351:10: got int *<noident>
arch/x86/kernel/cpu/common.c:352:10: warning: incorrect type in argument 4 (different signedness)
arch/x86/kernel/cpu/common.c:352:10: expected unsigned int *ecx
arch/x86/kernel/cpu/common.c:352:10: got int *<noident>
arch/x86/kernel/cpu/common.c:353:10: warning: incorrect type in argument 5 (different signedness)
arch/x86/kernel/cpu/common.c:353:10: expected unsigned int *edx
arch/x86/kernel/cpu/common.c:353:10: got int *<noident>
arch/x86/kernel/cpu/common.c:362:30: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/common.c:362:30: expected unsigned int *ebx
arch/x86/kernel/cpu/common.c:362:30: got int *<noident>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Not necessary to expose it, also fixes sparse warning.
arch/x86/kernel/early_printk.c:196:16: warning: symbol 'early_console' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x1eb41): Section mismatch in reference from the function calgary_handle_quirks() to the function .init.text:calgary_set_split_completion_timeout()
calgary_handle_quirks() are only called at
__init time (in calgary_init_one() via handle_quirks ops).
So annotate this function and the sister function __init.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix following warning:
WARNING: o-x86_64/arch/x86/kernel/built-in.o(.text+0x13d15): Section mismatch in reference from the function acpi_map_lsapic() to the function .cpuinit.text:mp_register_lapic()
The function acpi_map_lsapic() is exported and thus not annotated.
But the sole user is acpi/processor_core.c in a __cpuinit path.
So create a small wrapper and put back the annotation thus
avoiding the warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the following warnings:
WARNING: arch/x86/kernel/built-in.o(.exit.text+0xf8): Section mismatch in reference from the function msr_exit() to the variable .cpuinit.data:msr_class_cpu_notifier
WARNING: arch/x86/kernel/built-in.o(.exit.text+0x158): Section mismatch in reference from the function cpuid_exit() to the variable .cpuinit.data:cpuid_class_cpu_notifier
WARNING: arch/x86/kernel/built-in.o(.exit.text+0x171): Section mismatch in reference from the function microcode_exit() to the variable .cpuinit.data:mc_cpu_notifier
In all three cases there were a function annotated __exit
that referenced a variable annotated __cpuinitdata.
The fix was to replace the annotation of the notifier
with __refdata to tell modpost that the reference to
a _cpuinit function in the notifier are OK.
The unregister call that references the notifier
variable will simple delete the function pointer
so there is no problem ignoring the reference.
Note: This looks like another case where __cpuinit
has been used as replacement for proper use
of CONFIG_HOTPLUG_CPU to decide what code are used for
HOTPLUG_CPU.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Silence the following warning:
WARNING: o-x86_64/arch/x86/kernel/built-in.o(.text+0x17cd3): Section mismatch in reference from the function remove_cpu_from_maps() to the variable .cpuinit.data:cpu_initialized
remove_cpu:maps() had a single user: __cpu_disable() so
mark it static and annotate it with __ref to silence the
warning from modpost.
_cpu_disable() has a single user in kernel/cpu.c:
=> take_cpu_down()
which again has a single user in the following call:
=> __stop_machine_run(take_cpu_down, &tcd_param, cpu);
Here a kthread is created.
So maybe the warning is correct and the right fix is to
remove the __cpuinitdata annotation of cpu_initialized?
Note: The analysis were disturbed by the fact that we had a variable
with the same name in cpu/common.c - but this is 32 bit only]
Note: Should smpboot_64 use cpu_clear()?
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
for bzImage, the vmlinux_64.lds still have s32 bit code, and startup_32
should be 0. fix the comment.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change latencytop Kconfig entry so it doesn't list the archictectures
that support it. Instead introduce HAVE_LATENCY_SUPPORT which any
architecture can set. Should reduce patch conflicts.
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Holger Wolf <wolf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch moves the micro-assembler in a separate implementation, as
it is useful for further run-time optimizations. The only change in
behaviour is cutting down printk noise at kernel startup time.
Checkpatch complains about macro parameters which aren't protected by
parentheses. I believe this is a flaw in checkpatch, the paste operator
used in those macros won't work with parenthesised parameters.
Signed-off-by: Thiemo Seufer <ths@networkno.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit 312b1485fb made __INIT_REFOK expand
into .section .section ".ref.text", "ax". Since the assembler doesn't
tolerate stuttering in the source that broke all MIPS builds.
Since with this change Sam downgraded __INIT_REFOK the best fix is to
get replace it by the modern days operator. With MIPS the only user
of __INIT_REFOK and __INITDATA_REFOK (which was equally broken) being
unused anyway these can be deleted but that's subject of a separate
commit.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Doing a make randconfig I came across this error in the Makefile.
This patch makes a directory out of arch/x86/mach-default for
CONFIG_X86_RDC321X
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[POWERPC] pasemi: Fix thinko in dma_direct_ops setup
The first patch will just fall through and still set dma_data to a bad
value, make it return directly instead.
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove all the dead timer interrupt checking functions for the ColdFire
CPU "timers" hardware that are not used after switching to GENERIC_TIME.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switching to GENERIC_TIME means we no longer need the empty timer offset
function for the 68360 CPU.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove unused local gettimeofday functions, now that we are using
GENERIC_TIME.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch the 68328 CPU timer code to using GENERIC_TIME.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the ColdFire DMA address table into its own file, and out
of each of the different CPU config files. No need to have a copy
of it in each of the config setup files.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Modify Makefiles to support separate coldfire platform directory.
Currently the common ColdFire CPU family code sits in the
arch/m68knommu/platform/5307 directory. This is confusing, the files
containing this common code are in no way specific to the 5307 ColdFire.
Create an arch/m68knommu/platform/coldfire directory to contain this
common code. Other m68knommu CPU varients do not need use this code
though, so it doesn't make sense to move it to arch/m68knommu/kernel.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move common ColdFire CPU vectors.c to common coldfire platform directory.
Currently the common ColdFire CPU family code sits in the
arch/m68knommu/platform/5307 directory. This is confusing, the files
containing this common code are in no way specific to the 5307 ColdFire.
Create an arch/m68knommu/platform/coldfire directory to contain this
common code. Other m68knommu CPU varients do not need use this code
though, so it doesn't make sense to move it to arch/m68knommu/kernel.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move common ColdFire CPU timers.c to common coldfire platform directory.
Currently the common ColdFire CPU family code sits in the
arch/m68knommu/platform/5307 directory. This is confusing, the files
containing this common code are in no way specific to the 5307 ColdFire.
Create an arch/m68knommu/platform/coldfire directory to contain this
common code. Other m68knommu CPU varients do not need use this code
though, so it doesn't make sense to move it to arch/m68knommu/kernel.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move common ColdFire CPU pit.c to common coldfire platform directory.
Currently the common ColdFire CPU family code sits in the
arch/m68knommu/platform/5307 directory. This is confusing, the files
containing this common code are in no way specific to the 5307 ColdFire.
Create an arch/m68knommu/platform/coldfire directory to contain this
common code. Other m68knommu CPU varients do not need use this code
though, so it doesn't make sense to move it to arch/m68knommu/kernel.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move common ColdFire CPU head.S to common coldfire platform directory.
Currently the common ColdFire CPU family code sits in the
arch/m68knommu/platform/5307 directory. This is confusing, the files
containing this common code are in no way specific to the 5307 ColdFire.
Create an arch/m68knommu/platform/coldfire directory to contain this
common code. Other m68knommu CPU varients do not need use this code
though, so it doesn't make sense to move it to arch/m68knommu/kernel.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move common ColdFire CPU entry.S to common coldfire platform directory.
Currently the common ColdFire CPU family code sits in the
arch/m68knommu/platform/5307 directory. This is confusing, the files
containing this common code are in no way specific to the 5307 ColdFire.
Create an arch/m68knommu/platform/coldfire directory to contain this
common code. Other m68knommu CPU varients do not need use this code
though, so it doesn't make sense to move it to arch/m68knommu/kernel.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 5407 ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 532x ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 527x ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 5307 ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 528x ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 5249 ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 5272 ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 520x ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 523x ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 5206e ColdFire parts.
Initial support is for the UARTs. DMA support is moved to common code.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to platform style configuration for 5206 ColdFire parts.
Initial support is for the UARTs. DMA support moved to common code
for all ColdFire parts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few places missed the "a" specifier for the __ex_table section. Add
these so we avoid generation an additional section at link time.
Latest modpost would otherwise complain like this:
WARNING: vmlinux.o (__ex_table.2): section name inconsistency.
(.[number]+) following section name.
Did you forget to use "ax"/"aw" in a .S file?
Note that for example <linux/init.h> contains
section definitions for use in .S files.
WARNING: vmlinux.o (__ex_table.4): section name inconsistency.
(.[number]+) following section name.
Did you forget to use "ax"/"aw" in a .S file?
Note that for example <linux/init.h> contains
section definitions for use in .S files.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ld will generate an unique named section when assembler do not use
"ax" but gcc does. Add the missing annotation.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
Remove commented-out code copied from NFS
NFS: Switch from intr mount option to TASK_KILLABLE
Add wait_for_completion_killable
Add wait_event_killable
Add schedule_timeout_killable
Use mutex_lock_killable in vfs_readdir
Add mutex_lock_killable
Use lock_page_killable
Add lock_page_killable
Add fatal_signal_pending
Add TASK_WAKEKILL
exit: Use task_is_*
signal: Use task_is_*
sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
ptrace: Use task_is_*
power: Use task_is_*
wait: Use TASK_NORMAL
proc/base.c: Use task_is_*
proc/array.c: Use TASK_REPORT
perfmon: Use task_is_*
...
Fixed up conflicts in NFS/sunrpc manually..
This patch fixes a bug of early_ioremap_reset(), which had been fixed
before by "convert the boot time page table to the kernels native
format" patch. But that patch has been reverted now.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Because in i386 early boot stage, boot_cpu_data may be not available,
which makes clflush_cach_range() into infinite loop, which is called
by change_page_attr(). This patch fixes this by setting
boot_cpu_data.x86_clflush_size in early_cpu_detect().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch replaces __change_page_attr_set_clr() with
change_page_attr_set_clr() in change_page_attr_clear() to flush the
TLB/cache properly.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/cpu/intel_cacheinfo.c:355:7: warning: symbol 'i' shadows an earlier one
arch/x86/kernel/cpu/intel_cacheinfo.c:296:39: originally declared here
arch/x86/kernel/cpu/intel_cacheinfo.c:367:18: warning: incorrect type in argument 2 (different signedness)
arch/x86/kernel/cpu/intel_cacheinfo.c:367:18: expected unsigned int *eax
arch/x86/kernel/cpu/intel_cacheinfo.c:367:18: got int *
arch/x86/kernel/cpu/intel_cacheinfo.c:367:28: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/cpu/intel_cacheinfo.c:367:28: expected unsigned int *ebx
arch/x86/kernel/cpu/intel_cacheinfo.c:367:28: got int *
arch/x86/kernel/cpu/intel_cacheinfo.c:367:38: warning: incorrect type in argument 4 (different signedness)
arch/x86/kernel/cpu/intel_cacheinfo.c:367:38: expected unsigned int *ecx
arch/x86/kernel/cpu/intel_cacheinfo.c:367:38: got int *
arch/x86/kernel/cpu/intel_cacheinfo.c:367:48: warning: incorrect type in argument 5 (different signedness)
arch/x86/kernel/cpu/intel_cacheinfo.c:367:48: expected unsigned int *edx
arch/x86/kernel/cpu/intel_cacheinfo.c:367:48: got int *
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
More late-caught fallout from the mainline merge. Commit
35e4a6e26d ("[POWERPC] Use
archdata.dma_data in dma_direct_ops and add the offset") claimed
"Now that all platforms using dma_direct_offset setup the
archdata.dma_data correctly, ..."
..but nope -- the pasemi iommu setup code that disables translation on
the DMA pci device didn't set dma_data correctly.
This fixes it.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (454 commits)
[POWERPC] Cell IOMMU fixed mapping support
[POWERPC] Split out the ioid fetching/checking logic
[POWERPC] Add support to cell_iommu_setup_page_tables() for multiple windows
[POWERPC] Split out the IOMMU logic from cell_dma_dev_setup()
[POWERPC] Split cell_iommu_setup_hardware() into two parts
[POWERPC] Split out the logic that allocates struct iommus
[POWERPC] Allocate the hash table under 1G on cell
[POWERPC] Add set_dma_ops() to match get_dma_ops()
[POWERPC] 83xx: Clean up / convert mpc83xx board DTS files to v1 format.
[POWERPC] 85xx: Only invalidate TLB0 and TLB1
[POWERPC] 83xx: Fix typo in mpc837x compatible entries
[POWERPC] 85xx: convert sbc85* boards to use machine_device_initcall
[POWERPC] 83xx: rework platform Kconfig
[POWERPC] 85xx: rework platform Kconfig
[POWERPC] 86xx: Remove unused IRQ defines
[POWERPC] QE: Explicitly set address-cells and size cells for muram
[POWERPC] Convert StorCenter DTS file to /dts-v1/ format.
[POWERPC] 86xx: Convert all 86xx DTS files to /dts-v1/ format.
[PPC] Remove 85xx from arch/ppc
[PPC] Remove 83xx from arch/ppc
...
This patch adds support for setting up a fixed IOMMU mapping on certain
cell machines. For 64-bit devices this avoids the performance overhead of
mapping and unmapping pages at runtime. 32-bit devices are unable to use
the fixed mapping.
The fixed mapping is established at boot, and maps all of physical memory
1:1 into device space at some offset. On machines with < 30 GB of memory
we setup the fixed mapping immediately above the normal IOMMU window.
For example a machine with 4GB of memory would end up with the normal
IOMMU window from 0-2GB and the fixed mapping window from 2GB to 6GB. In
this case a 64-bit device wishing to DMA to 1GB would be told to DMA to
3GB, plus any offset required by firmware. The firmware offset is encoded
in the "dma-ranges" property.
On machines with 30GB or more of memory, we are unable to place the fixed
mapping above the normal IOMMU window as we would run out of address space.
Instead we move the normal IOMMU window to coincide with the hash page
table, this region does not need to be part of the fixed mapping as no
device should ever be DMA'ing to it. We then setup the fixed mapping
from 0 to 32GB.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split out the ioid fetching and checking logic so we can use it elsewhere
in a subsequent patch.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add support to cell_iommu_setup_page_tables() for handling two windows,
the dynamic window and the fixed window. A fixed window size of 0
indicates that there is no fixed window at all.
Currently there are no callers who pass a non-zero fixed window, but the
upcoming fixed IOMMU mapping patch will change that.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split the IOMMU logic out from cell_dma_dev_setup() into a separate
function. If we're not using dma_direct_ops or dma_iommu_ops we don't
know what the hell's going on, so BUG.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split cell_iommu_setup_hardware() into two parts. Split the page table
setup into cell_iommu_setup_page_tables() and the bits that kick the
hardware into cell_iommu_enable_hardware().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split out the logic that allocates a struct iommu into a separate
function. This can fail however the calling code has never cared - so
just return if we can't allocate an iommu.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In order to support the fixed IOMMU mapping (in a subsequent patch),
we need the hash table to be inside the IOMMUs DMA window. This is
usually 2G, but let's make sure the hash table is under 1G as that
will satisfy the IOMMU requirements and also means the hash table will
be on node 0.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
alpha: fix x86.git merge build error
ia64: on UP percpu variables are not small memory model
x86: fix arch/x86/kernel/test_nx.c modular build bug
s390: use generic percpu linux-2.6.git
POWERPC: use generic per cpu
ia64: use generic percpu
SPARC64: use generic percpu
percpu: change Kconfig to HAVE_SETUP_PER_CPU_AREA
modules: fold percpu_modcopy into module.c
x86: export copy_from_user_ll_nocache[_nozero]
x86: fix duplicated TIF on 64-bit
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
lguest: use __PAGE_KERNEL instead of _PAGE_KERNEL
lguest: Use explicit includes rateher than indirect
lguest: get rid of lg variable assignments
lguest: change gpte_addr header
lguest: move changed bitmap to lg_cpu
lguest: move last_pages to lg_cpu
lguest: change last_guest to last_cpu
lguest: change spte_addr header
lguest: per-vcpu lguest pgdir management
lguest: make pending notifications per-vcpu
lguest: makes special fields be per-vcpu
lguest: per-vcpu lguest task management
lguest: replace lguest_arch with lg_cpu_arch.
lguest: make registers per-vcpu
lguest: make emulate_insn receive a vcpu struct.
lguest: map_switcher_in_guest() per-vcpu
lguest: per-vcpu interrupt processing.
lguest: per-vcpu lguest timers
lguest: make hypercalls use the vcpu struct
lguest: make write() operation smp aware
...
Manual conflict resolved (maybe even correctly, who knows) in
drivers/lguest/x86/core.c
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
PPC: Fix powerpc vio_find_name to not use devices_subsys
Driver core: add bus_find_device_by_name function
Module: check to see if we have a built in module with the same name
x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
Driver core: Fix up build when CONFIG_BLOCK=N
Sparc64 has a way of providing the base address for the per cpu area of the
currently executing processor in a global register.
Sparc64 also provides a way to calculate the address of a per cpu area
from a base address instead of performing an array lookup.
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
percpu_modcopy() is defined multiple times in arch files. However, the only
user is module.c. Put a static definition into module.c and remove
the definitions from the arch files.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.
Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port. Move migration to the regular
vcpu execution path, triggered by a new bitflag.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When preparing to enter the guest, if an interrupt comes in while
preemption is disabled but interrupts are still enabled, we miss a
preemption point. Fix by explicitly checking whether we need to
reschedule.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Otherwise we re-initialize the mmu caches, which will fail since the
caches are already registered, which will cause us to deinitialize said caches.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Right now rmap_remove won't set the page as dirty if the shadow pte
pointed to this page had write access and then it became readonly.
This patches fixes that, by setting the page as dirty for spte changes from
write to readonly access.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When executing a test program called "crashme", we found the KVM guest cannot
survive more than ten seconds, then encounterd kernel panic. The basic concept
of "crashme" is generating random assembly code and trying to execute it.
After some fixes on emulator insn validity judgment, we found it's hard to
get the current emulator handle the invalid instructions correctly, for the
#UD trap for hypercall patching caused troubles. The problem is, if the opcode
itself was OK, but combination of opcode and modrm_reg was invalid, and one
operand of the opcode was memory (SrcMem or DstMem), the emulator will fetch
the memory operand first rather than checking the validity, and may encounter
an error there. For example, ".byte 0xfe, 0x34, 0xcd" has this problem.
In the patch, we simply check that if the invalid opcode wasn't vmcall/vmmcall,
then return from emulate_instruction() and inject a #UD to guest. With the
patch, the guest had been running for more than 12 hours.
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If some other cpu steals mmu pages between our check and an attempt to
allocate, we can run out of mmu pages. Fix by moving the check into the
same critical section as the allocation.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.
Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Since gfn_to_page() is a sleeping function, and we want to make the core mmu
spinlocked, we need to pass the page from the walker context (which can sleep)
to the shadow context (which cannot).
[marcelo: avoid recursive locking of mmap_sem]
Signed-off-by: Avi Kivity <avi@qumranet.com>
In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Do not hold kvm->lock mutex across the entire pagefault code,
only acquire it in places where it is necessary, such as mmu
hash list, active list, rmap and parent pte handling.
Allow concurrent guest walkers by switching walk_addr() to use
mmap_sem in read-mode.
And get rid of the lockless __gfn_to_page.
[avi: move kvm_mmu_pte_write() locking inside the function]
[avi: add locking for real mode]
[avi: fix cmpxchg locking]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel. This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This can help diagnosing what the guest is trying to do. In many cases
we can get away with partial emulation of msrs.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Host side TLB flush can be merged together if multiple
spte need to be write-protected.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move ioapic code to common, since IA64 also needs it.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This paves the way for multiple architecture support. Note that while
ioapic.c could potentially be shared with ia64, it is also moved.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
includes, not existing. Rather than add a zillion <asm/kvm.h>s, export kvm.h
only if the arch actually supports it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
memnode.map is s16 array because of nodeid is 16 bit now.
so need to increase the nodemap_size according to that bits.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
one early crash on one 8 node 256g machine:
Command line: console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug apic=debug acpi.debug_level=0x0000000f pci=routeirq ip=dhcp load_ramdisk=1 ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dffe0000 (usable)
BIOS-e820: 00000000dffe0000 - 00000000dffee000 (ACPI data)
BIOS-e820: 00000000dffee000 - 00000000dffff050 (ACPI NVS)
BIOS-e820: 00000000dffff050 - 00000000e0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000004020000000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map = 67239936
Kernel panic - not syncing: Duplicated early reservation d40000-e42000
Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3
Call Trace:
[<ffffffff80221545>] lapic_get_maxlvt+0x0/0x10
[<ffffffff80221657>] clear_local_APIC+0x5/0xcf
[<ffffffff80221726>] disable_local_APIC+0x5/0x17
[<ffffffff8021fe16>] smp_send_stop+0x46/0x4c
[<ffffffff80235293>] panic+0x94/0x13e
[<ffffffff80bc3b03>] sctp_eps_proc_init+0x12/0x34
[<ffffffff80b9f1c5>] reserve_early+0x30/0x6c
[<ffffffff80803925>] init_memory_mapping+0x2cd/0x2dc
[<ffffffff80b9dc01>] setup_arch+0x21f/0x44e
[<ffffffff80b978be>] start_kernel+0x6f/0x2c7
[<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3
it turns out there is overlap between pgtable and bss...
in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end
need to round up table_start to PAGE_SIZE.
also make the panic more informative.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This just adds the PCI IDs of AMD's family 10h and 11h CPU's northbridges to
k8topology discovery.
Signed-off-by: Joachim Deguara <joachim.deguara@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Put appropriate pagetable update hooks in so that paravirt knows
what's going on in there.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use a standard list threaded through page->lru for maintaining the pgd
list on PAE. This is the same as 64-bit, and seems saner than using a
non-standard list via page->index.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.
If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.
With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.
In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.
An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.
A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff
Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In x86 PAE mode, stop treating pmds as a special case. Previously
they were always allocated and freed with the pgd. The modifies the
code to be the same as 64-bit mode, where they are allocated on
demand.
This is a step on the way to unifying 32/64-bit pagetable allocation
as much as possible.
There is a complicating wart, however. When you install a new
reference to a pmd in the pgd, the processor isn't guaranteed to see
it unless you reload cr3. Since reloading cr3 also has the
side-effect of flushing the tlb, this is an expense that we want to
avoid whereever possible.
This patch simply avoids reloading cr3 unless the update is to the
current pagetable. Later patches will optimise this further.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The change from current to tsk in do_page_fault is safe as
this is set at the very beginning of the function.
Removes a likely() annotation from the 64-bit version, this
could have instead been added to 32-bit.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When changing a kernel page from RO->RW, it's OK to leave stale TLB
entries around, since doing a global flush is expensive and they pose
no security problem. They can, however, generate a spurious fault,
which we should catch and simply return from (which will have the
side-effect of reloading the TLB to the current PTE).
This can occur when running under Xen, because it frequently changes
kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it avoids
doing a global TLB flush after changing page permissions.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On !PAE 32-bit, _PAGE_NX will be 0, making is_prefetch always
return early. The test is sufficient on PAE as __supported_pte_mask
is updated in the same places as nx_enabled in init_32.c which also
takes disable_nx into account.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Unify includes in moved fault.c.
Modify Makefiles to pick up unified file.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Elimination of these ifdefs can be done in a unified file.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's about time to get on with unifying these files, elimination
of the ugly ifdefs can occur in the unified file.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
printk fixes. NOP in terms of functionality, but strings got
a bit larger due to the KERN_ markers that were added.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes some bugs of EFI memory handing code.
- On x86_64, it is possible that EFI memory map can not be mapped via
identity map, so efi_map_memmap is removed, just use early_ioremap.
- On i386, the EFI memory map mapping take effect cross paging_init,
so it is not necessary to use efi_map_memmap.
- EFI memory map is unmapped in efi_enter_virtual_mode to avoid
early_ioremap leak.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch makes reboot_type of BOOT_EFI is used on i386 too. Because
correpsonding reboot code of i386 and x86_64 is merged.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This changes the oops dumping format for page faults to
be similar between X86_32 and 64.
This is the first user of printk_address on X86_32.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This will help when unifying the oops dumping code on 32/64
bit. No functional changes.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Further towards unifying these files, add another helper
in same spirit as is_errata93.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Further towards unifying these files, add another helper
in same spirit as is_errata93.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cleanup the address calculations, which are necessary to identify the
high/low alias mappings of the kernel on 64 bit machines. Instead of
calling __pa/__va back and forth, calculate the physical address once
and base the other calculations on it. Add understandable constants so
we can use the already available within() helper. Also add comments,
which help mere mortals to understand what this code does.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
make CONFIG_DEBUG_PAGEALLOC universally available.
CONFIG_HIBERNATION and CONFIG_HUGETLBFS was disabling it, for no
particular reason.
If there are any unfixed bugs here we'll fix it, but do not disable
vital debugging facilities like that ..
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
debug incorrect/late access to init memory, by permanently unmapping
the init memory ranges. Depends on CONFIG_DEBUG_PAGEALLOC=y.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It looks like a mismerge put the rodata self-check in the wrong spot; move
it to the right place after marking the .rodata section read only.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
clflush is sufficient to be issued on one CPU. The invalidation is
broadcast throughout the coherence domain.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
clflush is an unordered operation with respect to other memory
traffic, including other CLFLUSH instructions. This needs proper
fencing with mfence.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The function name global_flush_tlb() suggests something different from
what the function really does. Rename it to cpa_flush_all(), which is an
understandable counterpart to cpa_flush_range().
no global visibility of the old API anymore.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use clflush on CPUs which support this.
clflush is only used when the page attribute operation has been
successful. On CPUs which do not support clflush and in the case of
error the old fashioned global_flush_tlb() is called.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Convert cpa_set and cpa_clear to call the new set_clr function.
Seperate out the debug helpers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Create a set_and_clr function to avoid the duplicate loops. Allows
also to do combined operations for optimization.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To avoid the modification of the flush code for the clflush
implementation, move the flush into the set and clear functions and
provide helper functions for the debugging code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd.
I also cleaned up the NX test code quite a bit, and got rid of the ugly
exception table sorting stuff.
From: Arjan van de Ven <arjan@linux.intel.com>
This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option
as well as the NX CPU feature/mappings. Both testcases can move to tests/
once that patch gets merged into mainline.
(I'm half considering moving the rodata test into mm/init.c but I'll
wait with that until init.c is unified)
As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h
for the RODATA sections, which lead to 1 page less being marked read only.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When we free initmem, various rodata and CPA checks may have left
memory read only.. this patch ensures that the memory is writable
before we free it.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In Ingo's testing, he found a bug in the CPA selftest code. What would
happen is that the test would call change_page_attr_addr on a range of
memory, part of which was read only, part of which was writable. The
only thing the test wanted to change was the global bit...
What actually happened was that the selftest would take the permissions
of the first page, and then the change_page_attr_addr call would then
set the permissions of the entire range to this first page. In the
rodata section case, this resulted in pages after the .rodata becoming
read only... which made the kernel rather unhappy in many interesting
ways.
This is just another example of how dangerous the cpa API is (was); this
patch changes the test to use the incremental clear/set APIs
instead, and it changes the clear/set implementation to work on a 1 page
at a time basis.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The set_memory_* and set_pages_* family of API's currently requires the
callers to do a global tlb flush after the function call; forgetting this is
a very nasty deathtrap. This patch moves the global tlb flush into
each of the callers
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
change_page_attr_add is only used in pageattr.c now, so we can
make this function static.
change_page_attr() isn't used anywere at all anymore; this function
is a really bad API anyway so just remove the bloat entirely.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
page_is_ram has a FIXME since ages, which reminds to sanity check the
BIOS area between 640k and 1M, which is sometimes falsely reported as
RAM in the e820 tables.
Implement the sanity check. Move the BIOS range defines from
pageattr.c into e820.h to avoid duplicate defines.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the introduction of the new API, no driver or non-archcore code needs
to use c-p-a anymore, so this patch also deprecates the EXPORT_SYMBOL of CPA
(it's a horrible API after all).
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch converts various users of change_page_attr() to the new,
more intent driven set_page_*/set_memory_* API set.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Right now, if drivers or other code want to change, say, a cache attribute of a
page, the only API they have is change_page_attr(). c-p-a is a really bad API
for this, because it forces the caller to know *ALL* the attributes he wants
for the page, not just the 1 thing he wants to change. So code that wants to
set a page uncachable, needs to be aware of the NX status as well etc etc etc.
This patch introduces a set of new APIs for this, set_pages_<attr> and
set_memory_<attr>, that offer a logical change to the user, and leave all
attributes not implied by the requested logical change alone.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unify the now identical ioremap_32.c and ioremap_64.c into the
same ioremap.c file. No code changed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When ioremap_page_range fails, then we can use remove_vm_area instead
of vunmap safely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use change_page_attr_addr() instead of change_page_attr(), which
simplifies the code significantly and matches the 64bit
implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make c_p_a unconditional for ioremap and iounmap. This ensures
complete consistency of the flags which are handed to
ioremap_page_range and the real flags in the mappings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
64bit uses end_pfn_map and 32bit uses max_low_pfn. There are several
files which have #ifdef'ed defines which map either to end_pfn_map or
max_low_pfn. Replace this by a universal define and clean up all the
other instances.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Get rid of the douplicate define of ISA_START/END_ADDRESS and use the
same headers in 32 and 64 bit code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The pgprot flags which are handed into ioremap_page_range() are
different to those which are set in change_page_attr(). The
ioremap_page_range flags are executable, while the c_p_a flags are
not.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The pgprot flags which are handed into ioremap_page_range() are
different to those which are set in change_page_attr(). The
ioremap_page_range flags are executable, while the c_p_a flags are
not. Also make the mappings global (which is a NOP currently on 32bit,
although CPUs from PPRO+ onwards support it, but that's a separate
fix.)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
What the check_exec() function really is trying to do is enforce certain
bits in the pgprot that are required by the x86 architecture, but that
callers might not be aware of (such as NX bit exclusion of the BIOS
area for BIOS based PCI access; it's not uncommon to ioremap the BIOS
region for various purposes and normally ioremap() memory has the NX bit
set).
This patch turns the check_exec() function into static_protections()
which also is now used to make sure the kernel text area remains non-NX
and that the .rodata section remains read-only. If the architecture
ends up requiring more such mandatory prot settings for specific areas,
this is now a reasonable place to add these.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes a bug of ioremap_nocache. ioremap_nocache() will call
__ioremap() with flags != 0 to do the real work, which will call
change_page_attr_addr() if phys_addr + size - 1 < (end_pfn_map << PAGE_SHIFT).
But some pages between 0 ~ end_pfn_map << PAGE_SHIFT are not mapped by
identity map, this will make change_page_attr_addr failed.
This patch is based on latest x86 git and has been tested on x86_64 platform.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes a bug of change_page_attr/change_page_attr_addr on
Intel i386/x86_64 CPUs. After changing page attribute to be
executable with these functions, the page remains un-executable on
Intel i386/x86_64 CPU. Because on Intel i386/x86_64 CPU, only if the
"NX" bits of all three level page tables are cleared (PAE is enabled),
the corresponding page is executable (refer to section 4.13.2 of Intel
64 and IA-32 Architectures Software Developer's Manual). So, the bug
is fixed through clearing the "NX" bit of PMD when splitting the huge
PMD.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
do some leftover cleanups in the now unified arch/x86/mm/pageattr.c
file.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
unify the now perfectly identical pageattr_32/64.c files - no code changed.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
backmerge 64-bit details into 32-bit pageattr.c.
the pageattr_32.c and pageattr_64.c files are now identical.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
careful: might change driver behavior - but this is the right
return value.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
prepare for the unification of the cpa code, by unifying the
lookup_address() logic between 32-bit and 64-bit.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
prepare for the unification of the cpa code, by unifying the
lookup_address() logic between 32-bit and 64-bit.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
get more testing of the c_p_a() code done by not turning off
PSE on DEBUG_PAGEALLOC.
this simplifies the early pagetable setup code, and tests
the largepage-splitup code quite heavily.
In the end, all the largepages will be split up pretty quickly,
so there's no difference to how DEBUG_PAGEALLOC worked before.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
further simplify cpa locking: since the largepage-split is a
slowpath, use the pgd_lock for the whole operation, intead
of the mmap_sem.
This also makes it suitable for DEBUG_PAGEALLOC purposes again.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cpa self-test fixes. change_page_attr_addr() was buggy, it
passed in a virtual address as a physical one.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
further cpa largepage-split cleanups: make the splitup isolated
functionality, without leaking details back into __change_page_attr().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
simplify 32-bit cpa largepage splitting: do a pure split and repeat
the pte lookup to get the new pte modified.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>