Commit Graph

8 Commits

Author SHA1 Message Date
Matthieu Castet 5bd5a45266 x86: Add NX protection for kernel data
This patch expands functionality of CONFIG_DEBUG_RODATA to set main
(static) kernel data area as NX.

The following steps are taken to achieve this:

 1. Linker script is adjusted so .text always starts and ends on a page bound
 2. Linker script is adjusted so .rodata always start and end on a page boundary
 3. NX is set for all pages from _etext through _end in mark_rodata_ro.
 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c
 5. bios rom is set to x when pcibios is used.

The results of patch application may be observed in the diff of kernel page
table dumps:

pcibios:

 -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
 ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
  0x00000000-0xc0000000           3G                           pmd
  ---[ Kernel Mapping ]---
 -0xc0000000-0xc0100000           1M     RW             GLB x  pte
 +0xc0000000-0xc00a0000         640K     RW             GLB NX pte
 +0xc00a0000-0xc0100000         384K     RW             GLB x  pte
 -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
 +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
 +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
 -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
 +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
  0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
  0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
  0xf7bfe000-0xf7c00000           8K                           pte

No pcibios:

 -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
 ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
  0x00000000-0xc0000000           3G                           pmd
  ---[ Kernel Mapping ]---
 -0xc0000000-0xc0100000           1M     RW             GLB x  pte
 +0xc0000000-0xc0100000           1M     RW             GLB NX pte
 -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
 +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
 +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
 -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
 +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
  0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
  0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
  0xf7bfe000-0xf7c00000           8K                           pte

The patch has been originally developed for Linux 2.6.34-rc2 x86 by
Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.

 -v1:  initial patch for 2.6.30
 -v2:  patch for 2.6.31-rc7
 -v3:  moved all code into arch/x86, adjusted credits
 -v4:  fixed ifdef, removed credits from CREDITS
 -v5:  fixed an address calculation bug in mark_nxdata_nx()
 -v6:  added acked-by and PT dump diff to commit log
 -v7:  minor adjustments for -tip
 -v8:  rework with the merge of "Set first MB as RW+NX"

Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: James Morris <jmorris@namei.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F82E.60601@free.fr>
[ minor cleanliness edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18 12:52:04 +01:00
Thomas Gleixner d19f61f098 x86/PCI: Convert pci_config_lock to raw_spinlock
pci_config_lock must be a real spinlock in preempt-rt. Convert it to
raw_spinlock. No change for !RT kernels.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-05-11 12:01:09 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Ingo Molnar 1164dd0099 x86: move mach-default/*.h files to asm/
We are getting rid of subarchitecture support - move the hook files
to asm/. (These are now stale and should be replaced with more explicit
runtime mechanisms - but the transition is simpler this way.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-29 14:16:51 +01:00
Jaswinder Singh Rajput 824877111c x86, pci: move arch/x86/pci/pci.h to arch/x86/include/asm/pci_x86.h
Impact: cleanup

Now that arch/x86/pci/pci.h is used in a number of other places as well,
move the lowlevel x86 pci definitions into the architecture include files.
(not to be confused with the existing arch/x86/include/asm/pci.h file,
which provides public details about x86 PCI)

Tested on: X86_32_UP, X86_32_SMP and X86_64_SMP

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 18:17:36 +01:00
Greg Kroah-Hartman 1ba6ab11d8 PCI: remove initial bios sort of PCI devices on x86
We currently keep 2 lists of PCI devices in the system, one in the
driver core, and one all on its own.  This second list is sorted at boot
time, in "BIOS" order, to try to remain compatible with older kernels
(2.2 and earlier days).  There was also a "nosort" option to turn this
sorting off, to remain compatible with even older kernel versions, but
that just ends up being what we have been doing from 2.5 days...

Unfortunately, the second list of devices is not really ever used to 
determine the probing order of PCI devices or drivers[1].  That is done
using the driver core list instead.  This change happened back in the
early 2.5 days.

Relying on BIOS ording for the binding of drivers to specific device
names is problematic for many reasons, and userspace tools like udev
exist to properly name devices in a persistant manner if that is needed,
no reliance on the BIOS is needed.

Matt Domsch and others at Dell noticed this back in 2006, and added a
boot option to sort the PCI device lists (both of them) in a
breadth-first manner to help remain compatible with the 2.4 order, if
needed for any reason.  This option is not going away, as some systems
rely on them.

This patch removes the sorting of the internal PCI device list in "BIOS"
mode, as it's not needed at all anymore, and hasn't for many years.
I've also removed the PCI flags for this from some other arches that for
some reason defined them, but never used them.

This should not change the ordering of any drivers or device probing.

[1] The old-style pci_get_device and pci_find_device() still used this
sorting order, but there are very few drivers that use these functions,
as they are deprecated for use in this manner.  If for some reason, a
driver rely on the order and uses these functions, the breadth-first
boot option will resolve any problem.

Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:46:58 -07:00
Ingo Molnar f5dbb55b99 fix BIOS PCI config cycle buglet causing ACPI boot regression
I figured out another ACPI related regression today.

randconfig testing triggered an early boot-time hang on a laptop of mine
(32-bit x86, config attached) - the screen was scrolling ACPI AML
exceptions [with no serial port and no early debugging available].

v2.6.24 works fine on that laptop with the same .config, so after a few
hours of bisection (had to restart it 3 times - other regressions
interacted), it honed in on this commit:

| 10270d4838 is first bad commit
|
| Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
| Date:   Wed Feb 13 09:56:14 2008 -0800
|
|     acpi: fix acpi_os_read_pci_configuration() misuse of raw_pci_read()

reverting this commit ontop of -rc5 gave a correctly booting kernel.

But this commit fixes a real bug so the real question is, why did it
break the bootup?

After quite some head-scratching, the following change stood out:

-                               pci_id->bus = tu8;
+                               pci_id->bus = val;

pci_id->bus is defined as u16:

   struct acpi_pci_id {
           u16 segment;
           u16 bus;
   ...

and 'tu8' changed from u8 to u32. So previously we'd unconditionally
mask the return value of acpi_os_read_pci_configuration()
(raw_pci_read()) to 8 bits, but now we just trust whatever comes back
from the PCI access routines and only crop it to 16 bits.

But if the high 8 bits of that result contains any noise then we'll
write that into ACPI's PCI ID descriptor and confuse the heck out of the
rest of ACPI.

So lets check the PCI-BIOS code on that theory. We have this codepath
for 8-bit accesses (arch/x86/pci/pcbios.c:pci_bios_read()):

        switch (len) {
        case 1:
                __asm__("lcall *(%%esi); cld\n\t"
                        "jc 1f\n\t"
                        "xor %%ah, %%ah\n"
                        "1:"
                        : "=c" (*value),
                          "=a" (result)
                        : "1" (PCIBIOS_READ_CONFIG_BYTE),
                          "b" (bx),
                          "D" ((long)reg),
                          "S" (&pci_indirect));

Aha! The "=a" output constraint puts the full 32 bits of EAX into
*value. But if the BIOS's routines set any of the high bits to nonzero,
we'll return a value with more set in it than intended.

The other, more common PCI access methods (v1 and v2 PCI reads) clear
out the high bits already, for example pci_conf1_read() does:

        switch (len) {
        case 1:
                *value = inb(0xCFC + (reg & 3));

which explicitly converts the return byte up to 32 bits and zero-extends
it.

So zero-extending the result in the PCI-BIOS read routine fixes the
regression on my laptop. ( It might fix some other long-standing issues
we had with PCI-BIOS during the past decade ... ) Both 8-bit and 16-bit
accesses were buggy.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-10 18:09:05 -07:00
Thomas Gleixner fb9aa6f1d4 i386: move pci
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-11 11:16:36 +02:00