2005-06-23 00:07:49 -07:00
|
|
|
config SELECT_MEMORY_MODEL
|
|
|
|
def_bool y
|
|
|
|
depends on EXPERIMENTAL || ARCH_SELECT_MEMORY_MODEL
|
|
|
|
|
2005-06-23 00:07:42 -07:00
|
|
|
choice
|
|
|
|
prompt "Memory model"
|
2005-06-23 00:07:49 -07:00
|
|
|
depends on SELECT_MEMORY_MODEL
|
|
|
|
default DISCONTIGMEM_MANUAL if ARCH_DISCONTIGMEM_DEFAULT
|
[PATCH] sparsemem memory model
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.
A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.
Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous. It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.
Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
to be chopped up.
In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags. Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations. However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions. Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.
One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags. It might provide
speed increases on certain platforms and will be stored there if there is
room. But, if out of room, an alternate (theoretically slower) mechanism is
used.
This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 00:07:54 -07:00
|
|
|
default SPARSEMEM_MANUAL if ARCH_SPARSEMEM_DEFAULT
|
2005-06-23 00:07:49 -07:00
|
|
|
default FLATMEM_MANUAL
|
2005-06-23 00:07:42 -07:00
|
|
|
|
2005-06-23 00:07:49 -07:00
|
|
|
config FLATMEM_MANUAL
|
2005-06-23 00:07:42 -07:00
|
|
|
bool "Flat Memory"
|
2006-01-06 00:12:07 -08:00
|
|
|
depends on !(ARCH_DISCONTIGMEM_ENABLE || ARCH_SPARSEMEM_ENABLE) || ARCH_FLATMEM_ENABLE
|
2005-06-23 00:07:42 -07:00
|
|
|
help
|
|
|
|
This option allows you to change some of the ways that
|
|
|
|
Linux manages its memory internally. Most users will
|
|
|
|
only have one option here: FLATMEM. This is normal
|
|
|
|
and a correct option.
|
|
|
|
|
[PATCH] sparsemem memory model
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.
A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.
Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous. It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.
Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
to be chopped up.
In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags. Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations. However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions. Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.
One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags. It might provide
speed increases on certain platforms and will be stored there if there is
room. But, if out of room, an alternate (theoretically slower) mechanism is
used.
This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 00:07:54 -07:00
|
|
|
Some users of more advanced features like NUMA and
|
|
|
|
memory hotplug may have different options here.
|
|
|
|
DISCONTIGMEM is an more mature, better tested system,
|
|
|
|
but is incompatible with memory hotplug and may suffer
|
|
|
|
decreased performance over SPARSEMEM. If unsure between
|
|
|
|
"Sparse Memory" and "Discontiguous Memory", choose
|
|
|
|
"Discontiguous Memory".
|
|
|
|
|
|
|
|
If unsure, choose this option (Flat Memory) over any other.
|
2005-06-23 00:07:42 -07:00
|
|
|
|
2005-06-23 00:07:49 -07:00
|
|
|
config DISCONTIGMEM_MANUAL
|
2005-09-16 19:27:54 -07:00
|
|
|
bool "Discontiguous Memory"
|
2005-06-23 00:07:42 -07:00
|
|
|
depends on ARCH_DISCONTIGMEM_ENABLE
|
|
|
|
help
|
2005-06-23 00:07:50 -07:00
|
|
|
This option provides enhanced support for discontiguous
|
|
|
|
memory systems, over FLATMEM. These systems have holes
|
|
|
|
in their physical address spaces, and this option provides
|
|
|
|
more efficient handling of these holes. However, the vast
|
|
|
|
majority of hardware has quite flat address spaces, and
|
2007-10-20 02:46:58 +02:00
|
|
|
can have degraded performance from the extra overhead that
|
2005-06-23 00:07:50 -07:00
|
|
|
this option imposes.
|
|
|
|
|
|
|
|
Many NUMA configurations will have this as the only option.
|
|
|
|
|
2005-06-23 00:07:42 -07:00
|
|
|
If unsure, choose "Flat Memory" over this option.
|
|
|
|
|
[PATCH] sparsemem memory model
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.
A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.
Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous. It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.
Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
to be chopped up.
In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags. Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations. However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions. Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.
One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags. It might provide
speed increases on certain platforms and will be stored there if there is
room. But, if out of room, an alternate (theoretically slower) mechanism is
used.
This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 00:07:54 -07:00
|
|
|
config SPARSEMEM_MANUAL
|
|
|
|
bool "Sparse Memory"
|
|
|
|
depends on ARCH_SPARSEMEM_ENABLE
|
|
|
|
help
|
|
|
|
This will be the only option for some systems, including
|
|
|
|
memory hotplug systems. This is normal.
|
|
|
|
|
|
|
|
For many other systems, this will be an alternative to
|
2005-09-16 19:27:54 -07:00
|
|
|
"Discontiguous Memory". This option provides some potential
|
[PATCH] sparsemem memory model
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.
A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.
Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous. It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.
Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
to be chopped up.
In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags. Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations. However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions. Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.
One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags. It might provide
speed increases on certain platforms and will be stored there if there is
room. But, if out of room, an alternate (theoretically slower) mechanism is
used.
This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 00:07:54 -07:00
|
|
|
performance benefits, along with decreased code complexity,
|
|
|
|
but it is newer, and more experimental.
|
|
|
|
|
|
|
|
If unsure, choose "Discontiguous Memory" or "Flat Memory"
|
|
|
|
over this option.
|
|
|
|
|
2005-06-23 00:07:42 -07:00
|
|
|
endchoice
|
|
|
|
|
2005-06-23 00:07:49 -07:00
|
|
|
config DISCONTIGMEM
|
|
|
|
def_bool y
|
|
|
|
depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || DISCONTIGMEM_MANUAL
|
|
|
|
|
[PATCH] sparsemem memory model
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.
A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.
Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous. It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.
Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
to be chopped up.
In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags. Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations. However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions. Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.
One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags. It might provide
speed increases on certain platforms and will be stored there if there is
room. But, if out of room, an alternate (theoretically slower) mechanism is
used.
This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 00:07:54 -07:00
|
|
|
config SPARSEMEM
|
|
|
|
def_bool y
|
|
|
|
depends on SPARSEMEM_MANUAL
|
|
|
|
|
2005-06-23 00:07:49 -07:00
|
|
|
config FLATMEM
|
|
|
|
def_bool y
|
[PATCH] sparsemem memory model
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.
A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.
Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous. It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.
Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
to be chopped up.
In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags. Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations. However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions. Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.
One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags. It might provide
speed increases on certain platforms and will be stored there if there is
room. But, if out of room, an alternate (theoretically slower) mechanism is
used.
This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 00:07:54 -07:00
|
|
|
depends on (!DISCONTIGMEM && !SPARSEMEM) || FLATMEM_MANUAL
|
|
|
|
|
|
|
|
config FLAT_NODE_MEM_MAP
|
|
|
|
def_bool y
|
|
|
|
depends on !SPARSEMEM
|
2005-06-23 00:07:49 -07:00
|
|
|
|
x86: lockless get_user_pages_fast()
Implement get_user_pages_fast without locking in the fastpath on x86.
Do an optimistic lockless pagetable walk, without taking mmap_sem or any
page table locks or even mmap_sem. Page table existence is guaranteed by
turning interrupts off (combined with the fact that we're always looking
up the current mm, means we can do the lockless page table walk within the
constraints of the TLB shootdown design). Basically we can do this
lockless pagetable walk in a similar manner to the way the CPU's pagetable
walker does not have to take any locks to find present ptes.
This patch (combined with the subsequent ones to convert direct IO to use
it) was found to give about 10% performance improvement on a 2 socket 8
core Intel Xeon system running an OLTP workload on DB2 v9.5
"To test the effects of the patch, an OLTP workload was run on an IBM
x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel. Comparing
runs with and without the patch resulted in an overall performance
benefit of ~9.8%. Correspondingly, oprofiles showed that samples from
__up_read and __down_read routines that is seen during thread contention
for system resources was reduced from 2.8% down to .05%. Monitoring the
/proc/vmstat output from the patched run showed that the counter for
fast_gup contained a very high number while the fast_gup_slow value was
zero."
(fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a
counter we had for the number of times the slowpath was invoked).
The main reason for the improvement is that DB2 has multiple threads each
issuing direct-IO. Direct-IO uses get_user_pages, and thus the threads
contend the mmap_sem cacheline, and can also contend on page table locks.
I would anticipate larger performance gains on larger systems, however I
think DB2 uses an adaptive mix of threads and processes, so it could be
that thread contention remains pretty constant as machine size increases.
In which case, we stuck with "only" a 10% gain.
The downside of using get_user_pages_fast is that if there is not a pte
with the correct permissions for the access, we end up falling back to
get_user_pages and so the get_user_pages_fast is a bit of extra work.
However this should not be the common case in most performance critical
code.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: Kconfig fix]
[akpm@linux-foundation.org: Makefile fix/cleanup]
[akpm@linux-foundation.org: warning fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 19:45:24 -07:00
|
|
|
config HAVE_GET_USER_PAGES_FAST
|
|
|
|
bool
|
|
|
|
|
2005-06-23 00:07:47 -07:00
|
|
|
#
|
|
|
|
# Both the NUMA code and DISCONTIGMEM use arrays of pg_data_t's
|
|
|
|
# to represent different areas of memory. This variable allows
|
|
|
|
# those dependencies to exist individually.
|
|
|
|
#
|
|
|
|
config NEED_MULTIPLE_NODES
|
|
|
|
def_bool y
|
|
|
|
depends on DISCONTIGMEM || NUMA
|
2005-06-23 00:07:53 -07:00
|
|
|
|
|
|
|
config HAVE_MEMORY_PRESENT
|
|
|
|
def_bool y
|
[PATCH] sparsemem memory model
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.
A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.
Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous. It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.
Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
to be chopped up.
In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags. Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations. However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions. Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.
One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags. It might provide
speed increases on certain platforms and will be stored there if there is
room. But, if out of room, an alternate (theoretically slower) mechanism is
used.
This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 00:07:54 -07:00
|
|
|
depends on ARCH_HAVE_MEMORY_PRESENT || SPARSEMEM
|
2005-09-03 15:54:26 -07:00
|
|
|
|
2005-09-03 15:54:28 -07:00
|
|
|
#
|
|
|
|
# SPARSEMEM_EXTREME (which is the default) does some bootmem
|
2006-10-03 22:53:09 +02:00
|
|
|
# allocations when memory_present() is called. If this cannot
|
2005-09-03 15:54:28 -07:00
|
|
|
# be done on your architecture, select this option. However,
|
|
|
|
# statically allocating the mem_section[] array can potentially
|
|
|
|
# consume vast quantities of .bss, so be careful.
|
|
|
|
#
|
|
|
|
# This option will also potentially produce smaller runtime code
|
|
|
|
# with gcc 3.4 and later.
|
|
|
|
#
|
|
|
|
config SPARSEMEM_STATIC
|
|
|
|
def_bool n
|
|
|
|
|
2005-09-03 15:54:26 -07:00
|
|
|
#
|
2006-10-03 22:34:14 +02:00
|
|
|
# Architecture platforms which require a two level mem_section in SPARSEMEM
|
2005-09-03 15:54:26 -07:00
|
|
|
# must select this option. This is usually for architecture platforms with
|
|
|
|
# an extremely sparse physical address space.
|
|
|
|
#
|
2005-09-03 15:54:28 -07:00
|
|
|
config SPARSEMEM_EXTREME
|
|
|
|
def_bool y
|
|
|
|
depends on SPARSEMEM && !SPARSEMEM_STATIC
|
[PATCH] mm: split page table lock
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.
This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)
In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.
Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.
There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 18:16:40 -07:00
|
|
|
|
2007-10-16 01:24:14 -07:00
|
|
|
config SPARSEMEM_VMEMMAP_ENABLE
|
|
|
|
def_bool n
|
|
|
|
|
|
|
|
config SPARSEMEM_VMEMMAP
|
2007-12-17 16:19:53 -08:00
|
|
|
bool "Sparse Memory virtual memmap"
|
|
|
|
depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
SPARSEMEM_VMEMMAP uses a virtually mapped memmap to optimise
|
|
|
|
pfn_to_page and page_to_pfn operations. This is the most
|
|
|
|
efficient option when sufficient kernel resources are available.
|
2007-10-16 01:24:14 -07:00
|
|
|
|
2005-10-29 18:16:54 -07:00
|
|
|
# eventually, we can have this option just 'select SPARSEMEM'
|
|
|
|
config MEMORY_HOTPLUG
|
|
|
|
bool "Allow for memory hot-add"
|
2006-09-30 23:27:05 -07:00
|
|
|
depends on SPARSEMEM || X86_64_ACPI_NUMA
|
2007-07-29 23:24:36 +02:00
|
|
|
depends on HOTPLUG && !HIBERNATION && ARCH_ENABLE_MEMORY_HOTPLUG
|
2008-07-14 09:59:18 +02:00
|
|
|
depends on (IA64 || X86 || PPC64 || SUPERH || S390)
|
2005-10-29 18:16:54 -07:00
|
|
|
|
|
|
|
comment "Memory hotplug is currently incompatible with Software Suspend"
|
2007-07-29 23:24:36 +02:00
|
|
|
depends on SPARSEMEM && HOTPLUG && HIBERNATION
|
2005-10-29 18:16:54 -07:00
|
|
|
|
2006-09-30 23:27:05 -07:00
|
|
|
config MEMORY_HOTPLUG_SPARSE
|
|
|
|
def_bool y
|
|
|
|
depends on SPARSEMEM && MEMORY_HOTPLUG
|
|
|
|
|
2007-10-16 01:26:12 -07:00
|
|
|
config MEMORY_HOTREMOVE
|
|
|
|
bool "Allow for memory hot remove"
|
|
|
|
depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
|
|
|
|
depends on MIGRATION
|
|
|
|
|
2008-04-28 02:12:55 -07:00
|
|
|
#
|
|
|
|
# If we have space for more page flags then we can enable additional
|
|
|
|
# optimizations and functionality.
|
|
|
|
#
|
|
|
|
# Regular Sparsemem takes page flag bits for the sectionid if it does not
|
|
|
|
# use a virtual memmap. Disable extended page flags for 32 bit platforms
|
|
|
|
# that require the use of a sectionid in the page flags.
|
|
|
|
#
|
|
|
|
config PAGEFLAGS_EXTENDED
|
|
|
|
def_bool y
|
|
|
|
depends on 64BIT || SPARSEMEM_VMEMMAP || !NUMA || !SPARSEMEM
|
|
|
|
|
[PATCH] mm: split page table lock
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.
This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)
In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.
Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.
There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 18:16:40 -07:00
|
|
|
# Heavily threaded applications may benefit from splitting the mm-wide
|
|
|
|
# page_table_lock, so that faults on different parts of the user address
|
|
|
|
# space can be handled with less contention: split it at this NR_CPUS.
|
|
|
|
# Default to 4 for wider testing, though 8 might be more appropriate.
|
|
|
|
# ARM's adjust_pte (unused if VIPT) depends on mm-wide page_table_lock.
|
2005-11-23 13:37:37 -08:00
|
|
|
# PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
|
[PATCH] mm: split page table lock
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.
This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)
In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.
Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.
There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 18:16:40 -07:00
|
|
|
#
|
|
|
|
config SPLIT_PTLOCK_CPUS
|
|
|
|
int
|
|
|
|
default "4096" if ARM && !CPU_CACHE_VIPT
|
2005-11-23 13:37:37 -08:00
|
|
|
default "4096" if PARISC && !PA20
|
[PATCH] mm: split page table lock
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.
This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)
In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.
Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.
There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 18:16:40 -07:00
|
|
|
default "4"
|
2006-01-08 01:00:49 -08:00
|
|
|
|
|
|
|
#
|
|
|
|
# support for page migration
|
|
|
|
#
|
|
|
|
config MIGRATION
|
2006-03-22 00:09:12 -08:00
|
|
|
bool "Page migration"
|
2006-06-23 02:03:37 -07:00
|
|
|
def_bool y
|
2008-07-23 21:28:22 -07:00
|
|
|
depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE
|
2006-03-22 00:09:12 -08:00
|
|
|
help
|
|
|
|
Allows the migration of the physical location of pages of processes
|
|
|
|
while the virtual addresses are not changed. This is useful for
|
|
|
|
example on NUMA systems to put pages nearer to the processors accessing
|
|
|
|
the page.
|
2006-06-12 17:11:31 -07:00
|
|
|
|
|
|
|
config RESOURCES_64BIT
|
|
|
|
bool "64 bit Memory and IO resources (EXPERIMENTAL)" if (!64BIT && EXPERIMENTAL)
|
|
|
|
default 64BIT
|
|
|
|
help
|
|
|
|
This option allows memory and IO resources to be 64 bit.
|
2007-02-10 01:43:10 -08:00
|
|
|
|
|
|
|
config ZONE_DMA_FLAG
|
|
|
|
int
|
|
|
|
default "0" if !ZONE_DMA
|
|
|
|
default "1"
|
|
|
|
|
2007-07-17 04:03:37 -07:00
|
|
|
config BOUNCE
|
|
|
|
def_bool y
|
|
|
|
depends on BLOCK && MMU && (ZONE_DMA || HIGHMEM)
|
|
|
|
|
2007-05-06 14:49:50 -07:00
|
|
|
config NR_QUICK
|
|
|
|
int
|
|
|
|
depends on QUICKLIST
|
2008-01-14 23:35:32 +01:00
|
|
|
default "2" if SUPERH || AVR32
|
2007-05-06 14:49:50 -07:00
|
|
|
default "1"
|
2007-07-15 23:40:05 -07:00
|
|
|
|
|
|
|
config VIRT_TO_BUS
|
|
|
|
def_bool y
|
|
|
|
depends on !ARCH_NO_VIRT_TO_BUS
|