Commit Graph

335 Commits

Author SHA1 Message Date
Tony Luck 581249966f Pull delete-sigdelayed into release branch 2006-03-21 08:17:38 -08:00
Tony Luck 536ea4e419 Pull bsp-removal into release branch 2006-03-21 08:16:21 -08:00
David Woodhouse ee436dc46a [PATCH] Fix IA64 success/failure indication in syscall auditing.
Original 2.6.9 patch and explanation from somewhere within HP via
bugzilla...

ia64 stores a success/failure code in r10, and the return value (normal
return, or *positive* errno) in r8. The patch also sets the exit code to
negative errno if it's a failure result for consistency with other
architectures.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-03-20 14:08:54 -05:00
Christoph Lameter d8117ce5a6 [IA64] Fix race in the accessed/dirty bit handlers
A pte may be zapped by the swapper, exiting process, unmapping or page
migration while the accessed or dirty bit handers are about to run. In that
case the accessed bit or dirty is set on an zeroed pte which leads the VM to
conclude that this is a swap pte. This may lead to

- Messages from the vm like

swap_free: Bad swap file entry 4000000000000000

- Processes being aborted

swap_dup: Bad swap file entry 4000000000000000
VM: killing process ....

Page migration is particular suitable for the creation of this race since
it needs to remove and restore page table entries.

The fix here is to check for the present bit and simply not update
the pte if the page is not present anymore. If the page is not present
then the fault handler should run next which will take care of the problem
by bringing the page back and then mark the page dirty or move it onto the
active list.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-08 16:07:55 -08:00
Russ Anderson e1c48554ae [IA64] mca recovery return value when no bus check
When there is no bus check, the return code should be failure, not success.

Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-07 15:40:06 -08:00
Bjorn Helgaas 6c5e62159c [IA64] don't report !sn2 or !summit hardware as an error
This stuff is all in the generic ia64 kernel, and the new initcall error
reporting complains about them.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-07 15:26:49 -08:00
Russ Anderson ea0e92a613 [IA64] Increase severity of MCA recovery messages
The MCA recovery messages are currently KERN_DEBUG,
so they don't show up in /var/log/messages (by default).
Increase the severity to KERN_ERR, for the initial
message (and also add the physical address to this
message). Leave the successful isolation message as
KERN_DEBUG, but increase the severity when isolation
fails to KERN_CRIT.

[Russ' patch made these all KERN_CRIT]

Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-07 15:23:25 -08:00
Jes Sorensen d2b176ed87 [IA64] sysctl option to silence unaligned trap warnings
Allow sysadmin to disable all warnings about userland apps
making unaligned accesses by using:
 # echo 1 > /proc/sys/kernel/ignore-unaligned-usertrap
Rather than having to use prctl on a process by process basis.

Default behaivour leaves the warnings enabled.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-28 09:42:23 -08:00
Ken Chen c8c1635faa [IA64] cleanup in fsys.S
beautify coding style for zeroing end of fsyscall_table entries.
Remove misleading __NR_syscall_last and add more comments.
Drop (now unneeded) "guard against failure to increase NR_syscalls"

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-28 08:53:32 -08:00
Tony Luck e963701a76 [IA64] die_if_kernel() can return
arch/ia64/kernel/unaligned.c erroneously marked die_if_kernel()
with a "noreturn" attribute ... which is silly (it returns whenever
the argument regs say that the fault happened in user mode, as one
might expect given the "if_kernel" part of its name!).  Thanks to
Alan and Gareth for pointing this out.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-27 16:18:58 -08:00
Zhang, Yanmin 5d1a88af82 [IA64] Delete a redundant instruction in unaligned_access
unaligned_access does fetch cr.ipsr, then calls
dispatch_unaligned_handler, but dispatch_unaligned_handler fetches
cr.ipsr again, so delete the first one.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-27 15:12:42 -08:00
Ashok Raj 8f8b1138fc [IA64] Count disabled cpus as potential hot-pluggable CPUs
Minor updates to earlier patch.
- Added to documentation to add ia64 as well.
- Minor clarification on how to use disabled cpus
- used plain max instead of max_t per Andew Morton.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-16 14:10:50 -08:00
Jack Steiner 6f6d75825d [IA64] Missing check for TIF_WORK if trace/audit enabled
It appears that if auditing is enabled, the kernel fails to
check for pending signals before returning to user mode.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-16 10:20:08 -08:00
Tony Luck 72166c35f0 Pull fix-cpu-possible-map into release branch 2006-02-15 15:17:57 -08:00
Horms b05de01ae1 [IA64] support panic_on_oops sysctl
Trivial port of this feature from i386
As it stands, panic_on_oops but does nothing on ia64

Signed-Off-By: Horms <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-15 15:16:50 -08:00
hawkes@sgi.com defbb2c929 [IA64] ia64: simplify and fix udelay()
The original ia64 udelay() was simple, but flawed for platforms without
synchronized ITCs:  a preemption and migration to another CPU during the
while-loop likely resulted in too-early termination or very, very
lengthy looping.

The first fix (now in 2.6.15) broke the delay loop into smaller,
non-preemptible chunks, reenabling preemption between the chunks.  This
fix is flawed in that the total udelay is computed to be the sum of just
the non-premptible while-loop pieces, i.e., not counting the time spent
in the interim preemptible periods.  If an interrupt or a migration
occurs during one of these interim periods, then that time is invisible
and only serves to lengthen the effective udelay().

This new fix backs out the current flawed fix and returns to a simple
udelay(), fully preemptible and interruptible.  It implements two simple
alternative udelay() routines:  one a default generic version that uses
ia64_get_itc(), and the other an sn-specific version that uses that
platform's RTC.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-15 13:37:04 -08:00
Andreas Schwab 50d8e59038 [IA64] Remove duplicate EXPORT_SYMBOLs
Remove symbol exports from ia64_ksyms.c that are already exported in
lib/string.c.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-15 13:23:32 -08:00
Ashok Raj a6b14fa6fd [IA64] Count disabled cpus as potential hot-pluggable CPUs
Have a facility to account for potentially hot-pluggable CPUs. ACPI doesnt
give a determinstic method to find hot-pluggable CPUs. Hence we use 2 methods
to assist.

- BIOS can mark potentially hot-pluggable CPUs as disabled in the MADT tables.
- User can specify the number of hot-pluggable CPUs via parameter
  additional_cpus=X

The option is enabled only if ACPI_CONFIG_HOTPLUG_CPU=y which enables the
physical hotplug option. Without which user can still use logical onlining
and offlining of CPUs by enabling CONFIG_HOTPLUG_CPU=y

Adds more bits to cpu_possible_map for potentially hot-pluggable cpus.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-14 15:37:58 -08:00
Ashok Raj 69aa234b91 [IA64] Dont set NR_CPUS for cpu_possible_map when CPU hotplug is enabled.
Do not set cpu_possible_map for NR_CPUS when ACPI_CONFIG_HOTPLUG_CPU is set.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-14 15:35:10 -08:00
Tony Luck 65b78722ce Pull new-syscalls into release branch 2006-02-09 14:43:58 -08:00
Hidetoshi Seto a947464617 [IA64] mca_drv: Add minstate validation
MCA driver can cause panic if kernel gets a state info with no minstate.
This patch adds minstate validation before handling it.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-09 14:42:55 -08:00
Janak Desai 9621a4ef8a [IA64] unshare system call registration for ia64
Registers system call for the ia64 architecture.

Reserves space for ppoll and pselect, and adds unshare at system
call number 1296.

Signed-off-by: Janak Desai <janak@us.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-08 15:43:38 -08:00
Keith Owens 2730c9295a [IA64] MCA: remove obsolete ifdef
No platform in the community tree uses PLATFORM_MCA_HANDLERS, remove
the references.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-08 12:02:07 -08:00
Keith Owens e9ac054daa [IA64] MCA: update MCA comm field for user space tasks
Update the comm field on the MCA handler for user tasks as well as for
verified kernel tasks.  This helps to identify the task that was
running when the MCA occurred.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-08 12:01:41 -08:00
Keith Owens 9336b0836b [IA64] MCA: print messages in MCA handler
Print a message identifying the monarch MCA handler.  Print a summary
of the status of the slave MCA cpus.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-08 11:59:23 -08:00
Tony Luck d6e56a2a08 [IA64] Fix CONFIG_PRINTK_TIME
There were two problems with enabling the PRINTK_TIME config
option:
1) The first calls to printk() occur before per-cpu data virtual
address is pinned into the TLB, so sched_clock() can fault.
2) sched_clock() is based on ar.itc, which may not be synchronized
across cpus.

Ken Chen started this patch, Tony Luck tinkered with it, and Jes
Sorensen perfected it.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-07 15:25:57 -08:00
Zou Nan hai 9d78f43d1f [IA64] Fix wrong use of memparse in efi.c
The check of (end != cp) after memparse in efi.c looks wrong to me.
The result is that we can't use mem= and max_addr= kernel parameter at
the same time.

The following patch removed the check just like other arches do.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-07 14:13:09 -08:00
Zou Nan hai ecdd5dabd3 [IA64] Fix a possible buffer overflow in efi.c
Make sure to save space for the trailing '\0'.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-07 10:59:37 -08:00
Linus Torvalds c03296a868 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2006-02-06 15:46:39 -08:00
Chen, Kenneth W 9ed2ad8648 [IA64] add syscall entry for *at()
Wire up the ia64 syscalls for *at() functions.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-06 10:42:46 -08:00
Zhang, Yanmin 69dcc99199 [PATCH] Export cpu topology in sysfs
The patch implements cpu topology exportation by sysfs.

Items (attributes) are similar to /proc/cpuinfo.

1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
	represent the physical package id of  cpu X;
2) /sys/devices/system/cpu/cpuX/topology/core_id:
	represent the cpu core id to cpu X;
3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
	represent the thread siblings to cpu X in the same core;
4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
	represent the thread siblings to cpu X in the same physical package;

To implement it in an architecture-neutral way, a new source file,
driver/base/topology.c, is to export the 5 attributes.

If one architecture wants to support this feature, it just needs to
implement 4 defines, typically in file include/asm-XXX/topology.h.
The 4 defines are:
#define topology_physical_package_id(cpu)
#define topology_core_id(cpu)
#define topology_thread_siblings(cpu)
#define topology_core_siblings(cpu)

The type of **_id is int.
The type of siblings is cpumask_t.

To be consistent on all architectures, the 4 attributes should have
deafult values if their values are unavailable. Below is the rule.

1) physical_package_id: If cpu has no physical package id, -1 is the
default value.

2) core_id: If cpu doesn't support multi-core, its core id is 0.

3) thread_siblings: Just include itself, if the cpu doesn't support
HT/multi-thread.

4) core_siblings: Just include itself, if the cpu doesn't support
multi-core and HT/Multi-thread.

So be careful when declaring the 4 defines in include/asm-XXX/topology.h.

If an attribute isn't defined on an architecture, it won't be exported.

Thank Nathan, Greg, Andi, Paul and Venki.

The patch provides defines for i386/x86_64/ia64.

Signed-off-by: Zhang, Yanmin <yanmin.zhang@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03 08:32:09 -08:00
Bjorn Helgaas a58786917c [IA64] avoid broken SAL_CACHE_FLUSH implementations
If SAL_CACHE_FLUSH drops interrupts, complain about it and fall back to
using PAL_CACHE_FLUSH instead.

This is to work around a defect in HP rx5670 firmware: when an interrupt
occurs during SAL_CACHE_FLUSH, SAL drops the interrupt but leaves it marked
"in-service", which leaves the interrupt (and others of equal or lower
priority) masked.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-02 13:25:54 -08:00
Linus Torvalds 59ed2f59e4 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 2006-02-01 22:06:15 -08:00
Keith Owens b0a06623dc [IA64] Delete MCA/INIT sigdelayed code
The only user of the MCA/INIT sigdelayed code (SGI's I/O probing) has
moved from the kernel into SAL.  Delete the MCA/INIT sigdelayed code.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-26 13:23:27 -08:00
Len Brown 9fdb62af92 [ACPI] merge 3549 4320 4485 4588 4980 5483 5651 acpica asus fops pnpacpi branches into release
Signed-off-by: Len Brown <len.brown@intel.com>
2006-01-24 17:52:48 -05:00
Jack Steiner 79c83bd15a [IA64] Scaling fix for simultaneous unaligned accesses
Eliminate a hot shared cacheline that occurs if multiple cpus are
taking unaligned exceptions.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-24 14:39:50 -08:00
Keith Owens 2a792058c3 [IA64] Set the correct default OS status in the MCA handler
sos->os_status is set to a default value of IA64_MCA_COLD_BOOT for an
MCA, but then is incorrectly overwritten with IA64_MCA_SAME_CONTEXT (0).
This makes SAL think that all MCAs have been recovered.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-24 11:50:07 -08:00
Ashok Raj b88e926584 [IA64] Fix UP build with BSP removal support.
Causes undefined force_cpei_retarget defined in arch/ia64/kernel/smpboot.c
Push the unneeded code inside #ifdef CONFIG_HOTPLUG_CPU.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-19 16:18:47 -08:00
John Hawkes 386d1d50c8 [IA64] eliminate softlockup warning
Fix an unnecessary softlockup watchdog warning in the ia64
uncached_build_memmap() that occurs occasionally at 256p and always at
512p.  The problem occurs at boot time.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-19 11:18:25 -08:00
Jes Sorensen 60f1c4443c [IA64] sem2mutex: arch/ia64/kernel/perfmon.c
Migrate perfmon from using an old semaphore to a completion handler.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-19 11:17:56 -08:00
Stephane Eranian 9179cb6578 [IA64] Perfmon for Montecito
Add Montecito PMU description table for perfmon2

Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-16 10:31:44 -08:00
Zhang Yanmin d3ef1f5aaf [IA64] prevent accidental modification of args in jprobe handler
When jprobe is hit, the function parameters of the original function
should be saved before jprobe handler is executed, and restored it after
jprobe handler is executed, because jprobe handler might change the
register values due to tail call optimization by the gcc.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-13 14:45:21 -08:00
Keith Owens e026cca0f2 [IA64] Add hotplug cpu to salinfo.c, replace semaphore with mutex
Add hotplug cpu support to salinfo.c.

The cpu_event field is a cpumask so use the cpu_* macros consistently,
replacing the existing mixture of cpu_* and *_bit macros.

Instead of counting the number of outstanding events in a semaphore and
trying to track that count over user space context, interrupt context,
non-maskable interrupt context and cpu hotplug, replace the semaphore
with a test for "any bits set" combined with a mutex.

Modify the locking to make the test for "work to do" an atomic
operation.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-13 14:22:35 -08:00
Jason Uhlenkott 15029285dc [IA64] Handle debug traps in fsys mode
We need to handle debug traps in fsys mode non-fatally.  They can
happen now that we have fsyscalls which contain probe instructions.

Signed-off-by: Jason Uhlenkott <jasonuhl@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-13 14:16:08 -08:00
Francois Wellenrieter 8a4b7b6f18 [IA64] Fix conversion of pal_min_state physical address
On return from INIT handler we must convert the address of the
minstate area from a kernel virtual uncached address (0xC...)
to physical uncached (0x8...).  A typo (or thinko?) in the code
converted to physical cached.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-13 14:01:01 -08:00
Tony Luck 7ae69d2aa4 [IA64] Add stub entry to fsys.S for sys_migrate_pages
When this new syscall was added to ia64 in commit

  39743889aa

fsys.S was forgotten.  Add a ".data8 0" there to keep
it in step.  [Reported by Stephane Eranian]

Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-13 10:03:58 -08:00
Al Viro 6450578f32 [PATCH] ia64: task_pt_regs()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:58 -08:00
Al Viro ab03591db1 [PATCH] ia64: task_thread_info()
on ia64 thread_info is at the constant offset from task_struct and stack
is embedded into the same beast.  Set __HAVE_THREAD_FUNCTIONS, made
task_thread_info() just add a constant.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:58 -08:00
akpm@osdl.org 198e2f1811 [PATCH] scheduler cache-hot-autodetect
)

From: Ingo Molnar <mingo@elte.hu>

This is the latest version of the scheduler cache-hot-auto-tune patch.

The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:

- I've added a 'domain distance' function, which is used to cache
  measurement results. Each distance is only measured once. This means
  that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
  distances 0 and 1, and on SMP distance 0 is measured. The code walks
  the domain tree to determine the distance, so it automatically follows
  whatever hierarchy an architecture sets up. This cuts down on the boot
  time significantly and removes the O(N^2) limit. The only assumption
  is that migration costs can be expressed as a function of domain
  distance - this covers the overwhelming majority of existing systems,
  and is a good guess even for more assymetric systems.

  [ People hacking systems that have assymetries that break this
    assumption (e.g. different CPU speeds) should experiment a bit with
    the cpu_distance() function. Adding a ->migration_distance factor to
    the domain structure would be one possible solution - but lets first
    see the problem systems, if they exist at all. Lets not overdesign. ]

Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:

- Instead of relying on a single cache-size provided by the platform and
  sticking to it, the code now auto-detects the 'effective migration
  cost' between two measured CPUs, via iterating through a wide range of
  cachesizes. The code searches for the maximum migration cost, which
  occurs when the working set of the test-workload falls just below the
  'effective cache size'. I.e. real-life optimized search is done for
  the maximum migration cost, between two real CPUs.

  This, amongst other things, has the positive effect hat if e.g. two
  CPUs share a L2/L3 cache, a different (and accurate) migration cost
  will be found than between two CPUs on the same system that dont share
  any caches.

(The reliable measurement of migration costs is tricky - see the source
for details.)

Furthermore i've added various boot-time options to override/tune
migration behavior.

Firstly, there's a blanket override for autodetection:

	migration_cost=1000,2000,3000

will override the depth 0/1/2 values with 1msec/2msec/3msec values.

Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:

	migration_factor=120

will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.

I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:

Dual Celeron (128K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
 ---------------------
           [00]    [01]
 [00]:     -     1.7(1)
 [01]:   1.7(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 1.7 (1784008)
 ---------------------

Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.

Dual HT P4 (512K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
 ---------------------
           [00]    [01]    [02]    [03]
 [00]:     -     0.4(1)  0.0(0)  0.4(1)
 [01]:   0.4(1)    -     0.4(1)  0.0(0)
 [02]:   0.0(0)  0.4(1)    -     0.4(1)
 [03]:   0.4(1)  0.0(0)  0.4(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (33900) 0.4 (448514)
 ---------------------

Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.

8-way P3/Xeon [2MB L2 cache]:

 ---------------------
 migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
 ---------------------
           [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]
 [00]:     -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [01]:  19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [02]:  19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [03]:  19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [04]:  19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1)
 [05]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1)
 [06]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1)
 [07]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 19.2 (19281756)
 ---------------------

This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:50 -08:00
Ingo Molnar 4dc7a0bbeb [PATCH] sched: add cacheflush() asm
Add per-arch sched_cacheflush() which is a write-back cacheflush used by
the migration-cost calibration code at bootup time.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:49 -08:00
Randy Dunlap a941564458 [PATCH] capable/capability.h (arch/)
arch: Use <linux/capability.h> where capable() is used.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:14 -08:00
Keshavamurthy Anil S eb3a72921c [PATCH] kprobes: fix race in recovery of reentrant probe
There is a window where a probe gets removed right after the probe is hit
on some different cpu.  In this case probe handlers can't find a matching
probe instance related to break address.  In this case we need to read the
original instruction at break address to see if that is not a break/int3
instruction and recover safely.

Previous code had a bug where we were not checking for the above race in
case of reentrant probes and the below patch fixes this race.

Tested on IA64, Powerpc, x86_64.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:12 -08:00
Anil S Keshavamurthy e597c2984c [PATCH] kprobes: arch_remove_kprobe
Currently arch_remove_kprobes() is only implemented/required for x86_64 and
powerpc.  All other architecture like IA64, i386 and sparc64 implementes a
dummy function which is being called from arch independent kprobes.c file.

This patch removes the dummy functions and replaces it with
#define arch_remove_kprobe(p, s)	do { } while(0)

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:40 -08:00
Bjorn Helgaas 80851ef2a5 [PATCH] /dev/mem: validate mmap requests
Add a hook so architectures can validate /dev/mem mmap requests.

This is analogous to validation we already perform in the read/write
paths.

The identity mapping scheme used on ia64 requires that each 16MB or
64MB granule be accessed with exactly one attribute (write-back or
uncacheable).  This avoids "attribute aliasing", which can cause a
machine check.

Sample problem scenario:
  - Machine supports VGA, so it has uncacheable (UC) MMIO at 640K-768K
  - efi_memmap_init() discards any write-back (WB) memory in the first granule
  - Application (e.g., "hwinfo") mmaps /dev/mem, offset 0
  - hwinfo receives UC mapping (the default, since memmap says "no WB here")
  - Machine check abort (on chipsets that don't support UC access to WB
    memory, e.g., sx1000)

In the scenario above, the only choices are
  - Use WB for hwinfo mmap.  Can't do this because it causes attribute
    aliasing with the UC mapping for the VGA MMIO space.
  - Use UC for hwinfo mmap.  Can't do this because the chipset may not
    support UC for that region.
  - Disallow the hwinfo mmap with -EINVAL.  That's what this patch does.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:02 -08:00
Andrew Morton a136564702 [PATCH] remove gcc-2 checks
Remove various things which were checking for gcc-1.x and gcc-2.x compilers.

From: Adrian Bunk <bunk@stusta.de>

    Some documentation updates and removes some code paths for gcc < 3.2.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:02 -08:00
Christoph Hellwig 6b9c7ed848 [PATCH] use ptrace_get_task_struct in various places
The ptrace_get_task_struct() helper that I added as part of the ptrace
consolidation is useful in variety of places that currently opencode it.
Switch them to the common helpers.

Add a ptrace_traceme() helper that needs to be explicitly called, and simplify
the ptrace_get_task_struct() interface.  We don't need the request argument
now, and we return the task_struct directly, using ERR_PTR() for error
returns.  It's a bit more code in the callers, but we have two sane routines
that do one thing well now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:51 -08:00
Christoph Lameter 39743889aa [PATCH] Swap Migration V5: sys_migrate_pages interface
sys_migrate_pages implementation using swap based page migration

This is the original API proposed by Ray Bryant in his posts during the first
half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org.

The intent of sys_migrate is to migrate memory of a process.  A process may
have migrated to another node.  Memory was allocated optimally for the prior
context.  sys_migrate_pages allows to shift the memory to the new node.

sys_migrate_pages is also useful if the processes available memory nodes have
changed through cpuset operations to manually move the processes memory.  Paul
Jackson is working on an automated mechanism that will allow an automatic
migration if the cpuset of a process is changed.  However, a user may decide
to manually control the migration.

This implementation is put into the policy layer since it uses concepts and
functions that are also needed for mbind and friends.  The patch also provides
a do_migrate_pages function that may be useful for cpusets to automatically
move memory.  sys_migrate_pages does not modify policies in contrast to Ray's
implementation.

The current code here is based on the swap based page migration capability and
thus is not able to preserve the physical layout relative to it containing
nodeset (which may be a cpuset).  When direct page migration becomes available
then the implementation needs to be changed to do a isomorphic move of pages
between different nodesets.  The current implementation simply evicts all
pages in source nodeset that are not in the target nodeset.

Patch supports ia64, i386 and x86_64.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:12:42 -08:00
Len Brown ed03f430cd Pull pnpacpi into acpica branch 2006-01-07 03:50:18 -05:00
Tony Luck 38c0b2c2aa [IA64] Fix compile warnings in setup.c
arch/ia64/kernel/setup.c: In function `show_cpuinfo':
arch/ia64/kernel/setup.c:576: warning: long unsigned int format, different type arg (arg 12)
arch/ia64/kernel/setup.c:576: warning: long unsigned int format, different type arg (arg 13)

Introduced by 95235ca2c2

Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-05 13:30:52 -08:00
Ashok Raj ff741906ad [IA64] support for cpu0 removal
here is the BSP removal support for IA64. Its pretty much the same thing that
was released a while back, but has your feedback incorporated.

- Removed CONFIG_BSP_REMOVE_WORKAROUND and associated cmdline param
- Fixed compile issue with sn2/zx1 due to a undefined fix_b0_for_bsp
- some formatting nits (whitespace etc)

This has been tested on tiger and long back by alex on hp systems as well.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-01-05 10:24:20 -08:00
Linus Torvalds 0356dbb7fe Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq 2006-01-04 16:21:26 -08:00
Len Brown cb654695f6 [ACPI] acpi_register_gsi() fix needed for ACPICA 20051021
Use the #define for ACPI_LEVEL_SENSITIVE instead of assuming
non-zero, because ACPICA 20051021 changes its value to zero.

Also, use uniform variable names:
edge_level -> triggering
active_high_low -> polarity

Signed-off-by: Len Brown <len.brown@intel.com>
2005-12-28 02:50:44 -05:00
Christoph Lameter dc86e88c2b [IA64] Add __read_mostly support for IA64
sparc64, i386 and x86_64 have support for a special data section dedicated
to rarely updated data that is frequently read. The section was created to
avoid false sharing of those rarely read data with frequently written kernel
data.

This patch creates such a data section for ia64 and will group rarely written
data into this section.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-16 10:52:46 -08:00
Jes Sorensen 3bd7f01713 [IA64] uncached ref count leak
Use raw_smp_processor_id() instead of get_cpu() as we don't need the
extra features of get_cpu().

Signed-off-by: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-16 10:29:52 -08:00
John Hawkes f5899b5d4f [IA64] disable preemption in udelay()
The udelay() inline for ia64 uses the ITC.  If CONFIG_PREEMPT is enabled
and the platform has unsynchronized ITCs and the calling task migrates
to another CPU while doing the udelay loop, then the effective delay may
be too short or very, very long.

This patch disables preemption around 100 usec chunks of the overall
desired udelay time.  This minimizes preemption-holdoffs.

udelay() is now too big to be inline, move it out of line and export it.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-16 10:00:24 -08:00
Robin Holt 27af4cfd11 [IA64] fix for SET_PERSONALITY when CONFIG_IA32_SUPPORT is not set.
Missed this when fixing the SET_PERSONALITY change.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-14 08:52:57 -08:00
Linus Torvalds 0e67050666 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2005-12-12 16:48:29 -08:00
Keshavamurthy Anil S bf8d5c52c3 [PATCH] kprobes: increment kprobe missed count for multiprobes
When multiple probes are registered at the same address and if due to some
recursion (probe getting triggered within a probe handler), we skip calling
pre_handlers and just increment nmissed field.

The below patch make sure it walks the list for multiple probes case.
Without the below patch we get incorrect results of nmissed count for
multiple probe case.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 08:57:45 -08:00
Bob Moore 50eca3eb89 [ACPI] ACPICA 20050930
Completed a major overhaul of the Resource Manager code -
specifically, optimizations in the area of the AML/internal
resource conversion code. The code has been optimized to
simplify and eliminate duplicated code, CPU stack use has
been decreased by optimizing function parameters and local
variables, and naming conventions across the manager have
been standardized for clarity and ease of maintenance (this
includes function, parameter, variable, and struct/typedef
names.)

All Resource Manager dispatch and information tables have
been moved to a single location for clarity and ease of
maintenance. One new file was created, named "rsinfo.c".

The ACPI return macros (return_ACPI_STATUS, etc.) have
been modified to guarantee that the argument is
not evaluated twice, making them less prone to macro
side-effects. However, since there exists the possibility
of additional stack use if a particular compiler cannot
optimize them (such as in the debug generation case),
the original macros are optionally available.  Note that
some invocations of the return_VALUE macro may now cause
size mismatch warnings; the return_UINT8 and return_UINT32
macros are provided to eliminate these. (From Randy Dunlap)

Implemented a new mechanism to enable debug tracing for
individual control methods. A new external interface,
acpi_debug_trace(), is provided to enable this mechanism. The
intent is to allow the host OS to easily enable and disable
tracing for problematic control methods. This interface
can be easily exposed to a user or debugger interface if
desired. See the file psxface.c for details.

acpi_ut_callocate() will now return a valid pointer if a
length of zero is specified - a length of one is used
and a warning is issued. This matches the behavior of
acpi_ut_allocate().

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-12-10 00:20:25 -05:00
Venkatesh Pallipadi 95235ca2c2 [CPUFREQ] CPU frequency display in /proc/cpuinfo
What is the value shown in "cpu MHz" of /proc/cpuinfo when CPUs are capable of
changing frequency?

Today the answer is: It depends.
On i386:
SMP kernel - It is always the boot frequency
UP kernel - Scales with the frequency change and shows that was last set.

On x86_64:
There is one single variable cpu_khz that gets written by all the CPUs. So,
the frequency set by last CPU will be seen on /proc/cpuinfo of all the
CPUs in the system. What you see also depends on whether you have constant_tsc
capable CPU or not.

On ia64:
It is always boot time frequency of a particular CPU that gets displayed.

The patch below changes this to:
Show the last known frequency of the particular CPU, when cpufreq is present. If
cpu doesnot support changing of frequency through cpufreq, then boot frequency
will be shown. The patch affects i386, x86_64 and ia64 architectures.

Signed-off-by: Venkatesh Pallipadi<venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2005-12-06 19:35:11 -08:00
Len Brown 3d5271f988 Pull release into acpica branch 2005-12-06 17:31:30 -05:00
Robin Holt bd1d6e2451 [IA64] Change SET_PERSONALITY to comply with comment in binfmt_elf.c.
We have a customer application which trips a bug.  The problem arises
when a driver attempts to call do_munmap on an area which is mapped, but
because current->thread.task_size has been set to 0xC0000000, the call
to do_munmap fails thinking it is an unmap beyond the user's address
space.

The comment in fs/binfmt_elf.c in load_elf_library() before the call
to SET_PERSONALITY() indicates that task_size must not be changed for
the running application until flush_thread, but is for ia64 executing
ia32 binaries.

This patch moves the setting of task_size from SET_PERSONALITY() to
flush_thread() as indicated.  The customer application no longer is able
to trip the bug.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-06 09:12:34 -08:00
Venkatesh Pallipadi c82e6abfb3 [ACPI] IA64 ZX1 buildfix for _PDC patch
http://bugzilla.kernel.org/show_bug.cgi?id=5483

ZX1 config doesn't include cpufreq, so move move acpi-processor.c
up out of ia64/cpufreq directory.

no functional changes

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-12-05 17:24:45 -05:00
Keith Owens 05f70395c6 [IA64] Allow salinfo_decode to detect signals on read
Return -EINTR instead of -ERESTARTSYS when signals are delivered during
a blocked read of /proc/sal/*/event.  This allows salinfo_decode to
detect signals when it is blocked on a read of those files.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-12-05 11:49:17 -08:00
Venkatesh Pallipadi 05131ecc99 [ACPI] Avoid BIOS inflicted crashes by evaluating _PDC only once
Linux invokes the AML _PDC method (Processor Driver Capabilities)
to tell the BIOS what features it can handle.  While the ACPI
spec says nothing about the OS invoking _PDC multiple times,
doing so with changing bits seems to hopelessly confuse the BIOS
on multiple platforms up to and including crashing the system.

Factor out the _PDC invocation so Linux invokes it only once.

http://bugzilla.kernel.org/show_bug.cgi?id=5483

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-12-01 01:30:35 -05:00
MAEDA Naoaki c780f96490 [ACPI] ia64 build fix
arch/ia64/kernel/acpi-ext.c: In function `acpi_vendor_resource_match':
arch/ia64/kernel/acpi-ext.c:38: error: structure has no member named `id'

Signed-off-by: MAEDA Naoaki <maeda.naoaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2005-11-30 18:02:33 -05:00
Keshavamurthy Anil S 5a94bcfd2a [IA64] Remove getting break_num by decoding instruction
break.b always sets cr.iim to 0 and the current code tries to
get the break_num by decoding instruction. However, their
seems to be a race condition while reading the regs->cr_iip,
as on other cpu the break.b at regs->cr_iip might have been
replaced with the original instruction as a result of
unregister_kprobe() and hence decoding instruction to
obtain break_num will result in wrong value in this case.

Also includes changes to kprobes.c which now has to handle
break number zero.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-29 09:24:39 -08:00
Dean Roe b77dae5293 [IA64] - Make pfn_valid more precise for SGI Altix systems
A single SGI Altix system can be divided into multiple partitions,
each running their own instance of the Linux kernel.  pfn_valid()
is currently not optimal for any but the first partition, since it
does not compare the pfn with min_low_pfn before calling the more
costly ia64_pfn_valid().

Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-29 09:24:10 -08:00
Jim Keniston 8bf1101bd5 [PATCH] kprobes: Fix return probes on sys_execve
Fix a bug in kprobes that can cause an Oops or even a crash when a return
probe is installed on one of the following functions: sys_execve,
do_execve, load_*_binary, flush_old_exec, or flush_thread.  The fix is to
remove the call to kprobe_flush_task() in flush_thread().  This fix has
been tested on all architectures for which the return-probes feature has
been implemented (i386, x86_64, ppc64, ia64).  Please apply.

BACKGROUND

Up to now, we have called kprobe_flush_task() under two situations: when a
task exits, and when it execs.  Flushing kretprobe_instances on exit is
correct because (a) do_exit() doesn't return, and (b) one or more
return-probed functions may be active when a task calls do_exit().  Neither
is the case for sys_execve() and its callees.

Initially, the mistaken call to kprobe_flush_task() on exec was harmless
because we put the "real" return address of each active probed function
back in the stack, just to be safe, when we recycled its
kretprobe_instance.  When support for ppc64 and ia64 was added, this safety
measure couldn't be employed, and was eventually dropped even for i386 and
x86_64.  sys_execve() and its callees were informally blacklisted for
return probes until this fix was developed.

Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23 16:08:39 -08:00
Chen, Kenneth W e8aabc4716 [IA64] polish comments for tlb fault handler in ivt.S
Polish the comments specifically in vhpt_miss and nested_dtlb_miss
handlers.  I think it's better to explicitly name each page table
level with its name instead of numerically name them.  i.e., use
pgd, pud, pmd, and pte instead of referring as L1, L2, L3 etc.
Along the line, remove some magic number in the comments like:
"PTA + (((IFA(61,63) << 7) | IFA(33,39))*8)".  No code change at
all, pure comment update.  Feel free to shoot anything you have,
darts or tomahawk cruise missile.  I will duck behind a bunker ;-)

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-17 09:48:15 -08:00
Chen, Kenneth W fedb25fae7 [IA64] 4 level page table bug fix in vhpt_miss
From source code inspection, I think there is a bug with 4 level
page table with vhpt_miss handler.  In the code path of rechecking
page table entry against previously read value after tlb insertion,
*pte value in register r18 was overwritten with value newly read
from pud pointer, render the check of new *pte against previous
*pte completely wrong.  Though the bug is none fatal and the penalty
is to purge the entry and retry.  For functional correctness, it
should be fixed.  The fix is to use a different register so new
*pud don't trash *pte.  (btw, the comments in the cmp statement is
wrong as well, which I will address in the next patch).

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-17 09:47:18 -08:00
Chen, Kenneth W 1e185b97b4 [PATCH] ia64: cpu_idle performance bug fix
Our performance validation on 2.6.15-rc1 caught a disastrous performance
regression on ia64 with netperf (-98%) and volanomark (-58%) compares to
previous kernel version 2.6.14-git7.  See the following chart (result
group 1 & 2).

  http://kernel-perf.sourceforge.net/results.machine_id=26.html

We have root caused it to commit 64c7c8f885

This changeset broke the ia64 task resched notification.  In
sched.c:resched_task(), a reschedule IPI is conditioned upon
TIF_POLLING_NRFLAG.  However, the above changeset unconditionally set
the polling thread flag for idle tasks regardless whether pal_halt_light
is in use or not.  As a result, resched IPI is not sent from
resched_task().  And since the default behavior on ia64 is to use
pal_halt_light, we end up delaying the rescheduling task until next
timer tick, and thus cause the performance regression.

This fixes the performance bug.  I'm glad our performance suite is
turning up bad performance bug like this in time.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 15:50:51 -08:00
Robin Holt 837cd0bdf5 [IA64] 4-level page tables
This patch introduces 4-level page tables to ia64.  I have run
some benchmarks and found nothing interesting.  Performance has
consistently fallen within the noise range.

It also introduces a config option (setting the default to 3
levels).  The config option prevents having 4 level page
tables with 64k base page size.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-11 09:37:29 -08:00
Panagiotis Issaris baf47fb660 [IA64] Replace kcalloc(1, with kzalloc.
Conversion from kcalloc(1, to kzalloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-10 11:28:20 -08:00
Tony Luck 7669a22592 Pull context-bitmap into release branch 2005-11-10 10:39:49 -08:00
Tony Luck cb8a55e4cd Pull extend-notify-die into release branch 2005-11-10 10:39:09 -08:00
Tony Luck cf1d469ec1 Pull mca-check-psp into release branch 2005-11-10 10:38:05 -08:00
Tony Luck 64de57ffd3 Pull align-sig-frame into release branch 2005-11-10 10:37:35 -08:00
Linus Torvalds 7df446e7e0 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2005-11-09 08:33:27 -08:00
Nick Piggin 64c7c8f885 [PATCH] sched: resched and cpu_idle rework
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid.  Improves efficiency of
resched_task and some cpu_idle routines.

* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
  and as we hold it during resched_task, then there is no need for an
  atomic test and set there. The only other time this should be set is
  when the task's quantum expires, in the timer interrupt - this is
  protected against because the rq lock is irq-safe.

- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
  won't get unset until the task get's schedule()d off.

- If we are running on the same CPU as the task we resched, then set
  TIF_NEED_RESCHED and no further action is required.

- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
  after TIF_NEED_RESCHED has been set, then we need to send an IPI.

Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.

* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
  becomes true, explicitly call schedule(). This makes things a bit clearer
  (IMO), but haven't updated all architectures yet.

- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
  to the resched_task rules, this isn't needed (and actually breaks the
  assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
  held). So remove that. Generally one less locked memory op when switching
  to the idle thread.

- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
  most polling idle loops. The above resched_task semantics allow it to be
  set until before the last time need_resched() is checked before going into
  a halt requiring interrupt wakeup.

  Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
  can be always left set, completely eliminating resched IPIs when rescheduling
  the idle task.

  POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:33 -08:00
Nick Piggin 5bfb5d690f [PATCH] sched: disable preempt in idle tasks
Run idle threads with preempt disabled.

Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
How did it ever work before?

Might fix the CPU hotplugging hang which Nigel Cunningham noted.

We think the bug hits if the idle thread is preempted after checking
need_resched() and before going to sleep, then the CPU offlined.

After calling stop_machine_run, the CPU eventually returns from preemption and
into the idle thread and goes to sleep.  The CPU will continue executing
previous idle and have no chance to call play_dead.

By disabling preemption until we are ready to explicitly schedule, this bug is
fixed and the idle threads generally become more robust.

From: alexs <ashepard@u.washington.edu>

  PPC build fix

From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>

  MIPS build fix

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:33 -08:00
Russ Anderson cbb9214434 [IA64] MCA recovery: Bump reference count on bad pages
When a page has a memory uncorrectable ECC error, the recovery
code wants to prevent the page from being reused.  This change
bumps the reference count to prevent the page from getting back
on the free list.

Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-08 10:04:16 -08:00
Russ Anderson 56f87b8217 [IA64] MCA recovery: pfn_valid() needs a pfn
paddr needs to be shifted by PAGE_SHIFT to be valid
input for pfn_valid().

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-08 10:03:05 -08:00
Russ Anderson a14f25a076 [IA64] MCA recovery based on PSP bits
The determination of whether an MCA is recoverable or not must
be based on the bits set in the PSP (Processor State Parameter).
The specific bits are shown in the Intel IA-64 Architecture Software
Developer's Manual, Vol 2, Table 11-6 Software Recovery Bits in
Processor State Parameter.  Those bits should be consistent
across the entire IA-64 family of processors.

Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-08 10:00:56 -08:00
David Mosberger-Tang cf20d1eafb [IA64] align signal-frame even when not using alternate signal-stack
At the moment, attempting to invoke a signal-handler on the normal
stack is guaranteed to fail if the stack-pointer happens not to be
16-byte aligned.  This is because the signal-trampoline will attempt
to store fp-regs with stf.spill instructions, which will trap for
misaligned addresses.  This isn't terribly useful behavior.  It's
better to just always align the signal frame to the next lower 16-byte
boundary.

Signed-off-by: David Mosberger-Tang <David.Mosberger@acm.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-08 09:58:06 -08:00
Keith Owens 9138d581b0 [IA64] Extend notify_die() hooks for IA64
notify_die() added for MCA_{MONARCH,SLAVE,RENDEZVOUS}_{ENTER,PROCESS,LEAVE} and
INIT_{MONARCH,SLAVE}_{ENTER,PROCESS,LEAVE}.  We need multiple
notification points for these events because they can take many seconds
to run which has nasty effects on the behaviour of the rest of the
system.

DIE_SS replaced by a generic DIE_FAULT which checks the vector number,
to allow interception of faults other than SS.

DIE_MACHINE_{HALT,RESTART} added to allow last minute close down
processing, especially when the halt/restart routines are called from
error handlers.

DIE_OOPS added.

The check for kprobe's break numbers has been moved from traps.c to
kprobes.c, allowing DIE_BREAK to be used for any additional break
numbers, i.e. it is no longer kprobes specific.

Hooks for kernel debuggers and kernel dumpers added, ENTER and LEAVE.
Both of these disable the system for long periods which impact on
watchdogs and heartbeat systems in general.  More patches to come that
use these events to reset watchdogs and heartbeats.

unregister_die_notifier() added and both routines exported.  Requested
by Dean Nelson.

Lock removed from {un,}register_die_notifier.  notifier_chain_register()
already takes a lock.  Also the generic notifier chain locking is being
reworked to distinguish between callbacks that can block and those that
cannot, the lock in {un,}register_die_notifier would interfere with
that change.  http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

Leading white space removed from arch/ia64/kernel/kprobes.c.

Typo in mca.c in original version of this patch found & fixed by Dean
Nelson.

Signed-off-by: Keith Owens <kaos@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Acked-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-07 11:27:13 -08:00
Jesper Juhl b2325fe1b7 [PATCH] kfree cleanup: arch
This is the arch/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in arch/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:54:06 -08:00
Ananth N Mavinakayanahalli d217d5450f [PATCH] Kprobes: preempt_disable/enable() simplification
Reorganize the preempt_disable/enable calls to eliminate the extra preempt
depth.  Changes based on Paul McKenney's review suggestions for the kprobes
RCU changeset.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:46 -08:00
Ananth N Mavinakayanahalli 991a51d83a [PATCH] Kprobes: Use RCU for (un)register synchronization - arch changes
Changes to the arch kprobes infrastructure to take advantage of the locking
changes introduced by usage of RCU for synchronization.  All handlers are now
run without any locks held, so they have to be re-entrant or provide their own
synchronization.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:46 -08:00
Ananth N Mavinakayanahalli 8a5c4dc5e5 [PATCH] Kprobes: Track kprobe on a per_cpu basis - ia64 changes
IA64 changes to track kprobe execution on a per-cpu basis.  We now track the
kprobe state machine independently on each cpu using an arch specific kprobe
control block.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:45 -08:00