Commit Graph

5262 Commits

Author SHA1 Message Date
Trond Myklebust ec06c096ed NFS: Cleanup of NFS read code
Same callback hierarchy inversion as for the NFS write calls. This patch is
not strictly speaking needed by the O_DIRECT code, but avoids confusing
differences between the asynchronous read and write code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:27 -05:00
Trond Myklebust 788e7a89a0 NFS: Cleanup of NFS write code in preparation for asynchronous o_direct
This patch inverts the callback hierarchy for NFS write calls.

Instead of having the NFSv2/v3/v4-specific code set up the RPC callback
ops, we allow the original caller to do so. This allows for more
flexibility w.r.t. how to set up and tear down the nfs_write_data
structure while still allowing the NFSv3/v4 code to perform error
handling.

The greater flexibility is needed by the asynchronous O_DIRECT code, which
wants to be able to hold on to the original nfs_write_data structures after
the WRITE RPC call has completed in order to be able to replay them if the
COMMIT call determines that the server has rebooted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:27 -05:00
J. Bruce Fields 7117bf3dfb lockd: Remove FL_LOCKD flag
Currently lockd identifies its own locks using the FL_LOCKD flag.  This
doesn't scale well to multiple lock managers--if we did this in nfsv4 too,
for example, we'd be left with only one free flag bit.

Instead, we just check whether the file manager ops (fl_lmops) set on this
lock are our own.

The only use for this is in nlm_traverse_locks, which uses it to find locks
that need cleaning up when freeing a host or a file.

In the long run it might be nice to do reference counting instead of
traversing all the locks like this....

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:26 -05:00
Andy Adamson 8dc7c3115b locks,lockd: fix race in nlmsvc_testlock
posix_test_lock() returns a pointer to a struct file_lock which is unprotected
and can be removed while in use by the caller.  Move the conflicting lock from
the return to a parameter, and copy the conflicting lock.

In most cases the caller ends up putting the copy of the conflicting lock on
the stack.  On i386, sizeof(struct file_lock) appears to be about 100 bytes.
We're assuming that's reasonable.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:26 -05:00
Andy Adamson 2e0af86f61 locks: remove unused posix_block_lock
posix_lock_file() is used to add a blocked lock to Lockd's block, so
posix_block_lock() is no longer needed.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:26 -05:00
Chuck Lever dead28da8e SUNRPC: eliminate rpc_call()
Clean-up: replace rpc_call() helper with direct call to rpc_call_sync.

This makes NFSv2 and NFSv3 synchronous calls more computationally
efficient, and reduces stack consumption in functions that used to
invoke rpc_call more than once.

Test plan:
Compile kernel with CONFIG_NFS enabled.  Connectathon on NFS version 2,
version 3, and version 4 mount points.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:23 -05:00
Chuck Lever cc0175c1dc SUNRPC: display human-readable procedure name in rpc_iostats output
Add fields to the rpc_procinfo struct that allow the display of a
human-readable name for each procedure in the rpc_iostats output.

Also fix it so that the NFSv4 stats are broken up correctly by
sub-procedure number.  NFSv4 uses only two real RPC procedures:
NULL, and COMPOUND.

Test plan:
Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats".

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:22 -05:00
Chuck Lever 11c556b3d8 SUNRPC: provide a mechanism for collecting stats in the RPC client
Add a simple mechanism for collecting stats in the RPC client.  Stats are
tabulated during xprt_release.  Note that per_cpu shenanigans are not
required here because the RPC client already serializes on the transport
write lock.

Test plan:
Compile kernel with CONFIG_NFS enabled.  Basic performance regression
testing with high-speed networking and high performance server.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:22 -05:00
Chuck Lever ef759a2e54 SUNRPC: introduce per-task RPC iostats
Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent.  Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:17 -05:00
Chuck Lever 262ca07de4 SUNRPC: add a handful of per-xprt counters
Monitor generic transport events.  Add a transport switch callout to
format transport counters for export to user-land.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:16 -05:00
Chuck Lever e19b63dafd SUNRPC: track length of RPC wait queues
RPC wait queue length will eventually be exported to userland via the RPC
iostats interface.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:15 -05:00
Chuck Lever 67ec9f46b8 NFS: report how long an NFS file system has been mounted
Add a field in nfs_server to record a timestamp when a mount succeeds.
Report the number of seconds the file system has been mounted via
nfs_show_stats().

Test plan:
Mount an NFS file system, watch the mountstats reports and compare with
clock time.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:15 -05:00
Chuck Lever d9ef5a8c26 NFS: introduce mechanism for tracking NFS client metrics
Add a per-superblock performance counter facility to the NFS client.  This
facility mimics the counters available for block devices and for
networking.  Expose these new counters via the new /proc/self/mountstats
interface.

Thanks to Andrew Morton and Trond Myklebust for their review and comments.

Test plan:
fsx and iozone on UP and SMP systems, with and without pre-emption.  Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:13 -05:00
Chuck Lever 7a480e250c NFS: show retransmit settings when displaying mount options
Sometimes it's important to know the exact RPC retransmit settings the
kernel is using for an NFS mount point.  Add this facility to the NFS
client's show_options method.

Test plan:
Set various retransmit settings via the mount command, and check that the
settings are reflected in /proc/mounts.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:12 -05:00
Chuck Lever b4629fe2f0 VFS: New /proc file /proc/self/mountstats
Create a new file under /proc/self, called mountstats, where mounted file
systems can export information (configuration options, performance counters,
and so on).  Use a mechanism similar to /proc/mounts and s_ops->show_options.

This mechanism does not violate namespace security, and is safe to use while
other processes are unmounting file systems.

Thanks to Mike Waychison for his review and comments.

Test-plan:
Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:12 -05:00
Goldwyn Rodrigues 24bd68f46b NFS: Code comments update in NFS
read_cache_mtime is no longer used in nfs_inode. This patch removes
references of read_cache_mtime in the code comments.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:11 -05:00
Trond Myklebust 24c5d9d7ea SUNRPC: Run rpci->queue_timeout on the rpciod workqueue instead of generic
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:08 -05:00
Trond Myklebust 7bab377fcb lockd: Don't expose the process pid to the NLM server
Instead we use the nlm_lockowner->pid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:06 -05:00
Trond Myklebust b92dccf65b NFS: Fix a busy inodes issue...
The nfs_open_context may live longer than the file descriptor that spawned
it, so it needs to carry a reference to the vfsmount. If not, then
generic_shutdown_super() may end up being called before reads and writes
have been flushed out.

Make a couple of functions static while we're at it...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:03 -05:00
Linus Torvalds 4657190936 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel.
  [MIPS] Sibyte: Fix race in sb1250_gettimeoffset().
  [MIPS] Sibyte: Fix interrupt timer off by one bug.
  [MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.
  [MIPS] Protect more of timer_interrupt() by xtime_lock.
  [MIPS] Work around bad code generation for <asm/io.h>.
  [MIPS] Simple patch to power off DBAU1200
  [MIPS] Fix DBAu1550 software power off.
  [MIPS] local_r4k_flush_cache_page fix
  [MIPS] SB1: Fix interrupt disable hazard.
  [MIPS] Get rid of the IP22-specific code in arclib.
  Update MAINTAINERS entry for MIPS.
2006-03-19 21:12:00 -08:00
Michael Chan 4a29cc2e50 [TG3]: 40-bit DMA workaround part 2
The 40-bit DMA workaround recently implemented for 5714, 5715, and
5780 needs to be expanded because there may be other tg3 devices
behind the EPB Express to PCIX bridge in the 5780 class device.

For example, some 4-port card or mother board designs have 5704 behind
the 5714.

All devices behind the EPB require the 40-bit DMA workaround.

Thanks to Chris Elmquist again for reporting the problem and testing
the patch.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-19 13:21:12 -08:00
Ralf Baechle DL5RB c7c694d196 [AX.25]: Fix potencial memory hole.
If the AX.25 dialect chosen by the sysadmin is set to DAMA master / 3
(or DAMA slave / 2, if CONFIG_AX25_DAMA_SLAVE=n) ax25_kick() will fall
through the switch statement without calling ax25_send_iframe() or any
other function that would eventually free skbn thus leaking the packet.

Fix by restricting the sysctl inferface to allow only actually supported
AX.25 dialects.

The system administration mistake needed for this to happen is rather
unlikely, so this is an uncritical hole.

Coverity #651.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-19 13:20:06 -08:00
Ralf Baechle a904f74785 [MIPS] Sibyte: Fix race in sb1250_gettimeoffset().
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
    
sb1250_gettimeoffset() simply reads the current cpu 0 timer remaining
value, however once this counter reaches 0 and the interrupt is raised,
it immediately resets and begins to count down again.
    
If sb1250_gettimeoffset() is called on cpu 1 via do_gettimeofday() after
the timer has reset but prior to cpu 0 processing the interrupt and
taking write_seqlock() in timer_interrupt() it will return a full value
(or close to it) causing time to jump backwards 1ms. Once cpu 0 handles
the interrupt and timer_interrupt() gets far enough along it will jump
forward 1ms.
    
Fix this problem by implementing mips_hpt_*() on sb1250 using a spare
timer unrelated to the existing periodic interrupt timers. It runs at
1Mhz with a full 23bit counter.  This eliminated the custom
do_gettimeoffset() for sb1250 and allowed use of the generic
fixed_rate_gettimeoffset() using mips_hpt_*() and timerhi/timerlo.
    
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-18 16:59:30 +00:00
Ralf Baechle a77f124294 [MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
    
Field width should be 23 bits not 20 bits.
    
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-18 16:59:29 +00:00
Ralf Baechle 966f4406d9 [MIPS] Work around bad code generation for <asm/io.h>.
If a call to set_io_port_base() was being followed by usage of
mips_io_port_base in the same function gcc was possibly using the old
value due to some clever abuse of const.  Adding a barrier will keep
the optimization and result in correct code with latest gcc.
    
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-18 16:59:28 +00:00
Atsushi Nemoto de62893bc0 [MIPS] local_r4k_flush_cache_page fix
If dcache_size != icache_size or dcache_size != scache_size, or
set-associative cache, icache/scache does not flushed properly.  Make
blast_?cache_page_indexed() masks its index value correctly.  Also,
use physical address for physically indexed pcache/scache.
    
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-18 16:59:27 +00:00
Ralf Baechle a3c4946db4 [MIPS] SB1: Fix interrupt disable hazard.
The SB1 core has a three cycle interrupt disable hazard but we were
wrongly treating it as fully interlocked.
    
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-18 16:59:26 +00:00
Alexey Kuznetsov 265a92856b [NET]: Fix race condition in sk_wait_event().
It is broken, the condition is checked out of socket lock. It is
wonderful the bug survived for so long time.

[ This fixes bugzilla #6233:
  race condition in tcp_sendmsg when connection became established ]

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-17 16:05:43 -08:00
Linus Torvalds 485ff09990 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge:
  powerpc: update defconfigs
  [PATCH] powerpc: properly configure DDR/P5IOC children devs
  [PATCH] powerpc: remove duplicate EXPORT_SYMBOLS
  [PATCH] powerpc: RTC memory corruption
  [PATCH] powerpc: enable NAP only on cpus who support it to avoid memory corruption
  [PATCH] powerpc: Clarify wording for CRASH_DUMP Kconfig option
  [PATCH] powerpc/64: enable CONFIG_BLK_DEV_SL82C105
  [PATCH] powerpc: correct cacheflush loop in zImage
  powerpc: Fix problem with time going backwards
  powerpc: Disallow lparcfg being a module
2006-03-16 09:13:34 -08:00
John Rose 92eb4602eb [PATCH] powerpc: properly configure DDR/P5IOC children devs
The dynamic add path for PCI Host Bridges can fail to configure children
adapters under P5IOC controllers.  It fails to properly fixup bus/device
resources, and it fails to properly enable EEH.  Both of these steps
need to occur before any children devices are enabled in
pci_bus_add_devices().

Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-16 16:55:07 +11:00
Ben Dooks dabaeff06c [ARM] 3364/1: [cleanup] warning fix - definitions for enable_hlt and disable_hlt
Patch from Ben Dooks

The enable_hlt and disable_hlt should be declared in
include/asm/setup.h. This fixes sparse errors from
arch/arm/kernel/process.c

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-15 23:17:26 +00:00
Linus Torvalds 7cafae5238 Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] iwmmxt thread state alignment
  [ARM] 3350/1: Enable 1-wire on ARM
  [ARM] 3356/1: Workaround for the ARM1136 I-cache invalidation problem
  [ARM] 3355/1: NSLU2: remove propmt depends
  [ARM] 3354/1: NAS100d: fix power led handling
  [ARM] Fix muldi3.S
2006-03-12 14:56:02 -08:00
Russell King cdaabbd74b [ARM] iwmmxt thread state alignment
This patch removes the reliance of iwmmxt on hand coded alignments.
Since thread_info is always 8K aligned, specifying that fpstate is
8-byte aligned achieves the same effect without needing to resort
to hand coded alignments.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-12 22:36:06 +00:00
Christoph Hellwig 7cd9013be6 [PATCH] remove __put_task_struct_cb export again
The patch '[PATCH] RCU signal handling' [1] added an export for
__put_task_struct_cb, a put_task_struct helper newly introduced in that
patch.  But the put_task_struct couldn't be used modular previously as
__put_task_struct wasn't exported.  There are not callers of it in modular
code, and it shouldn't be exported because we don't want drivers to hold
references to task_structs.

This patch removes the export and folds __put_task_struct into
__put_task_struct_cb as there's no other caller.

[1] http://www2.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e56d090310d7625ecb43a1eeebd479f04affb48b

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-11 09:19:34 -08:00
Kirill Korotaev 0adb25d2e7 [PATCH] ext3: ext3_symlink should use GFP_NOFS allocations inside
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in
ext3_symlink().  Such allocation may re-enter ext3 code from
try_to_free_pages.  But JBD/ext3 code keeps a pointer to current journal
handle in task_struct and, hence, is not reentrable.

This bug led to "Assertion failure in journal_dirty_metadata()" messages.

http://bugzilla.openvz.org/show_bug.cgi?id=115

Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-11 09:19:34 -08:00
Christoph Lameter 8fce4d8e3b [PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages.
The cache reaper currently tries to free all alien caches and all remote
per cpu pages in each pass of cache_reap.  For a machines with large number
of nodes (such as Altix) this may lead to sporadic delays of around ~10ms.
Interrupts are disabled while reclaiming creating unacceptable delays.

This patch changes that behavior by adding a per cpu reap_node variable.
Instead of attempting to free all caches, we free only one alien cache and
the per cpu pages from one remote node.  That reduces the time spend in
cache_reap.  However, doing so will lengthen the time it takes to
completely drain all remote per cpu pagesets and all alien caches.  The
time needed will grow with the number of nodes in the system.  All caches
are drained when they overflow their respective capacity.  So the drawback
here is only that a bit of memory may be wasted for awhile longer.

Details:

1. Rename drain_remote_pages to drain_node_pages to allow the specification
   of the node to drain of pcp pages.

2. Add additional functions init_reap_node, next_reap_node for NUMA
   that manage a per cpu reap_node counter.

3. Add a reap_alien function that reaps only from the current reap_node.

For us this seems to be a critical issue.  Holdoffs of an average of ~7ms
cause some HPC benchmarks to slow down significantly.  F.e.  NAS parallel
slows down dramatically.  NAS parallel has a 12-16 seconds runtime w/o rotor
compared to 5.8 secs with the rotor patches.  It gets down to 5.05 secs with
the additional interrupt holdoff reductions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09 19:47:38 -08:00
Roman Zippel 7b61fcda8a [PATCH] m68k: fix cmpxchg compile errors if CONFIG_RMW_INSNS=n
We require that all archs implement atomic_cmpxchg(), for the generic
version of atomic_add_unless().

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09 19:47:38 -08:00
Atsushi Nemoto 0ef675d491 [PATCH] mtd: 64 bit fixes
Fix some bugs in mtd/jffs2 on 64bit platform.

The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h.

And some variables in jffs2 are declared as uint32_t but used to hold
size_t values.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09 19:47:37 -08:00
Ralf Baechle fd2a4f1183 [MIPS] Undefine scr_writew and scr_readw in <asm/vga.h>.
This is gluing the build of cirrusfb but really the mess that would need
cleaning and fixing is <video/vga.h> and <linux/vt_buffer.h> ...
    
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-03-09 18:05:10 +00:00
Linus Torvalds 0d514f040a Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge:
  powerpc: Fix various syscall/signal/swapcontext bugs
  [PATCH] powerpc: incorrect rmo_top handling in prom_init
  [PATCH] powerpc: Fix incorrect pud_ERROR() message
  [PATCH] powerpc: Expose SMT and L1 icache snoop userland features
  [PATCH] powerpc: Fix windfarm_pm112 not starting all control loops
  [PATCH] powerpc: Fix old g5 issues with windfarm
  powerpc32: Fix timebase synchronization on 32-bit powermacs
  powerpc: Turn off verbose debug output in powermac platform functions
  powerpc: Fix might-sleep warning in program check exception handler
2006-03-08 18:11:00 -08:00
Andi Kleen f9262c12c0 [PATCH] i386: port ATI timer fix from x86_64 to i386 II
ATI chipsets tend to generate double timer interrupts for the local APIC
timer when both the 8254 and the IO-APIC timer pins are enabled.  This is
because they route it to both and the result is anded together and the CPU
ends up processing it twice.

This patch changes check_timer to disable the 8254 routing for interrupt 0.

I think it would be safe on all chipsets actually (i tested it on a couple
and it worked everywhere) and Windows seems to do it in a similar way, but
to be conservative this patch only enables this mode on ATI (and adds
options to enable/disable too)

Ported over from a similar x86-64 change.

I reused the ACPI earlyquirk infrastructure for the ATI bridge check, but
tweaked it a bit to work even without ACPI.

Inspired by a patch from Chuck Ebbert, but redone.

Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 18:10:31 -08:00
Michael Matz 2ec5e3a867 [PATCH] fix kexec asm
While testing kexec and kdump we hit problems where the new kernel would
freeze or instantly reboot.  The easiest way to trigger it was to kexec a
kernel compiled for CONFIG_M586 on an athlon cpu.  Compiling for CONFIG_MK7
instead would work fine.

The patch fixes a few problems with the kexec inline asm.

Signed-off-by: Chris Mason <mason@suse.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:15:04 -08:00
Dipankar Sarma 529bf6be5c [PATCH] fix file counting
I have benchmarked this on an x86_64 NUMA system and see no significant
performance difference on kernbench.  Tested on both x86_64 and powerpc.

The way we do file struct accounting is not very suitable for batched
freeing.  For scalability reasons, file accounting was
constructor/destructor based.  This meant that nr_files was decremented
only when the object was removed from the slab cache.  This is susceptible
to slab fragmentation.  With RCU based file structure, consequent batched
freeing and a test program like Serge's, we just speed this up and end up
with a very fragmented slab -

llm22:~ # cat /proc/sys/fs/file-nr
587730  0       758844

At the same time, I see only a 2000+ objects in filp cache.  The following
patch I fixes this problem.

This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api.  In the sysctl handler for
nr_files, we populate files_stat.nr_files before returning to user.

Counting files as an when they are created and destroyed (as opposed to
inside slab) allows us to correctly count open files with RCU.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:14:01 -08:00
Dipankar Sarma 21a1ea9eb4 [PATCH] rcu batch tuning
This patch adds new tunables for RCU queue and finished batches.  There are
two types of controls - number of completed RCU updates invoked in a batch
(blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark,
qlowmark).

By default, the per-cpu batch limit is set to a small value.  If the input
RCU rate exceeds the high watermark, we do two things - force quiescent
state on all cpus and set the batch limit of the CPU to INTMAX.  Setting
batch limit to INTMAX forces all finished RCUs to be processed in one shot.
 If we have more than INTMAX RCUs queued up, then we have bigger problems
anyway.  Once the incoming queued RCUs fall below the low watermark, the
batch limit is set to the default.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:14:01 -08:00
Andrew Morton e2bab3d924 [PATCH] percpu_counter_sum()
Implement percpu_counter_sum().  This is a more accurate but slower version of
percpu_counter_read_positive().

We need this for Alex's speedup-ext3_statfs patch and for the nr_file
accounting fix.  Otherwise these things would be too inaccurate on large CPU
counts.

Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Cc: Alex Tomas <alex@clusterfs.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:14:01 -08:00
Atsushi Nemoto 707ced0d71 [PATCH] __get_unaligned() gcc-4 fix
If the 'ptr' is a const, this code cause "assignment of read-only variable"
error on gcc 4.x.

Use __u64 instead of __typeof__(*(ptr)) for temporary variable to get
rid of errors on gcc 4.x.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:14:00 -08:00
Mark Fasheh 1c6cc5fd32 [PATCH] powerpc: restore eeh_add_device_late() prototype stub
We fixed this:

arch/powerpc/platforms/pseries/eeh.c: In function `eeh_add_device_tree_late':
arch/powerpc/platforms/pseries/eeh.c:901: warning: implicit declaration of function `eeh_add_device_late'
arch/powerpc/platforms/pseries/eeh.c: At top level:
arch/powerpc/platforms/pseries/eeh.c:918: error: conflicting types for 'eeh_add_device_late'
arch/powerpc/platforms/pseries/eeh.c:901: error: previous implicit declaration of 'eeh_add_device_late' was here
make[2]: *** [arch/powerpc/platforms/pseries/eeh.o] Error 1

But we forgot the !CONFIG_EEH stub.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:14:00 -08:00
Linus Torvalds a19cbd4bf2 Mark the pipe file operations static
They aren't used (nor even really usable) outside of pipe.c anyway

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:03:09 -08:00
Paul Mackerras 1bd79336a4 powerpc: Fix various syscall/signal/swapcontext bugs
A careful reading of the recent changes to the system call entry/exit
paths revealed several problems, plus some things that could be
simplified and improved:

* 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit
  path, so it was only doing anything with it once it saw some other
  bit being set.  In other words, the noerror behaviour would apply to
  the next system call where we had to reschedule or deliver a signal,
  which is not necessarily the current system call.

* 32-bit wasn't doing the call to ptrace_notify in the syscall exit
  path when the _TIF_SINGLESTEP bit was set.

* _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and
  _TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set
  by system calls.  I took it out of _TIF_USER_WORK_MASK.

* On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers
  to be restored (unless perhaps a signal was delivered or the syscall
  was traced or single-stepped).  Thus the non-volatile registers
  weren't restored on exit from a signal handler.  We probably got
  away with it mostly because signal handlers written in C wouldn't
  alter the non-volatile registers.

* On 32-bit I simplified the code and made it more like 64-bit by
  making the syscall exit path jump to ret_from_except to handle
  preemption and signal delivery.

* 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was
  set - but I think because of that 32-bit was actually restoring the
  non-volatile registers on exit from a signal handler.

* I changed the order of enabling interrupts and saving the
  non-volatile registers before calling do_syscall_trace_leave; now we
  enable interrupts first.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-08 13:24:22 +11:00
Catalin Marinas 6a0e243069 [ARM] 3352/1: DSB required for the completion of a TLB maintenance operation
Patch from Catalin Marinas

Chapter B2.7.3 in the latest ARM ARM (with v6 information) states that
the completion of a TLB maintenance operation is only guaranteed by
the execution of a DSB (Data Syncronization Barrier, formerly Data
Write Barrier or Drain Write Buffer).

Note that a DSB is only needed in the flush_tlb_kernel_* functions
since the completion is guaranteed by a mode change (i.e. switching
back to user mode) for the flush_tlb_user_* functions.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-07 14:42:27 +00:00