Commit Graph

7207 Commits

Author SHA1 Message Date
Linus Torvalds 41f81e88e0 Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] Fix xfs_ichgtime()s broken usage of I_SYNC
  [XFS] Make xfsbufd threads freezable
  [XFS] revert to double-buffering readdir
  [XFS] Fix broken inode cluster setup.
  [XFS] Clear XBF_READ_AHEAD flag on I/O completion.
  [XFS] Fixed a few bugs in xfs_buf_associate_memory()
  [XFS] 971064 Various fixups for xfs_bulkstat().
  [XFS] Fix dbflush panic in xfs_qm_sync.
2007-12-10 10:18:27 -08:00
David Chinner cf10e82bdc [XFS] Fix xfs_ichgtime()s broken usage of I_SYNC
The recent I_LOCK->I_SYNC changes mistakenly changed xfs_ichgtime to look
at I_SYNC instead of I_LOCK. This was incorrect and prevents newly created
inodes from moving to the dirty list. Change this to the correct check
which is for I_NEW, not I_LOCK or I_SYNC so that behaviour is correct.

SGI-PV: 974225
SGI-Modid: xfs-linux-melb:xfs-kern:30204a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:47:56 +11:00
Rafael J. Wysocki 978c7b2ff4 [XFS] Make xfsbufd threads freezable
Fix breakage caused by commit 8314418629
that did not introduce the necessary call to set_freezable() in
xfs/linux-2.6/xfs_buf.c .

SGI-PV: 974224
SGI-Modid: xfs-linux-melb:xfs-kern:30203a

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:47:36 +11:00
Christoph Hellwig e89bc612d6 [XFS] revert to double-buffering readdir
The current readdir implementation deadlocks on a btree buffers locks
because nfsd calls back into ->lookup from the filldir callback. The only
short-term fix for this is to revert to the old inefficient
double-buffering scheme.

SGI-PV: 973377
SGI-Modid: xfs-linux-melb:xfs-kern:30201a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:47:15 +11:00
David Chinner a7430847fc [XFS] Fix broken inode cluster setup.
The radix tree based inode caches did away with the inode cluster hashes,
replacing them with a bunch of masking and gang lookups on the radix tree.

This masking got broken when moving the code to per-ag radix trees and
indexing by agino # rather than straight inode number. The result is
clustered inode writeback does not cluster and things can go extremely
slowly when there are lots of inodes to write.

Fix it up by comparing the agino # of the inode we just looked up to the
index of the cluster we are looking for.

Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com>

SGI-PV: 972915
SGI-Modid: xfs-linux-melb:xfs-kern:30033a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:46:59 +11:00
Lachlan McIlroy 77be55a5a1 [XFS] Clear XBF_READ_AHEAD flag on I/O completion.
SGI-PV: 972554
SGI-Modid: xfs-linux-melb:xfs-kern:30128a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2007-12-10 13:46:45 +11:00
Lachlan McIlroy d1afb678ce [XFS] Fixed a few bugs in xfs_buf_associate_memory()
- calculation of 'page_count' was incorrect as it did not
  consider the offset of 'mem' into the first page. The
  logic to bump 'page_count' didn't work if 'len' was <=
  PAGE_CACHE_SIZE (ie offset = 3k, len = 2k).
- setting b_buffer_length to 'len' is incorrect if 'offset'
  is > 0. Set it to the total length of the buffer.
- I suspect that passing a non-aligned address into
  mem_to_page() for the first page may have been causing
  issues - don't know but just tidy up that code anyway.

SGI-PV: 971596
SGI-Modid: xfs-linux-melb:xfs-kern:30143a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2007-12-10 13:46:20 +11:00
Lachlan McIlroy cd57e594ad [XFS] 971064 Various fixups for xfs_bulkstat().
- sanity check for NULL user buffer in xfs_ioc_bulkstat[_compat]()
- remove the special case for XFS_IOC_FSBULKSTAT with count == 1. This
  special case causes bulkstat to fail because the special case uses
  xfs_bulkstat_single() instead of xfs_bulkstat() and the two functions
  have different semantics.  xfs_bulkstat() will return the next inode
  after the one supplied while skipping internal inodes (ie quota inodes).
  xfs_bulkstate_single() will only lookup the inode supplied and return
  an error if it is an internal inode.
- in xfs_bulkstat(), need to initialise 'lastino' to the inode supplied
  so in cases were we return without examining any inodes the scan wont
  restart back at zero.
- sanity check for valid *ubcountp values. Cannot sanity check for valid
  ubuffer here because some users of xfs_bulkstat() don't supply a buffer.
- checks against 'ubleft' (the space left in the user's buffer) should be
  against 'statstruct_size' which is the supplied minimum object size.
  The mixture of checks against statstruct_size and 0 was one of the
  reasons we were skipping inodes.
- if the formatter function returns BULKSTAT_RV_NOTHING and an error and
  the error is not ENOENT or EINVAL then we need to abort the scan. ENOENT
  is for inodes that are no longer valid and we just skip them. EINVAL is
  returned if we try to lookup an internal inode so we skip them too. For
  a DMF scan if the inode and DMF attribute cannot fit into the space left
  in the user's buffer it would return ERANGE. We didn't handle this error
  and skipped the inode. We would continue to skip inodes until one fitted
  into the user's buffer or we completed the scan.
- put back the recalculation of agino (that got removed with the last fix)
  at the end of the while loop. This is because the code at the start of
  the loop expects agino to be the last inode examined if it is non-zero.
- if we found some inodes but then encountered an error, return success
  this time and the error next time. If the formatter aborted with ENOMEM
  we will now return this error but only if we couldn't read any inodes.
  Previously if we encountered ENOMEM without reading any inodes we
  returned a zero count and no error which falsely indicated the scan was
  complete.

SGI-PV: 973431
SGI-Modid: xfs-linux-melb:xfs-kern:30089a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
2007-12-10 13:44:11 +11:00
Donald Douwsma d757762bf2 [XFS] Fix dbflush panic in xfs_qm_sync.
The recent behaviour layer removal dropped the check for quotas that have
been requested at mount time but have subsequently been turned off. This
results in a panic when accessing m_quotainfo which has been freed.

This patch adds the check originally made by xfs_qm_syncall() to
xfs_qm_sync().

SGI-PV: 969769
SGI-Modid: xfs-linux-melb:xfs-kern:29908a

Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-12-10 13:40:10 +11:00
Len Brown f7a5274d7d Pull suspend-2.6.24 into release branch 2007-12-06 16:26:52 -05:00
Al Viro 97bd7919e2 remove nonsense force-casts from ocfs2
endianness annotations in networking code had been in place for quite a
while; in particular, sin_port and s_addr are annotated as big-endian.

Code in ocfs2 had __force casts added apparently to shut the sparse
warnings up; of course, these days they only serve to *produce* warnings
for no reason whatsoever...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:25:20 -08:00
Al Viro 7e46aa5c8c regression: bfs endianness bug
BFS_FILEBLOCKS() expects struct bfs_inode * (on-disk data, with little-
endian fields), not struct bfs_inode_info * (in-core stuff, with host-
endian ones).

It's a macro and fields with the right names are present in
bfs_inode_info, so it compiles, but on big-endian host it gives bogus
results.

Introduced in commit f433dc5634 ("Fixes to
the BFS filesystem driver").

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:25:20 -08:00
Al Viro 9b5e6857b3 regression: cifs endianness bug
access_flags_to_mode() gets on-the-wire data (little-endian) and treats
it as host-endian.

Introduced in commit e01b640013 ("[CIFS]
enable get mode from ACL when cifsacl mount option specified")

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:25:19 -08:00
Alexey Dobriyan 5a622f2d0f proc: fix proc_dir_entry refcounting
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]
if (atomic_read(&de->count) == 0)
					if (atomic_dec_and_test(&de->count))
						if (de->deleted)
							/* also not taken! */
							free_proc_entry(de);
else
	de->deleted = 1;
		[refcount=0, deleted=1]

2) use after free

remove_proc_entry			de_put
-----------------			------
			[refcnt = 1]

					if (atomic_dec_and_test(&de->count))
if (atomic_read(&de->count) == 0)
	free_proc_entry(de);
						/* boom! */
						if (de->deleted)
							free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [<c10ac4f0>] vsnprintf+0x2ad/0x49b
 [<c10ac779>] vscnprintf+0x14/0x1f
 [<c1018e6b>] vprintk+0xc5/0x2f9
 [<c10379f1>] handle_fasteoi_irq+0x0/0xab
 [<c1004f44>] do_IRQ+0x9f/0xb7
 [<c117db3b>] preempt_schedule_irq+0x3f/0x5b
 [<c100264e>] need_resched+0x1f/0x21
 [<c10190ba>] printk+0x1b/0x1f
 [<c107c8ad>] de_put+0x3d/0x50
 [<c107c8f8>] proc_delete_inode+0x38/0x41
 [<c107c8c0>] proc_delete_inode+0x0/0x41
 [<c1066298>] generic_delete_inode+0x5e/0xc6
 [<c1065aa9>] iput+0x60/0x62
 [<c1063c8e>] d_kill+0x2d/0x46
 [<c1063fa9>] dput+0xdc/0xe4
 [<c10571a1>] __fput+0xb0/0xcd
 [<c1054e49>] filp_close+0x48/0x4f
 [<c1055ee9>] sys_close+0x67/0xa5
 [<c10026b6>] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen => nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:21:20 -08:00
Jan Kara d4beaf4ab5 jbd: Fix assertion failure in fs/jbd/checkpoint.c
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.

If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).

We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state.  The locking there is subtle though (as
everywhere in JBD ;().  We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.

Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either.  Better be safe if something
changes in future...

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:21:20 -08:00
Evgeniy Dushistov 0c664f9742 ufs: fix nexstep dir block size
This patch fixes regression, introduced since 2.6.16.  NextStep variant of
UFS as OpenStep uses directory block size equals to 1024.  Without this
change, ufs_check_page fails in many cases.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Dave Bailey <dsbailey@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:21:18 -08:00
Jeff Moyer e00ba3dae0 aio: only account I/O wait time in read_events if there are active requests
On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was
running (but completely idle).  The UML code sits in io_getevents waiting for
an event to be submitted and completed.

Fix this by checking ctx->reqs_active before scheduling to determine whether
or not we are waiting for I/O.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05 09:21:18 -08:00
Rafael J. Wysocki e136e769d4 Freezer: Fix JFFS2 garbage collector freezing issue (rev. 2)
Fix breakage caused by commit d5d8c5976d
"freezer: do not send signals to kernel threads" in
jffs2_garbage_collect_thread() that assumed it would be sent signals
by the freezer.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Pete MacKay <armlinux@architechnical.net>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-12-04 01:35:41 -05:00
Linus Torvalds 8002cedc1a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/net-2.6: (27 commits)
  [INET]: Fix inet_diag dead-lock regression
  [NETNS]: Fix /proc/net breakage
  [TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure
  [NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK
  [NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON
  [DECNET]: dn_nl_deladdr() almost always returns no error
  [IPV6]: Restore IPv6 when MTU is big enough
  [RXRPC]: Add missing select on CRYPTO
  mac80211: rate limit wep decrypt failed messages
  rfkill: fix double-mutex-locking
  mac80211: drop unencrypted frames if encryption is expected
  mac80211: Fix behavior of ieee80211_open and ieee80211_close
  ieee80211: fix unaligned access in ieee80211_copy_snap
  mac80211: free ifsta->extra_ie and clear IEEE80211_STA_PRIVACY_INVOKED
  SCTP: Fix build issues with SCTP AUTH.
  SCTP: Fix chunk acceptance when no authenticated chunks were listed.
  SCTP: Fix the supported extensions paramter
  SCTP: Fix SCTP-AUTH to correctly add HMACS paramter.
  SCTP: Fix the number of HB transmissions.
  [TCP] illinois: Incorrect beta usage
  ...
2007-12-03 08:15:36 -08:00
Eric W. Biederman 2b1e300a9d [NETNS]: Fix /proc/net breakage
Well I clearly goofed when I added the initial network namespace support
for /proc/net.  Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify ->lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
  be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-12-02 00:33:17 +11:00
Heiko Carstens 81257def2a tty: add the new termios2 ioctls to the compatible list.
Make them depend on TCGETS2.  If that one is implemented the rest should be
there as well.

Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:55 -08:00
Miklos Szeredi 08b633070a fuse: fix attribute caching after rename
Invalidate attributes on rename, since some filesystems may update
st_ctime.  Reported by Szabolcs Szakacsits

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
John Muir fbee36b92a fuse: fix uninitialized field in fuse_inode
I found problems accessing (executing) previously existing files, until
I did chmod on them (or setattr).

If the fi->attr_version is not initialized, then it could be
larger than fc->attr_version until a setattr is executed, and as a
result the inode attributes would never be set.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi d0186b25e6 fuse: fix FUSE_FILE_OPS sending
FUSE_FILE_OPS is meant to signal that the kernel will send the open file to to
the userspace filesystem for operations on open files, so that sillyrenaming
unlinked files becomes unnecessary.

However this needs VFS changes, which won't make it into 2.6.24.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi a6643094e7 fuse: pass open flags to read and write
Some open flags (O_APPEND, O_DIRECT) can be changed with fcntl(F_SETFL, ...)
after open, but fuse currently only sends the flags to userspace in open.

To make it possible to correcly handle changing flags, send the
current value to userspace in each read and write.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi 7dca9fd39f fuse: cleanup: add fuse_get_attr_version()
Extract repeated code into helper function, as suggested by Akpm.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi bcb4be809d fuse: fix reading past EOF
Currently reading a fuse file will stop at cached i_size and return
EOF, even though the file might have grown since the attributes were
last updated.

So detect if trying to read past EOF, and refresh the attributes
before continuing with the read.

Thanks to mpb for the report.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Tobias Poschwatta 6454d1f903 fix up ext2_fs.h for userspace after reservations backport
In commit a686cd898bd999fd026a51e90fb0a3410d258ddb:

 "Val's cross-port of the ext3 reservations code into ext2."

include/linux/ext2_fs.h got a new function whose return value is only
defined if __KERNEL__ is defined. Putting #ifdef __KERNEL__ around the
function seems to help, patch below.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:53 -08:00
Eric W. Biederman 19fd4bb2a0 proc: remove races from proc_id_readdir()
Oleg noticed that the call of task_pid_nr_ns() in proc_pid_readdir
is racy with respect to tasks exiting.

After a bit of examination it also appears that the call itself
is completely unnecessary.

So to fix the problem this patch modifies next_tgid() to return
both a tgid and the task struct in question.

A structure is introduced to return these values because it is
slightly cleaner and easier to optimize, and the resulting code
is a little shorter.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:52 -08:00
Alexey Dobriyan c2319540cd proc: fix NULL ->i_fop oops
proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in
NULL dereference during "file->f_op->readdir(file, buf, filler)".

The solution is to remove proc_kill_inodes() completely:

a) we don't have tricky modules implementing their tricky readdir hooks which
   could keeping this revoke from hell.

b) In a situation when module is gone but PDE still alive, standard
   readdir will return only "." and "..", because pde->next was cleared by
   remove_proc_entry().

c) the race proc_kill_inode() destined to prevent is not completely
   fixed, just race window made smaller, because vfs_readdir() is run
   without sb_lock held and without file_list_lock held.  Effectively,
   ->i_fop is cleared at random moment, which can't fix properly anything.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2)
EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0
EIP is at vfs_readdir+0x47/0x74
EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
       00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
       00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
Call Trace:
 [<c1061040>] filldir64+0x0/0xc5
 [<c1061295>] sys_getdents64+0x63/0xa5
 [<c10026ba>] sysenter_past_esp+0x5f/0x85
 =======================
Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78

hch: "Nice, getting rid of this is a very good step formwards.
      Unfortunately we have another copy of this junk in
      security/selinux/selinuxfs.c:sel_remove_entries() which would need the
      same treatment."

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:52 -08:00
Linus Torvalds cae2f9c46d Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
  sysfs: fix off-by-one error in fill_read_buffer()
  kobject: two typo fixes
  UIO: add UIO documentation target to DocBook Makefile
  UIO: fix up the UIO documentation
  create /sys/.../power when CONFIG_PM is set
  allow LEGACY_PTYS to be set to 0
2007-11-28 15:59:50 -08:00
Miao Xie 8118a859dc sysfs: fix off-by-one error in fill_read_buffer()
I found that there is a off-by-one problem in the following code.

Version:	2.6.24-rc2
File:		fs/sysfs/file.c:118-122
Function:	fill_read_buffer
--------------------------------------------------------------------
	count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page);

	sysfs_put_active_two(attr_sd);

	BUG_ON(count > (ssize_t)PAGE_SIZE);
--------------------------------------------------------------------

Because according to the specification of the sysfs and the implement of
the show methods, the show methods return the number of bytes which would
be generated for the given input, excluding the trailing null.So if the
return value of the show methods equals PAGE_SIZE - 1, the buffer is full
in fact.  And if the return value equals PAGE_SIZE, the resulting string
was already truncated,or buffer overflow occurred.

This patch fixes an off-by-one error in fill_read_buffer.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <teheo@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-28 13:53:53 -08:00
Ingo Molnar c46f739dd3 vfs: coredumping fix
fix: http://bugzilla.kernel.org/show_bug.cgi?id=3043

only allow coredumping to the same uid that the coredumping
task runs under.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Alan Cox <alan@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-28 10:58:01 -08:00
Mark Fasheh b1967d0edd ocfs2: reverse inline-data truncate args
ocfs2_truncate() and ocfs2_remove_inode_range() had reversed their "set
i_size" arguments to ocfs2_truncate_inline(). Fix things so that truncate
sets i_size, and punching a hole ignores it.

This exposed a problem where punching a hole in an inline-data file wasn't
updating the page cache, so fix that too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:03 -08:00
Mark Fasheh 0d8a4e0cd6 ocfs2: Fix comparison in ocfs2_size_fits_inline_data()
This was causing us to prematurely push out inline data by one byte.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:03 -08:00
Mark Fasheh bccb9dad89 ocfs2: Remove bug statement in ocfs2_dentry_iput()
The existing bug statement didn't take into account unhashed dentries which
might not have a cluster lock on them. This could happen if a node exporting
the file system via NFS is rebooted, re-exported to nfs clients and then
unmounted. It's fine in this case to not have a dentry cluster lock.

Just remove the bug statement and replace it with an error print, which
does the proper checks. Though we want to know if something has happened
which might have prevented a cluster lock from being created, it's
definitely not necessary to panic the machine for this.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Jan Kara 5a58c3ef22 [PATCH] ocfs2: Remove expensive bitmap scanning
Enable expensive bitmap scanning only if DEBUG option is enabled.
The bitmap scanning quite loads the CPU and on my machine the write
throughput of dd if=/dev/zero of=/ocfs2/file bs=1M count=500 conv=sync
improves from 37 MB/s to 45.4 MB/s in local mode...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Mark Fasheh a46043e08f ocfs2: log valid inode # on bad inode
If the inode block isn't valid then we don't want to print the value from
that, instead print the block number which was passed in (which should
always be correct). Also, turn this into a debug print for now - folks who
hit an actual problem always have other logs indicating what the source is.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:02 -08:00
Mark Fasheh ef9f86ceb6 ocfs2: Filter -ENOSPC in mlog_errno()
It's almost never worth printing in that situation and we keep forgetting to
manually filter it out.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Joe Perches 2759236f84 [PATCH] fs/ocfs2: Add missing "space"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Mark Fasheh e001e796e4 ocfs2: Reset journal parameters after s_mount_opt update
Right now we're just setting them from the existing parameters, not the
new ones that a remount specified.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27 16:47:01 -08:00
Linus Torvalds 423eaf8f00 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
  NFS: Clean up new multi-segment direct I/O changes
  NFS: Ensure we return zero if applications attempt to write zero bytes
  NFS: Support multiple segment iovecs in the NFS direct I/O path
  NFS: Introduce iovec I/O helpers to fs/nfs/direct.c
  SUNRPC: Add missing "space" to net/sunrpc/auth_gss.c
  SUNRPC: make sunrpc/xprtsock.c:xs_setup_{udp,tcp}() static
  NFS: fs/nfs/dir.c should #include "internal.h"
  NFS: make nfs_wb_page_priority() static
  NFS: mount failure causes bad page state
  SUNRPC: remove NFS/RDMA client's binary sysctls
  kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server
  sunrpc: rpc_pipe_poll may miss available data in some cases
  sunrpc: return error if unsupported enctype or cksumtype is encountered
  sunrpc: gss_pipe_downcall(), don't assume all errors are transient
  NFS: Fix the ustat() regression
2007-11-26 19:42:59 -08:00
Linus Torvalds 0685ab4fb8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
  sched: bump version of kernel/sched_debug.c
  sched: fix minimum granularity tunings
  sched: fix RLIMIT_CPU comment
  sched: fix kernel/acct.c comment
  sched: fix prev_stime calculation
  sched: don't forget to unlock uids_mutex on error paths
2007-11-26 19:42:08 -08:00
Chuck Lever 02fe494619 NFS: Clean up new multi-segment direct I/O changes
Simplify calling sequence of nfs_direct_{read,write}_schedule(), and
rename them to reflect their new role.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:40 -05:00
Chuck Lever b9148c6b80 NFS: Ensure we return zero if applications attempt to write zero bytes
A zero byte count direct write request should be a successful no-op, not an
error.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:38 -05:00
Chuck Lever c216fd708e NFS: Support multiple segment iovecs in the NFS direct I/O path
Allow applications to perform asynchronous scatter-gather direct I/O
to NFS files.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:36 -05:00
Chuck Lever 19f737879c NFS: Introduce iovec I/O helpers to fs/nfs/direct.c
Add helpers that iterate over multi-segment iovecs.  These will
be used to support multi-segment scatter/gather direct I/O in a
later patch.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:35 -05:00
Adrian Bunk 4c30d56edc NFS: fs/nfs/dir.c should #include "internal.h"
Every file should include the headers containing the prototypes for its global
functions (in this case nfs_access_cache_shrinker()).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:49 -05:00
Adrian Bunk 5334eb13d4 NFS: make nfs_wb_page_priority() static
nfs_wb_page_priority() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:48 -05:00
Russell King f16c960332 NFS: mount failure causes bad page state
While testing a kernel based upon ecd744eec3
(with wrong boot arguments), I got the following bad page state entry while
NFS was trying to mount it's rootfs:

IP-Config: Complete:
      device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255,
     host=192.168.1.101, domain=, nis-domain=(none),
     bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath=
Looking up port of RPC 100003/2 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get mountd port number from server, using default
mount: server 192.168.1.100 not responding, timed out
Root-NFS: Server returned error -5 while mounting /nfs/rootfs/
VFS: Unable to mount root fs via NFS, trying floppy.
Bad page state in process 'swapper'
page:c02b1260 flags:0x00000400 mapping:00000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
[<c0023e34>] (dump_stack+0x0/0x14) from [<c0062570>] (bad_page+0x70/0xac)
[<c0062500>] (bad_page+0x0/0xac) from [<c0064914>] (free_hot_cold_page+0x80/0x178)
[<c0064894>] (free_hot_cold_page+0x0/0x178) from [<c0064a74>] (free_hot_page+0x14/0x18)
[<c0064a60>] (free_hot_page+0x0/0x18) from [<c0067078>] (put_page+0xf8/0x154)
[<c0066f80>] (put_page+0x0/0x154) from [<c007dbc8>] (kfree+0xc8/0xd0)
[<c007db00>] (kfree+0x0/0xd0) from [<c00cbb54>] (nfs_get_sb+0x230/0x710)
[<c00cb924>] (nfs_get_sb+0x0/0x710) from [<c0084334>] (vfs_kern_mount+0x58/0xac)[<c00842dc>] (vfs_kern_mount+0x0/0xac) from [<c00843c0>] (do_kern_mount+0x38/0xf4)
[<c0084388>] (do_kern_mount+0x0/0xf4) from [<c0099c7c>] (do_mount+0x1e8/0x614)
...

This seems to be caused by use of an uninitialised structure due to NULL
options being passed to nfs_validate_mount_data().  Ensure that the
parsed mount data is always initialised.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
     (Trond: added fix for the same bug in nfs4_validate_mount_data()).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:22 -05:00