Commit Graph

5200 Commits

Author SHA1 Message Date
Mark Fasheh
abf8b15694 ocfs2: abstract out allocation locking
Right now, file allocation for ocfs2 is done within ocfs2_extend_file(),
which is either called from ->setattr() (for an i_size change), or at the
top of ocfs2_file_aio_write().

Inodes on file systems with sparse file support will want to do their
allocation during the actual write call.

In either case the cluster locking decisions are the same. We abstract out
that code into a new function, ocfs2_lock_allocators() which will be used by
a later patch to enable writing to sparse files.

This also provides a nice cleanup of ocfs2_extend_allocation().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:01:58 -07:00
Mark Fasheh
3a0782d09c ocfs2: teach extend/truncate about sparse files
For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no
longer exists since i_size is not tied to i_clusters. In
ocfs2_extend_file(), we skip the allocation / page zeroing code for file
systems which understand sparse files.

The core truncate code is changed to do a bottom up tree traversal. This
gets abstracted out into it's own function. To make things more readable,
most of the special case handling for in-inode extents from
ocfs2_do_truncate() is also removed.

Though write support for sparse files comes in a later patch, we at least
update ocfs2_prepare_inode_for_write() to skip allocation for sparse files.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:01:56 -07:00
Mark Fasheh
363041a5f7 ocfs2: temporarily remove extent map caching
The code in extent_map.c is not prepared to deal with a subtree being
rotated between lookups. This can happen when filling holes in sparse files.
Instead of a lengthy patch to update the code (which would likely lose the
benefit of caching subtree roots), we remove most of the algorithms and
implement a simple path based lookup. A less ambitious extent caching scheme
will be added in a later patch.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:01:31 -07:00
Mark Fasheh
dcd0538ff4 ocfs2: sparse b-tree support
Introduce tree rotations into the b-tree code. This will allow ocfs2 to
support sparse files. Much of the added code is designed to be generic (in
the ocfs2 sense) so that it can later be re-used to implement large
extended attributes.

This patch only adds the rotation code and does minimal updates to callers
of the extent api.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:44:03 -07:00
Mark Fasheh
6f16bf655c ocfs2: small cleanup of ocfs2_request_delete()
There are two checks in there (one for inode newness, one for other mounted
nodes) which are unnecessary, so remove them. The DLM will allow the trylock
in either case without any messaging overhead.

Removing these makes ocfs2_request_delete() a one liner function, so just
move the trylock out one level into ocfs2_query_inode_wipe().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:40:55 -07:00
Tiger Yang
68e2b740c4 ocfs2: remove unused code
Remove node messaging code that becomes unused with the delete inode vote
removal.

[Removed even more cruft which I spotted during review --Mark]

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:40:16 -07:00
Tiger Yang
500086300e ocfs2: Remove delete inode vote
Ocfs2 currently does cluster-wide node messaging to check the open state of
an inode during delete. This patch removes that mechanism in favor of an
inode cluster lock which is taken at shared read when an inode is first read
and dropped in clear_inode(). This allows a deleting node to test the
liveness of an inode by attempting to take an exclusive lock.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:39:48 -07:00
Mark Fasheh
a9f5f70739 ocfs2: filter more error prints
We don't want to print anything at all in ocfs2_lookup() when getting an
error from ocfs2_iget() - it could be something as innocuous as a signal
being detected in the dlm.

ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can
return if the inode was deleted on another node.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:39:08 -07:00
Sunil Mushran
bebe6f120b ocfs2: Replace panic() with emergency_restart() when fencing
We have noticed panic() hanging leading us to a situation in which
the node, while otherwise dead, is still disk heartbeating. This
leads to a hung cluster as the other nodes are waiting for this
node to stop disk heartbeating. This situation is only resolved
by power resetting the box.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:39:02 -07:00
Sunil Mushran
5d262cc7dd ocfs2: Silence compiler warnings
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:38:55 -07:00
Mark Fasheh
be9e986b82 ocfs2: Local mounts should skip inode updates
We don't want the extent map and uptodate cache destruction in
ocfs2_meta_lock_update() on a local mount, so skip that.

This fixes several bugs with uptodate being cleared on buffers and extent
maps being corrupted.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:35:21 -07:00
Sunil Mushran
0d01af6e5d ocfs2_dlm: Call cond_resched_lock() once per hash bucket scan
In dlm_migrate_all_locks(), we currently call cond_resched_lock() after
processing each lockres in a hash bucket. Move it outside the loop so as to
call it only after the entire hash bucket has been processed.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:33:11 -07:00
Srinivas Eeda
756a1501dd ocfs2_dlm: fix race in dlm_remaster_locks
There is a possibility that dlm_remaster_locks could overwride node->state
with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the
node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting
stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock
spinlock.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:33:02 -07:00
David Woodhouse
ef2e58ea6b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2007-04-26 09:31:28 +01:00
Andrew Morton
f6449f4ece [JFFS2] Fix compr_rubin.c build after include file elimination.
It seems to be silly season lately.

(Oops, test builds are more useful if the file in question is actually
configured on. dwmw2).

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-26 07:27:04 +01:00
Patrick McHardy
af65bdfce9 [NETLINK]: Switch cb_lock spinlock to mutex and allow to override it
Switch cb_lock to mutex and allow netlink kernel users to override it
with a subsystem specific mutex for consistent locking in dump callbacks.
All netlink_dump_start users have been audited not to rely on any
side-effects of the previously used spinlock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:03 -07:00
Arnaldo Carvalho de Melo
b529ccf279 [NETLINK]: Introduce nlmsg_hdr() helper
For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the
number of direct accesses to skb->data and for consistency with all the other
cast skb member helpers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:34 -07:00
Eric Dumazet
ae40eb1ef3 [NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution
Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:04 -07:00
David Woodhouse
61c4b23770 [JFFS2] Handle inodes with only a single metadata node with non-zero isize
This should never happen unless there's corruption on the medium and the
actual data nodes go missing. But the failure mode (an oops when we assume
the fragtree isn't empty and go looking for its last node) isn't useful.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-25 17:04:23 +01:00
David Woodhouse
c00c310eac [JFFS2] Tidy up licensing/copyright boilerplate.
In particular, remove the bit in the LICENCE file about contacting
Red Hat for alternative arrangements. Their errant IS department broke
that arrangement a long time ago -- the policy of collecting copyright
assignments from contributors came to an end when the plug was pulled on
the servers hosting the project, without notice or reason.

We do still dual-license it for use with eCos, with the GPL+exception
licence approved by the FSF as being GPL-compatible. It's just that nobody
has the right to license it differently.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-25 14:16:47 +01:00
Joakim Tjernlund
0dec4c8bc6 [JFFS2] Better fix for all-zero node headers
No need to check for all-zero header since the header cannot
be zero due to other checks.

Replace the all-zero header check in readinode.c with a
check for the magic word.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-25 04:13:06 +01:00
David Woodhouse
df8e96f391 [JFFS2] Improve read_inode memory usage, v2.
We originally used to read every node and allocate a jffs2_tmp_dnode_info
structure for each, before processing them in (reverse) version order
and discarding the ones which are obsoleted by later nodes.

With huge logfiles, this behaviour caused memory problems. For example, a
file involved in OLPC trac #1292 has 1822391 nodes, and would cause the XO
machine to run out of memory during the first stage of read_inode().

Instead of just inserting nodes into a tree in version order as we find
them, we now put them into a tree in order of their offset within the
file, which allows us to immediately discard nodes which are completely
obsoleted.

We don't use a full tree with 'fragments' pointing to the real data
structure, as we do in the normal fragtree. We sort only on the start
address, and add an 'overlapped' flag to the tmp_dnode_info to indicate
that the node in question is (partially) overlapped by another.

When the scan is complete, we start at the end of the file, adding each
node to a real fragtree as before. Where the node is non-overlapped, we
just add it (it doesn't matter that it's not the latest version; there is
no overlap). When the node at the end of the tree _is_ overlapped, we sort
it and all its overlapping nodes into version order and then add them to
the fragtree in that order.

This 'early discard' reduces the peak allocation of tmp_dnode_info
structures from 1.8M to a mere 62872 (3.5%) in the degenerate case
referenced above.

This version of the patch also correctly rememembers the highest node
version# seen for an inode when it's scanned.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-25 03:23:42 +01:00
Jeff Mahoney
9b7f375505 reiserfs: fix xattr root locking/refcount bug
The listxattr() and getxattr() operations are only protected by a read
lock.  As a result, if either of these operations run in parallel, a race
condition exists where the xattr_root will end up being cached twice, which
results in the leaking of a reference and a BUG() on umount.

This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(),
into one get_xa_root() function that takes the appropriate locking around
the entire critical section.

Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it>

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Andrea Righi <a.righi@cineca.it>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Edward Shishkin <edward@namesys.com>
Cc: Alex Zarochentsev <zam@namesys.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-24 08:23:09 -07:00
Latchesar Ionkov
c959df9f01 v9fs: don't use primary fid when removing file
v9fs_insert uses v9fs_fid_lookup (which also locks the fid) to get the
primary fid associated with the dentry and destroys the v9fs_fid struct
after removing the file.  If another process called v9fs_fid_lookup on the
same dentry, it may wait undefinitely for the fid's lock (as the struct is
freed).

This patch changes v9fs_remove to use a cloned fid, so the primary fid is
not locked and freed.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@hera.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-24 08:23:08 -07:00
David Woodhouse
44b998e1eb [JFFS2] Improve failure mode if inode checking leaves unchecked space.
We should never find the unchecked size is non-zero after we've finished
checking all inodes. If it happens, used to BUG(), leaving the alloc_sem
held and deadlocking. Instead, just return -ENOSPC after complaining. The
GC thread will die, but read-only operation should be able to continue and
the file system should be unmountable.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-23 12:11:46 +01:00
David Woodhouse
566865a2a4 [JFFS2] Fix cross-endian build.
When compiling a LE-capable JFFS2 on PowerPC, wbuf.c fails to compile:

fs/jffs2/wbuf.c:973: error: braced-group within expression allowed only inside a function
fs/jffs2/wbuf.c:973: error: initializer element is not constant
fs/jffs2/wbuf.c:973: error: (near initialization for ‘oob_cleanmarker.magic’)
fs/jffs2/wbuf.c:974: error: braced-group within expression allowed only inside a function
fs/jffs2/wbuf.c:974: error: initializer element is not constant
fs/jffs2/wbuf.c:974: error: (near initialization for ‘oob_cleanmarker.nodetype’)
fs/jffs2/wbuf.c:975: error: braced-group within expression allowed only inside a function
fs/jffs2/wbuf.c:976: error: initializer element is not constant
fs/jffs2/wbuf.c:976: error: (near initialization for ‘oob_cleanmarker.totlen’)

Provide constant_cpu_to_je{16,32} functions, and use them for initialising the
offending structure.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-23 12:07:17 +01:00
Trond Myklebust
2b82f190c8 NFS: Fix race in nfs_set_page_dirty
Protect nfs_set_page_dirty() against races with nfs_inode_add_request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:30 -07:00
Trond Myklebust
612c9384fd NFS: Fix the 'desynchronized value of nfs_i.ncommit' error
Redirtying a request that is already marked for commit will screw up the
accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit.
Ensure that all requests on the commit queue are labelled with the
PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside
nfs_page_mark_flush().

Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for
atomicity reasons. Avoid dropping the spinlock until we're done marking the
request in the radix tree and have added it to the ->dirty list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Trond Myklebust
6d677e3504 NFS: Don't clear PG_writeback until after we've processed unstable writes
Ensure that we don't release the PG_writeback lock until after the page has
either been redirtied, or queued on the nfs_inode 'commit' list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Trond Myklebust
8e821cad12 NFS: clean up the unstable write code
Get rid of the inlined #ifdefs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Joakim Tjernlund
a491486a20 [JFFS2] Obsolete dirent nodes immediately on unlink, where possible.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-20 23:09:28 -04:00
Evgeniy Dushistov
07a0cfec30 ufs proper handling of zero link case
This patch should fix or partly fix this bug:
http://bugzilla.kernel.org/show_bug.cgi?id=8276

The problem is:

- if we see "zero link case" during reading inode operation, we call
  ufs_error(which remount fs readonly), but not "mark" inode as bad (1)

- in readonly case we do not fill some data structures, which are used in
  read and write case (2)

- VFS call ufs_delete_inode if link count is zero (3)

so (1)->(3)->(2) cause oops, this patch should fix such scenario

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Jim Paris <jim@jtan.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-17 16:36:27 -07:00
Alan Cox
c4bbafda70 exec.c: fix coredump to pipe problem and obscure "security hole"
The patch checks for "|" in the pattern not the output and doesn't nail a
pid on to a piped name (as it is a program name not a file)

Also fixes a very very obscure security corner case.  If you happen to have
decided on a core pattern that starts with the program name then the user
can run a program called "|myevilhack" as it stands.  I doubt anyone does
this.

Signed-off-by: Alan Cox <alan@redhat.com>
Confirmed-by: Christopher S. Aker <caker@theshore.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-17 16:36:26 -07:00
Joakim Tjernlund
c2aecda79c [JFFS2] Speed up mount for directly-mapped NOR flash
Remove excessive scanning of empty flash after a clean
marker for users of the point/unpoint method. cfi_cmdset_0001
uses point/unpoint by default iff flash mapping is linear.
The speedup is several orders of magnitude if FS is less than
half full.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 14:07:34 -04:00
Artem Bityutskiy
10731f8300 [JFFS2] fix buffer sise calculations in jffs2_get_inode_nodes()
In read inode we have an optimization which prevents one
min. I/O unit (e.g. NAND page) to be read more then once.

Namely, at the beginning we do not know which node type we read,
so we read so we assume we read the directory entry, because it
has the smallest node header. When we read it, we read up to the
next min. I/O unit, just because if later we'll need to read more,
we already have this data.

If it turns out to be that the node is not directory entry, and
we need more data, and we did not read it because it sits in the
next min. I/O unit, we read the whole next (or several next)
min. I/O unit(s). And if it happens to be that we read a data node,
and we've read part of its data, we calculate partial CRC.
So if later we need to check data CRC, we'll only read the rest
of the data from further min. I/O units and continue CRC checking.

This code was a bit messy and buggy. The bug was that it assumed
relatively large min. I/O unit, so that the largest node header
could overlap only one min. I/O unit boundary.

This parch clean-ups the code a bit and fixes this bug.
The patch was not tested on flash with small min. I/O unit, like
NOR-ECC, nut it was tested on NAND with 512 bytes NAND page, so
it at least does not break NAND. It was also tested with mtdram
so it should not break NOR.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 14:05:48 -04:00
Adrian Hunter
7f762ab24c [JFFS2] Disable summary after wbuf recovery
After a write error, any data in the write buffer must
be relocated.  This is handled by the jffs2_wbuf_recover
function.  This function does not fix up the erase block
summary information that is collected for writing at the
end of the block, which results in an incorrect summary
(or BUG if the summary was found to be empty).

As the summary is not essential (it is an optimisation),
it may be disabled for the current erase block when this
situation arises.  This patch does that.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 13:56:44 -04:00
Adrian Hunter
99c2594f0e [JFFS2] Prevent list corruption when handling write errors
If a write error occurs, the affected block is placed on the
bad_used_list.  In the case that the write error occured
when writing summary data the block was also being placed on
the dirty_list, which caused list corruption and ultimately
a soft lockup in jffs2_mark_node_obsolete. This fixes that.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 13:56:23 -04:00
Artem Bityutskiy
b0afbbec49 [JFFS2] fix deadlock on error path
When the MTD driver returns write failure, the following deadlock
occurs:

We are in __jffs2_flush_wbuf(), we hold &c->wbuf_sem. Write failure.
jffs2_wbuf_recover()->jffs2_reserve_space_gc()->jffs2_do_reserve_space()
->jffs2_erase_pending_blocks()->jffs2_flash_read()

and it tries to lock &c->wbuf_sem again. Deadlock.

Reported-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 13:53:51 -04:00
Thomas Gleixner
53043002ef [JFFS2] check node crc before doing anything else
Check the node CRC on scan before doing anything else with the node.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 18:26:18 +01:00
Trond Myklebust
eb4cac10d9 NFS: Fix a list corruption problem
We must remove the request from whatever list it is currently on before we
can add it to the dirty list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-15 16:48:11 -07:00
Trond Myklebust
5a6d41b32a NFS: Ensure PG_writeback is cleared when writeback fails
If the writebacks are cancelled via nfs_cancel_dirty_list, or due to the
memory allocation failing in nfs_flush_one/nfs_flush_multi, then we must
ensure that the PG_writeback flag is cleared.

Also ensure that we actually own the PG_writeback flag whenever we
schedule a new writeback by making nfs_set_page_writeback() return the
value of test_set_page_writeback().
The PG_writeback page flag ends up replacing the functionality of the
PG_FLUSHING nfs_page flag, so we rip that out too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:46:48 -07:00
Trond Myklebust
60fa3f769f NFS: Fix two bugs in the O_DIRECT write code
Do not flag an error if the COMMIT call fails and we decide to resend the
writes. Let the resend flag the error if it fails.

If a write has failed, then nfs_direct_write_result should not attempt to
send a commit. It should just exit asap and return the error to the user.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:46:48 -07:00
Trond Myklebust
e1552e1998 NFS: Fix an Oops in nfs_setattr()
It looks like nfs_setattr() and nfs_rename() also need to test whether the
target is a regular file before calling nfs_wb_all()...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:46:47 -07:00
Jeff Mahoney
c3724b129b [PATCH] autofs4: fix race in unhashed dentry code
Commit f50b6f8691 introduced a race in
autofs4 between autofs_lookup_unhashed() and autofs_dentry_release().

autofs_dentry_release() ends up clearing the ->dentry and ->inode members
of autofs_info before removing it from the rehash list.  The list is
protected by the rehash lock in both functions, but since
autofs_dentry_release() starts tearing the autofs_info struct down before
removing it from the list, autofs_lookup_unhashed() can get a autofs_info
with a NULL dentry.

This patch moves the clearing of ->dentry and ->inode after the removal
from the rehash list.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-12 15:31:42 -07:00
Vladimir Saveliev
6d205f1205 [PATCH] reiserfs: fix key decrementing
This patch fixes a bug in function decrementing a key of stat data item.

Offset of reiserfs keys are compared as signed values.  To set key offset
to maximal possible value maximal signed value has to be used.

This bug is responsible for severe reiserfs filesystem corruption which
shows itself as warning vs-13060.  reiserfsck fixes this corruption by
filesystem tree rebuilding.

Signed-off-by: Vladimir Saveliev <vs@namesys.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-12 15:31:42 -07:00
Stephen Rothwell
1a38147ed0 [POWERPC] Make struct property's value a void *
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:18 +10:00
Timo Savola
a5bfffac64 [PATCH] fuse: validate rootmode mount option
If rootmode isn't valid, we hit the BUG() in fuse_init_inode.  Now
EINVAL is returned.

Signed-off-by: Timo Savola <tsavola@movial.fi>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-08 19:47:55 -07:00
Andrew Morton
2363cc0264 [PATCH] remove protection of LANANA-reserved majors
Revert all this.  It can cause device-mapper to receive a different major from
earlier kernels and it turns out that the Amanda backup program (via GNU tar,
apparently) checks major numbers on files when performing incremental backups.

Which is a bit broken of Amanda (or tar), but this feature isn't important
enough to justify the churn.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-04 21:12:47 -07:00
Robert P. J. Day
8dc64fca75 [JFFS2] Delete everything related to obsolete JFFS2_PROC option
Delete everything related to the apparently non-existent kernel config
option JFFS2_PROC.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-02 14:11:25 -04:00
Andrew Morton
7479d2b90b [PATCH] revert "retries in ext4_prepare_write() violate ordering requirements"
Revert b46be05004.  Same reasoning as for ext3.

Cc: Kirill Korotaev <dev@openvz.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ken Chen <kenneth.w.chen@intel.com>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:06:08 -07:00
Andrew Morton
1aa9b4b9bc [PATCH] revert "retries in ext3_prepare_write() violate ordering requirements"
Revert e92a4d595b.

Dmitry points out

"When we block_prepare_write() failed while ext3_prepare_write() we jump to
 "failure" label and call ext3_prepare_failure() witch search last mapped bh
 and invoke commit_write untill it.  This is wrong!!  because some bh from
 begining to the last mapped bh may be not uptodate.  As a result we commit to
 disk not uptodate page content witch contains garbage from previous usage."

and

"Unexpected file size increasing."

   Call trace the same as it was in first issue but result is different.
   For example we have file with i_size is zero.  we want write two blocks ,
   but fs has only one free block.

   ->ext3_prepare_write(...from == 0, to == 2048)
     retry:
     ->block_prepare_write() == -ENOSPC# we failed but allocated one block here.
     ->ext3_prepare_failure()
       ->commit_write( from == 0, to == 1024) # after this i_size becomes 1024 :)
     if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
        goto retry;

   Finally when all retries will be spended ext3_prepare_failure return
   -ENOSPC, but i_size was increased and later block trimm procedures can't
   help here.

We don't appear to have the horsepower to fix these issues, so let's put
things back the way they were for now.

Cc: Kirill Korotaev <dev@openvz.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ken Chen <kenneth.w.chen@intel.com>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:06:08 -07:00
Brian Pomerantz
0322170260 [PATCH] fix page leak during core dump
When the dump cannot occur most likely because of a full file system and
the page to be written is the zero page, the call to page_cache_release()
is missed.

Signed-off-by: Brian Pomerantz <bapper@mvista.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:06:08 -07:00
Andrew Morton
05565b65a5 [PATCH] proc: fix linkage with CONFIG_SYSCTL=y, CONFIG_PROC_SYSCTL=n
We're using #ifdef CONFIG_SYSCTL, but we should be using CONFIG_PROC_SYSCTL,
so we get

 fs/built-in.o: In function `proc_root_init':
 /usr/src/linux/fs/proc/root.c:83: undefined reference to `proc_sys_init'

Fix that up and remove an ifdef-in-C.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Helge Hafting <helgehaf@aitel.hist.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:06:08 -07:00
Linus Torvalds
22c8c65d24 Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: partial write fix
2007-03-29 08:23:52 -07:00
Paolo 'Blaisorblade' Giarrusso
75e8defbe4 [PATCH] uml: hostfs variable renaming
* rename name to host_root_path
* rename data to req_root.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-29 08:22:25 -07:00
Jeff Dike
622e696938 [PATCH] uml: fix compilation problems
Fix a few miscellaneous compilation problems -
	an assignment with mismatched types in ldt.c
	a missing include in mconsole.h which needs a definition of uml_pt_regs
	I missed removing an include of user_util.h in hostfs

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-29 08:22:25 -07:00
Dmitriy Monakhov
d9993c37ef [PATCH] splice: partial write fix
Currently if partial write has happened while ->commit_write() then page
wasn't marked as accessed and rebalanced.

Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-03-29 14:26:42 +02:00
Linus Torvalds
e5c465f5d9 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2_dlm: Check for migrateable lockres in dlm_empty_lockres()
  ocfs2_dlm: Fix lockres ref counting bug
2007-03-28 14:02:03 -07:00
Jeff Garzik
a9c87a10db Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes 2007-03-28 02:21:18 -04:00
Zach Brown
28defbea64 [PATCH] aio: remove bare user-triggerable error printk
The user can generate console output if they cause do_mmap() to fail
during sys_io_setup().  This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().

We don't need this printk at all, just remove it.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 17:53:25 -07:00
Jean Tourrilhes
ed4bb10631 [PATCH] wext: Add missing ioctls to 64<->32 conversion
Johannes Berg and Michael Buesch noticed that the WPA ioctls
were missing from the 64<->32 bit conversion. This means that when
using a 32 bits userspace on a 64 bit kernel, those ioctls fail.

Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-03-27 14:10:17 -04:00
Linus Torvalds
e0ab0bb6d2 Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
  Export __splice_from_pipe()
  2/2 splice: dont readpage
  1/2 splice: dont steal
  make elv_register() output atomic
  block: blk_max_pfn is somtimes wrong
2007-03-27 09:05:49 -07:00
Mika Kukkonen
5c46010af2 [PATCH] Fix kernel build with EMBEDDED & PROC_FS & !PROC_SYSCTL
Without attached patch against current -git I get following with
!PROC_SYSCTL (with EMBEDDED and PROC_FS set):

    CC      init/version.o
    LD      init/built-in.o
    LD      vmlinux
  fs/built-in.o: In function `do_proc_sys_lookup':
  proc_sysctl.c:(.text+0x26583): undefined reference to `sysctl_head_next'
  fs/built-in.o: In function `proc_sys_revalidate':
  proc_sysctl.c:(.text+0x265bb): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_readdir':
  proc_sysctl.c:(.text+0x26720): undefined reference to `sysctl_head_next'
  proc_sysctl.c:(.text+0x267d8): undefined reference to `sysctl_head_finish'
  proc_sysctl.c:(.text+0x268e7): undefined reference to `sysctl_head_next'
  proc_sysctl.c:(.text+0x26910): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_write':
  proc_sysctl.c:(.text+0x2695d): undefined reference to `sysctl_perm'
  proc_sysctl.c:(.text+0x2699c): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_read':
  proc_sysctl.c:(.text+0x269e9): undefined reference to `sysctl_perm'
  proc_sysctl.c:(.text+0x26a25): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_permission':
  proc_sysctl.c:(.text+0x26ad1): undefined reference to `sysctl_perm'
  proc_sysctl.c:(.text+0x26adb): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_lookup':
  proc_sysctl.c:(.text+0x26b39): undefined reference to `sysctl_head_finish'
  make: *** [vmlinux] Virhe 1

All those functions are in fs/proc/proc_sysctl.c, which has no CONFIG_
#define's in it, so the patch makes the compilation of that file to depend
on CONFIG_PROC_SYSCTL (the simplest choice).

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:16 -07:00
J. Bruce Fields
79f6523a16 [PATCH] knfsd: nfsd4: remove superfluous cancel_delayed_work() call
This cancel_delayed_work call is called from a function that is only called
from a piece of code that immediate follows a cancel and destruction of the
workqueue, so it's clearly a mistake.

Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00
Bruce Fields
21315edd48 [PATCH] knfsd: nfsd4: demote "clientid in use" printk to a dprintk
The reused clientid here is a more of a problem for the client than the
server, and the client can report the problem itself if it's serious.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00
Bruce Fields
54c0440949 [PATCH] knfsd: nfsd4: fix inheritance flags on v4 ace derived from posix default ace
A regression introduced in the last set of acl patches removed the
INHERIT_ONLY flag from aces derived from the posix acl.  Fix.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00
NeilBrown
598b9a5637 [PATCH] knfsd: allow nfsd READDIR to return 64bit cookies
->readdir passes lofft_t offsets (used as nfs cookies) to
nfs3svc_encode_entry{,_plus}, but when they pass it on to encode_entry it
becomes an 'off_t', which isn't good.

So filesystems that returned 64bit offsets would lose.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00
Mark Fasheh
40bee44eae Export __splice_from_pipe()
Ocfs2 wants to implement it's own splice write actor so that it can better
manage cluster / page locks. This lets us re-use the rest of splice write
while only providing our own code where it's actually important.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-03-27 08:55:47 +02:00
Nick Piggin
08c7259163 2/2 splice: dont readpage
Splice does not need to readpage to bring the page uptodate before writing
to it, because prepare_write will take care of that for us.

Splice is also wrong to SetPageUptodate before the page is actually uptodate.
This results in the old uninitialised memory leak. This gets fixed as a
matter of course when removing the readpage logic.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-03-27 08:55:39 +02:00
Nick Piggin
485ddb4b97 1/2 splice: dont steal
Stealing pages with splice is problematic because we cannot just insert
an uptodate page into the pagecache and hope the filesystem can take care
of it later.

We also cannot just ClearPageUptodate, then hope prepare_write does not
write anything into the page, because I don't think prepare_write gives
that guarantee.

Remove support for SPLICE_F_MOVE for now. If we really want to bring it
back, we might be able to do so with a the new filesystem buffered write
aops APIs I'm working on. If we really don't want to bring it back, then
we should decide that sooner rather than later, and remove the flag and
all the stealing infrastructure before anybody starts using it.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-03-27 08:55:08 +02:00
Sunil Mushran
2f5bf1f2d0 ocfs2_dlm: Check for migrateable lockres in dlm_empty_lockres()
In dlm_migrate_lockres(), we check upfront whether the lockres is a
candidate for migration. This patch encapsulates that code in a separate
function so that dlm_empty_lockres() can also use it during umount. This
patch addresses the umount process spinning problem.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-26 16:51:15 -07:00
Sunil Mushran
78062cb2e5 ocfs2_dlm: Fix lockres ref counting bug
During umount, the umount thread migrates the lockres' and the dlm_thread
frees the empty lockres'. Due to a race, the reference counting on the
lockres goes awry leading to extra puts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-26 16:50:52 -07:00
Adrian Bunk
d32b687e2e 9p: make struct v9fs_cached_file_operations static
This patch makes te needlessly global struct v9fs_cached_file_operations
static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-03-26 18:14:44 +00:00
Andrew Morton
105fd108a6 [PATCH] "ext[34]: EA block reference count racing fix" performance fix
A little mistake in 8a2bfdcbfa is making all
transactions synchronous, which reduces ext3 performance to comical levels.

Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-23 11:01:22 -07:00
David Howells
aa289b4723 [PATCH] FDPIC: fix the /proc/pid/stat representation of executable boundaries
Fix the /proc/pid/stat representation of executable boundaries.  It should
show the bounds of the executable, but instead shows the bounds of the
loader.

Before the patch is applied, the bug can be seen by examining, say, inetd:

	# ps | grep inetd
	  610         root          0   S   /usr/sbin/inetd -i
	# cat /proc/610/maps
	c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
	c3180000-c31dede4 r-xs 00000000 00:0b 14582179  /lib/libuClibc-0.9.28.so
	c328c000-c328ea00 rw-p 00008000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
	c3290000-c329b6c0 rw-p 00000000 00:00 0
	c32a0000-c32c0000 rwxp 00000000 00:00 0
	c32d4000-c32d8000 rw-p 00000000 00:00 0
	c3394000-c3398000 rw-p 00000000 00:00 0
	c3458000-c345f464 r-xs 00000000 00:0b 16384612  /usr/sbin/inetd
	c3470000-c34748f8 rw-p 00004000 00:0b 16384612  /usr/sbin/inetd
	c34cc000-c34d0000 rw-p 00000000 00:00 0
	c34d4000-c34d8000 rw-p 00000000 00:00 0
	c34d8000-c34dc000 rw-p 00000000 00:00 0
	# cat /proc/610/stat
	610 (inetd) S 1 610 610 0 -1 256 0 0 0 0 0 8 0 0 19 0 1 0 94392000718
	950272 0 4294967295 3233480704 3233523592 3274440352 3274439976
 	3273467584 0 0 4096 90115 3221712796 0 0 17 0 0 0 0

The code boundaries are 3233480704 to 3233523592, which are:

	(gdb) p/x 3233480704
	$1 = 0xc0bb0000
	(gdb) p/x 3233523592
	$2 = 0xc0bba788

Which corresponds to this line in the maps file:

	c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so

Which is wrong.  After the patch is applied, the maps file is pretty much
identical (there's some minor shuffling of the location of some of the
anonymous VMAs), but the stat file is now:

	# cat /proc/610/stat
	610 (inetd) S 1 610 610 0 -1 256 0 0 0 0 0 7 0 0 18 0 1 0 94392000722
	950272 0 4294967295 3276111872 3276141668 3274440352 3274439976
	3273467584 0 0 4096 90115 3221712796 0 0 17 0 0 0 0

The code boundaries are then 3276111872 to 3276141668, which are:

	(gdb) p/x 3276111872
	$1 = 0xc3458000
	(gdb) p/x 3276141668
	$2 = 0xc345f464

And these correspond to this line in the maps file instead:

	c3458000-c345f464 r-xs 00000000 00:0b 16384612  /usr/sbin/inetd

Which is now correct.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-23 11:01:21 -07:00
Linus Torvalds
12998096cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Allow reset of file to ATTR_NORMAL when archive bit not set
  [CIFS] Do not negotiate new POSIX_PATH_OPERATIONS_CAP yet
  [CIFS] reset mode when client notices that ATTR_READONLY is no longer set
2007-03-22 19:47:09 -07:00
Rafael J. Wysocki
b43376927a [PATCH] Make XFS workqueues nonfreezable
Since freezable workqueues are broken in 2.6.21-rc
(cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
it's better to change the only user of them, which is XFS, to use "normal"
nonfreezable workqueues.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-22 19:39:06 -07:00
Steve French
066fcb06d3 [CIFS] Allow reset of file to ATTR_NORMAL when archive bit not set
When a file had a dos attribute of 0x1 (readonly - but dos attribute
of archive was not set) - doing chmod 0777 or equivalent would
try to set a dos attribute of 0 (which some servers ignore)
rather than ATTR_NORMAL (0x20) which most servers accept.
Does not affect servers which support the CIFS Unix Extensions.

Acked-by: Prasad Potluri <pvp@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-03-23 00:45:08 +00:00
James Bottomley
d1cabd6326 [PATCH] fix process crash caused by randomisation and 64k pages
This bug was seen on ppc64, but it could have occurred on any
architecture with a page size of 64k or above.  The problem is that in
fs/binfmt_elf.c:randomize_stack_top() randomizes the stack to within
0x7ff pages.  On 4k page machines, this is 8MB; on 64k page boxes, this
is 128MB.

The problem is that the new binary layout (selected in
arch_pick_mmap_layout) places the mapping segment 128MB or the stack
rlimit away from the top of the process memory, whichever is larger.  If
you chose an rlimit of less than 128MB (most defaults are in the 8Mb
range) then you can end up having your entire stack randomized away.

The fix is to make randomize_stack_top() only steal at most 8MB, which this
patch does.  However, I have to point out that even with this, your stack
rlimit might not be exactly what you get if it's > 128MB, because you're
still losing the random offset of up to 8MB.

The true fix should be to leave an explicit gap for the randomization plus
a buffer when determining mmap_base, but that would involve fixing all the
architectures.

Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:06 -07:00
Johannes Berg
e5d480ff17 [PATCH] change misleading EFI partition support description
Remove the misleading "Presently only useful on the IA-64 platform" text
from the EFI partition Kconfig.

EFI partitions are also used by Apple on their Intel-based machines and
thus you need EFI partition support if you (for example) want to attach
such a machine in target disk mode.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:06 -07:00
Trond Myklebust
634707388b [PATCH] nfs: nfs_getattr() can't call nfs_sync_mapping_range() for non-regular files
Looks like we need a check in nfs_getattr() for a regular file. It makes
no sense to call nfs_sync_mapping_range() on anything else. I think that
should fix your problem: it will stop the NFS client from interfering
with dirty pages on that inode's mapping.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:06 -07:00
Peter Zijlstra
89a09141df [PATCH] nfs: fix congestion control
The current NFS client congestion logic is severly broken, it marks the
backing device congested during each nfs_writepages() call but doesn't
mirror this in nfs_writepage() which makes for deadlocks.  Also it
implements its own waitqueue.

Replace this by a more regular congestion implementation that puts a cap on
the number of active writeback pages and uses the bdi congestion waitqueue.

Also always use an interruptible wait since it makes sense to be able to
SIGKILL the process even for mounts without 'intr'.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:05 -07:00
suzuki
b74a2f0913 [PATCH] fix rescan_partitions to return errors properly
The only error code which comes from the partition checkers is -1, when
they finds an EIO.  As per the discussion, ENOMEM values were ignored,
as they might scare the users.

So, with the current code, we end up returning -1 and not EIO for the
ioctl() calls.  Which doesn't give any clue to the user of what went
wrong.

Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:05 -07:00
Vasily Averin
1174cf7301 [PATCH] smbfs: double free memory corruption
smbfs allocates rq_trans2buffer to handle server's multi transaction2 response
messages.  As struct smb_request may be reused, rq_trans2buffer is freed
before each new request.  However if last servers's response is not multi but
single trans2 message then new rq_trans2buffer is not allocated but last
smb_rput still tries to free it again.

To prevent this issue rq_trans2buffer pointer should be set to NULL after
kfree.

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:05 -07:00
Michael Halcrow
b228b8e5bf [PATCH] eCryptfs: fix possible NULL ptr deref in ecryptfs_d_release()
ecryptfs_d_release() first dereferences a pointer (via
ecryptfs_dentry_to_lower()) and then afterwards checks to see if the
pointer it just dereferenced is NULL (via ecryptfs_dentry_to_private()).

This patch moves all of the work done on the dereferenced pointer inside a
block governed by the condition that the pointer is non-NULL.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:05 -07:00
Evgeniy Dushistov
0465fc0a1c [PATCH] ufs2: tindirect truncate fix
During modification of code to support UFS2 writing, the case with
"three indirect" blocks in truncate path was missed, this patch fixes
this situation.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:03 -07:00
Evgeniy Dushistov
4b25a37e20 [PATCH] ufs: zeroize the rest of block in truncate
This patch fix behaviour in such test scenario:

  lseek(fd, BIG_OFFSET)
  write(fd, buf, sizeof(buf))
  truncate(BIG_OFFSET)
  truncate(BIG_OFFSET + sizeof(buf))
  read(fd, buf...)

Because of if file big enough(BIG_OFFSET) we start allocate space by block,
ordinary block size > page size, so we should zeroize the rest of block in
truncate(except last framgnet, about which VFS should care), to not get
garbage, when we extend file.

Also patch corrects conversion from pointer to block to physical block number,
this helps in case of not common used UFS types.

And add to debug output inode number.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:03 -07:00
Evgeniy Dushistov
5431bf97ce [PATCH] ufs: prepare write + change blocks on the fly
This fixes "change blocks numbers on the fly" in case when "prepare
write page" is in the call chain, in this case some buffers may be not
uptodate and not mapped, we should care to map them and load from disk.

This patch was tested with:
 - ufs regressions simple tests
 - fsx-linux
 - ltp(20060306)
 - untar and build kernel

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:03 -07:00
Evgeniy Dushistov
2189850f42 [PATCH] ufs2: more correct work with time
This patch corrects work with time in UFS2 case.

1) According to UFS2 disk layout modification/access and so on "time"
   should be hold in two variables one 64bit for seconds and another 32bit for
   nanoseconds,

   at now for some unknown reason we suppose that "inode time" holds in
   three variables 32bit for seconds, 32bit for milliseconds and 32bit for
   nanoseconds.

2) We set amount of nanoseconds in "VFS inode" to 0 during read, instead of
   getting values from "on disk inode"(this should close
   http://bugzilla.kernel.org/show_bug.cgi?id=7991).

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Bjoern Jacke <bjoern@j3e.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:03 -07:00
Steve French
38e2aff670 [CIFS] Do not negotiate new POSIX_PATH_OPERATIONS_CAP yet
Samba server now expects that clients which send the new
POSIX_PATH_OPERATIONS_CAP send all opens with this new
SMB - and expects that clients that could send the new
posix open/create but don't as indicating that they really
want Windows semantics on that handle (which allows Samba
to support clients which want to support both types of
behaviors on different handles on the same mount)

We will put this capability back in the SetFSInfo
negotiation with servers like Samba when the
new POSIXCreate (create/open/mkdir) code is finished.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-03-16 05:12:53 +00:00
Alan Stern
e7b0d26a86 [PATCH] sysfs: reinstate exclusion between method calls and attribute unregistration
This patch (as869) reinstates the mutual exclusion between sysfs
attribute method calls and attribute unregistration.  The
previously-reported deadlocks have been fixed, and this exclusion is
by far the simplest way to avoid races during driver unbinding.

The check for orphaned read-buffers has been moved down slightly, so
that the remainder of a partially-read buffer will still be available
to userspace even after the attribute has been unregistered.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-15 15:29:26 -07:00
Alan Stern
d9a9cdfb07 [PATCH] sysfs and driver core: add callback helper, used by SCSI and S390
This patch (as868) adds a helper routine for device drivers that need
to set up a callback to perform some action in a different process's
context.  This is intended for use by attribute methods that want to
unregister themselves or their parent device.  Attribute method calls
are mutually exclusive with unregistration, so such actions cannot be
taken directly.

Two attribute methods are converted to use the new helper routine: one
for SCSI device deletion and one for System/390 ccwgroup devices.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-15 15:29:26 -07:00
Linus Torvalds
4e337adae4 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2_dlm: Add missing locks in dlm_empty_lockres
  ocfs2_dlm: Missing get/put lockres in dlm_run_purge_lockres
  configfs: add missing mutex_unlock()
  ocfs2: add some missing address space callbacks
  ocfs2: Concurrent access of o2hb_region->hr_task was not locked
  ocfs2: Proper cleanup in case of error in ocfs2_register_hb_callbacks()
2007-03-14 15:29:08 -07:00
Al Viro
82c05a13c9 [PATCH] cifs endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 15:27:50 -07:00
Al Viro
a033f35a22 [PATCH] include of asm/pgtable.h in nfsfh is bogus
not needed and actually breaks build on frv, while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 15:27:49 -07:00
Al Viro
04ff97086b [PATCH] sanitize security_getprocattr() API
have it return the buffer it had allocated

Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 15:27:48 -07:00
Sunil Mushran
b36c3f8498 ocfs2_dlm: Add missing locks in dlm_empty_lockres
__dlm_lockres_unused() expects the caller to take the lockres spinlock.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:35 -07:00
Sunil Mushran
3fca0894a4 ocfs2_dlm: Missing get/put lockres in dlm_run_purge_lockres
In some circumstances, this was causing us to reference freed memory.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:33 -07:00
Joel Becker
afdf04ea09 configfs: add missing mutex_unlock()
d_alloc() failure in configfs_register_subsystem() would fail to unlock
the mutex taken above.  Reorganize the exit path to ensure the unlock
happens.

Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:21 -07:00
Joel Becker
03f981cf2e ocfs2: add some missing address space callbacks
Under load, OCFS2 would crash in invalidate_inode_pages2_range() because
invalidate_complete_page2() was unable to invalidate a page.  It would
appear that JBD is holding on to the page.  ext3 has a specific
->releasepage() handler to cover this case.

Steal ext3's ->releasepage(), ->invalidatepage(), and ->migratepage(), as
they appear completely appropriate for OCFS2.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:16 -07:00