Right now, file allocation for ocfs2 is done within ocfs2_extend_file(),
which is either called from ->setattr() (for an i_size change), or at the
top of ocfs2_file_aio_write().
Inodes on file systems with sparse file support will want to do their
allocation during the actual write call.
In either case the cluster locking decisions are the same. We abstract out
that code into a new function, ocfs2_lock_allocators() which will be used by
a later patch to enable writing to sparse files.
This also provides a nice cleanup of ocfs2_extend_allocation().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no
longer exists since i_size is not tied to i_clusters. In
ocfs2_extend_file(), we skip the allocation / page zeroing code for file
systems which understand sparse files.
The core truncate code is changed to do a bottom up tree traversal. This
gets abstracted out into it's own function. To make things more readable,
most of the special case handling for in-inode extents from
ocfs2_do_truncate() is also removed.
Though write support for sparse files comes in a later patch, we at least
update ocfs2_prepare_inode_for_write() to skip allocation for sparse files.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The code in extent_map.c is not prepared to deal with a subtree being
rotated between lookups. This can happen when filling holes in sparse files.
Instead of a lengthy patch to update the code (which would likely lose the
benefit of caching subtree roots), we remove most of the algorithms and
implement a simple path based lookup. A less ambitious extent caching scheme
will be added in a later patch.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Introduce tree rotations into the b-tree code. This will allow ocfs2 to
support sparse files. Much of the added code is designed to be generic (in
the ocfs2 sense) so that it can later be re-used to implement large
extended attributes.
This patch only adds the rotation code and does minimal updates to callers
of the extent api.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
There are two checks in there (one for inode newness, one for other mounted
nodes) which are unnecessary, so remove them. The DLM will allow the trylock
in either case without any messaging overhead.
Removing these makes ocfs2_request_delete() a one liner function, so just
move the trylock out one level into ocfs2_query_inode_wipe().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Remove node messaging code that becomes unused with the delete inode vote
removal.
[Removed even more cruft which I spotted during review --Mark]
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Ocfs2 currently does cluster-wide node messaging to check the open state of
an inode during delete. This patch removes that mechanism in favor of an
inode cluster lock which is taken at shared read when an inode is first read
and dropped in clear_inode(). This allows a deleting node to test the
liveness of an inode by attempting to take an exclusive lock.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We don't want to print anything at all in ocfs2_lookup() when getting an
error from ocfs2_iget() - it could be something as innocuous as a signal
being detected in the dlm.
ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can
return if the inode was deleted on another node.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We have noticed panic() hanging leading us to a situation in which
the node, while otherwise dead, is still disk heartbeating. This
leads to a hung cluster as the other nodes are waiting for this
node to stop disk heartbeating. This situation is only resolved
by power resetting the box.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We don't want the extent map and uptodate cache destruction in
ocfs2_meta_lock_update() on a local mount, so skip that.
This fixes several bugs with uptodate being cleared on buffers and extent
maps being corrupted.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
In dlm_migrate_all_locks(), we currently call cond_resched_lock() after
processing each lockres in a hash bucket. Move it outside the loop so as to
call it only after the entire hash bucket has been processed.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
There is a possibility that dlm_remaster_locks could overwride node->state
with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the
node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting
stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock
spinlock.
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
It seems to be silly season lately.
(Oops, test builds are more useful if the file in question is actually
configured on. dwmw2).
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Switch cb_lock to mutex and allow netlink kernel users to override it
with a subsystem specific mutex for consistent locking in dump callbacks.
All netlink_dump_start users have been audited not to rely on any
side-effects of the previously used spinlock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the
number of direct accesses to skb->data and for consistency with all the other
cast skb member helpers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This should never happen unless there's corruption on the medium and the
actual data nodes go missing. But the failure mode (an oops when we assume
the fragtree isn't empty and go looking for its last node) isn't useful.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In particular, remove the bit in the LICENCE file about contacting
Red Hat for alternative arrangements. Their errant IS department broke
that arrangement a long time ago -- the policy of collecting copyright
assignments from contributors came to an end when the plug was pulled on
the servers hosting the project, without notice or reason.
We do still dual-license it for use with eCos, with the GPL+exception
licence approved by the FSF as being GPL-compatible. It's just that nobody
has the right to license it differently.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
No need to check for all-zero header since the header cannot
be zero due to other checks.
Replace the all-zero header check in readinode.c with a
check for the magic word.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We originally used to read every node and allocate a jffs2_tmp_dnode_info
structure for each, before processing them in (reverse) version order
and discarding the ones which are obsoleted by later nodes.
With huge logfiles, this behaviour caused memory problems. For example, a
file involved in OLPC trac #1292 has 1822391 nodes, and would cause the XO
machine to run out of memory during the first stage of read_inode().
Instead of just inserting nodes into a tree in version order as we find
them, we now put them into a tree in order of their offset within the
file, which allows us to immediately discard nodes which are completely
obsoleted.
We don't use a full tree with 'fragments' pointing to the real data
structure, as we do in the normal fragtree. We sort only on the start
address, and add an 'overlapped' flag to the tmp_dnode_info to indicate
that the node in question is (partially) overlapped by another.
When the scan is complete, we start at the end of the file, adding each
node to a real fragtree as before. Where the node is non-overlapped, we
just add it (it doesn't matter that it's not the latest version; there is
no overlap). When the node at the end of the tree _is_ overlapped, we sort
it and all its overlapping nodes into version order and then add them to
the fragtree in that order.
This 'early discard' reduces the peak allocation of tmp_dnode_info
structures from 1.8M to a mere 62872 (3.5%) in the degenerate case
referenced above.
This version of the patch also correctly rememembers the highest node
version# seen for an inode when it's scanned.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The listxattr() and getxattr() operations are only protected by a read
lock. As a result, if either of these operations run in parallel, a race
condition exists where the xattr_root will end up being cached twice, which
results in the leaking of a reference and a BUG() on umount.
This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(),
into one get_xa_root() function that takes the appropriate locking around
the entire critical section.
Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Andrea Righi <a.righi@cineca.it>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Edward Shishkin <edward@namesys.com>
Cc: Alex Zarochentsev <zam@namesys.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
v9fs_insert uses v9fs_fid_lookup (which also locks the fid) to get the
primary fid associated with the dentry and destroys the v9fs_fid struct
after removing the file. If another process called v9fs_fid_lookup on the
same dentry, it may wait undefinitely for the fid's lock (as the struct is
freed).
This patch changes v9fs_remove to use a cloned fid, so the primary fid is
not locked and freed.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@hera.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should never find the unchecked size is non-zero after we've finished
checking all inodes. If it happens, used to BUG(), leaving the alloc_sem
held and deadlocking. Instead, just return -ENOSPC after complaining. The
GC thread will die, but read-only operation should be able to continue and
the file system should be unmountable.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
When compiling a LE-capable JFFS2 on PowerPC, wbuf.c fails to compile:
fs/jffs2/wbuf.c:973: error: braced-group within expression allowed only inside a function
fs/jffs2/wbuf.c:973: error: initializer element is not constant
fs/jffs2/wbuf.c:973: error: (near initialization for ‘oob_cleanmarker.magic’)
fs/jffs2/wbuf.c:974: error: braced-group within expression allowed only inside a function
fs/jffs2/wbuf.c:974: error: initializer element is not constant
fs/jffs2/wbuf.c:974: error: (near initialization for ‘oob_cleanmarker.nodetype’)
fs/jffs2/wbuf.c:975: error: braced-group within expression allowed only inside a function
fs/jffs2/wbuf.c:976: error: initializer element is not constant
fs/jffs2/wbuf.c:976: error: (near initialization for ‘oob_cleanmarker.totlen’)
Provide constant_cpu_to_je{16,32} functions, and use them for initialising the
offending structure.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Redirtying a request that is already marked for commit will screw up the
accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit.
Ensure that all requests on the commit queue are labelled with the
PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside
nfs_page_mark_flush().
Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for
atomicity reasons. Avoid dropping the spinlock until we're done marking the
request in the radix tree and have added it to the ->dirty list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ensure that we don't release the PG_writeback lock until after the page has
either been redirtied, or queued on the nfs_inode 'commit' list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Get rid of the inlined #ifdefs.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch should fix or partly fix this bug:
http://bugzilla.kernel.org/show_bug.cgi?id=8276
The problem is:
- if we see "zero link case" during reading inode operation, we call
ufs_error(which remount fs readonly), but not "mark" inode as bad (1)
- in readonly case we do not fill some data structures, which are used in
read and write case (2)
- VFS call ufs_delete_inode if link count is zero (3)
so (1)->(3)->(2) cause oops, this patch should fix such scenario
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Jim Paris <jim@jtan.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch checks for "|" in the pattern not the output and doesn't nail a
pid on to a piped name (as it is a program name not a file)
Also fixes a very very obscure security corner case. If you happen to have
decided on a core pattern that starts with the program name then the user
can run a program called "|myevilhack" as it stands. I doubt anyone does
this.
Signed-off-by: Alan Cox <alan@redhat.com>
Confirmed-by: Christopher S. Aker <caker@theshore.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove excessive scanning of empty flash after a clean
marker for users of the point/unpoint method. cfi_cmdset_0001
uses point/unpoint by default iff flash mapping is linear.
The speedup is several orders of magnitude if FS is less than
half full.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In read inode we have an optimization which prevents one
min. I/O unit (e.g. NAND page) to be read more then once.
Namely, at the beginning we do not know which node type we read,
so we read so we assume we read the directory entry, because it
has the smallest node header. When we read it, we read up to the
next min. I/O unit, just because if later we'll need to read more,
we already have this data.
If it turns out to be that the node is not directory entry, and
we need more data, and we did not read it because it sits in the
next min. I/O unit, we read the whole next (or several next)
min. I/O unit(s). And if it happens to be that we read a data node,
and we've read part of its data, we calculate partial CRC.
So if later we need to check data CRC, we'll only read the rest
of the data from further min. I/O units and continue CRC checking.
This code was a bit messy and buggy. The bug was that it assumed
relatively large min. I/O unit, so that the largest node header
could overlap only one min. I/O unit boundary.
This parch clean-ups the code a bit and fixes this bug.
The patch was not tested on flash with small min. I/O unit, like
NOR-ECC, nut it was tested on NAND with 512 bytes NAND page, so
it at least does not break NAND. It was also tested with mtdram
so it should not break NOR.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
After a write error, any data in the write buffer must
be relocated. This is handled by the jffs2_wbuf_recover
function. This function does not fix up the erase block
summary information that is collected for writing at the
end of the block, which results in an incorrect summary
(or BUG if the summary was found to be empty).
As the summary is not essential (it is an optimisation),
it may be disabled for the current erase block when this
situation arises. This patch does that.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
If a write error occurs, the affected block is placed on the
bad_used_list. In the case that the write error occured
when writing summary data the block was also being placed on
the dirty_list, which caused list corruption and ultimately
a soft lockup in jffs2_mark_node_obsolete. This fixes that.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
When the MTD driver returns write failure, the following deadlock
occurs:
We are in __jffs2_flush_wbuf(), we hold &c->wbuf_sem. Write failure.
jffs2_wbuf_recover()->jffs2_reserve_space_gc()->jffs2_do_reserve_space()
->jffs2_erase_pending_blocks()->jffs2_flash_read()
and it tries to lock &c->wbuf_sem again. Deadlock.
Reported-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Check the node CRC on scan before doing anything else with the node.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
We must remove the request from whatever list it is currently on before we
can add it to the dirty list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the writebacks are cancelled via nfs_cancel_dirty_list, or due to the
memory allocation failing in nfs_flush_one/nfs_flush_multi, then we must
ensure that the PG_writeback flag is cleared.
Also ensure that we actually own the PG_writeback flag whenever we
schedule a new writeback by making nfs_set_page_writeback() return the
value of test_set_page_writeback().
The PG_writeback page flag ends up replacing the functionality of the
PG_FLUSHING nfs_page flag, so we rip that out too.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do not flag an error if the COMMIT call fails and we decide to resend the
writes. Let the resend flag the error if it fails.
If a write has failed, then nfs_direct_write_result should not attempt to
send a commit. It should just exit asap and return the error to the user.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It looks like nfs_setattr() and nfs_rename() also need to test whether the
target is a regular file before calling nfs_wb_all()...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit f50b6f8691 introduced a race in
autofs4 between autofs_lookup_unhashed() and autofs_dentry_release().
autofs_dentry_release() ends up clearing the ->dentry and ->inode members
of autofs_info before removing it from the rehash list. The list is
protected by the rehash lock in both functions, but since
autofs_dentry_release() starts tearing the autofs_info struct down before
removing it from the list, autofs_lookup_unhashed() can get a autofs_info
with a NULL dentry.
This patch moves the clearing of ->dentry and ->inode after the removal
from the rehash list.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes a bug in function decrementing a key of stat data item.
Offset of reiserfs keys are compared as signed values. To set key offset
to maximal possible value maximal signed value has to be used.
This bug is responsible for severe reiserfs filesystem corruption which
shows itself as warning vs-13060. reiserfsck fixes this corruption by
filesystem tree rebuilding.
Signed-off-by: Vladimir Saveliev <vs@namesys.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If rootmode isn't valid, we hit the BUG() in fuse_init_inode. Now
EINVAL is returned.
Signed-off-by: Timo Savola <tsavola@movial.fi>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert all this. It can cause device-mapper to receive a different major from
earlier kernels and it turns out that the Amanda backup program (via GNU tar,
apparently) checks major numbers on files when performing incremental backups.
Which is a bit broken of Amanda (or tar), but this feature isn't important
enough to justify the churn.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Delete everything related to the apparently non-existent kernel config
option JFFS2_PROC.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Revert e92a4d595b.
Dmitry points out
"When we block_prepare_write() failed while ext3_prepare_write() we jump to
"failure" label and call ext3_prepare_failure() witch search last mapped bh
and invoke commit_write untill it. This is wrong!! because some bh from
begining to the last mapped bh may be not uptodate. As a result we commit to
disk not uptodate page content witch contains garbage from previous usage."
and
"Unexpected file size increasing."
Call trace the same as it was in first issue but result is different.
For example we have file with i_size is zero. we want write two blocks ,
but fs has only one free block.
->ext3_prepare_write(...from == 0, to == 2048)
retry:
->block_prepare_write() == -ENOSPC# we failed but allocated one block here.
->ext3_prepare_failure()
->commit_write( from == 0, to == 1024) # after this i_size becomes 1024 :)
if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
goto retry;
Finally when all retries will be spended ext3_prepare_failure return
-ENOSPC, but i_size was increased and later block trimm procedures can't
help here.
We don't appear to have the horsepower to fix these issues, so let's put
things back the way they were for now.
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ken Chen <kenneth.w.chen@intel.com>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the dump cannot occur most likely because of a full file system and
the page to be written is the zero page, the call to page_cache_release()
is missed.
Signed-off-by: Brian Pomerantz <bapper@mvista.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We're using #ifdef CONFIG_SYSCTL, but we should be using CONFIG_PROC_SYSCTL,
so we get
fs/built-in.o: In function `proc_root_init':
/usr/src/linux/fs/proc/root.c:83: undefined reference to `proc_sys_init'
Fix that up and remove an ifdef-in-C.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Helge Hafting <helgehaf@aitel.hist.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* rename name to host_root_path
* rename data to req_root.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a few miscellaneous compilation problems -
an assignment with mismatched types in ldt.c
a missing include in mconsole.h which needs a definition of uml_pt_regs
I missed removing an include of user_util.h in hostfs
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently if partial write has happened while ->commit_write() then page
wasn't marked as accessed and rebalanced.
Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The user can generate console output if they cause do_mmap() to fail
during sys_io_setup(). This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().
We don't need this printk at all, just remove it.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Berg and Michael Buesch noticed that the WPA ioctls
were missing from the 64<->32 bit conversion. This means that when
using a 32 bits userspace on a 64 bit kernel, those ioctls fail.
Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
Export __splice_from_pipe()
2/2 splice: dont readpage
1/2 splice: dont steal
make elv_register() output atomic
block: blk_max_pfn is somtimes wrong
Without attached patch against current -git I get following with
!PROC_SYSCTL (with EMBEDDED and PROC_FS set):
CC init/version.o
LD init/built-in.o
LD vmlinux
fs/built-in.o: In function `do_proc_sys_lookup':
proc_sysctl.c:(.text+0x26583): undefined reference to `sysctl_head_next'
fs/built-in.o: In function `proc_sys_revalidate':
proc_sysctl.c:(.text+0x265bb): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_readdir':
proc_sysctl.c:(.text+0x26720): undefined reference to `sysctl_head_next'
proc_sysctl.c:(.text+0x267d8): undefined reference to `sysctl_head_finish'
proc_sysctl.c:(.text+0x268e7): undefined reference to `sysctl_head_next'
proc_sysctl.c:(.text+0x26910): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_write':
proc_sysctl.c:(.text+0x2695d): undefined reference to `sysctl_perm'
proc_sysctl.c:(.text+0x2699c): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_read':
proc_sysctl.c:(.text+0x269e9): undefined reference to `sysctl_perm'
proc_sysctl.c:(.text+0x26a25): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_permission':
proc_sysctl.c:(.text+0x26ad1): undefined reference to `sysctl_perm'
proc_sysctl.c:(.text+0x26adb): undefined reference to `sysctl_head_finish'
fs/built-in.o: In function `proc_sys_lookup':
proc_sysctl.c:(.text+0x26b39): undefined reference to `sysctl_head_finish'
make: *** [vmlinux] Virhe 1
All those functions are in fs/proc/proc_sysctl.c, which has no CONFIG_
#define's in it, so the patch makes the compilation of that file to depend
on CONFIG_PROC_SYSCTL (the simplest choice).
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This cancel_delayed_work call is called from a function that is only called
from a piece of code that immediate follows a cancel and destruction of the
workqueue, so it's clearly a mistake.
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reused clientid here is a more of a problem for the client than the
server, and the client can report the problem itself if it's serious.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A regression introduced in the last set of acl patches removed the
INHERIT_ONLY flag from aces derived from the posix acl. Fix.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
->readdir passes lofft_t offsets (used as nfs cookies) to
nfs3svc_encode_entry{,_plus}, but when they pass it on to encode_entry it
becomes an 'off_t', which isn't good.
So filesystems that returned 64bit offsets would lose.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ocfs2 wants to implement it's own splice write actor so that it can better
manage cluster / page locks. This lets us re-use the rest of splice write
while only providing our own code where it's actually important.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Splice does not need to readpage to bring the page uptodate before writing
to it, because prepare_write will take care of that for us.
Splice is also wrong to SetPageUptodate before the page is actually uptodate.
This results in the old uninitialised memory leak. This gets fixed as a
matter of course when removing the readpage logic.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Stealing pages with splice is problematic because we cannot just insert
an uptodate page into the pagecache and hope the filesystem can take care
of it later.
We also cannot just ClearPageUptodate, then hope prepare_write does not
write anything into the page, because I don't think prepare_write gives
that guarantee.
Remove support for SPLICE_F_MOVE for now. If we really want to bring it
back, we might be able to do so with a the new filesystem buffered write
aops APIs I'm working on. If we really don't want to bring it back, then
we should decide that sooner rather than later, and remove the flag and
all the stealing infrastructure before anybody starts using it.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
In dlm_migrate_lockres(), we check upfront whether the lockres is a
candidate for migration. This patch encapsulates that code in a separate
function so that dlm_empty_lockres() can also use it during umount. This
patch addresses the umount process spinning problem.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
During umount, the umount thread migrates the lockres' and the dlm_thread
frees the empty lockres'. Due to a race, the reference counting on the
lockres goes awry leading to extra puts.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch makes te needlessly global struct v9fs_cached_file_operations
static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
A little mistake in 8a2bfdcbfa is making all
transactions synchronous, which reduces ext3 performance to comical levels.
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the /proc/pid/stat representation of executable boundaries. It should
show the bounds of the executable, but instead shows the bounds of the
loader.
Before the patch is applied, the bug can be seen by examining, say, inetd:
# ps | grep inetd
610 root 0 S /usr/sbin/inetd -i
# cat /proc/610/maps
c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3180000-c31dede4 r-xs 00000000 00:0b 14582179 /lib/libuClibc-0.9.28.so
c328c000-c328ea00 rw-p 00008000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3290000-c329b6c0 rw-p 00000000 00:00 0
c32a0000-c32c0000 rwxp 00000000 00:00 0
c32d4000-c32d8000 rw-p 00000000 00:00 0
c3394000-c3398000 rw-p 00000000 00:00 0
c3458000-c345f464 r-xs 00000000 00:0b 16384612 /usr/sbin/inetd
c3470000-c34748f8 rw-p 00004000 00:0b 16384612 /usr/sbin/inetd
c34cc000-c34d0000 rw-p 00000000 00:00 0
c34d4000-c34d8000 rw-p 00000000 00:00 0
c34d8000-c34dc000 rw-p 00000000 00:00 0
# cat /proc/610/stat
610 (inetd) S 1 610 610 0 -1 256 0 0 0 0 0 8 0 0 19 0 1 0 94392000718
950272 0 4294967295 3233480704 3233523592 3274440352 3274439976
3273467584 0 0 4096 90115 3221712796 0 0 17 0 0 0 0
The code boundaries are 3233480704 to 3233523592, which are:
(gdb) p/x 3233480704
$1 = 0xc0bb0000
(gdb) p/x 3233523592
$2 = 0xc0bba788
Which corresponds to this line in the maps file:
c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
Which is wrong. After the patch is applied, the maps file is pretty much
identical (there's some minor shuffling of the location of some of the
anonymous VMAs), but the stat file is now:
# cat /proc/610/stat
610 (inetd) S 1 610 610 0 -1 256 0 0 0 0 0 7 0 0 18 0 1 0 94392000722
950272 0 4294967295 3276111872 3276141668 3274440352 3274439976
3273467584 0 0 4096 90115 3221712796 0 0 17 0 0 0 0
The code boundaries are then 3276111872 to 3276141668, which are:
(gdb) p/x 3276111872
$1 = 0xc3458000
(gdb) p/x 3276141668
$2 = 0xc345f464
And these correspond to this line in the maps file instead:
c3458000-c345f464 r-xs 00000000 00:0b 16384612 /usr/sbin/inetd
Which is now correct.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] Allow reset of file to ATTR_NORMAL when archive bit not set
[CIFS] Do not negotiate new POSIX_PATH_OPERATIONS_CAP yet
[CIFS] reset mode when client notices that ATTR_READONLY is no longer set
Since freezable workqueues are broken in 2.6.21-rc
(cf. http://marc.theaimsgroup.com/?l=linux-kernel&m=116855740612755,
http://marc.theaimsgroup.com/?l=linux-kernel&m=117261312523921&w=2)
it's better to change the only user of them, which is XFS, to use "normal"
nonfreezable workqueues.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a file had a dos attribute of 0x1 (readonly - but dos attribute
of archive was not set) - doing chmod 0777 or equivalent would
try to set a dos attribute of 0 (which some servers ignore)
rather than ATTR_NORMAL (0x20) which most servers accept.
Does not affect servers which support the CIFS Unix Extensions.
Acked-by: Prasad Potluri <pvp@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This bug was seen on ppc64, but it could have occurred on any
architecture with a page size of 64k or above. The problem is that in
fs/binfmt_elf.c:randomize_stack_top() randomizes the stack to within
0x7ff pages. On 4k page machines, this is 8MB; on 64k page boxes, this
is 128MB.
The problem is that the new binary layout (selected in
arch_pick_mmap_layout) places the mapping segment 128MB or the stack
rlimit away from the top of the process memory, whichever is larger. If
you chose an rlimit of less than 128MB (most defaults are in the 8Mb
range) then you can end up having your entire stack randomized away.
The fix is to make randomize_stack_top() only steal at most 8MB, which this
patch does. However, I have to point out that even with this, your stack
rlimit might not be exactly what you get if it's > 128MB, because you're
still losing the random offset of up to 8MB.
The true fix should be to leave an explicit gap for the randomization plus
a buffer when determining mmap_base, but that would involve fixing all the
architectures.
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the misleading "Presently only useful on the IA-64 platform" text
from the EFI partition Kconfig.
EFI partitions are also used by Apple on their Intel-based machines and
thus you need EFI partition support if you (for example) want to attach
such a machine in target disk mode.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Looks like we need a check in nfs_getattr() for a regular file. It makes
no sense to call nfs_sync_mapping_range() on anything else. I think that
should fix your problem: it will stop the NFS client from interfering
with dirty pages on that inode's mapping.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current NFS client congestion logic is severly broken, it marks the
backing device congested during each nfs_writepages() call but doesn't
mirror this in nfs_writepage() which makes for deadlocks. Also it
implements its own waitqueue.
Replace this by a more regular congestion implementation that puts a cap on
the number of active writeback pages and uses the bdi congestion waitqueue.
Also always use an interruptible wait since it makes sense to be able to
SIGKILL the process even for mounts without 'intr'.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only error code which comes from the partition checkers is -1, when
they finds an EIO. As per the discussion, ENOMEM values were ignored,
as they might scare the users.
So, with the current code, we end up returning -1 and not EIO for the
ioctl() calls. Which doesn't give any clue to the user of what went
wrong.
Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
smbfs allocates rq_trans2buffer to handle server's multi transaction2 response
messages. As struct smb_request may be reused, rq_trans2buffer is freed
before each new request. However if last servers's response is not multi but
single trans2 message then new rq_trans2buffer is not allocated but last
smb_rput still tries to free it again.
To prevent this issue rq_trans2buffer pointer should be set to NULL after
kfree.
Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ecryptfs_d_release() first dereferences a pointer (via
ecryptfs_dentry_to_lower()) and then afterwards checks to see if the
pointer it just dereferenced is NULL (via ecryptfs_dentry_to_private()).
This patch moves all of the work done on the dereferenced pointer inside a
block governed by the condition that the pointer is non-NULL.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During modification of code to support UFS2 writing, the case with
"three indirect" blocks in truncate path was missed, this patch fixes
this situation.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fix behaviour in such test scenario:
lseek(fd, BIG_OFFSET)
write(fd, buf, sizeof(buf))
truncate(BIG_OFFSET)
truncate(BIG_OFFSET + sizeof(buf))
read(fd, buf...)
Because of if file big enough(BIG_OFFSET) we start allocate space by block,
ordinary block size > page size, so we should zeroize the rest of block in
truncate(except last framgnet, about which VFS should care), to not get
garbage, when we extend file.
Also patch corrects conversion from pointer to block to physical block number,
this helps in case of not common used UFS types.
And add to debug output inode number.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes "change blocks numbers on the fly" in case when "prepare
write page" is in the call chain, in this case some buffers may be not
uptodate and not mapped, we should care to map them and load from disk.
This patch was tested with:
- ufs regressions simple tests
- fsx-linux
- ltp(20060306)
- untar and build kernel
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch corrects work with time in UFS2 case.
1) According to UFS2 disk layout modification/access and so on "time"
should be hold in two variables one 64bit for seconds and another 32bit for
nanoseconds,
at now for some unknown reason we suppose that "inode time" holds in
three variables 32bit for seconds, 32bit for milliseconds and 32bit for
nanoseconds.
2) We set amount of nanoseconds in "VFS inode" to 0 during read, instead of
getting values from "on disk inode"(this should close
http://bugzilla.kernel.org/show_bug.cgi?id=7991).
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Bjoern Jacke <bjoern@j3e.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samba server now expects that clients which send the new
POSIX_PATH_OPERATIONS_CAP send all opens with this new
SMB - and expects that clients that could send the new
posix open/create but don't as indicating that they really
want Windows semantics on that handle (which allows Samba
to support clients which want to support both types of
behaviors on different handles on the same mount)
We will put this capability back in the SetFSInfo
negotiation with servers like Samba when the
new POSIXCreate (create/open/mkdir) code is finished.
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch (as869) reinstates the mutual exclusion between sysfs
attribute method calls and attribute unregistration. The
previously-reported deadlocks have been fixed, and this exclusion is
by far the simplest way to avoid races during driver unbinding.
The check for orphaned read-buffers has been moved down slightly, so
that the remainder of a partially-read buffer will still be available
to userspace even after the attribute has been unregistered.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (as868) adds a helper routine for device drivers that need
to set up a callback to perform some action in a different process's
context. This is intended for use by attribute methods that want to
unregister themselves or their parent device. Attribute method calls
are mutually exclusive with unregistration, so such actions cannot be
taken directly.
Two attribute methods are converted to use the new helper routine: one
for SCSI device deletion and one for System/390 ccwgroup devices.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/mfasheh/ocfs2:
ocfs2_dlm: Add missing locks in dlm_empty_lockres
ocfs2_dlm: Missing get/put lockres in dlm_run_purge_lockres
configfs: add missing mutex_unlock()
ocfs2: add some missing address space callbacks
ocfs2: Concurrent access of o2hb_region->hr_task was not locked
ocfs2: Proper cleanup in case of error in ocfs2_register_hb_callbacks()
not needed and actually breaks build on frv, while we are at it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
have it return the buffer it had allocated
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__dlm_lockres_unused() expects the caller to take the lockres spinlock.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
In some circumstances, this was causing us to reference freed memory.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
d_alloc() failure in configfs_register_subsystem() would fail to unlock
the mutex taken above. Reorganize the exit path to ensure the unlock
happens.
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Under load, OCFS2 would crash in invalidate_inode_pages2_range() because
invalidate_complete_page2() was unable to invalidate a page. It would
appear that JBD is holding on to the page. ext3 has a specific
->releasepage() handler to cover this case.
Steal ext3's ->releasepage(), ->invalidatepage(), and ->migratepage(), as
they appear completely appropriate for OCFS2.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>