Commit Graph

387 Commits

Author SHA1 Message Date
Mark Fasheh 363041a5f7 ocfs2: temporarily remove extent map caching
The code in extent_map.c is not prepared to deal with a subtree being
rotated between lookups. This can happen when filling holes in sparse files.
Instead of a lengthy patch to update the code (which would likely lose the
benefit of caching subtree roots), we remove most of the algorithms and
implement a simple path based lookup. A less ambitious extent caching scheme
will be added in a later patch.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:01:31 -07:00
Mark Fasheh dcd0538ff4 ocfs2: sparse b-tree support
Introduce tree rotations into the b-tree code. This will allow ocfs2 to
support sparse files. Much of the added code is designed to be generic (in
the ocfs2 sense) so that it can later be re-used to implement large
extended attributes.

This patch only adds the rotation code and does minimal updates to callers
of the extent api.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:44:03 -07:00
Mark Fasheh 6f16bf655c ocfs2: small cleanup of ocfs2_request_delete()
There are two checks in there (one for inode newness, one for other mounted
nodes) which are unnecessary, so remove them. The DLM will allow the trylock
in either case without any messaging overhead.

Removing these makes ocfs2_request_delete() a one liner function, so just
move the trylock out one level into ocfs2_query_inode_wipe().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:40:55 -07:00
Tiger Yang 68e2b740c4 ocfs2: remove unused code
Remove node messaging code that becomes unused with the delete inode vote
removal.

[Removed even more cruft which I spotted during review --Mark]

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:40:16 -07:00
Tiger Yang 500086300e ocfs2: Remove delete inode vote
Ocfs2 currently does cluster-wide node messaging to check the open state of
an inode during delete. This patch removes that mechanism in favor of an
inode cluster lock which is taken at shared read when an inode is first read
and dropped in clear_inode(). This allows a deleting node to test the
liveness of an inode by attempting to take an exclusive lock.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:39:48 -07:00
Mark Fasheh a9f5f70739 ocfs2: filter more error prints
We don't want to print anything at all in ocfs2_lookup() when getting an
error from ocfs2_iget() - it could be something as innocuous as a signal
being detected in the dlm.

ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can
return if the inode was deleted on another node.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:39:08 -07:00
Sunil Mushran bebe6f120b ocfs2: Replace panic() with emergency_restart() when fencing
We have noticed panic() hanging leading us to a situation in which
the node, while otherwise dead, is still disk heartbeating. This
leads to a hung cluster as the other nodes are waiting for this
node to stop disk heartbeating. This situation is only resolved
by power resetting the box.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:39:02 -07:00
Sunil Mushran 5d262cc7dd ocfs2: Silence compiler warnings
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:38:55 -07:00
Mark Fasheh be9e986b82 ocfs2: Local mounts should skip inode updates
We don't want the extent map and uptodate cache destruction in
ocfs2_meta_lock_update() on a local mount, so skip that.

This fixes several bugs with uptodate being cleared on buffers and extent
maps being corrupted.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:35:21 -07:00
Sunil Mushran 0d01af6e5d ocfs2_dlm: Call cond_resched_lock() once per hash bucket scan
In dlm_migrate_all_locks(), we currently call cond_resched_lock() after
processing each lockres in a hash bucket. Move it outside the loop so as to
call it only after the entire hash bucket has been processed.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:33:11 -07:00
Srinivas Eeda 756a1501dd ocfs2_dlm: fix race in dlm_remaster_locks
There is a possibility that dlm_remaster_locks could overwride node->state
with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the
node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting
stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock
spinlock.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:33:02 -07:00
Sunil Mushran 2f5bf1f2d0 ocfs2_dlm: Check for migrateable lockres in dlm_empty_lockres()
In dlm_migrate_lockres(), we check upfront whether the lockres is a
candidate for migration. This patch encapsulates that code in a separate
function so that dlm_empty_lockres() can also use it during umount. This
patch addresses the umount process spinning problem.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-26 16:51:15 -07:00
Sunil Mushran 78062cb2e5 ocfs2_dlm: Fix lockres ref counting bug
During umount, the umount thread migrates the lockres' and the dlm_thread
frees the empty lockres'. Due to a race, the reference counting on the
lockres goes awry leading to extra puts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-26 16:50:52 -07:00
Sunil Mushran b36c3f8498 ocfs2_dlm: Add missing locks in dlm_empty_lockres
__dlm_lockres_unused() expects the caller to take the lockres spinlock.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:35 -07:00
Sunil Mushran 3fca0894a4 ocfs2_dlm: Missing get/put lockres in dlm_run_purge_lockres
In some circumstances, this was causing us to reference freed memory.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:33 -07:00
Joel Becker 03f981cf2e ocfs2: add some missing address space callbacks
Under load, OCFS2 would crash in invalidate_inode_pages2_range() because
invalidate_complete_page2() was unable to invalidate a page.  It would
appear that JBD is holding on to the page.  ext3 has a specific
->releasepage() handler to cover this case.

Steal ext3's ->releasepage(), ->invalidatepage(), and ->migratepage(), as
they appear completely appropriate for OCFS2.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:16 -07:00
Joel Becker e6c352dbc0 ocfs2: Concurrent access of o2hb_region->hr_task was not locked
This means that a build-up and a teardown could race which would result in a
double-kthread_stop().

Protect the setting and clearing of hr_task with o2hb_live_lock, as it's not
a common thing and not performance critical.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:12 -07:00
Joel Becker c24f72cc7c ocfs2: Proper cleanup in case of error in ocfs2_register_hb_callbacks()
If ocfs2_register_hb_callbacks() succeeds on its first callback but fails
its second, it doesn't release the first on the way out. Fix that.

While we're at it, o2hb_unregister_callback() never returns anything but
0, so let's make it void.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:09 -07:00
Uwe Kleine-König 1b3c3714cb Fix typos concerning hierarchy
heirarchical, hierachical -> hierarchical
        heirarchy, hierachy -> hierarchy

Signed-off-by: Uwe Kleine-König <zeisberg@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-17 19:23:03 +01:00
Eric W. Biederman 0b4d414714 [PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name.  Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:59 -08:00
Eric W. Biederman 0e03036c97 [PATCH] sysctl: register the ocfs2 sysctl numbers
ocfs2 was did not have the binary number it uses under CTL_FS registered in
sysctl.h.  Register it to avoid future conflicts, and change the name of the
definition to be in line with the rest of the sysctl numbers.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:58 -08:00
Josef 'Jeff' Sipek ee9b6d61a2 [PATCH] Mark struct super_operations const
This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".

Compile tested with gcc & sparse.

Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:47 -08:00
Arjan van de Ven 92e1d5be91 [PATCH] mark struct inode_operations const 2
Many struct inode_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:46 -08:00
Arjan van de Ven 00977a59b9 [PATCH] mark struct file_operations const 6
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Philipp Reisner b559292e06 [PATCH] ocfs2 heartbeat: clean up bio submission code
As was already pointed out Mathieu Avila on Thu, 07 Sep 2006 03:15:25 -0700
that OCFS2 is expecting bio_add_page() to add pages to BIOs in an easily
predictable manner.

That is not true, especially for devices with own merge_bvec_fn().

Therefore OCFS2's heartbeat code is very likely to fail on such devices.

Move the bio_put() call into the bio's bi_end_io() function. This makes the
whole idea of trying to predict the behaviour of bio_add_page() unnecessary.
Removed compute_max_sectors() and o2hb_compute_request_limits().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:15:58 -08:00
Zhen Wei 925037bcba ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages
When there is a lot of multithreaded I/O usage, two threads can collide
while sending out a message to the other nodes. This is due to the lack of
locking between threads while sending out the messages.

When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux
kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects
itself by acquiring a lock at invocation by calling lock_sock().
tcp_sendmsg() then loops over the buffers in the iovec, allocating
associated sk_buff's and cache pages for use in the actual send. As it does
so, it pushes the data out to tcp for actual transmission. However, if one
of those allocation fails (because a large number of large sends is being
processed, for example), it must wait for memory to become available. It
does so by jumping to wait_for_sndbuf or wait_for_memory, both of which
eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory()
contains a code path that calls sk_wait_event(). Finally, sk_wait_event()
contains the call to release_sock().

The following patch adds a lock to the socket container in order to
properly serialize outbound requests.

From: Zhen Wei <zwei@novell.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:15:11 -08:00
Sunil Mushran 0dd82141b2 ocfs2_dlm: Add timeout to dlm join domain
Currently the ocfs2 dlm has no timeout during dlm join domain. While this is
not a problem in normal operation, this does become an issue if, say, the
other node is refusing to let the node join the domain because of a stuck
recovery. This patch adds a 90 sec timeout.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:10:39 -08:00
Sunil Mushran e4968476a9 ocfs2_dlm: Silence some messages during join domain
These messages can easily be activated using the mlog infrastructure
and don't need to be enabled by default.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:10:14 -08:00
Srinivas Eeda 1faf289454 ocfs2_dlm: disallow a domain join if node maps mismatch
There is a small window where a joining node may not see the node(s) that
just died but are still part of the domain. To fix this, we must disallow
join requests if the joining node has a different node map.

A new field node_map is added to dlm_query_join_request to send the current
nodes nodemap along with join request. On the receiving end the nodes that
are part of the cluster verifies if this new node sees all the nodes that
are still part of the cluster. They disallow the join if the maps mismatch.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:09:14 -08:00
Sunil Mushran f3f854648d ocfs2_dlm: Ensure correct ordering of set/clear refmap bit on lockres
Eventhough the set refmap bit message is sent before the clear refmap
message, currently there is no guarentee that the set message will be
handled before the clear. This patch prevents the clear refmap to be
processed while the node is sending assert master messages to other
nodes. (The set refmap message is sent as a response to the assert
master request).

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:08:13 -08:00
Sunil Mushran ab81afd30b ocfs2: Binds listener to the configured ip address
This patch binds the o2net listener to the configured ip address
instead of INADDR_ANY for security. Fixes oss.oracle.com bugzilla#814.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:07:49 -08:00
Kurt Hackel 3b8118cffa ocfs2_dlm: Calling post handler function in assert master handler
This patch prevents the dlm from sending the clear refmap message
before the set refmap. We use the newly created post function handler
routine to accomplish the task.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:07:24 -08:00
Kurt Hackel d74c9803a9 ocfs2: Added post handler callable function in o2net message handler
Currently o2net allows one handler function per message type. This
patch adds the ability to call another function to be called after
the handler has returned the message to the other node.

Handlers are now given the option of returning a context (in the form of a
void **) which will be passed back into the post message handler function.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:06:56 -08:00
Kurt Hackel 74aa25856c ocfs2_dlm: Cookies in locks not being printed correctly in error messages
The dlm encodes the node number and a sequence number in the lock cookie.
It also stores the cookie in the lockres in the big endian format to avoid
swapping 8 bytes on each lock request. The bug here was that it was assuming
the cookie to be in the cpu format when decoding it for printing the error
message. This patch swaps the bytes before the print.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:06:24 -08:00
Kurt Hackel 90aaaf1c23 ocfs2_dlm: Silence a failed convert
When the lockres is in migrate or recovery state, all convert requests
are denied with the appropriate error status that is handled on the
requester node. This patch silences the erroneous error message printed
on the master node.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:05:48 -08:00
Kurt Hackel a6fa36402a ocfs2_dlm: wake up sleepers on the lockres waitqueue
The dlm was not waking up threads waiting on the lockres wait queue,
waiting for the lockres to be no longer be in the DLM_LOCK_RES_IN_PROGRESS
and the DLM_LOCK_RES_MIGRATING states.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:05:19 -08:00
Kurt Hackel 28b72d9c92 ocfs2_dlm: Dlm dispatch was stopping too early
dlm_dispatch_work was not processing the queued up tasks at
the first sign of the node leaving the domain leading to not
only incompleted tasks but also a mismatch in the dlm refcnt.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:04:27 -08:00
Kurt Hackel 50635f15b3 ocfs2_dlm: Drop inflight refmap even if no locks found on the lockres
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:03:42 -08:00
Kurt Hackel 1cd04dbe33 ocfs2_dlm: Flush dlm workqueue before starting to migrate
This is to prevent the condition in which a previously queued
up assert master asserts after we start the migration. Now
migration ensures the workqueue is flushed before proceeding
with migrating the lock to another node. This condition is
typically encountered during parallel umounts.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:03:02 -08:00
Kurt Hackel e17e75ecb8 ocfs2_dlm: Fix migrate lockres handler queue scanning
The migrate lockres handler was only searching for its lock on
migrated lockres on the expected queue. This could be problematic
as the new master could have also issued a convert request
during the migration and thus moved the lock to the convert queue.
We now search for the lock on all three queues.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:02:40 -08:00
Kurt Hackel 71ac106243 ocfs2_dlm: Make dlmunlock() wait for migration to complete
dlmunlock() was not waiting for migration to complete before releasing locks
on locally mastered locks.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:01:49 -08:00
Kurt Hackel ddc09c8dda ocfs2_dlm: Fixes race between migrate and dirty
dlmthread was removing lockres' from the dirty list
and resetting the dirty flag before shuffling the list.
This patch retains the dirty state flag until the lists
are shuffled.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:00:57 -08:00
Adrian Bunk faf0ec9f13 [PATCH] fs/ocfs2/dlm/: make functions static
This patch makes some needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 11:53:31 -08:00
Kurt Hackel ba2bf21851 ocfs2_dlm: fix cluster-wide refcounting of lock resources
This was previously broken and migration of some locks had to be temporarily
disabled. We use a new (and backward-incompatible) set of network messages
to account for all references to a lock resources held across the cluster.
once these are all freed, the master node may then free the lock resource
memory once its local references are dropped.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 11:53:07 -08:00
Mark Fasheh e051fda4fd ocfs2: ocfs2_link() journal credits update
Commit 592282cf2e fixed some missing directory
c/mtime updates in part by introducing a dinode update in ocfs2_add_entry().
Unfortunately, ocfs2_link() (which didn't update the directory inode before)
is now missing a single journal credit. Fix this by doubling the number of
inode updates expected during hard link creation.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-01 12:03:19 -08:00
Mark Fasheh a8a75a20e9 [PATCH] ocfs2: fix thinko in ocfs2_backup_super_blkno()
Fix a bug which was introduced when I synced up ocfs2_fs.h with ocfs2-tools.
We can't do u64/u32 in kernel.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 14:53:27 -08:00
Mark Fasheh 50af94b14c ocfs2: Add backup superblock info to ocfs2_fs.h
This synchronizes us with recent ocfs2-tools changes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:20:10 -08:00
Mark Fasheh 6a1bd4a578 ocfs2: cleanup ocfs2_iget() errors
Get rid of some error prints in the ocfs2_iget() path from
ocfs2_get_dentry(). NFSD can easily cause us to read stale inodes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:19:12 -08:00
Mark Fasheh 592282cf2e ocfs2: Directory c/mtime update fixes
ocfs2 wasn't updating c/mtime on directories during dirent
creation/deletion. Fix ocfs2_unlink(), ocfs2_rename() and
__ocfs2_add_entry() by adding the proper code to update the struct inode and
push the change out to disk.

This helps rename/unlink on nfs exported file systems in particular as those
clients compare directory time values to avoid a full re-reading a directory
which hasn't changed.

ocfs2_rename() loses some superfluous error handling as a result of this
patch.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:18:49 -08:00
Mark Fasheh 72bce5078d ocfs2: Don't print errors when following symlinks
We shouldn't print errors returned from vfs_follow_link(). This was causing
spurious errors to show up in the logs.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:18:14 -08:00
Zhen Wei 92efc15241 ocfs2: export heartbeat thread pid via configfs
The patch allows the ocfs2 heartbeat thread to prioritize I/O which may
help cut down on spurious fencing. Most of this will be in the tools -
we can have a pid configfs attribute and let userspace (ocfs2_hb_ctl)
calls the ioprio_set syscall after starting heartbeat, but only cfq
scheduler supports I/O priorities now.

Signed-off-by: Zhen Wei <zwei@novell.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:40:32 -08:00
Mark Fasheh 7f4a2a97e3 ocfs2: always unmap in ocfs2_data_convert_worker()
Mmap-heavy clustered workloads were sometimes finding stale data on mmap
reads. The solution is to call unmap_mapping_range() on any down convert of
a data lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:38:59 -08:00
Mark Fasheh 6c2aad0567 ocfs2: ignore NULL vfsmnt in ocfs2_should_update_atime()
This can come from NFSD.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:38:32 -08:00
Mark Fasheh 564f8a3228 ocfs2: Allow direct I/O read past end of file
ocfs2_direct_IO_get_blocks() was incorrectly returning -EIO for a direct I/O
read whose start block was past the end of the file allocation tree. Fix
things so that we return a hole instead. do_direct_IO() will then notice
that the range start is past eof and return a short read.

While there, remove the unused vbo_max variable.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:38:08 -08:00
Mark Fasheh 0333394bff ocfs2: don't print error in ocfs2_permission()
Errors from generic_permission() can happen in valid cases and shouldn't be
reported.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:37:20 -08:00
Robert P. J. Day cd86128088 [PATCH] Fix numerous kcalloc() calls, convert to kzalloc()
All kcalloc() calls of the form "kcalloc(1,...)" are converted to the
equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect
ordering of the first two arguments are fixed.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:52 -08:00
Mark Fasheh 7e913c5360 [PATCH] ocfs2: relative atime support
Update ocfs2_should_update_atime() to understand the MNT_RELATIME flag and
to test against mtime / ctime accordingly.

[akpm@osdl.org: cleanups]
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Valerie Henson <val_henson@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:50 -08:00
Linus Torvalds 741441ab78 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  [patch 3/3] OCFS2 Configurable timeouts - Protocol changes
  [patch 2/3] OCFS2 Configurable timeouts
  [patch 1/3] OCFS2 - Expose struct o2nm_cluster
  ocfs2: Synchronize feature incompat flags in ocfs2_fs.h
  ocfs2: update mount option documentation
  ocfs2: local mounts
2006-12-12 10:21:01 -08:00
Andrew Beekhof 828ae6afbe [patch 3/3] OCFS2 Configurable timeouts - Protocol changes
Modify the OCFS2 handshake to ensure essential timeouts are configured
identically on all nodes.

Only allow changes when there are no connected peers

Improves the logic in o2net_advance_rx() which broke now that
sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg)

Included is the field for userspace-heartbeat timeout to avoid the need for
further protocol changes.

Uses a global spinlock to ensure the decisions to update configfs entries
are made on the correct value.  The region covered by the spinlock when
incrementing the counter is much larger as this is the more critical case.

Small cleanup contributed by Adrian Bunk <bunk@stusta.de>

Signed-off-by: Andrew Beekhof <abeekhof@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-11 14:26:44 -08:00
Josef Sipek d28c91740a [PATCH] struct path: convert ocfs2
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:48 -08:00
Jeff Mahoney b5dd80304d [patch 2/3] OCFS2 Configurable timeouts
Allow configuration of OCFS2 timeouts from userspace via configfs

Signed-off-by: Andrew Beekhof <abeekhof@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 18:13:20 -08:00
Andrew Beekhof 296b75ed6a [patch 1/3] OCFS2 - Expose struct o2nm_cluster
Subsequent patches (namely userspace heartbeat and configurable timeouts)
require access to the o2nm_cluster struct.  This patch does the necessary
shuffling.

Signed-off-by: Andrew Beekhof <abeekhof@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 18:13:01 -08:00
Mark Fasheh 8903901dbf ocfs2: Synchronize feature incompat flags in ocfs2_fs.h
These got a little bit out of date with ocfs2-tools, make things consistent
again. We reserve a flag for sparse allocation code as that's pretty close
to testable at this point.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 18:05:37 -08:00
Sunil Mushran c271c5c22b ocfs2: local mounts
This allows users to format an ocfs2 file system with a special flag,
OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it
will not use any cluster services, nor will it require a cluster
configuration, thus acting like a 'local' file system.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 17:37:53 -08:00
Alexey Dobriyan 4a6e617a4b [PATCH] fs/*: trivial vsnprintf() conversion
It would very lame to get buffer overflow via one of the following.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:35 -08:00
Christoph Lameter e18b890bb0 [PATCH] slab: remove kmem_cache_t
Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Christoph Lameter e6b4f8da3a [PATCH] slab: remove SLAB_NOFS
SLAB_NOFS is an alias of GFP_NOFS.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
David Howells 9db7372445 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/ata/libata-scsi.c
	include/linux/libata.h

Futher merge of Linus's head and compilation fixups.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 17:01:28 +00:00
Tiger Yang d38eb8db6a ocfs2: implement i_op->permission
Implement .permission() in ocfs2_file_iops, ocfs2_special_file_iops and
ocfs2_dir_iops.

This helps us avoid some multi-node races with mode change and vfs
operations.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:29:14 -08:00
Tiger Yang 25899deef4 ocfs2: update file system paths to set atime
Conditionally update atime in ocfs2_file_aio_read(), ocfs2_readdir() and
ocfs2_mmap().

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:58 -08:00
Tiger Yang 7f1a37e31f ocfs2: core atime update functions
This patch adds the core routines for updating atime in ocfs2.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:51 -08:00
Tiger Yang 8659ac25b4 ocfs2: Add splice support
Add splice read/write support in ocfs2.

ocfs2_file_splice_read/write are very similar to ocfs2_file_aio_read/write.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:46 -08:00
Mark Fasheh e88d0c9a41 ocfs2: Remove ocfs2_write_should_remove_suid()
Use should_remove_suid() instead.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:43 -08:00
Mark Fasheh 1fabe1481f ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t
This is mostly a search and replace as ocfs2_journal_handle is now no more
than a container for a handle_t pointer.

ocfs2_commit_trans() becomes very straight forward, and we remove some out
of date comments / code.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:28 -08:00
Mark Fasheh 65eff9ccf8 ocfs2: remove handle argument to ocfs2_start_trans()
All callers either pass in NULL directly, or a local variable that is
already set to NULL.

The internals of ocfs2_start_trans() get a nice cleanup as a result.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:23 -08:00
Mark Fasheh dae85832ff ocfs2: remove ocfs2_journal_handle journal field
It is no longer used.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:13 -08:00
Mark Fasheh 02dc1af44e ocfs2: pass ocfs2_super * into ocfs2_commit_trans()
This sets us up to remove handle->journal.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:08 -08:00
Mark Fasheh 4bcec1847a ocfs2: remove unused handle argument from ocfs2_meta_lock_full()
Now that this is unused and all callers pass NULL, we can safely remove it.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:05 -08:00
Mark Fasheh a301a27d71 ocfs2: make ocfs2_alloc_handle() static
This is no longer used outside of journal.c

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:00 -08:00
Mark Fasheh daf29e9cda ocfs2: remove unused ocfs2_handle_add_lock()
This gets us rid of a slab we no longer need, as well as removing the
majority of what's left on ocfs2_journal_handle.

ocfs2_commit_unstarted_handle() has no more real work to do, so remove that
function too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:58 -08:00
Mark Fasheh 02928a71ae ocfs2: remove unused ocfs2_handle_add_inode()
We can also delete the unused infrastructure which was once in place to
support this functionality. ocfs2_inode_private loses ip_handle and
ip_handle_list. ocfs2_journal_handle loses handle_list.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:55 -08:00
Mark Fasheh 85b9e783cb ocfs2: Don't allocate handle early in ocfs2_rename()
It isn't used until ocfs2_start_trans() anyway.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:53 -08:00
Mark Fasheh da5cbf2f9d ocfs2: don't use handle for locking in allocation functions
Instead we record our state on the allocation context structure which all
callers already know about and lifetime correctly. This means the
reservation functions don't need a handle passed in any more, and we can
also take it off the alloc context.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:49 -08:00
Mark Fasheh 8d5596c687 ocfs2: don't pass handle to ocfs2_meta_lock in ocfs2_rename()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:24 -08:00
Mark Fasheh 6d8fc40e63 ocfs2: don't pass handle to ocfs2_meta_lock in ocfs2_symlink()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:22 -08:00
Mark Fasheh 30a4f5e86b ocfs2: don't pass handle to ocfs2_meta_lock in ocfs2_unlink()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:19 -08:00
Mark Fasheh 5098c27bb8 ocfs2: don't pass handle to ocfs2_meta_lock() in orphan dir code
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:16 -08:00
Mark Fasheh 123a964340 ocfs2: don't pass handle to ocfs2_meta_lock() in ocfs2_link()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:14 -08:00
Mark Fasheh e3a8213859 ocfs2: don't pass handle to ocfs2_meta_lock() in ocfs2_mknod()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:12 -08:00
Mark Fasheh e08dc8b980 ocfs2: don't pass handle to ocfs2_meta_lock() in __ocfs2_flush_truncate_log()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:10 -08:00
Mark Fasheh 8898a5a58f ocfs2: don't pass handle to ocfs2_meta_lock() in localalloc.c
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:08 -08:00
Mark Fasheh c161f89be7 ocfs2: remove ocfs2_journal_handle flags field
Callers can set h_sync directly on the handle_t, whether a transaction has
been started or not can be determined via the existence of the handle_t on
the struct ocfs2_journal_handle.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:06 -08:00
Mark Fasheh 1fc581467e ocfs2: have ocfs2_extend_trans() take handle_t
No reason to use our wrapper struct in this function, so take the handle_t
directly.

Also fixes a bug where we were incorrectly setting the handle to NULL in
case of a failure from journal_restart()

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:04 -08:00
Mark Fasheh 01ddf1e186 ocfs2: remove unused ocfs2_journal_handle field
max_buffs was just being set and not actually used.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:00 -08:00
Mark Fasheh f5a923d1ba ocfs2: fix format warnings in dlm_alloc_pagevec()
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:26:56 -08:00
Adrian Bunk da66116eef [2.6 patch] make ocfs2_create_new_lock() static
This patch makes the needlessly global ocfs2_create_new_lock() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:26:50 -08:00
David Howells c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
Mark Fasheh e2057c5a63 ocfs2: cond_resched() in ocfs2_zero_extend()
The loop within ocfs2_zero_extend() can execute for a long time, causing
spurious soft lockup warnings.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-10-20 15:27:48 -07:00
Mark Fasheh 0effef776f ocfs2: fix page zeroing during simple extends
The page zeroing code was missing the region between old i_size and new
i_size for those extends that didn't actually require a change in space
allocation.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-10-20 15:27:26 -07:00
Sunil Mushran 711a40fcaa ocfs2: remove spurious d_count check in ocfs2_rename()
This was causing some folks to incorrectly get -EBUSY during rename.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-10-20 15:26:35 -07:00
Akinobu Mita 79cd22d3ac ocfs2: delete redundant memcmp()
This patch deletes redundant memcmp() while looking up in rb tree.

Signed-off-by: Akinbou Mita <akinobu.mita@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-10-20 15:26:06 -07:00
Alexey Dobriyan 2ecd05ae68 [PATCH] fs/*: use BUILD_BUG_ON
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11 11:14:23 -07:00
Mark Fasheh 17ff785691 [PATCH] r/o bind mounts: clean up OCFS2 nlink handling
OCFS2 does some operations on i_nlink, then reverts them if some of its
operations fail to complete.  This does not fit in well with the
drop_nlink() logic where we expect i_nlink to stay at zero once it gets
there.

So, delay all of the nlink operations until we're sure that the operations
have completed.  Also, introduce a small helper to check whether an inode
has proper "unlinkable" i_nlink counts no matter whether it is a directory
or regular inode.

This patch is broken out from the others because it does contain some
logical changes.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen d8c76e6f45 [PATCH] r/o bind mount prepwork: inc_nlink() helper
This is mostly included for parity with dec_nlink(), where we will have some
more hooks.  This one should stay pretty darn straightforward for now.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen 9a53c3a783 [PATCH] r/o bind mounts: unlink: monitor i_nlink
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.

We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.

So, add a little helper function to do the decrements.  We'll tie into it in a
bit to note when i_nlink hits zero.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Badari Pulavarty 027445c372 [PATCH] Vectorize aio_read/aio_write fileop methods
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Theodore Ts'o ba52de123d [PATCH] inode-diet: Eliminate i_blksize from the inode structure
This eliminates the i_blksize field from struct inode.  Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.

Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.

[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:18 -07:00
Theodore Ts'o 8e18e2941c [PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_private
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86.  (It would be more on an x86_64 system).  This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).

This patch:

The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer.  Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer.  This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.

[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:17 -07:00
Alexey Dobriyan 1a1d92c10d [PATCH] Really ignore kmem_cache_destroy return value
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:

	(void)kmem_cache_destroy(cache);

* Those who check it printk something, however, slab_error already printed
  the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
  low-level filesystem driver should make. Converted to ignore.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:10 -07:00
Mark Fasheh 0d5dc6c2dd ocfs2: Teach ocfs2_drop_lock() to use ->set_lvb() callback
With this, we don't need to pass an additional struct with function pointer.

Now that the callbacks are fully used, comment the remaining API.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:48 -07:00
Mark Fasheh b5e500e23e ocfs2: Remove ->unblock lockres operation
Have ocfs2_process_blocked_lock() call ocfs2_generic_unblock_lock(), which
gets to be ocfs2_unblock_lock() now that it's the only possible unblock
function.

Remove the ->unblock() callback from the structure, and all lock type
specific unblock functions.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:48 -07:00
Mark Fasheh cc567d89b3 ocfs2: move downconvert worker to lockres ops
This way lock types don't have to manually pass it to
ocfs2_generic_unblock_lock().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:48 -07:00
Mark Fasheh 08280f11de ocfs2: Remove unused dlmglue functions
The meta data unblocking code no longer needs ocfs2_do_unblock_meta() or
ocfs2_can_downconvert_meta_lock(), so remove them.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:48 -07:00
Mark Fasheh 810d5aeba1 ocfs2: Have the metadata lock use generic dlmglue functions
Fill in the ->check_downconvert and ->set_lvb callbacks with meta data
specific operations and switch ocfs2_unblock_meta() to call
ocfs2_generic_unblock_lock()

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:47 -07:00
Mark Fasheh 5ef0d4ea08 ocfs2: Add ->set_lvb callback in dlmglue
This allows a lock type to set the value block before downconvert.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:47 -07:00
Mark Fasheh 16d5b9567a ocfs2: Add ->check_downconvert callback in dlmglue
This will allow lock types to force a requeue of a lock downconvert.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:47 -07:00
Mark Fasheh f7fbfdd1fc ocfs2: Check for refreshing locks in generic unblock function
Tidy up the exit path a bit too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:47 -07:00
Mark Fasheh b80fc012e0 ocfs2: don't unconditionally pass LVB flags
Allow a lock type to specifiy whether it makes use of the LVB. The only type
which does this right now is the meta data lock. This should save us some
space on network messages since they won't have to needlessly transmit value
blocks.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:47 -07:00
Mark Fasheh aa2623ad80 ocfs2: combine inode and generic blocking AST functions
There is extremely little difference between the two now. We can remove the
callback from ocfs2_lock_res_ops as well.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:46 -07:00
Mark Fasheh 54a7e7552e ocfs2: Add ->get_osb() dlmglue locking operation
Will be used to find the ocfs2_super structure from a given lockres.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:46 -07:00
Mark Fasheh 2a45f2d13e ocfs2: remove ->unlock_ast() callback from ocfs2_lock_res_ops
This was always defined to the same function in all locks, so clean things
up by removing and passing ocfs2_unlock_ast() directly to the DLM.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:46 -07:00
Mark Fasheh e92d57df27 ocfs2: combine inode and generic AST functions
There is extremely little difference between the two now. We can remove the
callback from ocfs2_lock_res_ops as well.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:46 -07:00
Mark Fasheh f625c9793b ocfs2: Clean up lock resource refresh flags
Use of the refresh mechanism is lock-type wide, so move knowledge of that to
the ocfs2_lock_res_ops structure.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:46 -07:00
Mark Fasheh 24c19ef404 ocfs2: Remove i_generation from inode lock names
OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
Typically, i_generation is encoded in the lock name so that a deleted inode
on and a new one in the same block don't share the same lvb.

Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
is potentially thrown away as soon as the meta data lock is taken - we
cannot encode the lock name without first knowing i_generation, which
requires a disk read.

This patch encodes i_generation in the inode meta data lvb, and removes the
value from the inode meta data lock name. This way, the read can be covered
by a lock, and at the same time we can distinguish between an up to date and
a stale LVB.

This will help cold-cache stat(2) performance in particular.

Since this patch changes the protocol version, we take the opportunity to do
a minor re-organization of two of the LVB fields.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:46 -07:00
Mark Fasheh f9e2d82e63 ocfs2: Encode i_generation in the meta data lvb
When i_generation is removed from the lockname, this will help us determine
whether a meta data lvb has information that is in sync with the local
struct inode.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:45 -07:00
Mark Fasheh 4d3b83f736 ocfs2: Free up some space in the lvb
lvb_version doesn't need to be a whole 32 bits. Make it an 8 bit field to
free up some space. This should be backwards compatible until we use one of
the fields, in which case we'd bump the lvb version anyway.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:45 -07:00
Mark Fasheh 0027dd5bc2 ocfs2: Remove special casing for inode creation in ocfs2_dentry_attach_lock()
We can't use LKM_LOCAL for new dentry locks because an unlink and subsequent
re-create of a name/inode pair may result in the lock still being mastered
somewhere in the cluster.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:45 -07:00
Mark Fasheh 1ba9da2ffa ocfs2: manually d_move() during ocfs2_rename()
Make use of FS_RENAME_DOES_D_MOVE to avoid a race condition that can occur
during ->rename() if we d_move() outside of the parent directory cluster
locks, and another node discovers the new name (created during the rename)
and unlinks it. d_move() will unconditionally rehash a dentry - which will
leave stale data in the system.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:45 -07:00
Mark Fasheh 1390334b4c ocfs2: Remove the dentry vote
This is unused now.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:43 -07:00
Mark Fasheh 379dfe9d0d ocfs2: Hook rest of the file system into dentry locking API
Actually replace the vote calls with the new dentry operations. Make any
necessary adjustments to get the scheme to work.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:43 -07:00
Mark Fasheh 80c05846f6 ocfs2: Add dentry tracking API
Replace the dentry vote mechanism with a cluster lock which covers a set
of dentries. This allows us to force d_delete() only on nodes which actually
care about an unlink.

Every node that does a ->lookup() gets a read only lock on the dentry, until
an unlink during which the unlinking node, will request an exclusive lock,
forcing the other nodes who care about that dentry to d_delete() it. The
effect is that we retain a very lightweight ->d_revalidate(), and at the
same time get to make large improvements to the average case performance of
the ocfs2 unlink and rename operations.

This patch adds the higher level API and the dentry manipulation code.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:43 -07:00
Mark Fasheh d680efe9d8 ocfs2: Add new cluster lock type
Replace the dentry vote mechanism with a cluster lock which covers a set
of dentries. This allows us to force d_delete() only on nodes which actually
care about an unlink.

Every node that does a ->lookup() gets a read only lock on the dentry, until
an unlink during which the unlinking node, will request an exclusive lock,
forcing the other nodes who care about that dentry to d_delete() it. The
effect is that we retain a very lightweight ->d_revalidate(), and at the
same time get to make large improvements to the average case performance of
the ocfs2 unlink and rename operations.

This patch adds the cluster lock type which OCFS2 can attach to
dentries.  A small number of fs/ocfs2/dcache.c functions are stubbed
out so that this change can compile.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:42 -07:00
Mark Fasheh f0681062b8 ocfs2: Update dlmglue for new dlmlock() API
File system lock names are very regular right now, so we really only need to
pass an extra parameter to dlmlock().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:42 -07:00
Mark Fasheh ea5b3a187e ocfs2: Update dlmfs for new dlmlock() API
We just need to add a namelen field to the user_lock_res structure, and
update a few debug prints. Instead of updating all debug prints, I took the
opportunity to remove a few that are likely unnecessary these days.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:42 -07:00
Mark Fasheh 3384f3df5e ocfs2: Allow binary names in the DLM
The OCFS2 DLM uses strlen() to determine lock name length, which excludes
the possibility of putting binary values in the name string. Fix this by
requiring that string length be passed in as a parameter.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:42 -07:00
Mark Fasheh e2c73698af ocfs2: Silence dlm error print
An AST can be delivered via the network after a lock has been removed, so no
need to print an error when we see that.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:41 -07:00
Mark Fasheh eb35746ca5 ocfs2: Remove overzealous BUG_ON()
The truncate code was never supposed to BUG() on an allocator it doesn't
know about, but rather to ignore it. Right now, this does nothing, but when
we change our allocation paths to use all suballocator files, this will
allow current versions of the fs module to work fine.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20 16:00:54 -07:00
Mark Fasheh f12033d206 ocfs2: Don't print on unknown remote blocking call
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20 16:00:36 -07:00
Mark Fasheh aa9588741d ocfs2: implement directory read-ahead
Uptodate.c now knows about read-ahead buffers. Use some more aggressive
logic in ocfs2_readdir().

The two functions which currently use directory read-ahead are
ocfs2_find_entry() and ocfs2_readdir().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20 15:53:40 -07:00
Mark Fasheh e0b4096d34 ocfs2: properly update i_mtime on buffered write
We weren't always updating i_mtime on writes, so fix ocfs2_commit_write() to
handle this.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
2006-09-20 15:53:05 -07:00
Tiger Yang 0f62de2c9c ocfs2: Fix directory link count checks in ocfs2_link()
Remove the redundant "i_nlink >= OCFS2_LINK_MAX" check and adds an unlinked
directory check in ocfs2_link().

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20 15:52:27 -07:00
Mark Fasheh a663e30513 ocfs2: move nlink check in ocfs2_mknod()
The dir nlink check in ocfs2_mknod() was being done outside of the cluster
lock, which means we could have been checking against a stale version of the
inode. Fix this by doing the check after the cluster lock instead.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20 15:52:08 -07:00
Mathieu Avila 471e3f5728 ocfs2: Fix heartbeat sector calculation
This fixes things for devices which set max_sectors to 8.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20 15:50:53 -07:00
Adrian Bunk 2d5625181f [PATCH] fs/ocfs2/ioctl.c should #include "ioctl.h"
Every file should #include the headers containing the prototypes for its
global functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20 15:49:33 -07:00
Herbert Poetzl ca4d147e62 ocfs2: add ext2 attributes
Support immutable, and other attributes.

Some renaming and other minor fixes done by myself.

Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20 15:48:39 -07:00
Mark Fasheh 883d4cae4a ocfs2: allocation hints
Record the most recently used allocation group on the allocation context, so
that subsequent allocations can attempt to optimize for contiguousness.
Local alloc especially should benefit from this as the current chain search
tends to let it spew across the disk.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-08-07 11:07:01 -07:00
Mark Fasheh 7bf72edee6 ocfs2: better group descriptor consistency checks
Try to catch corrupted group descriptors with some stronger checks placed in
a couple of strategic locations. Detect a failed resizefs and refuse to
allocate past what bitmap i_clusters allows.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-08-07 11:04:38 -07:00
Mark Fasheh 101ebf256d ocfs2: limit cluster bitmap information saved at mount
We were storing cluster count on the ocfs2_super structure, but never
actually using it so remove that. Also, we don't want to populate the
uptodate cache with the unlocked block read - it is technically safe as is,
but we should change it for correctness.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-08-07 11:04:07 -07:00
Adrian Bunk 9acd72f424 [PATCH] fs/ocfs2/dlm/dlmmaster.c: unexport dlm_migrate_lockres
This patch removes the unused EXPORT_SYMBOL_GPL(dlm_migrate_lockres).

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-08-07 10:55:50 -07:00
Kurt Hackel 34e3d18037 ocfs2: fix check for locally granted state during dlmunlock()
If a process requests a lock cancel but the lock has been remotely granted
already then there is no need to send the cancel message.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-08-07 10:55:22 -07:00