Currently we do write coalescing in a very inefficient manner: one pass in
generic_writepages() in order to lock the pages for writing, then one pass
in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather
the locked pages for coalescing into RPC requests of size "wsize".
In fact, it turns out there is actually a deadlock possible here since we
only start I/O on the second pass. If the user signals the process while
we're in nfs_sync_mapping_wait(), for instance, then we may exit before
starting I/O on all the requests that have been queued up.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Do the coalescing of read requests into block sized requests at start of
I/O as we scan through the pages instead of going through a second pass.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It is redundant, and will interfere with the call to
balance_dirty_pages_ratelimited_nr in generic_file_write().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfs statfs function returns a success code on error, and fills the
output buffer with invalid values. The attached patch makes it return a
correct error code instead.
Signed-off-by: Amnon Aaronsohn <amnonaar@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(Modified patch to reinstate the dprintk())
We're getting lockdep warnings due to a post-2.6.21-rc7 bugfix.
The xattr_sem can never be taken in the manner described. Internal inodes
are protected by I_PRIVATE. Add the appropriate annotation.
Cc: <stable@kernel.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
[PATCH] elevator: elv_list_lock does not need irq disabling
[BLOCK] Don't pin lots of memory in mempools
cfq-iosched: speedup cic rb lookup
ll_rw_blk: add io_context private pointer
cfq-iosched: get rid of cfqq hash
cfq-iosched: tighten queue request overlap condition
cfq-iosched: improve sync vs async workloads
cfq-iosched: never allow an async queue idling
cfq-iosched: get rid of ->dispatch_slice
cfq-iosched: don't pass unused preemption variable around
cfq-iosched: get rid of ->cur_rr and ->cfq_list
cfq-iosched: slice offset should take ioprio into account
[PATCH] cfq-iosched: style cleanups and comments
cfq-iosched: sort IDLE queues into the rbtree
cfq-iosched: sort RT queues into the rbtree
[PATCH] cfq-iosched: speed up rbtree handling
cfq-iosched: rework the whole round-robin list concept
cfq-iosched: minor updates
cfq-iosched: development update
cfq-iosched: improve preemption for cooperating tasks
Currently we scale the mempool sizes depending on memory installed
in the machine, except for the bio pool itself which sits at a fixed
256 entry pre-allocation.
There's really no point in "optimizing" this OOM path, we just need
enough preallocated to make progress. A single unit is enough, lets
scale it down to 2 just to be on the safe side.
This patch saves ~150kb of pinned kernel memory on a 32-bit box.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.infradead.org/mtd-2.6: (46 commits)
[MTD] [MAPS] drivers/mtd/maps/ck804xrom.c: convert pci_module_init()
[MTD] [NAND] CM-x270 MTD driver
[MTD] [NAND] Wrong calculation of page number in nand_block_bad()
[MTD] [MAPS] fix plat-ram printk format
[JFFS2] Fix compr_rubin.c build after include file elimination.
[JFFS2] Handle inodes with only a single metadata node with non-zero isize
[JFFS2] Tidy up licensing/copyright boilerplate.
[MTD] [OneNAND] Exit loop only when column start with 0
[MTD] [OneNAND] Fix access the past of the real oobfree array
[MTD] [OneNAND] Update Samsung OneNAND official URL
[JFFS2] Better fix for all-zero node headers
[JFFS2] Improve read_inode memory usage, v2.
[JFFS2] Improve failure mode if inode checking leaves unchecked space.
[JFFS2] Fix cross-endian build.
[MTD] Finish conversion mtd_blkdevs to use the kthread API
[JFFS2] Obsolete dirent nodes immediately on unlink, where possible.
Use menuconfig objects: MTD
[MTD] mtd_blkdevs: Convert to use the kthread API
[MTD] Fix fwh_lock locking
[JFFS2] Speed up mount for directly-mapped NOR flash
...
Fixes for various arch compilation problems:
(*) Missing module exports.
(*) Variable name collision when rxkad and af_rxrpc both built in
(rxrpc_debug).
(*) Large constant representation problem (AFS_UUID_TO_UNIX_TIME).
(*) Configuration dependencies.
(*) printk() format warnings.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the wakeup transitions after a VLocation record update completes
one way or another. This builds on Dave Miller's partial fix.
Also move wakeups outside the spinlocked sections.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (46 commits)
dev_dbg: check dev_dbg() arguments
drivers/base/attribute_container.c: use mutex instead of binary semaphore
mod_sysfs_setup() doesn't return errno when kobject_add_dir() failure occurs
s2ram: add arch irq disable/enable hooks
define platform wakeup hook, use in pci_enable_wake()
security: prevent permission checking of file removal via sysfs_remove_group()
device_schedule_callback() needs a module reference
s390: cio: Delay uevents for subchannels
sysfs: bin.c printk fix
Driver core: use mutex instead of semaphore in DMA pool handler
driver core: bus_add_driver should return an error if no bus
debugfs: Add debugfs_create_u64()
the overdue removal of the mount/umount uevents
kobject: Comment and warning fixes to kobject.c
Driver core: warn when userspace writes to the uevent file in a non-supported way
Driver core: make uevent-environment available in uevent-file
kobject core: remove rwsem from struct subsystem
qeth: Remove usage of subsys.rwsem
PHY: remove rwsem use from phy core
IEEE1394: remove rwsem use from ieee1394 core
...
Prevent permission checking from being performed when the kernel wants to
unconditionally remove a sysfs group, by introducing an kernel-only variant
of lookup_one_len(), lookup_one_len_kern().
Additionally, as sysfs_remove_group() does not check the return value of
the lookup before using it, a BUG_ON has been added to pinpoint the cause
of any problems potentially caused by this (and as a form of annotation).
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Nagendra Singh Tomar <nagendra_tomar@adaptec.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as896b) fixes an oversight in the design of
device_schedule_callback(). It is necessary to acquire a reference to the
module owning the callback routine, to prevent the module from being
unloaded before the callback can run.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Satyam Sharma <satyam.sharma@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
fs/sysfs/bin.c: In function 'read':
fs/sysfs/bin.c:77: warning: format '%zd' expects type 'signed size_t', but argument 4 has type 'int'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I went to use this the other day, only to find it didn't exist.
It's a straight copy of the debugfs u32 code, then s/u32/u64/. A quick
test shows it seems to be working.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch contains the overdue removal of the mount/umount uevents.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of git://git.infradead.org/ubi-2.6:
UBI: remove unused variable
UBI: add me to MAINTAINERS
JFFS2: add UBI support
UBI: Unsorted Block Images
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (27 commits)
ocfs2: Cache extent records
ocfs2: Remember rw lock level during direct io
ocfs2: Fix up i_blocks calculation to know about holes
ocfs2: Fix extent lookup to return true size of holes
ocfs2: Read from an unwritten extent returns zeros
ocfs2: make room for unwritten extents flag
ocfs2: Use own splice write actor
ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate()
[PATCH] Turn do_sync_file_range() into do_sync_mapping_range()
ocfs2: zero tail of sparse files on truncate
ocfs2: Teach ocfs2_get_block() about holes
ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write()
ocfs2: teach ocfs2_file_aio_write() about sparse files
ocfs2: Turn off shared writeable mmap for local files systems with holes.
ocfs2: abstract out allocation locking
ocfs2: teach extend/truncate about sparse files
ocfs2: temporarily remove extent map caching
ocfs2: sparse b-tree support
ocfs2: small cleanup of ocfs2_request_delete()
ocfs2: remove unused code
...
This patch make JFFS2 able to work with UBI volumes via the emulated MTD
devices which are directly mapped to these volumes.
Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
cmpxchg() is not available on every processor so can't
be used in generic code.
Replace with spinlock protection on the ->state changes,
wakeups, and wait loops.
Add what appears to be a missing wakeup on transition
to AFS_VL_VALID state in afs_vlocation_updater().
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the create, link, symlink, unlink, mkdir, rmdir and
rename VFS operations to the in-kernel AFS filesystem.
Also:
(1) Fix dentry and inode revalidation. d_revalidate should only look at
state of the dentry. Revalidation of the contents of an inode pointed to
by a dentry is now separate.
(2) Fix afs_lookup() to hash negative dentries as well as positive ones.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the CB.InitCallBackState3 operation for the fileserver to
call. This reduces the amount of network traffic because if this op
is aborted, the fileserver will then attempt an CB.InitCallBackState
operation.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the CB.GetCapabilities operation with which the fileserver can
ask the client for the following information:
(1) The list of network interfaces it has available as IPv4 address + netmask
plus the MTUs.
(2) The client's UUID.
(3) The extended capabilities of the client, for which the only current one
is unified error mapping (abort code interpretation).
To support this, the patch adds the following routines to AFS:
(1) A function to iterate through all the network interfaces using RTNETLINK
to extract IPv4 addresses and MTUs.
(2) A function to iterate through all the network interfaces using RTNETLINK
to pull out the MAC address of the lowest index interface to use in UUID
construction.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add security support to the AFS filesystem. Kerberos IV tickets are added as
RxRPC keys are added to the session keyring with the klog program. open() and
other VFS operations then find this ticket with request_key() and either use
it immediately (eg: mkdir, unlink) or attach it to a file descriptor (open).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle multiple mounts of an AFS superblock correctly, checking to see
whether the superblock is already initialised after calling sget()
rather than just unconditionally stamping all over it.
Also delete the "silent" parameter to afs_fill_super() as it's not
used and can, in any case, be obtained from sb->s_flags.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Delete the old RxRPC code as it's now no longer used.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up the AFS sources.
Also remove references to AFS keys. RxRPC keys are used instead.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The extent map code was ripped out earlier because of an inability to deal
with holes. This patch adds back a simpler caching scheme requiring far less
code.
Our old extent map caching was designed back when meta data block caching in
Ocfs2 didn't work very well, resulting in many disk reads. These days our
metadata caching is much better, resulting in no un-necessary disk reads. As
a result, extent caching doesn't have to be as fancy, nor does it have to
cache as many extents. Keeping the last 3 extents seen should be sufficient
to give us a small performance boost on some streaming workloads.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cluster locking might have been redone because a direct write won't
complete, so this needs to be reflected in the iocb.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Older file systems which didn't support holes did a dumb calculation of
i_blocks based on i_size. This is no longer accurate, so fix things up to
take actual allocation into account.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Initially, we had wired things to return a size '1' of holes. Cook up a
small amount of code to find the next extent and calculate the number of
clusters between the virtual offset and the next allocated extent.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Return an optional extent flags field from our lookup functions and wire up
callers to treat unwritten regions as holes for the purpose of returning
zeros to the user.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Due to the size of our group bitmaps, we'll never have a leaf node extent
record with more than 16 bits worth of clusters. Split e_clusters up so that
leaf nodes can get a flags field where we can mark unwritten extents.
Interior nodes whose length references all the child nodes beneath it can't
split their e_clusters field, so we use a union to preserve sizing there.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We need to fill holes during a splice write. Provide our own splice write
actor which can call ocfs2_file_buffered_write() with a splice-specific
callback.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Do this instead of filemap_fdatawrite() - this way we sync only the
range between i_size and the cluster boundary.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
do_sync_file_range() accepts a file * from which it takes an address_space to
sync. Abstract out the bulk of the function into do_sync_mapping_range()
which takes the address_space directly. This way callers who want to sync an
address_space directly can take advantage of the functionality provided.
do_sync_file_range() is preserved as a small wrapper around
do_sync_mapping_range().
Ocfs2 in particular would like to use this to initiate a sync of a specific
inode range during truncate, where a file * may not be available.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Since we don't zero on extend anymore, truncate needs to be fixed up to zero
the part of a file between i_size and and end of it's cluster. Otherwise a
subsequent extend could expose bad data.
This introduced a new helper, which can be used in ocfs2_write().
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
ocfs2_get_block() didn't understand sparse files, fix that. Also remove some
code that isn't really useful anymore. We can fix up
ocfs2_direct_IO_get_blocks() at the same time.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock()
because allocating writes will require zeroing of pages adjacent to the I/O
for cluster sizes greater than page size.
Implement a custom file write here, which can order page locks for zeroing.
This also has the advantage that cluster locks can easily be ordered outside
of the page locks.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>