There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.
Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch fixes regression introduced by
"8bdc81c jffs2: get rid of jffs2_sync_super". We submit a delayed work in order
to make sure the write-buffer is synchronized at some point. But we do not
flush it when we unmount, which causes an oops when we unmount the file-system
and then the delayed work is executed.
This patch fixes the issue by adding a "cancel_delayed_work_sync()" infocation
in the '->sync_fs()' handler. This will make sure the delayed work is canceled
on sync, unmount and re-mount. And because VFS always callse 'sync_fs()' before
unmounting or remounting, this fixes the issue.
Reported-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: stable@vger.kernel.org [3.5+]
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Currently JFFS2 file-system maps the VFS "superblock" abstraction to the
write-buffer. Namely, it uses VFS services to synchronize the write-buffer
periodically.
The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblock using the '->write_super()' call-back. But the
problem with this thread is that it wastes power by waking up the system every
5 seconds no matter what. So we want to kill it completely and thus, we need to
make file-systems to stop using the '->write_super' VFS service, and then
remove it together with the kernel thread.
This patch switches the JFFS2 write-buffer management from
'->write_super()'/'->s_dirt' to a delayed work. Instead of setting the 's_dirt'
flag we just schedule a delayed work for synchronizing the write-buffer.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We do not need to call 'jffs2_write_super()' on sync. This function
causes a GC pass to make sure the current contents is pushed out with
the data which we already have on the media.
But this is not needed on unmount and only slows sync down unnecessarily.
It is enough to just sync the write-buffer.
This call was added by one of the generic VFS rework patch-sets,
see d579ed00aa.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We do not need to call 'jffs2_write_super()' on unmount. This function
causes a GC pass to make sure the current contents is pushed out with
the data which we already have on the media.
But this is not needed on unmount and only slows unmount down unnecessarily.
It is enough to just sync the write-buffer.
This call was added by one of the generic VFS rework patch-sets,
see 8c85e12512.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We do not need 'lock_super()'/'unlock_super()' in JFFS2 - kill them.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Add a new rp_size= parameter which creates a "reserved pool" of disk
space which can only be used by root. Other users are not permitted
to write to disk when the available space is less than the pool size.
Based on original code by Artem Bityutskiy in
https://dev.laptop.org/ticket/5317
[dwmw2: use capable(CAP_SYS_RESOURCE) not uid/gid check, fix debug prints]
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use pr_fmt to prefix KBUILD_MODNAME to appropriate logging messages.
Remove now unnecessary internal prefixes from formats.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Use the more current logging style.
Coalesce formats, align arguments.
Convert uses of embedded function names to %s, __func__.
A couple of long line checkpatch errors I don't care about exist.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
D1 and D2 macros are mostly uses to emit debugging messages.
Convert the logging uses of D1 & D2 to jffs2_dbg(level, fmt, ...)
to be a bit more consistent style with the rest of the kernel.
All jffs2_dbg output is now at KERN_DEBUG where some of
the previous uses were emitted at various KERN_<LEVEL>s.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iEYEABECAAYFAk8Mq/MACgkQdwG7hYl686PeFACfZCgbdDWD9A/JL+i1RMfExVu6
Pi0An3Hmc3PTCp0yQ21KtcKhpF9CAMEu
=NfuL
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6
MTD pull for 3.3
* tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6: (113 commits)
mtd: Fix dependency for MTD_DOC200x
mtd: do not use mtd->block_markbad directly
logfs: do not use 'mtd->block_isbad' directly
mtd: introduce mtd_can_have_bb helper
mtd: do not use mtd->suspend and mtd->resume directly
mtd: do not use mtd->lock, unlock and is_locked directly
mtd: do not use mtd->sync directly
mtd: harmonize mtd_writev usage
mtd: do not use mtd->lock_user_prot_reg directly
mtd: mtd->write_user_prot_reg directly
mtd: do not use mtd->read_*_prot_reg directly
mtd: do not use mtd->get_*_prot_info directly
mtd: do not use mtd->read_oob directly
mtd: mtdoops: do not use mtd->panic_write directly
romfs: do not use mtd->get_unmapped_area directly
mtd: do not use mtd->get_unmapped_area directly
mtd: do use mtd->point directly
mtd: introduce mtd_has_oob helper
mtd: mtdcore: export symbols cleanup
mtd: clean-up the default_mtd_writev function
...
Fix up trivial edit/remove conflict in drivers/staging/spectra/lld_mtd.c
This patch teaches 'mtd_sync()' to do nothing when the MTD driver does
not have the '->sync()' method, which allows us to remove all direct
'mtd->sync' accesses.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
..to allow forcing of either compression scheme. This will override
compiled-in defaults. jffs2_compress is reworked a bit, as the lzo/zlib
override shares lots of code w/ the PRIORITY mode.
v2: update show_options accordingly.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Currently jffs2 has compile-time constants (and .config options)
controlling whether or not the various compression/decompression
drivers are built in and enabled. This is fine for embedded
systems, but it clashes with distribution kernels. Distro kernels
tend to turn on everything; this causes OpenFirmware to fall
over, as it understands ZLIB-compressed inodes. Booting a kernel
that has LZO compression enabled, writing to the boot partition,
and then rebooting causes OFW to fail to read the kernel from
the filesystem. This is because LZO compression has priority
when writing new data to jffs2, if LZO is enabled.
This patch adds mount option parsing, and a single supported
option ("compr=none"). This adds the flexibility of being
able to specify which compressor overrides on a per-superblock
basis. For now, we can simply disable compression;
additional flexibility coming soon.
v2: kill some printks, and implement show_options as suggested
by Artem Bityutskiy.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
RCU free the struct inode. This will allow:
- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.
The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.
In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.
The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
The BKL is only used in put_super, fill_super and remount_fs that are all
three protected by the superblocks s_umount rw_semaphore. Therefore it is
safe to remove the BKL entirely.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: David Woodhouse <dwmw2@infradead.org>
This patch is a preparation necessary to remove the BKL from do_new_mount().
It explicitly adds calls to lock_kernel()/unlock_kernel() around
get_sb/fill_super operations for filesystems that still uses the BKL.
I've read through all the code formerly covered by the BKL inside
do_kern_mount() and have satisfied myself that it doesn't need the BKL
any more.
do_kern_mount() is already called without the BKL when mounting the rootfs
and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
afs_mntpt_follow_link(). Both later functions are actually the filesystems
follow_link inode operation. vfs_kern_mount() is calling the specified
get_sb function and lets the filesystem do its job by calling the given
fill_super function.
Therefore I think it is safe to push down the BKL from the VFS to the
low-level filesystems get_sb/fill_super operation.
[arnd: do not add the BKL to those file systems that already
don't use it elsewhere]
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Christoph Hellwig <hch@infradead.org>
This is the culmination of this sequence of patches. By moving the block
erasing from jffs2_write_super() into the GC code, we avoid huge
latencies on unmount where it waits for _all_ pending blocks to be
erased, and we allow better control for time-critical tasks by stopping
the GC thread.
Signed-off-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We're about to call this from a bunch of places which already hold
c->erase_completion_lock, so add an assertion and change its existing
callers to do the same.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The call to ->write_super from __sync_filesystem will go away, so make
sure jffs2 performs the same actions from inside ->sync_fs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Push down lock_super into ->write_super instances and remove it from the
caller.
Following filesystem don't need ->s_lock in ->write_super and are skipped:
* bfs, nilfs2 - no other uses of s_lock and have internal locks in
->write_super
* ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
* reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
->write_super
* xfs - no other uses of s_lock and uses internal lock (buffer lock on
superblock buffer) to serialize ->write_super. Also xfs_fs_write_super
is superflous and will go away in the next merge window
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
jffs2_write_super is only called from super.c and doesn't use any
functionality from fs.c. So move it over to super.c and make it
static there.
[should go in through the vfs tree as it is a requirement for the
next patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move BKL into ->put_super from the only caller. A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.
[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We just did a full fs writeout using sync_filesystem before, and if
that's not enough for the filesystem it can perform it's own writeout
in ->put_super, which many filesystems already do.
Move a call to foofs_write_super into every foofs_put_super for now to
guarantee identical behaviour until it's cleaned up by the individual
filesystem maintainers.
Exceptions:
- affs already has identical copy & pasted code at the beginning of
affs_put_super so no need to do it twice.
- xfs does the right thing without it and I have changes pending for
the xfs tree touching this are so I don't really need conflicts
here..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that the readdir/lookup deadlock issues have been dealt with, we can
export JFFS2 file systems again.
(For now, you have to specify fsid manually; we should add a method to
the export_ops to handle that too.)
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.
Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c
This is flag day, yes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ditch a couple of pointless casts from void *, and use the normal
variable name 'f' for jffs2_inode_info pointers -- especially since
it actually shows up in lockdep reports.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Stop the JFFS2 filesystem from using iget() and read_inode(). Replace
jffs2_read_inode() with jffs2_iget(), and call that instead of iget().
jffs2_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
jffs2_do_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.
Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* git://git.infradead.org/mtd-2.6:
[JFFS2] Fix obsoletion of metadata nodes in jffs2_add_tn_to_tree()
[MTD] Fix error checking after get_mtd_device() in get_sb_mtd functions
[JFFS2] Fix buffer length calculations in jffs2_get_inode_nodes()
[JFFS2] Fix potential memory leak of dead xattrs on unmount.
[JFFS2] Fix BUG() caused by failing to discard xattrs on deleted files.
[MTD] generalise the handling of MTD-specific superblocks
[MTD] [MAPS] don't force uclinux mtd map to be root dev
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generalise the handling of MTD-specific superblocks so that JFFS2 and ROMFS
can both share it.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.
I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.
I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.
Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).
There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In particular, remove the bit in the LICENCE file about contacting
Red Hat for alternative arrangements. Their errant IS department broke
that arrangement a long time ago -- the policy of collecting copyright
assignments from contributors came to an end when the plug was pulled on
the servers hosting the project, without notice or reason.
We do still dual-license it for use with eCos, with the GPL+exception
licence approved by the FSF as being GPL-compatible. It's just that nobody
has the right to license it differently.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".
Compile tested with gcc & sparse.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_mtd_device() returns NULL in case of any failure. Teach it to return an
error code instead. Fix all users as well.
Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>