Commit Graph

27516 Commits

Author SHA1 Message Date
J. Bruce Fields 34b232bb37 nfsd4: merge 3 setclientid cases to 2
Boy, is this simpler.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:02 -04:00
J. Bruce Fields 8f9307119d nfsd4: pull out common code from setclientid cases
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:01 -04:00
J. Bruce Fields ad72aae5ad nfsd4: merge last two setclientid cases
The code here is mostly the same.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:00 -04:00
J. Bruce Fields 63db46328a nfsd4: setclientid/confirm comment cleanup
Be a little more concise.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:59 -04:00
J. Bruce Fields e98479b8d6 nfsd4: setclientid remove unnecessary terms from a logical expression
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:58 -04:00
J. Bruce Fields d5497fc693 nfsd4: move rq_flavor into svc_cred
Move the rq_flavor into struct svc_cred, and use it in setclientid and
exchange_id comparisons as well.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:58 -04:00
J. Bruce Fields 8fbba96e5b nfsd4: stricter cred comparison for setclientid/exchange_id
The typical setclientid or exchange_id will probably be performed with a
credential that maps to either root or nobody, so comparing just uid's
is unlikely to be useful.  So, use everything else we can get our hands
on.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:57 -04:00
J. Bruce Fields 03a4e1f6dd nfsd4: move principal name into svc_cred
Instead of keeping the principal name associated with a request in a
structure that's private to auth_gss and using an accessor function,
move it to svc_cred.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:55 -04:00
J. Bruce Fields 631fc9ea05 nfsd4: allow removing clients not holding state
RFC 5661 actually says we should allow an exchange_id to remove a
matching client, even if the exchange_id comes from a different
principal, *if* the victim client lacks any state.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:55 -04:00
J. Bruce Fields 136e658d62 nfsd4: rearrange exchange_id logic to simplify
Minor cleanup: it's simpler to have separate code paths for the update
and non-update cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:54 -04:00
J. Bruce Fields 2dbb269dfe nfsd4: exchange_id cleanup: comments
Make these comments a bit more concise and uniform.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:53 -04:00
J. Bruce Fields 83e08fd46c nfsd4: exchange_id cleanup: local shorthands for repeated tests
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:52 -04:00
J. Bruce Fields 1a308118c2 nfsd4: allow an EXCHANGE_ID to kill a 4.0 client
Following rfc 5661 section 2.4.1, we can permit a 4.1 client to remove
an established 4.0 client's state.

(But we don't allow updates.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:52 -04:00
J. Bruce Fields ea236d0704 nfsd4: exchange_id: check creds before killing confirmed client
We mustn't allow a client to destroy another client with established
state unless it has the right credential.

And some minor cleanup.

(Note: our comparison of credentials is actually pretty bogus currently;
that will need to be fixed in another patch.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:51 -04:00
J. Bruce Fields 2786cc3a05 nfsd4: exchange_id error cleanup
There's no point to the dprintk here as the main proc_compound loop
already does this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:50 -04:00
J. Bruce Fields 11ae681052 nfsd4: exchange_id has a pointless copy
We just verified above that these two verifiers are already the same.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:50 -04:00
Weston Andros Adamson 5fb35a3a9b nfsd: return 0 on reads of fault injection files
debugfs read operations were returning the contents of an uninitialized u64.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:48 -04:00
Jeff Layton ce0fc43c5a nfsd: wrap all accesses to st_deny_bmap
Handle the st_deny_bmap in a similar fashion to the st_access_bmap. Add
accessor functions and use those instead of bare bitops.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:48 -04:00
Jeff Layton 82c5ff1b14 nfsd: wrap accesses to st_access_bmap
Currently, we do this for the most part with "bare" bitops, but
eventually we'll need to expand the share mode code to handle access
and deny modes on other nodes.

In order to facilitate that code in the future, move to some generic
accessor functions. For now, these are mostly static inlines, but
eventually we'll want to move these to "real" functions that are
able to handle multi-node configurations or have a way to "swap in"
new operations to be done in lieu of or in conjunction with these
atomic bitops.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:47 -04:00
Jeff Layton 3a3286147f nfsd: make test_share a bool return
All of the callers treat the return that way already.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:46 -04:00
Jeff Layton 5ae037e599 nfsd: consolidate set_access and set_deny
These functions are identical. Also, rename them to bmap_to_share_mode
to better reflect what they do, and have them just return the result
instead of passing in a pointer to the storage location.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:46 -04:00
Chuck Lever f07ea10dc8 NFSD: SETCLIENTID_CONFIRM returns NFS4ERR_CLID_INUSE too often
According to RFC 3530bis, the only items SETCLIENTID_CONFIRM processing
should be concerned with is the clientid, clientid verifier, and
principal.  The client's IP address is not supposed to be interesting.

And, NFS4ERR_CLID_INUSE is meant only for principal mismatches.

I triggered this logic with a prototype UCS client -- one that
uses the same nfs_client_id4 string for all servers.  The client
mounted our server via its IPv4, then via its IPv6 address.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:45 -04:00
Stanislav Kinsbursky 8dbf28e495 LockD: add debug message to start and stop functions
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:44 -04:00
Stanislav Kinsbursky 3d1221dfa9 LockD: service start function introduced
This is just a code move, which from my POV makes the code look better.
I.e. now on start we have 3 different stages:
1) Service creation.
2) Service per-net data allocation.
3) Service start.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:44 -04:00
Stanislav Kinsbursky 7d13ec761a LockD: move global usage counter manipulation from error path
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:43 -04:00
Stanislav Kinsbursky 2445223909 LockD: service creation function introduced
This function creates service if it doesn't exist, or increases usage
counter if it does, and returns a pointer to it.  The usage counter will
be droppepd by svc_destroy() later in lockd_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:42 -04:00
Stanislav Kinsbursky dbf9b5d74c LockD: use existing per-net data function on service creation
This patch also replaces svc_rpcb_setup() with svc_bind().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:42 -04:00
Stanislav Kinsbursky 4db77695bf LockD: pass service to per-net up and down functions
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:41 -04:00
Stanislav Kinsbursky 786185b5f8 SUNRPC: move per-net operations from svc_destroy()
The idea is to separate service destruction and per-net operations,
because these are two different things and the mix looks ugly.

Notes:

1) For NFS server this patch looks ugly (sorry for that). But these
place will be rewritten soon during NFSd containerization.

2) LockD per-net counter increase int lockd_up() was moved prior to
make_socks() to make lockd_down_net() call safe in case of error.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:40 -04:00
Stanislav Kinsbursky 9793f7c889 SUNRPC: new svc_bind() routine introduced
This new routine is responsible for service registration in a specified
network context.

The idea is to separate service creation from per-net operations.

Note also: since registering service with svc_bind() can fail, the
service will be destroyed and during destruction it will try to
unregister itself from rpcbind. In this case unregistration has to be
skipped.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:39 -04:00
Weston Andros Adamson e7a0444aef nfsd: add IPv6 addr escaping to fs_location hosts
The fs_location->hosts list is split on colons, but this doesn't work when
IPv6 addresses are used (they contain colons).
This patch adds the function nfsd4_encode_components_esc() to
allow the caller to specify escape characters when splitting on 'sep'.
In order to fix referrals, this patch must be used with the mountd patch
that similarly fixes IPv6 [] escaping.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:38 -04:00
J. Bruce Fields 45eaa1c1a1 nfsd4: fix change attribute endianness
Though actually this doesn't matter much, as NFSv4.0 clients are
required to treat the change attribute as opaque.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:38 -04:00
J. Bruce Fields d1829b3824 nfsd4: fix free_stateid return endianness
Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:37 -04:00
J. Bruce Fields 57b7b43b40 nfsd4: int/__be32 fixes
In each of these cases there's a simple unambiguous correct choice, and
no actual bug.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:37 -04:00
J. Bruce Fields bc1b542be9 nfsd4: preserve __user annotation on cld downcall msg
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:36 -04:00
J. Bruce Fields 2355c59644 nfsd4: fix missing "static"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:35 -04:00
J. Bruce Fields bfa4b36525 nfsd: state.c should include current_stateid.h
OK, admittedly I'm mainly just trying to shut sparse up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:35 -04:00
Chris Mason 1e20932a23 Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into for-linus
Conflicts:
	fs/btrfs/ulist.h

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-31 16:49:53 -04:00
Trond Myklebust 1d59d61f60 NFS: Ensure that setattr and getattr wait for O_DIRECT write completion
Use the same mechanism as the block devices are using, but move the
helper functions from fs/direct-io.c into fs/inode.c to remove the
dependency on CONFIG_BLOCK.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 11:41:36 -07:00
Jan Schmidt c31931088f Btrfs: fix tree mod log rewinded level and rewinding of moved keys
When we rewind REMOVE_WHILE_FREEING operations, there's code that allocates
a fresh buffer instead of cloning the old one. Setting that buffer's level
correctly was missing in this case.

When rewinding a MOVE_KEYS operation, btrfs_node_key_ptr_offset(slot) was
missing for memmove_extent_buffer()'s arguments.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:56:19 +02:00
Jan Schmidt f395694c2c Btrfs: fix tree mod log del_ptr
Logging for del_ptr when we're not deleting the last pointer was wrong. This
fixes both, duplicate log entries and log sequence.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:56:19 +02:00
Jan Schmidt e9b7fd4d8b Btrfs: add tree_mod_dont_log helper
Replace duplicate code by small inline helper function.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:56:18 +02:00
Jan Schmidt 926dd8a640 Btrfs: add missing spin_lock for insertion into tree mod log
tree_mod_alloc calls __get_tree_mod_seq and must acquire a spinlock before
doing so.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:56:18 +02:00
Jan Schmidt 3301958b7c Btrfs: add inodes before dropping the extent lock in find_all_leafs
We must build up the inode list with the extent lock held after following
indirect refs.

This also requires an extension to ulists, which allows to modify the stored
aux value in case a key already exists in the list.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-31 19:53:08 +02:00
Al Viro e5467859f7 split ->file_mmap() into ->mmap_addr()/->mmap_file()
... i.e. file-dependent and address-dependent checks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-31 13:11:54 -04:00
Theodore Ts'o f3fc0210c0 ext4: add missing save_error_info() to ext4_error()
The ext4_error() function is missing a call to save_error_info().
Since this is the function which marks the file system as containing
an error, this oversight (which was introduced in 2.6.36) is quite
significant, and should be backported to older stable kernels with
high urgency.

Reported-by: Ken Sumrall <ksumrall@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ksumrall@google.com
Cc: stable@kernel.org
2012-05-30 23:00:16 -04:00
Theodore Ts'o 2c0544b235 ext4: add debugging trigger for ext4_error()
Make it easy to test whether or not the error handling subsystem in
ext4 is working correctly.  This allows us to simulate an ext4_error()
by echoing a string to /sys/fs/ext4/<dev>/trigger_fs_error.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ksumrall@google.com
2012-05-30 22:56:46 -04:00
Al Viro 7696e0c37f binfmt_flat: use vm_munmap, we are missing ->mmap_sem there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:56 -04:00
Al Viro 5a5e4c2eca binfmt_elf: switch elf_map() to vm_mmap/vm_munmap
No reason to hold ->mmap_sem over the sequence

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:55 -04:00
Al Viro 63d37a84ab vfs: umount_tree() might be called on subtree that had never made it
__mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm()
we'd done back when we set ->mnt_ns non-NULL; it should not be done to
vfsmounts that had never gone through commit_tree() and friends.  Kudos to
lczerner for catching that one...

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:55 -04:00
Will Deacon 46ce341b2f pipe: return -ENOIOCTLCMD instead of -EINVAL on unknown ioctl command
As described in commit 07d106d0a ("vfs: fix up ENOIOCTLCMD error
handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl
command which they don't understand. Doing so will result in -ENOTTY
being returned to userspace, which matches the behaviour of the compat
layer if it fails to translate an ioctl command.

This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of
-EINVAL when passed an unknown ioctl command.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:55 -04:00
J. Bruce Fields 3f50fff4da vfs: remove unused __d_splice_alias argument
Nobody sets want_disconn any more.

Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:54 -04:00
J. Bruce Fields 7732a557b1 vfs: stop d_splice_alias creating directory aliases
A directory should never have more than one dentry pointing to it.

But d_splice_alias() will add one if it finds a directory with an
already-existing non-DISCONNECTED dentry.

I can't find an obvious reproducer, but I also can't see what prevents
d_splice_alias() from encountering such a case.

It therefore seems safest to allow d_splice_alias to use any dentry it
finds.

(Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0,
this could cause an nfsd deadlock like this:

	- Somebody attempts to remove a non-empty directory.
	- The dentry_unhash() in vfs_rmdir() unhashes the dentry
	  pointing to the non-empty directory.
	- ->rmdir() then fails with -ENOTEMPTY
	- Before the vfs_rmdir() caller reaches dput(), an nfsd process
	  in rename looks up the directory by filehandle; at the end of
	  that lookup, this dentry is found by d_alloc_anon(), and a
	  reference is taken on it, preventing dput() from removing it.
	- A regular lookup of the directory calls d_splice_alias(),
	  finds only an unhashed (not a DISCONNECTED) dentry, and
	  insteads adds a new one, so the directory now has two
	  dentries.
	- The nfsd process in rename, which was previously looking up
	  the source directory of the rename, now looks up the target
	  directory (which is the same), and gets the dentry newly
	  created by the previous lookup.
	- The rename, seeing two different dentries, assumes this is a
	  cross-directory rename and attempts to take the i_mutex on the
	  directory twice.

That reproducer no longer exists, but I don't think there was anything
fundamentally incorrect about the vfs_rmdir() behavior there, so I think
the real fault was here in d_splice_alias().)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:54 -04:00
Dan Carpenter fd657170c0 fsnotify: remove unused parameter from send_to_group()
We don't use "mnt" anymore in send_to_group() after 1968f5eed5 ("fanotify:
use both marks when possible") was applied.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:53 -04:00
Dmitry Kasatkin 799243a389 vfs: increment iversion when a file is truncated
When a file is truncated with truncate()/ftruncate() and then closed,
iversion is not updated.  This patch uses ATTR_SIZE flag as an indication
to increment iversion.

Mimi said:

On fput(), i_version is used to detect and flag files that have changed
and need to be re-measured in the IMA measurement policy.  When a file
is truncated with truncate()/ftruncate() and then closed, i_version is
not updated.  As a result, although the file has changed, it will not be
re-measured and added to the IMA measurement list on subsequent access.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:53 -04:00
Shai Fultheim a0a9b04337 fs: Move bh_cachep to the __read_mostly section
bh_cachep is only written to once on initialization, so move it to the
__read_mostly section.

Signed-off-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:52 -04:00
Cong Wang 3ed37648e1 fs: move file_remove_suid() to fs/inode.c
file_remove_suid() is a generic function operates on struct file,
it almost has no relations with file mapping, so move it to fs/inode.c.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:52 -04:00
Artem Bityutskiy 8bdc81c506 jffs2: get rid of jffs2_sync_super
Currently JFFS2 file-system maps the VFS "superblock" abstraction to the
write-buffer. Namely, it uses VFS services to synchronize the write-buffer
periodically.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblock using the '->write_super()' call-back. But the
problem with this thread is that it wastes power by waking up the system every
5 seconds no matter what. So we want to kill it completely and thus, we need to
make file-systems to stop using the '->write_super' VFS service, and then
remove it together with the kernel thread.

This patch switches the JFFS2 write-buffer management from
'->write_super()'/'->s_dirt' to a delayed work. Instead of setting the 's_dirt'
flag we just schedule a delayed work for synchronizing the write-buffer.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:52 -04:00
Artem Bityutskiy 06688905cc jffs2: remove unnecessary GC pass on sync
We do not need to call 'jffs2_write_super()' on sync. This function
causes a GC pass to make sure the current contents is pushed out with
the data which we already have on the media.

But this is not needed on unmount and only slows sync down unnecessarily.
It is enough to just sync the write-buffer.

This call was added by one of the generic VFS rework patch-sets,
see d579ed00aa.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:51 -04:00
Artem Bityutskiy d0490eea14 jffs2: remove unnecessary GC pass on umount
We do not need to call 'jffs2_write_super()' on unmount. This function
causes a GC pass to make sure the current contents is pushed out with
the data which we already have on the media.

But this is not needed on unmount and only slows unmount down unnecessarily.
It is enough to just sync the write-buffer.

This call was added by one of the generic VFS rework patch-sets,
see 8c85e12512.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:51 -04:00
Artem Bityutskiy 3a0c0e26b6 jffs2: remove lock_super
We do not need 'lock_super()'/'unlock_super()' in JFFS2 - kill them.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-30 21:04:51 -04:00
Linus Torvalds af56e0aa35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "There are some updates and cleanups to the CRUSH placement code, a bug
  fix with incremental maps, several cleanups and fixes from Josh Durgin
  in the RBD block device code, a series of cleanups and bug fixes from
  Alex Elder in the messenger code, and some miscellaneous bounds
  checking and gfp cleanups/fixes."

Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the
networking people preferring "unsigned int" over just "unsigned".

* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits)
  libceph: fix pg_temp updates
  libceph: avoid unregistering osd request when not registered
  ceph: add auth buf in prepare_write_connect()
  ceph: rename prepare_connect_authorizer()
  ceph: return pointer from prepare_connect_authorizer()
  ceph: use info returned by get_authorizer
  ceph: have get_authorizer methods return pointers
  ceph: ensure auth ops are defined before use
  ceph: messenger: reduce args to create_authorizer
  ceph: define ceph_auth_handshake type
  ceph: messenger: check return from get_authorizer
  ceph: messenger: rework prepare_connect_authorizer()
  ceph: messenger: check prepare_write_connect() result
  ceph: don't set WRITE_PENDING too early
  ceph: drop msgr argument from prepare_write_connect()
  ceph: messenger: send banner in process_connect()
  ceph: messenger: reset connection kvec caller
  libceph: don't reset kvec in prepare_write_banner()
  ceph: ignore preferred_osd field
  ceph: fully initialize new layout
  ...
2012-05-30 11:17:19 -07:00
Jan Schmidt 95a06077f7 Btrfs: use delayed ref sequence numbers for all fs-tree updates
The sequence number for delayed refs is needed to postpone certain delayed
refs for a very short period while walking backrefs. Before the tree
modification log, we thought we'd only have to hold back those references
that don't have a counter operation.

While now we've the tree mod log, we're rewinding fs tree blocks to a
defined consistent state. We cannot know in advance for which tree block
we'll be doing rewind operations later. Therefore, we must postpone all the
delayed refs for fs-tree blocks, even those having a counter operation.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 18:18:21 +02:00
Chris Mason cfc442b696 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into HEAD 2012-05-30 11:55:38 -04:00
Linus Torvalds 0d167518e0 Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block
Merge block/IO core bits from Jens Axboe:
 "This is a bit bigger on the core side than usual, but that is purely
  because we decided to hold off on parts of Tejun's submission on 3.4
  to give it a bit more time to simmer.  As a consequence, it's seen a
  long cycle in for-next.

  It contains:

   - Bug fix from Dan, wrong locking type.
   - Relax splice gifting restriction from Eric.
   - A ton of updates from Tejun, primarily for blkcg.  This improves
     the code a lot, making the API nicer and cleaner, and also includes
     fixes for how we handle and tie policies and re-activate on
     switches.  The changes also include generic bug fixes.
   - A simple fix from Vivek, along with a fix for doing proper delayed
     allocation of the blkcg stats."

Fix up annoying conflict just due to different merge resolution in
Documentation/feature-removal-schedule.txt

* 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits)
  blkcg: tg_stats_alloc_lock is an irq lock
  vmsplice: relax alignement requirements for SPLICE_F_GIFT
  blkcg: use radix tree to index blkgs from blkcg
  blkcg: fix blkcg->css ref leak in __blkg_lookup_create()
  block: fix elvpriv allocation failure handling
  block: collapse blk_alloc_request() into get_request()
  blkcg: collapse blkcg_policy_ops into blkcg_policy
  blkcg: embed struct blkg_policy_data in policy specific data
  blkcg: mass rename of blkcg API
  blkcg: style cleanups for blk-cgroup.h
  blkcg: remove blkio_group->path[]
  blkcg: blkg_rwstat_read() was missing inline
  blkcg: shoot down blkgs if all policies are deactivated
  blkcg: drop stuff unused after per-queue policy activation update
  blkcg: implement per-queue policy activation
  blkcg: add request_queue->root_blkg
  blkcg: make request_queue bypassing on allocation
  blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
  blkcg: make blkg_conf_prep() take @pol and return with queue lock held
  blkcg: remove static policy ID enums
  ...
2012-05-30 08:52:42 -07:00
Stefan Behrens 48235a68a3 Btrfs: fix false positive in check-integrity on unmount
During unmount, it could happen that the integrity checker printed a
warning message "attempt to free ... on umount which is not yet iodone"
which turned out to be a false positive.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:44 -04:00
Stefan Behrens 86ff7ffce0 Btrfs: fix runtime warning in check-integrity check data mode
If a file_extent_item was located at the very end of a leaf and there was
not enough space to hold a full item, but there was enough space to hold
one of type BTRFS_FILE_EXTENT_INLINE or PREALLOC, and it was only such a
short item, a warning was printed anyway. This check is now fixed.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:43 -04:00
Stefan Behrens 3d136a1131 Btrfs: set ioprio of scrub readahead to idle
Reduce ioprio class of scrub readahead threads to idle priority.
This setting is fixed. This priority has shown the best performance
during all measurements.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:43 -04:00
Josef Bacik 5bdbeb2187 Btrfs: fix return code in drop_objectid_items
So dpkg fsync()'s the file and the directory containing the file whenever it
writes to a file which is really slow in btrfs.  This is partly because
fsync()'ing a directory _always_ committed the transaction instead of just
going to the tree log.  This is because drop_objectid_items() would return 1
since it does a btrfs_search_slot() which returns 1.  In tree-log jargon
this means that we have to commit the transaction to be safe.  So just check
if ret is greater than 0 and set it to 0 if it does.  With this patch we now
use the tree-log instead of committing the entire transaction, which is
twice as fast on my box.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:42 -04:00
Josef Bacik 22ee6985de Btrfs: check to see if the inode is in the log before fsyncing
We have this check down in the actual logging code, but this is after we
start a transaction and all that good stuff.  So move the helper
inode_in_log() out so we can call it in fsync() and avoid starting a
transaction altogether and just exit if we've already fsync()'ed this file
recently.  You would notice this issue if you fsync()'ed a file over and
over again until the transaction committed.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:42 -04:00
Tsutomu Itoh 018642a1f1 Btrfs: return value of btrfs_read_buffer is checked correctly
btrfs_read_buffer() has the possibility of returning the error.
Therefore, I add the code in which the return value of btrfs_read_buffer()
is checked.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-05-30 10:23:41 -04:00
Stefan Behrens 733f4fbbc1 Btrfs: read device stats on mount, write modified ones during commit
The device statistics are written into the device tree with each
transaction commit. Only modified statistics are written.
When a filesystem is mounted, the device statistics for each involved
device are read from the device tree and used to initialize the
counters.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:41 -04:00
Stefan Behrens c11d2c236c Btrfs: add ioctl to get and reset the device stats
An ioctl interface is added to get the device statistic counters.
A second ioctl is added to atomically get and reset these counters.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:40 -04:00
Stefan Behrens 442a4f6308 Btrfs: add device counters for detected IO and checksum errors
The goal is to detect when drives start to get an increased error rate,
when drives should be replaced soon. Therefore statistic counters are
added that count IO errors (read, write and flush). Additionally, the
software detected errors like checksum errors and corrupted blocks are
counted.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30 10:23:39 -04:00
Asias He d07eb91170 btrfs: Drop unused function btrfs_abort_devices()
1) This function is not used anywhere.

2) Using the blk_abort_queue() to abort the queue seems not correct.
blk_abort_queue() is used for timeout handling (block/blk-timeout.c).

Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
2012-05-30 10:23:39 -04:00
Miao Xie 762f226326 Btrfs: fix the same inode id problem when doing auto defragment
Two files in the different subvolumes may have the same inode id, so
The rb-tree which is used to manage the defragment object must take it
into account. This patch fix this problem.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-05-30 10:23:38 -04:00
Josef Bacik 2adcac1a73 Btrfs: fall back to non-inline if we don't have enough space
If cow_file_range_inline fails with ENOSPC we abort the transaction which
isn't very nice.  This really shouldn't be happening anyways but there's no
sense in making it a horrible error when we can easily just go allocate
normal data space for this stuff.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:38 -04:00
Josef Bacik 8a35d95ff4 Btrfs: fix how we deal with the orphan block rsv
Ceph was hitting this race where we would remove an inode from the per-root
orphan list before we would release the space we had reserved for the inode.
We actually don't need a list or anything, we just need to make sure the
root doesn't try to free up the orphan reserve until after the inodes have
released their reservations.  So use an atomic counter instead of a list on
the root and only decrement the counter after we've released our
reservation.  I've tested this as well as several others and we no longer
see the warnings that you would see while running ceph.  Thanks,
Btrfs: fix how we deal with the orphan block rsv

Ceph was hitting this race where we would remove an inode from the per-root
orphan list before we would release the space we had reserved for the inode.
We actually don't need a list or anything, we just need to make sure the
root doesn't try to free up the orphan reserve until after the inodes have
released their reservations.  So use an atomic counter instead of a list on
the root and only decrement the counter after we've released our
reservation.  I've tested this as well as several others and we no longer
see the warnings that you would see while running ceph.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:37 -04:00
Josef Bacik 72ac3c0d79 Btrfs: convert the inode bit field to use the actual bit operations
Miao pointed this out while I was working on an orphan problem that messing
with a bitfield where different ranges are protected by different locks
doesn't work out right.  Turns out we've been doing this forever where we
have different parts of the bit field protected by either no lock at all or
different locks which could cause all sorts of weird problems including the
issue I was hitting.  So instead make a runtime_flags thing that we use the
normal bit operations on that are all atomic so we can keep having our
no/different locking for the different flags and then make force_compress
it's own thing so it can be treated normally.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:36 -04:00
Josef Bacik cd023e7b17 Btrfs: merge contigous regions when loading free space cache
When we write out the free space cache we will write out everything that is
in our in memory tree, and then we will just walk the pinned extents tree
and write anything we see there.  The problem with this is that during
normal operations the pinned extents will be merged back into the free space
tree normally, and then we can allocate space from the merged areas and
commit them to the tree log.  If we crash and replay the tree log we will
crash again because the tree log will try to free up space from what looks
like 2 seperate but contiguous entries, since one entry is from the original
free space cache and the other was a pinned extent that was merged back.  To
fix this we just need to walk the free space tree after we load it and merge
contiguous entries back together.  This will keep the tree log stuff from
breaking and it will make the allocator behave more nicely.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:36 -04:00
Liu Bo 9ba1f6e44e Btrfs: do not do balance in readonly mode
In normal cases, we would not be allowed to do balance in RO mode.
However, when we're using a seeding device and adding another device to sprout,
things will change:

$ mkfs.btrfs /dev/sdb7
$ btrfstune -S 1 /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs fi bal /mnt/btrfs   -----------------------> fail.
$ btrfs dev add /dev/sdb8 /mnt/btrfs
$ btrfs fi bal /mnt/btrfs   -----------------------> works!

It should not be designed as an exception, and we'd better add another check for
mnt flags.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:35 -04:00
Liu Bo d1ac6e41d5 Btrfs: use fastpath in extent state ops as much as possible
Fully utilize our extent state's new helper functions to use
fastpath as much as possible.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:34 -04:00
Liu Bo f8c5d0b443 Btrfs: fix wrong error returned by adding a device
Reproduce:
$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs dev add /dev/sdb8 /mnt/btrfs
ERROR: error adding the device '/dev/sdb8' - Invalid argument

Since we mount with readonly options, and /dev/sdb7 is not a seeding one,
a readonly notification is preferred.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:34 -04:00
Josef Bacik 5fd0204355 Btrfs: finish ordered extents in their own thread
We noticed that the ordered extent completion doesn't really rely on having
a page and that it could be done independantly of ending the writeback on a
page.  This patch makes us not do the threaded endio stuff for normal
buffered writes and direct writes so we can end page writeback as soon as
possible (in irq context) and only start threads to do the ordered work when
it is actually done.  Compression needs to be reworked some to take
advantage of this as well, but atm it has to do a find_get_page in its endio
handler so it must be done in its own thread.  This makes direct writes
quite a bit faster.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:33 -04:00
Josef Bacik 4e89915220 Btrfs: do not check delalloc when updating disk_i_size
We are checking delalloc to see if it is ok to update the i_size.  There are
2 cases it stops us from updating

1) If there is delalloc between our current disk_i_size and this ordered
extent

2) If there is delalloc between our current ordered extent and the next
ordered extent

These tests are racy however since we can set delalloc for these ranges at
any time.  Also for the first case if we notice there is delalloc between
disk_i_size and our ordered extent we will not update disk_i_size and assume
that when that delalloc bit gets written out it will update everything
properly.  However if we crash before that we will have file extents outside
of our i_size, which is not good, so this test is dangerous as well as racy.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:33 -04:00
Jim Meyering f60d16a892 Btrfs: avoid buffer overrun in mount option handling
There is an off-by-one error: allocating room for a maximal result
string but without room for a trailing NUL.  That, can lead to
returning a transformed string that is not NUL-terminated, and
then to a caller reading beyond end of the malloc'd buffer.

Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy
(the result is guaranteed to fit), remove dead strlen at end, and
change a few variable names and comments.

Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:32 -04:00
Jim Meyering a27202fbe9 Btrfs: NUL-terminate path buffer in DEV_INFO ioctl result
A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer
would not be NUL-terminated in the DEV_INFO ioctl result buffer.

Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:31 -04:00
Jim Meyering f07c9a79f0 Btrfs: avoid buffer overrun in btrfs_printk
The buffer read-overrun would be triggered by a printk format
starting with <N>, where N is a single digit.  NUL-terminate
after strncpy.  Use memcpy, not strncpy, since we know the
string we're copying fits in the destination buffer and
contains no NUL byte.

Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:31 -04:00
Daniel J Blueman 2eec6c8102 Fix minor type issues
Address some minor type issues identified by sparse checker.

Signed-off-by: Daniel J Blueman <daniel@quora.org>
2012-05-30 10:23:30 -04:00
Sergei Trofimovich 0d2450abfa btrfs: allow changing 'thread_pool' size at remount time
Changing 'mount -oremount,thread_pool=2 /' didn't make any effect:

maximum amount of worker threads is specified in 2 places:
- in 'strict btrfs_fs_info::thread_pool_size'
- in each worker struct: 'struct btrfs_workers::max_workers'

'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'.

Fix it by pushing new maximum value to all created worker structures
as well.

Cc: Josef Bacik <josef@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2012-05-30 10:23:30 -04:00
Josef Bacik 0885ef5b56 Btrfs: do not do filemap_write_and_wait_range in fsync
We already do the btrfs_wait_ordered_range which will do this for us, so
just remove this call so we don't call it twice.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:29 -04:00
Josef Bacik 551ebb2d34 Btrfs: remove useless waiting and extra filemap work
In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice
because compression does strange things and then waiting.  Then we look up
ordered extents and if we find any we will always schedule_timeout(); once
and then loop back around and do it all again.  We will even check to see if
there is delalloc pages on this range and loop again.  So this patch gets
rid of the multipe fdata_write() calls and just does
filemap_write_and_wait().  In the case of compression we will still find the
ordered extents and start those individually if we need to so that is ok,
but in the normal buffered case we avoid all this weird overhead.

Then in the case of the schedule_timeout(1), we don't need it.  All callers
either 1) don't care, they just want to make sure what they just wrote maeks
it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing
in which case it will lock and check for ordered extents _anyway_ so get
back to them as quickly as possible.  The delaloc check is simply not
needed, this only catches the case where we write to the file again since
doing the filemap_write_and_wait() and if the caller truly cares about that
it will take care of everything itself.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik d7dbe9e7f6 Btrfs: fix compile warnings in extent_io.c
These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,
Btrfs: fix compile warnings in extent_io.c

These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik 30f8fe3e47 Btrfs: cache no acl on new inodes
When running compilebench I noticed we were spending some time looking up
acls on new inodes, which shouldn't be happening since there were no acls.
This is because when we init acls on the inode after creating them we don't
cache the fact there are no acls if there aren't any.  Doing this adds a
little bit of a bump to my compilebench runs.  Thanks,
Btrfs: cache no acl on new inodes

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00
Josef Bacik 0c4d2d95d0 Btrfs: use i_version instead of our own sequence
We've been keeping around the inode sequence number in hopes that somebody
would use it, but nobody uses it and people actually use i_version which
serves the same purpose, so use i_version where we used the incore inode's
sequence number and that way the sequence is updated properly across the
board, and not just in file write.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00
Jan Schmidt 20b297d620 Btrfs: tree mod log sanity checks in join_transaction
When a fresh transaction begins, the tree mod log must be clean. Users of
the tree modification log must ensure they never span across transaction
boundaries.

We reset the sequence to 0 in this safe situation to make absolutely sure
overflow can't happen.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:36 +02:00
Jan Schmidt 19ae4e8133 Btrfs: fs_info variable for join_transaction
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:35 +02:00
Jan Schmidt 8445f61cad Btrfs: use the tree modification log for backref resolving
This enables backref resolving on life trees while they are changing. This
is a prerequisite for quota groups and just nice to have for everything
else.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:34 +02:00
Jan Schmidt 5d9e75c41d Btrfs: add btrfs_search_old_slot
The tree modification log together with the current state of the tree gives
a consistent, old version of the tree. btrfs_search_old_slot is used to
search through this old version and return old (dummy!) extent buffers.
Naturally, this function cannot do any tree modifications.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:33 +02:00
Jan Schmidt f3ea38da3e Btrfs: add del_ptr and insert_ptr modifications to the tree mod log
Record all relevant modifications to block pointers in the tree mod log so
that we can rewind them later on for backref walking.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:32 +02:00
Jan Schmidt f230475e62 Btrfs: put all block modifications into the tree mod log
When running functions that can make changes to the internal trees
(e.g. btrfs_search_slot), we check if somebody may be interested in the
block we're currently modifying. If so, we record our modification to be
able to rewind it later on.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:29 +02:00
Jan Schmidt bd989ba359 Btrfs: add tree modification log functions
The tree mod log will log modifications made fs-tree nodes. Most
modifications are done by autobalance of the tree. Such changes are recorded
as long as a block entry exists. When released, the log is cleaned.

With the tree modification log, it's possible to reconstruct a consistent
old state of the tree. This is required to do backref walking on a busy
file system.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:01 +02:00
Al Viro 1676765238 get rid of idiotic misplaced __kernel_mode_t in ncfps kernel-private data structure
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:42 -04:00
Andi Kleen 962830df36 brlocks/lglocks: API cleanups
lglocks and brlocks are currently generated with some complicated macros
in lglock.h.  But there's no reason to not just use common utility
functions and put all the data into a common data structure.

In preparation, this patch changes the API to look more like normal
function calls with pointers, not magic macros.

The patch is rather large because I move over all users in one go to keep
it bisectable.  This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:41 -04:00
Andi Kleen eea62f831b brlocks/lglocks: turn into functions
lglocks and brlocks are currently generated with some complicated macros
in lglock.h.  But there's no reason to not just use common utility
functions and put all the data into a common data structure.

Since there are at least two users it makes sense to share this code in a
library.  This is also easier maintainable than a macro forest.

This will also make it later possible to dynamically allocate lglocks and
also use them in modules (this would both still need some additional, but
now straightforward, code)

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:41 -04:00
Al Viro ea022dfb3c ocfs: simplify symlink handling
seeing that "fast" symlinks still get allocation + copy, we might as
well simply switch them to pagecache-based variant of ->follow_link();
just need an appropriate ->readpage() for them...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:40 -04:00
Al Viro 408bd629ba get rid of pointless allocations and copying in ecryptfs_follow_link()
switch to generic_readlink(), while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:40 -04:00
Al Viro 28fe3c1963 hpfs: assorted endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:39 -04:00
Al Viro 77ee26e44c hpfs: annotate ea
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:39 -04:00
Al Viro 46287aa652 hpfs: annotate struct hpfs_dirent
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:39 -04:00
Al Viro 6ce2bbba52 hpfs: annotate struct anode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:38 -04:00
Al Viro 2b9f1cc29b hpfs: annotate struct fnode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:38 -04:00
Al Viro ddc19e6e04 hpfs: annotate btree nodes, get rid of bitfields mess
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:38 -04:00
Al Viro 39413c6046 hpfs: annotate struct dnode
little-endians...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:37 -04:00
Al Viro 52576da354 hpfs: bitmaps are little-endian
annotate properly...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:37 -04:00
Al Viro c4c995430a hpfs: get rid of bitfields in struct fnode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:37 -04:00
Al Viro 4085e155b1 hpfs: get rid of bitfields endianness wanking in extended_attribute
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:36 -04:00
Randy Dunlap 185553b224 fs: fix inode.c kernel-doc warnings
Fix kernel-doc warnings in fs/inode.c:

Warning(fs/inode.c:1493): No description found for parameter 'path'
Warning(fs/inode.c:1493): Excess function parameter 'mnt' description in 'touch_atime'
Warning(fs/inode.c:1493): Excess function parameter 'dentry' description in 'touch_atime'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:36 -04:00
Al Viro de5e2b3628 hpfs: endianness bugs
a couple of le32 and le16 used with wrong le..._to_cpu(), plus
idiotic use of le32_to_cpu() on 1-bit bitfield

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:36 -04:00
Al Viro 528c032764 btrfs: trivial endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:35 -04:00
Al Viro 1db5df98fa ocfs2: kill endianness abuses in blockcheck.c
ocfs2_block_check is for little-endian contents; if we just want to
its fields converted to host-endian in a couple of functions, just
put those values into local u32 and u16...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:35 -04:00
Al Viro f6a5690324 ocfs2: deal with __user misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:35 -04:00
Al Viro 8515841086 ocfs2: trivial endianness misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:34 -04:00
Al Viro 66f8f50920 affs: bury unused macros
... unused since 2.4.4.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:34 -04:00
Al Viro af569596a9 kill v9fs_dentry_from_dir_inode()
In *all* callers we have a dentry of child of that directory.
Just use ->d_parent of that one, for fsck sake...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:34 -04:00
Sage Weil c862868bb4 ceph: move encode_fh to new API
Use parent_inode has a flag for whether nfsd wants a connectable fh, but
generate one opportunistically so that we can take advantage of the
additional info in there.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:33 -04:00
Al Viro b0b0382bb4 ->encode_fh() API change
pass inode + parent's inode or NULL instead of dentry + bool saying
whether we want the parent or not.

NOTE: that needs ceph fix folded in.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:33 -04:00
Al Viro 6d42e7e9f6 ubifs: use generic_fillattr()
don't open-code it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:32 -04:00
Al Viro 77ba78776e xfs: switch to proper __bitwise type for KM_... flags
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:32 -04:00
Al Viro c217a2a004 switch utimes() to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:32 -04:00
Al Viro 0aa2ee5f0a switch statfs to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:31 -04:00
Al Viro bdc689594b switch flock to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:31 -04:00
Al Viro 20ba5d736f switch signalfd4() to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:30 -04:00
Al Viro 545ec2c794 switch fcntl to fget_raw_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:30 -04:00
Al Viro 7449af1e8b switch xattr syscalls to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:30 -04:00
Al Viro 863ced7fe7 switch readdir/getdents to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:29 -04:00
Al Viro c2bd6c11cd switch do_fsync() to fget_light()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:29 -04:00
Linus Torvalds 7d36014b97 Merge branch 'akpm' (Andrew's patch-bomb)
Merge patches through Andrew Morton:
 "180 patches - err 181 - listed below:

   - most of MM.  I held back the (large) "memcg: add hugetlb extension"
     series because a bunfight has recently broken out.

   - leds.  After this, Bryan Wu will be handling drivers/leds/

   - backlight

   - lib/

   - rtc"

* emailed from Andrew Morton <akpm@linux-foundation.org>: (181 patches)
  drivers/rtc/rtc-s3c.c: fix compiler warning
  drivers/rtc/rtc-tegra.c: clean up probe/remove routines
  drivers/rtc/rtc-pl031.c: remove RTC timer interrupt handling
  drivers/rtc/rtc-lpc32xx.c: add device tree support
  drivers/rtc/rtc-m41t93.c: don't let get_time() reset M41T93_FLAG_OF
  rtc: ds1307: add trickle charger support
  rtc: ds1307: remove superfluous initialization
  rtc: rename CONFIG_RTC_MXC to CONFIG_RTC_DRV_MXC
  drivers/rtc/Kconfig: place RTC_DRV_IMXDI and RTC_MXC under "on-CPU RTC drivers"
  drivers/rtc/rtc-pcf8563.c: add RTC_VL_READ/RTC_VL_CLR ioctl feature
  rtc: add ioctl to get/clear battery low voltage status
  drivers/rtc/rtc-ep93xx.c: convert to use module_platform_driver()
  rtc/spear: add Device Tree probing capability
  lib/vsprintf.c: "%#o",0 becomes '0' instead of '00'
  radix-tree: fix preload vector size
  spinlock_debug: print kallsyms name for lock
  vsprintf: fix %ps on non symbols when using kallsyms
  lib/bitmap.c: fix documentation for scnprintf() functions
  lib/string_helpers.c: make arrays static
  lib/test-kstrtox.c: mark const init data with __initconst instead of __initdata
  ...
2012-05-29 18:05:31 -07:00
David Rientjes a7f638f999 mm, oom: normalize oom scores to oom_score_adj scale only for userspace
The oom_score_adj scale ranges from -1000 to 1000 and represents the
proportion of memory available to the process at allocation time.  This
means an oom_score_adj value of 300, for example, will bias a process as
though it was using an extra 30.0% of available memory and a value of
-350 will discount 35.0% of available memory from its usage.

The oom killer badness heuristic also uses this scale to report the oom
score for each eligible process in determining the "best" process to
kill.  Thus, it can only differentiate each process's memory usage by
0.1% of system RAM.

On large systems, this can end up being a large amount of memory: 256MB
on 256GB systems, for example.

This can be fixed by having the badness heuristic to use the actual
memory usage in scoring threads and then normalizing it to the
oom_score_adj scale for userspace.  This results in better comparison
between eligible threads for kill and no change from the userspace
perspective.

Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:24 -07:00
Hugh Dickins 17cf28afea mm/fs: remove truncate_range
Remove vmtruncate_range(), and remove the truncate_range method from
struct inode_operations: only tmpfs ever supported it, and tmpfs has now
converted over to using the fallocate method of file_operations.

Update Documentation accordingly, adding (setlease and) fallocate lines.
And while we're in mm.h, remove duplicate declarations of shmem_lock() and
shmem_file_setup(): everyone is now using the ones in shmem_fs.h.

Based-on-patch-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Cong Wang <amwang@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:23 -07:00
Sasha Levin 08fa29d916 mm: fix NULL ptr deref when walking hugepages
A missing validation of the value returned by find_vma() could cause a
NULL ptr dereference when walking the pagetable.

This is triggerable from usermode by a simple user by trying to read a
page info out of /proc/pid/pagemap which doesn't exist.

Introduced by commit 025c5b2451 ("thp: optimize away unnecessary page
table locking").

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>		[3.4.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:18 -07:00
Linus Torvalds 442a9ffabb Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS updates from Steve French.

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (29 commits)
  cifs: fix oops while traversing open file list (try #4)
  cifs: Fix comment as d_alloc_root() is replaced by d_make_root()
  CIFS: Introduce SMB2 mounts as vers=2.1
  CIFS: Introduce SMB2 Kconfig option
  CIFS: Move add/set_credits and get_credits_field to ops structure
  CIFS: Move protocol specific demultiplex thread calls to ops struct
  CIFS: Move protocol specific part from cifs_readv_receive to ops struct
  CIFS: Move header_size/max_header_size to ops structure
  CIFS: Move protocol specific part from SendReceive2 to ops struct
  cifs: Include backup intent search flags during searches {try #2)
  CIFS: Separate protocol specific part from setlk
  CIFS: Separate protocol specific part from getlk
  CIFS: Separate protocol specific lock type handling
  CIFS: Convert lock type to 32 bit variable
  CIFS: Move locks to cifsFileInfo structure
  cifs: convert send_nt_cancel into a version specific op
  cifs: add a smb_version_operations/values structures and a smb_version enum
  cifs: remove the vers= and version= synonyms for ver=
  cifs: add warning about change in default cache semantics in 3.7
  cifs: display cache= option in /proc/mounts
  ...
2012-05-29 12:42:10 -07:00
Linus Torvalds 53f2c4a8fd NFS client updates for Linux 3.5
New features include:
 - Rewrite the O_DIRECT code so that it can share the same coalescing and
   pNFS functionality as the page cache code.
 - Allow the server to provide hints as to when we should use pNFS, and
   when it is more efficient to read and write through the metadata
   server.
 - NFS cache consistency updates:
   - Use the ctime to emulate a change attribute for NFSv2/v3 so that
     all NFS versions can share the same cache management code.
   - New cache management code will only look at the change attribute
     and size attribute when deciding whether or not our cached data
     is still valid or not.
   - Don't request NFSv4 post-op attributes on writes in cases such as
     O_DIRECT, where we don't care about data cache consistency, or
     when we have a write delegation, and know that our cache is
     still consistent.
   - Don't request NFSv4 post-op attributes on operations such as
     COMMIT, where there are no expected metadata updates.
   - Don't request NFSv4 directory post-op attributes in cases where
     the operations themselves already return change attribute updates:
     i.e.  operations such as OPEN, CREATE, REMOVE, LINK and RENAME.
 - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS
   if we detect no attempts to lookup filenames.
 - Improve the code sharing between NFSv2/v3 and v4 mounts
 - NFSv4.1 state management efficiency improvements
 - More patches in preparation for NFSv4/v4.1 migration functionality.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw/MNAAoJEGcL54qWCgDyxU8P/2kKqhAlhoLEArBqo9FT3/OK
 YrNs5uO/erTgnCG8L0XQvTKjHB9F7TAeFXqTmBZuPlb1afRpHHt2vzPqzIvUCeOC
 ZXm8vzZf4nxWZgEFoTDdUBvqQi9lLdIzCRhSaVCKcRnNwiuaKDd/iwykbWGcHqmv
 jtR4lzXPllJdKCUL3yb3juVrpq6Vvn254ID2pqdnYcEtIJIHgaRZpwdp4Iz9+8b5
 Moishiw2rgCBJIhf+VCYd8B2oYfMgSDPxG1o3etkwY46qo+4s+CIls9Vu/6YzGXK
 3+NdLatRDqKhQpLm0/R+dI3rntnTZ8x6LgWnTGxUsiqb6pAaHZPK284rf2eh/s7M
 Q4G4203r0uw539kIt6eKOGqC9c8kZAPCHlQSPCaImZyCJsz+6OMShNlGB5bZpFPr
 tbdxaxudrhCF7UVKXicJCWgv2nIHtek6fNwey1jqFoYgZP5ipiBKymvXQC5WAMBw
 7RHJor/JEC+UJkVg/7Mkpg0UNw3E36CTYLeRJKlNCS6YO9NJQseCDxhhMNAy/ab7
 RGO8DVMkUsOUH20S+a19LyeFQtveWFIE0DiDqRn0KnNGhGwHrv2t4xFukjlrf4Sw
 8FQUBRdtFxfmspfA1IdoTY49XZQda5eagvTy1MyaWEh+jPSJ4G5j3sSjFiaKAJqw
 79iQKFGkxPOSHx2yCdAF
 =suVW
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "New features include:
   - Rewrite the O_DIRECT code so that it can share the same coalescing
     and pNFS functionality as the page cache code.
   - Allow the server to provide hints as to when we should use pNFS,
     and when it is more efficient to read and write through the
     metadata server.
   - NFS cache consistency updates:
     * Use the ctime to emulate a change attribute for NFSv2/v3 so that
       all NFS versions can share the same cache management code.
     * New cache management code will only look at the change attribute
       and size attribute when deciding whether or not our cached data
       is still valid or not.
     * Don't request NFSv4 post-op attributes on writes in cases such as
       O_DIRECT, where we don't care about data cache consistency, or
       when we have a write delegation, and know that our cache is still
       consistent.
     * Don't request NFSv4 post-op attributes on operations such as
       COMMIT, where there are no expected metadata updates.
     * Don't request NFSv4 directory post-op attributes in cases where
       the operations themselves already return change attribute
       updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and
       RENAME.
   - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS
     if we detect no attempts to lookup filenames.
   - Improve the code sharing between NFSv2/v3 and v4 mounts
   - NFSv4.1 state management efficiency improvements
   - More patches in preparation for NFSv4/v4.1 migration functionality."

Fix trivial conflict in fs/nfs/nfs4proc.c that was due to the dcache
qstr name initialization changes (that made the length/hash a 64-bit
union)

* tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (146 commits)
  NFSv4: Add debugging printks to state manager
  NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO
  NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE
  NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error
  NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION
  NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager
  NFSv4.1: Handle errors in nfs4_bind_conn_to_session
  NFSv4.1: nfs4_bind_conn_to_session should drain the session
  NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid
  NFSv4.1: Add DESTROY_CLIENTID
  NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session
  NFSv4.1: Ensure we use the correct credentials for session create/destroy
  NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations
  NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease
  NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM
  NFSv4: Clean up the error handling for nfs4_reclaim_lease
  NFSv4.1: Exchange ID must use GFP_NOFS allocation mode
  nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN*
  nfs4.1: add BIND_CONN_TO_SESSION operation
  NFSv4.1 test the mdsthreshold hint parameters
  ...
2012-05-29 10:43:51 -07:00
Tao Ma 6f2e9f0e7d ext4: protect group inode free counting with group lock
Now when we set the group inode free count, we don't have a proper
group lock so that multiple threads may decrease the inode free
count at the same time. And e2fsck will complain something like:

Free inodes count wrong for group #1 (1, counted=0).
Fix? no

Free inodes count wrong for group #2 (3, counted=0).
Fix? no

Directories count wrong for group #2 (780, counted=779).
Fix? no

Free inodes count wrong for group #3 (2272, counted=2273).
Fix? no

So this patch try to protect it with the ext4_lock_group.

btw, it is found by xfstests test case 269 and the volume is
mkfsed with the parameter
"-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr"
and I have run it 100 times and the error in e2fsck doesn't
show up again.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-28 18:20:59 -04:00
Zheng Liu 8563000d3b ext4: use consistent ssize_t type in ext4_file_write()
The generic_file_aio_write() function returns ssize_t, and
ext4_file_write() returns a ssize_t, so use a ssize_t to collect the
return value from generic_file_aio_write().  It shouldn't matter since
the VFS read/write paths shouldn't allow a read greater than MAX_INT,
but there was previously a bug in the AIO code paths, and it's best if
we use a consistent type so that the return value from
generic_file_aio_write() can't get truncated.

Reported-by: Jouni Siren <jouni.siren@iki.fi>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-28 18:06:51 -04:00
Zheng Liu 4a3c3a5120 ext4: fix format flag in ext4_ext_binsearch_idx()
fix ext_debug format flag in ext4_ext_binsearch_idx().

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-28 17:55:16 -04:00
Zheng Liu 400db9d301 ext4: cleanup in ext4_discard_allocated_blocks()
remove 'len' variable in ext4_discard_allocated_blocks() because it is
useless.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-28 17:53:53 -04:00
Theodore Ts'o 2cde417de0 ext4: return ENOMEM when mounts fail due to lack of memory
This is a port of the ext3 commit: 4569cd1b0d

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-28 17:49:54 -04:00
Theodore Ts'o 2716b80284 ext4: remove redundundant "(char *) bh->b_data" casts
The b_data field of the buffer_head is already a char *, so there's no
point casting it to a char *.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-28 17:47:52 -04:00
Trond Myklebust cc0a984368 NFSv4: Add debugging printks to state manager
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-28 17:21:57 -04:00
Trond Myklebust fb13bfa7e1 NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO
If a file OPEN is denied due to a share lock, the resulting
NFS4ERR_SHARE_DENIED is currently mapped to the default EIO.
This patch adds a more appropriate mapping, and brings Linux
into line with what Solaris 10 does.

See https://bugzilla.kernel.org/show_bug.cgi?id=43286

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-05-28 17:21:48 -04:00
Andreas Dilger 7e936b7372 ext4: disallow hard-linked directory in ext4_lookup
A hard-linked directory to its parent can cause the VFS to deadlock,
and is a sign of a corrupted file system.  So detect this case in
ext4_lookup(), before the rmdir() lockup scenario can take place.

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-28 17:02:25 -04:00
Linus Torvalds a01ee165a1 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull exofs updates from Boaz Harrosh:
 "Just a couple of patches.  The first is a BUG fix destined for stable
  which missed the 3.4-rc7 Kernel.  The second is just a fixture
  addition so exofs is able to be better exported as a cluster file
  system via pNFS."

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: Add SYSFS info for autologin/pNFS export
  exofs: Fix CRASH on very early IO errors.
2012-05-28 13:10:41 -07:00
Haogang Chen 967ac8af44 ext4: fix potential integer overflow in alloc_flex_gd()
In alloc_flex_gd(), when flexbg_size is large, kmalloc size would
overflow and flex_gd->groups would point to a buffer smaller than
expected, causing OOB accesses when it is used.

Note that in ext4_resize_fs(), flexbg_size is calculated using
sbi->s_log_groups_per_flex, which is read from the disk and only bounded
to [1, 31]. The patch returns NULL for too large flexbg_size.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-28 14:21:55 -04:00
Akira Fujita 9d99012ff2 ext4: remove needs_recovery in ext4_mb_init()
needs_recovery in ext4_mb_init() is not used, remove it.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.ne.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-28 14:19:25 -04:00
Eric Sandeen 7e84b62164 ext4: force ro mount if ext4_setup_super() fails
If ext4_setup_super() fails i.e. due to a too-high revision,
the error is logged in dmesg but the fs is not mounted RO as
indicated.

Tested by:

# mkfs.ext4 -r 4 /dev/sdb6
# mount /dev/sdb6 /mnt/test
# dmesg | grep "too high"
[164919.759248] EXT4-fs (sdb6): revision level too high, forcing read-only mode
# grep sdb6 /proc/mounts
/dev/sdb6 /mnt/test2 ext4 rw,seclabel,relatime,data=ordered 0 0

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-28 14:17:25 -04:00
Dan Carpenter bb3d132a24 ext4: fix potential NULL dereference in ext4_free_inodes_counts()
The ext4_get_group_desc() function returns NULL on error, and
ext4_free_inodes_count() function dereferences it without checking.
There is a check on the next line, but it's too late.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-28 14:16:57 -04:00
Linus Torvalds 90324cc1b1 avoid iput() from flusher thread
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr
 uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP
 +AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB
 KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc
 ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS
 tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV
 4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F
 Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv
 5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63
 BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN
 NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF
 /9c4zxxekjHZnn2oooEa
 =oLXf
 -----END PGP SIGNATURE-----

Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback tree from Wu Fengguang:
 "Mainly from Jan Kara to avoid iput() in the flusher threads."

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Avoid iput() from flusher thread
  vfs: Rename end_writeback() to clear_inode()
  vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
  writeback: Refactor writeback_single_inode()
  writeback: Remove wb->list_lock from writeback_single_inode()
  writeback: Separate inode requeueing after writeback
  writeback: Move I_DIRTY_PAGES handling
  writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
  writeback: Move clearing of I_SYNC into inode_sync_complete()
  writeback: initialize global_dirty_limit
  fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
  mm: page-writeback.c: local functions should not be exposed globally
2012-05-28 09:54:45 -07:00
Trond Myklebust 359d7d1c97 NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE
We're already invalidating the data cache, and setting the new change
attribute. Since directories don't care about the i_size field, there
is no need to be forcing any extra revalidation of the page cache.

We do keep the NFS_INO_INVALID_ATTR flag, in order to force an
attribute cache revalidation on stat() calls since we do not
update the mtime and ctime fields.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-28 10:05:47 -04:00
Trond Myklebust f2c1b5100d NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error
The results from a call to nfs4_proc_create_session() should always
be fed into nfs4_handle_reclaim_lease_error, so that we can
handle errors such as NFS4ERR_SEQ_MISORDERED correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:50:44 -04:00
Trond Myklebust 9f594791dd NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION
Let nfs4_schedule_session_recovery() handle the details of choosing
between resetting the session, and other session related recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:33:07 -04:00
Trond Myklebust 7c5d725684 NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:33:07 -04:00
Trond Myklebust bf674c8228 NFSv4.1: Handle errors in nfs4_bind_conn_to_session
Ensure that we handle NFS4ERR_DELAY errors separately, and then
let nfs4_recovery_handle_error() handle all other cases.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:32:06 -04:00
Trond Myklebust 43ac544cb3 NFSv4.1: nfs4_bind_conn_to_session should drain the session
In order to avoid races with other RPC calls that end up setting the
NFS4CLNT_BIND_CONN_TO_SESSION flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:00:08 -04:00
Darrick J. Wong e93376c20b ext4/jbd2: add metadata checksumming to the list of supported features
Activate the metadata checksumming feature by adding it to ext4 and
jbd2's lists of supported features.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27 08:12:42 -04:00
Darrick J. Wong c390087591 jbd2: checksum data blocks that are stored in the journal
Calculate and verify checksums of each data block being stored in the journal.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27 08:12:12 -04:00
Darrick J. Wong 1f56c5890e jbd2: checksum commit blocks
Calculate and verify the checksum of commit blocks.  In checksum v2,
deprecate most of the checksum v1 commit block checksum fields, since
each block has its own checksum.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27 08:10:25 -04:00
Darrick J. Wong 3caa487f53 jbd2: checksum descriptor blocks
Calculate and verify a checksum of each descriptor block.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27 08:10:22 -04:00
Darrick J. Wong 42a7106de6 jbd2: checksum revocation blocks
Compute and verify revoke blocks inside the journal.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27 08:08:24 -04:00
Darrick J. Wong 4fd5ea43bc jbd2: checksum journal superblock
Calculate and verify a checksum covering the journal superblock.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27 08:08:22 -04:00
Darrick J. Wong 01b5adcebb jbd2: Grab a reference to the crc32c driver if necessary
Obtain a reference to the crc32c driver if needed for the v2 checksum.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27 07:50:56 -04:00
Darrick J. Wong 25ed6e8a54 jbd2: enable journal clients to enable v2 checksumming
Add in the necessary code so that journal clients can enable the new
journal checksumming features.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-27 07:48:56 -04:00
Linus Torvalds 36126f8f2e word-at-a-time: make the interfaces truly generic
This changes the interfaces in <asm/word-at-a-time.h> to be a bit more
complicated, but a lot more generic.

In particular, it allows us to really do the operations efficiently on
both little-endian and big-endian machines, pretty much regardless of
machine details.  For example, if you can rely on a fast population
count instruction on your architecture, this will allow you to make your
optimized <asm/word-at-a-time.h> file with that.

NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
not truly generic, it actually only works on big-endian.  Why? Because
on little-endian the generic algorithms are wasteful, since you can
inevitably do better. The x86 implementation is an example of that.

(The only truly non-generic part of the asm-generic implementation is
the "find_zero()" function, and you could make a little-endian version
of it.  And if the Kbuild infrastructure allowed us to pick a particular
header file, that would be lovely)

The <asm/word-at-a-time.h> functions are as follows:

 - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
   uses.

 - has_zero(): take a word, and determine if it has a zero byte in it.
   It gets the word, the pointer to the constant pool, and a pointer to
   an intermediate "data" field it can set.

   This is the "quick-and-dirty" zero tester: it's what is run inside
   the hot loops.

 - "prep_zero_mask()": take the word, the data that has_zero() produced,
   and the constant pool, and generate an *exact* mask of which byte had
   the first zero.  This is run directly *outside* the loop, and allows
   the "has_zero()" function to answer the "is there a zero byte"
   question without necessarily getting exactly *which* byte is the
   first one to contain a zero.

   If you do multiple byte lookups concurrently (eg "hash_name()", which
   looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
   phase, the result of those can be or'ed together to get the "either
   or" case.

 - The result from "prep_zero_mask()" can then be fed into "find_zero()"
   (to find the byte offset of the first byte that was zero) or into
   "zero_bytemask()" (to find the bytemask of the bytes preceding the
   zero byte).

   The existence of zero_bytemask() is optional, and is not necessary
   for the normal string routines.  But dentry name hashing needs it, so
   if you enable DENTRY_WORD_AT_A_TIME you need to expose it.

This changes the generic strncpy_from_user() function and the dentry
hashing functions to use these modified word-at-a-time interfaces.  This
gets us back to the optimized state of the x86 strncpy that we lost in
the previous commit when moving over to the generic version.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-26 11:33:40 -07:00
Trond Myklebust 32b0131069 NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid
If the EXCHGID4_FLAG_CONFIRMED_R flag is set, the client is in theory
supposed to already know the correct value of the seqid, in which case
RFC5661 states that it should ignore the value returned.

Also ensure that if the sanity check in nfs4_check_cl_exchange_flags
fails, then we must not change the nfs_client fields.

Finally, clean up the code: we don't need to retest the value of
'status' unless it can change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-26 14:17:31 -04:00
Trond Myklebust 6624553910 NFSv4.1: Add DESTROY_CLIENTID
Ensure that we destroy our lease on last unmount

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-26 14:17:30 -04:00
Jan Schmidt f29021b29a Btrfs: add tree mod log to fs_info
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:54 +02:00
Jan Schmidt 815a51c74a Btrfs: dummy extent buffers for tree mod log
The tree modification log needs two ways to create dummy extent buffers,
once by allocating a fresh one (to rebuild an old root) and once by
cloning an existing one (to make private rewind modifications) to it.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:54 +02:00
Jan Schmidt 64947ec0d1 Btrfs: move struct seq_list to ctree.h
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:53 +02:00
Jan Schmidt 5581a51a59 Btrfs: don't set for_cow parameter for tree block functions
Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed
parameter for_cow = 1. In fact, these two functions should never mark
their tree modification operations as for_cow, because they can change
the number of blocks referenced by a tree.

Hence, we remove the extra for_cow parameter from these functions and
make them pass a zero down.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:53 +02:00
Jan Schmidt 976b1908d9 Btrfs: look into the extent during find_all_leafs
Before this patch we called find_all_leafs for a data extent, then called
find_all_roots and then looked into the extent to grab the information
we were seeking. This was done without holding the leaves locked to avoid
deadlocks. However, this can obviouly race with concurrent tree
modifications.

Instead, we now look into the extent while we're holding the lock during
find_all_leafs and store this information together with the leaf list.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:52 +02:00
Jan Schmidt d5c88b735f Btrfs: bugfix: ignore the wrong key for indirect tree block backrefs
The key we store with a tree block backref is only a hint. It is set when
the ref is created and can remain correct for a long time. As the tree is
rebalanced, however, eventually the key no longer points to the correct
destination.

With this patch, we change find_parent_nodes to no longer add keys unless it
knows for sure they're correct (e.g. because they're for an extent data
backref). Then when we later encounter a backref ref with no parent and no
key set, we grab the block and take the first key from the block itself.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:51 +02:00
Jan Schmidt dadcaf78b5 Btrfs: bugfix in btrfs_find_parent_nodes
That one has been around since the addition of backref.c. Due to the way we
calculate our slot numbers, after adding inline refs we're missing one keyed
ref unless it's located at the beginning of a new leaf.

Reported-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:51 +02:00
Jan Schmidt cd1b413c5c Btrfs: ulist realloc bugfix
ulist_next gets the pointer to the previously returned element to find the
next element from there. However, when we call ulist_add while iteration
with ulist_next is in progress (ulist explicitly supports this), we can
realloc the ulist internal memory, which makes the pointer to the previous
element useless.

Instead, we now use an iterator parameter that's independent from the
internal pointers.

Reported-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:49 +02:00
Trond Myklebust 2cf047c994 NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
2012-05-25 18:02:10 -04:00
Trond Myklebust 848f5bda54 NFSv4.1: Ensure we use the correct credentials for session create/destroy
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 18:02:09 -04:00
Trond Myklebust ad24ecfbcd NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations
For backward compatibility with nfs-utils.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
2012-05-25 18:02:09 -04:00
Trond Myklebust 89a217360e NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease
Apparently the patch "NFS: Always use the same SETCLIENTID boot verifier"
is tickling a Linux nfs server bug, and causing a regression: the server
can get into a situation where it keeps replying NFS4ERR_SEQ_MISORDERED
to our CREATE_SESSION request even when we are sending the correct
sequence ID.

Fix this by purging the lease and then retrying.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 16:17:42 -04:00
Trond Myklebust be0bfed002 NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM
Otherwise we can end up not sending a new exchange-id/setclientid

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 16:17:13 -04:00
Trond Myklebust 2a6ee6aa2f NFSv4: Clean up the error handling for nfs4_reclaim_lease
Try to consolidate the error handling for nfs4_reclaim_lease into
a single function instead of doing a bit here, and a bit there...

Also ensure that NFS4CLNT_PURGE_STATE handles errors correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 16:17:13 -04:00
Linus Torvalds ece78b7df7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3 and quota fixes from Jan Kara:
 "Interesting bits are:
   - removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
     quota code does not need i_mutex anymore in any unusual way.
   - backport (from ext4) of a fix of a checkpointing bug (missing cache
     flush) that could lead to fs corruption on power failure

  The rest are just random small fixes & cleanups."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: trivial fix to comment for ext2_free_blocks
  ext2: remove the redundant comment for ext2_export_ops
  ext3: return 32/64-bit dir name hash according to usage type
  quota: Get rid of nested I_MUTEX_QUOTA locking subclass
  quota: Use precomputed value of sb_dqopt in dquot_quota_sync
  ext2: Remove i_mutex use from ext2_quota_write()
  reiserfs: Remove i_mutex use from reiserfs_quota_write()
  ext4: Remove i_mutex use from ext4_quota_write()
  ext3: Remove i_mutex use from ext3_quota_write()
  quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
  jbd: Write journal superblock with WRITE_FUA after checkpointing
  jbd: protect all log tail updates with j_checkpoint_mutex
  jbd: Split updating of journal superblock and marking journal empty
  ext2: do not register write_super within VFS
  ext2: Remove s_dirt handling
  ext2: write superblock only once on unmount
  ext3: update documentation with barrier=1 default
  ext3: remove max_debt in find_group_orlov()
  jbd: Refine commit writeout logic
2012-05-25 08:14:59 -07:00
Linus Torvalds b5f4035adf Features:
* Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly.
  * Fix self-ballooning code, and balloon logic when booting as initial domain.
  * Move array printing code to generic debugfs
  * Support XenBus domains.
  * Lazily free grants when a domain is dead/non-existent.
  * In M2P code use batching calls
 Bug-fixes:
  * Fix NULL dereference in allocation failure path (hvc_xen)
  * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
  * Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPu9MpAAoJEFjIrFwIi8fJaNQH/RylThiO+O+LBpPrO8VRUw+2
 /Io98T7ZK2ggoUeaJx0C8irM0JMFAkxGMcfX3w9fwNt/BTec4s++4JhbN1jYN0da
 6a0PqINo+M8y73So6CBfuJDCunaRLGKVG/ibIO3Y3WAff51/H+DMvO7uYYDAE0aA
 mikyOxnaty0DiG5i4JGDHGmCzDASfK/jgGccZ03m6522mDx5ZIbTzZWONLfz8dqT
 rbxnn9vrNLgEYWuzyLMwW0GymToUtt01xBQvwJLAbhn8lr1WBRBLpxXA+5iYNQrn
 Ri25G7keYJhG4uwZfaHnR+4HTrmhlGzK1Z96dkqpGUaeIcdyWmPMp22VtBBiwG8=
 =uyRr
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen updates from Konrad Rzeszutek Wilk:
 "Features:
   * Extend the APIC ops implementation and add IRQ_WORKER vector
     support so that 'perf' can work properly.
   * Fix self-ballooning code, and balloon logic when booting as initial
     domain.
   * Move array printing code to generic debugfs
   * Support XenBus domains.
   * Lazily free grants when a domain is dead/non-existent.
   * In M2P code use batching calls
  Bug-fixes:
   * Fix NULL dereference in allocation failure path (hvc_xen)
   * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
   * Fix HVM guest resume - we would leak an PIRQ value instead of
     reusing the existing one."

Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of
apic ipi interface next to the new apic_id functions.

* tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: do not map the same GSI twice in PVHVM guests.
  hvc_xen: NULL dereference on allocation failure
  xen: Add selfballoning memory reservation tunable.
  xenbus: Add support for xenbus backend in stub domain
  xen/smp: unbind irqworkX when unplugging vCPUs.
  xen: enter/exit lazy_mmu_mode around m2p_override calls
  xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep
  xen: implement IRQ_WORK_VECTOR handler
  xen: implement apic ipi interface
  xen/setup: update VA mapping when releasing memory during setup
  xen/setup: Combine the two hypercall functions - since they are quite similar.
  xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
  xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
  xen/gnttab: add deferred freeing logic
  debugfs: Add support to print u32 array in debugfs
  xen/p2m: An early bootup variant of set_phys_to_machine
  xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
  xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
  xen/p2m: Move code around to allow for better re-usage.
2012-05-24 16:02:08 -07:00
Linus Torvalds ce004178be Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc changes from David S. Miller:
 "This has the generic strncpy_from_user() implementation architectures
  can now use, which we've been developing on linux-arch over the past
  few days.

  For good measure I ran both a 32-bit and a 64-bit glibc testsuite run,
  and the latter of which pointed out an adjustment I needed to make to
  sparc's user_addr_max() definition.  Linus, you were right, STACK_TOP
  was not the right thing to use, even on sparc itself :-)

  From Sam Ravnborg, we have a conversion of sparc32 over to the common
  alloc_thread_info_node(), since the aspect which originally blocked
  our doing so (sun4c) has been removed."

Fix up trivial arch/sparc/Kconfig and lib/Makefile conflicts.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Fix user_addr_max() definition.
  lib: Sparc's strncpy_from_user is generic enough, move under lib/
  kernel: Move REPEAT_BYTE definition into linux/kernel.h
  sparc: Increase portability of strncpy_from_user() implementation.
  sparc: Optimize strncpy_from_user() zero byte search.
  sparc: Add full proper error handling to strncpy_from_user().
  sparc32: use the common implementation of alloc_thread_info_node()
2012-05-24 15:10:28 -07:00
Linus Torvalds 9978306e31 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
Pull XFS update from Ben Myers:

 - Removal of xfsbufd
 - Background CIL flushes have been moved to a workqueue.
 - Fix to xfs_check_page_type applicable to filesystems where
   blocksize < page size
 - Fix for stale data exposure when extsize hints are used.
 - A series of xfs_buf cache cleanups.
 - Fix for XFS_IOC_ALLOCSP
 - Cleanups for includes and removal of xfs_lrw.[ch].
 - Moved all busy extent handling to it's own file so that it is easier
   to merge with userspace.
 - Fix for log mount failure.
 - Fix to enable inode reclaim during quotacheck at mount time.
 - Fix for delalloc quota accounting.
 - Fix for memory reclaim deadlock on agi buffer.
 - Fixes for failed writes and to clean up stale delalloc blocks.
 - Fix to use GFP_NOFS in blkdev_issue_flush
 - SEEK_DATA/SEEK_HOLE support

* 'for-linus' of git://oss.sgi.com/xfs/xfs: (57 commits)
  xfs: add trace points for log forces
  xfs: fix memory reclaim deadlock on agi buffer
  xfs: fix delalloc quota accounting on failure
  xfs: protect xfs_sync_worker with s_umount semaphore
  xfs: introduce SEEK_DATA/SEEK_HOLE support
  xfs: make xfs_extent_busy_trim not static
  xfs: make XBF_MAPPED the default behaviour
  xfs: flush outstanding buffers on log mount failure
  xfs: Properly exclude IO type flags from buffer flags
  xfs: clean up xfs_bit.h includes
  xfs: move xfs_do_force_shutdown() and kill xfs_rw.c
  xfs: move xfs_get_extsz_hint() and kill xfs_rw.h
  xfs: move xfs_fsb_to_db to xfs_bmap.h
  xfs: clean up busy extent naming
  xfs: move busy extent handling to it's own file
  xfs: move xfsagino_t to xfs_types.h
  xfs: use iolock on XFS_IOC_ALLOCSP calls
  xfs: kill XBF_DONTBLOCK
  xfs: kill xfs_read_buf()
  xfs: kill XBF_LOCK
  ...
2012-05-24 14:14:46 -07:00
Trond Myklebust bbafffd293 NFSv4.1: Exchange ID must use GFP_NOFS allocation mode
Exchange ID can be called in a lease reclaim situation, so it
will deadlock if it then tries to write out dirty NFS pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:31:39 -04:00
Weston Andros Adamson a9e64442f1 nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN*
The state manager can handle SEQ4_STATUS_CB_PATH_DOWN* flags with a
BIND_CONN_TO_SESSION instead of destroying the session and creating a new one.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:26:21 -04:00
Weston Andros Adamson 7c44f1ae4a nfs4.1: add BIND_CONN_TO_SESSION operation
This patch adds the BIND_CONN_TO_SESSION operation which is needed for
upcoming SP4_MACH_CRED work and useful for recovering from broken connections
without destroying the session.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:22:19 -04:00
Andy Adamson d23d61c8d3 NFSv4.1 test the mdsthreshold hint parameters
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:49 -04:00
Andy Adamson 2701d086db NFSv4.1 add nfs_inode book keeping for mdsthreshold
Keep track of the number of bytes read or written via buffered, direct, and
mem-mapped i/o for use by mdsthreshold size_io hints.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:48 -04:00
Andy Adamson 82be417aa3 NFSv4.1 cache mdsthreshold values on OPEN
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:48 -04:00
Andy Adamson 88034c3d88 NFSv4.1 mdsthreshold attribute xdr
We only support one layout type per file system, so one threshold_item4 per
mdsthreshold4.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:47 -04:00
David S. Miller 446969084d kernel: Move REPEAT_BYTE definition into linux/kernel.h
And make sure that everything using it explicitly includes
that header file.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-24 13:10:05 -07:00
Linus Torvalds 28f3d71761 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull more networking updates from David Miller:
 "Ok, everything from here on out will be bug fixes."

1) One final sync of wireless and bluetooth stuff from John Linville.
   These changes have all been in his tree for more than a week, and
   therefore have had the necessary -next exposure.  John was just away
   on a trip and didn't have a change to send the pull request until a
   day or two ago.

2) Put back some defines in user exposed header file areas that were
   removed during the tokenring purge.  From Stephen Hemminger and Paul
   Gortmaker.

3) A bug fix for UDP hash table allocation got lost in the pile due to
   one of those "you got it..  no I've got it.." situations.  :-)

   From Tim Bird.

4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll
   try to coalesce overlapping frags and crash.  Fix from Eric Dumazet.

5) RCU routing table lookups can race with free_fib_info(), causing
   crashes when we deref the device pointers in the route.  Fix by
   releasing the net device in the RCU callback.  From Yanmin Zhang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits)
  tcp: take care of overlaps in tcp_try_coalesce()
  ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
  mm: add a low limit to alloc_large_system_hash
  ipx: restore token ring define to include/linux/ipx.h
  if: restore token ring ARP type to header
  xen: do not disable netfront in dom0
  phy/micrel: Fix ID of KSZ9021
  mISDN: Add X-Tensions USB ISDN TA XC-525
  gianfar:don't add FCB length to hard_header_len
  Bluetooth: Report proper error number in disconnection
  Bluetooth: Create flags for bt_sk()
  Bluetooth: report the right security level in getsockopt
  Bluetooth: Lock the L2CAP channel when sending
  Bluetooth: Restore locking semantics when looking up L2CAP channels
  Bluetooth: Fix a redundant and problematic incoming MTU check
  Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C
  Bluetooth: Fix EIR data generation for mgmt_device_found
  Bluetooth: Fix Inquiry with RSSI event mask
  Bluetooth: improve readability of l2cap_seq_list code
  Bluetooth: Fix skb length calculation
  ...
2012-05-24 11:54:29 -07:00
Tim Bird 31fe62b958 mm: add a low limit to alloc_large_system_hash
UDP stack needs a minimum hash size value for proper operation and also
uses alloc_large_system_hash() for proper NUMA distribution of its hash
tables and automatic sizing depending on available system memory.

On some low memory situations, udp_table_init() must ignore the
alloc_large_system_hash() result and reallocs a bigger memory area.

As we cannot easily free old hash table, we leak it and kmemleak can
issue a warning.

This patch adds a low limit parameter to alloc_large_system_hash() to
solve this problem.

We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table
allocation.

Reported-by: Mark Asselstine <mark.asselstine@windriver.com>
Reported-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-24 00:28:21 -04:00
Linus Torvalds 644473e9c6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace enhancements from Eric Biederman:
 "This is a course correction for the user namespace, so that we can
  reach an inexpensive, maintainable, and reasonably complete
  implementation.

  Highlights:
   - Config guards make it impossible to enable the user namespace and
     code that has not been converted to be user namespace safe.

   - Use of the new kuid_t type ensures the if you somehow get past the
     config guards the kernel will encounter type errors if you enable
     user namespaces and attempt to compile in code whose permission
     checks have not been updated to be user namespace safe.

   - All uids from child user namespaces are mapped into the initial
     user namespace before they are processed.  Removing the need to add
     an additional check to see if the user namespace of the compared
     uids remains the same.

   - With the user namespaces compiled out the performance is as good or
     better than it is today.

   - For most operations absolutely nothing changes performance or
     operationally with the user namespace enabled.

   - The worst case performance I could come up with was timing 1
     billion cache cold stat operations with the user namespace code
     enabled.  This went from 156s to 164s on my laptop (or 156ns to
     164ns per stat operation).

   - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
     Most uid/gid setting system calls treat these value specially
     anyway so attempting to use -1 as a uid would likely cause
     entertaining failures in userspace.

   - If setuid is called with a uid that can not be mapped setuid fails.
     I have looked at sendmail, login, ssh and every other program I
     could think of that would call setuid and they all check for and
     handle the case where setuid fails.

   - If stat or a similar system call is called from a context in which
     we can not map a uid we lie and return overflowuid.  The LFS
     experience suggests not lying and returning an error code might be
     better, but the historical precedent with uids is different and I
     can not think of anything that would break by lying about a uid we
     can't map.

   - Capabilities are localized to the current user namespace making it
     safe to give the initial user in a user namespace all capabilities.

  My git tree covers all of the modifications needed to convert the core
  kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  userns:  Silence silly gcc warning.
  cred: use correct cred accessor with regards to rcu read lock
  userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  userns: Convert cgroup permission checks to use uid_eq
  userns: Convert tmpfs to use kuid and kgid where appropriate
  userns: Convert sysfs to use kgid/kuid where appropriate
  userns: Convert sysctl permission checks to use kuid and kgids.
  userns: Convert proc to use kuid/kgid where appropriate
  userns: Convert ext4 to user kuid/kgid where appropriate
  userns: Convert ext3 to use kuid/kgid where appropriate
  userns: Convert ext2 to use kuid/kgid where appropriate.
  userns: Convert devpts to use kuid/kgid where appropriate
  userns: Convert binary formats to use kuid/kgid where appropriate
  userns: Add negative depends on entries to avoid building code that is userns unsafe
  userns: signal remove unnecessary map_cred_ns
  userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  userns: Convert stat to return values mapped from kuids and kgids
  userns: Convert user specfied uids and gids in chown into kuids and kgid
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  ...
2012-05-23 17:42:39 -07:00
Linus Torvalds 468f4d1a85 Power management updates for 3.5
* Implementation of opportunistic suspend (autosleep) and user space interface
   for manipulating wakeup sources.
 
 * Hibernate updates from Bojan Smojver and Minho Ban.
 
 * Updates of the runtime PM core and generic PM domains framework related to
   PM QoS.
 
 * Assorted fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIcBAABAgAGBQJPu+jwAAoJEKhOf7ml8uNsOw0P/0w1FqXD64a1laE43JIlBe9w
 yHEcLHc9MXN+8lS0XQ6jFiL/VC3U5Sj7Ro+DFKcL2MWX//dfDcZcwA9ep/qh4tHV
 tJ987IijdWqJV14pde3xQafhp/9i12rArLxns7S5fzkdfVk0iDjhZZaZy4afFJYM
 SuCsDhCwWefZh89+oLikByiFPnhW+f2ZC9YQeokBM/XvZLtxmOiVfL6duloT/Cr+
 58jkrJ8xz/5kmmN4bXM4Wlpf9ZIYFXbvtbKrq3GZOXc+LpNKlWQyFgg/pIuxBewC
 uSgsNXXV0LFDi5JfER/8l9MMLtJwwc4VHzpLvMnRv+GtwO2/FKIIr9Fcv000IL2N
 0/Ppr52M7XpRruM/k+YroUQ4F1oBX6HB4e3rwqC+XG6n5bwn/Jc7kdy7aUojqNLG
 Nlr5f0vBjLTSF66Jnel71Bn+gbA1ogER7E+esSTMpyX+RgGJAUVt5oX9IjbXl3PI
 bk8xW1csSRxBI2NkFOd9EM3vMzdGc5uu+iOoy7iBvcAK0AEfo2Ml9YuSVFQeqAu0
 A96MUW155A+GKMC7I/LK8pTgMvYDedWhVW9uyXpMRjwdFC5/ywZU1aM00tL9HMpG
 pzHOFJgsYrf/6VCV8BwqgudRYd0K5EPSGeITCg973os/XzJIOCfJuy+Pn5V/F0ew
 lTbi8ipQD0Hh8A/Xt0QB
 =Q2vo
 -----END PGP SIGNATURE-----

Merge tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:

 - Implementation of opportunistic suspend (autosleep) and user space
   interface for manipulating wakeup sources.

 - Hibernate updates from Bojan Smojver and Minho Ban.

 - Updates of the runtime PM core and generic PM domains framework
   related to PM QoS.

 - Assorted fixes.

* tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
  epoll: Fix user space breakage related to EPOLLWAKEUP
  PM / Domains: Make it possible to add devices to inactive domains
  PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format
  PM / Domains: Fix computation of maximum domain off time
  PM / Domains: Fix link checking when add subdomain
  PM / Sleep: User space wakeup sources garbage collector Kconfig option
  PM / Sleep: Make the limit of user space wakeup sources configurable
  PM / Documentation: suspend-and-cpuhotplug.txt: Fix typo
  PM / Domains: Cache device stop and domain power off governor results, v3
  PM / Domains: Make device removal more straightforward
  PM / Sleep: Fix a mistake in a conditional in autosleep_store()
  epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready
  PM / QoS: Create device constraints objects on notifier registration
  PM / Runtime: Remove device fields related to suspend time, v2
  PM / Domains: Rework default domain power off governor function, v2
  PM / Domains: Rework default device stop governor function, v2
  PM / Sleep: Add user space interface for manipulating wakeup sources, v3
  PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources
  PM / Sleep: Implement opportunistic sleep, v2
  PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints
  ...
2012-05-23 14:07:06 -07:00
Trond Myklebust 54ac471c83 NFS: Add memory barriers to the nfs_client->cl_cons_state initialisation
Ensure that a process that uses the nfs_client->cl_cons_state test
for whether the initialisation process is finished does not read
stale data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-23 15:24:59 -04:00
Trond Myklebust 4697bd5e94 NFSv4: Fix a race in the net namespace mount notification
Since the struct nfs_client gets added to the global nfs_client_list
before it is initialised, it is possible that rpc_pipefs_event can
end up trying to create idmapper entries on such a thing.

The solution is to have the mount notification wait for the
initialisation of each nfs_client to complete, and then to
skip any entries for which the it failed.

Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-05-23 15:21:13 -04:00
Trond Myklebust 7b38c3682c NFSv4.1: Fix session initialisation races
Session initialisation is not complete until the lease manager
has run. We need to ensure that both nfs4_init_session and
nfs4_init_ds_session do so, and that they check for any resulting
errors in clp->cl_cons_state.

Only after this is done, can nfs4_ds_connect check the contents
of clp->cl_exchange_flags.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
2012-05-23 15:20:57 -04:00
Linus Torvalds ec0d7f18ab Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull fpu state cleanups from Ingo Molnar:
 "This tree streamlines further aspects of FPU handling by eliminating
  the prepare_to_copy() complication and moving that logic to
  arch_dup_task_struct().

  It also fixes the FPU dumps in threaded core dumps, removes and old
  (and now invalid) assumption plus micro-optimizes the exit path by
  avoiding an FPU save for dead tasks."

Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came
in because we now do the FPU handling in arch_dup_task_struct() rather
than the legacy (and now gone) prepare_to_copy().

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: drop the fpu state during thread exit
  x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  fork: move the real prepare_to_copy() users to arch_dup_task_struct()
2012-05-23 10:59:07 -07:00
Shirish Pargaonkar 2c0c2a08be cifs: fix oops while traversing open file list (try #4)
While traversing the linked list of open file handles, if the identfied
file handle is invalid, a reopen is attempted and if it fails, we
resume traversing where we stopped and cifs can oops while accessing
invalid next element, for list might have changed.

So mark the invalid file handle and attempt reopen if no
valid file handle is found in rest of the list.
If reopen fails, move the invalid file handle to the end of the list
and start traversing the list again from the begining.
Repeat this four times before giving up and returning an error if
file reopen keeps failing.

Cc: <stable@vger.kernel.org>
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-23 12:33:18 +04:00
Sedat Dilek ea4b574028 cifs: Fix comment as d_alloc_root() is replaced by d_make_root()
For more details see <file: Documentation/filesystems/porting>.

Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-23 12:33:16 +04:00
Steve French 1080ef758f CIFS: Introduce SMB2 mounts as vers=2.1
As with Linux nfs client, which uses "nfsvers=" or "vers=" to
indicate which protocol to use for mount, specifying

"vers=2.1"

will force an SMB2 mount. When vers is not specified CIFS is used

"vers=1"

We can eventually autonegotiate down from SMB2 to CIFS
when SMB2 is stable enough to make it the default, but this
is for the future. At that time we could also implement a
"maxprotocol" mount option as smbclient and Samba have today,
but that would be premature until SMB2 is stable.

Intially the SMB2 Kconfig option will depend on "BROKEN"
until the merge is complete, and then be "EXPERIMENTAL"
When it is no longer experimental we can consider changing
the default protocol to attempt first.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-23 12:33:15 +04:00
Steve French 675f36fb1d CIFS: Introduce SMB2 Kconfig option
SMB2 is the followon to the CIFS (and SMB) protocols
and the default for Windows since Windows Vista, and also
now implemented by various non-Windows servers. SMB2
is more secure, has various performance advantages, including
larger i/o sizes, flow control, better caching model and more.
SMB2 also resolves some scalability limits in the CIFS
protocol and adds many new features while being much
simpler (only a few dozen commands instead of hundreds)
and since the protocol is clearer it is also more consistently
implemented across servers and thus easier to optimize.

After much discussion with Jeff Layton, Jeremy Allison
and others at Connectathon, we decided to move the SMB2
code from a distinct .ko and fstype into distinct
C files that optionally build in cifs.ko. As a result
the Kconfig gets simpler.

To avoid destabilizing CIFS, the SMB2 code is going
to be moved into its own experimental CONFIG_CIFS_SMB2 ifdef
as it is merged and rereviewed. The changes to stable
CIFS (builds with the SMB2 ifdef off) are expected to be
fairly small.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-23 12:33:14 +04:00
Pavel Shilovsky 452757897a CIFS: Move add/set_credits and get_credits_field to ops structure
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-23 12:33:12 +04:00
Pavel Shilovsky 8aa26f3ed8 CIFS: Move protocol specific demultiplex thread calls to ops struct
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-23 12:33:11 +04:00
Pavel Shilovsky eb37871118 CIFS: Move protocol specific part from cifs_readv_receive to ops struct
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-23 12:33:09 +04:00
Pavel Shilovsky 1887f60103 CIFS: Move header_size/max_header_size to ops structure
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-23 12:33:08 +04:00
Pavel Shilovsky 082d0642c6 CIFS: Move protocol specific part from SendReceive2 to ops struct
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-23 12:32:57 +04:00
Darrick J. Wong 8f888ef846 jbd2: change disk layout for metadata checksumming
Define flags and allocate space in on-disk journal structures to support
checksumming of journal metadata.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-05-22 22:43:41 -04:00
Linus Torvalds 6101167727 dlm for 3.5
This set includes some minor fixes and improvements.
 The one large patch addresses the special "nodir" mode,
 which has been a long neglected proof of concept, but
 with these fixes seems to be quite usable.  It allows
 the resource master to be assigned statically instead of
 dynamically, which can improve performance if there is
 little locality and most resources are shared.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPu/MlAAoJEDgbc8f8gGmq860P/0o+tYG2pAUz87WnKg92cGwm
 ajaI78ydY6qOjndcEjbgdX6uWqVQ7f/OKo3drzVH8KFQ67eiaXC4wv2xTL3aymbX
 2Ua55oiVsW+k9d9yK5Dzfa4qAlR5QPV1WEAnoVkiEDNoiGCGecjmVebhK1/Sb5Lu
 1gaIJ3C+3L1ngfAzpfeB+7LwuVB36UlIyBrvPOj6yWiSDgpPaVbTrEU0NaDDDDIi
 oo7tTiqivCZf/GH+ZcIjPE/LBen/lVqXSDU2YShiac/ErRfpRk9rnDFIUeN2nYPd
 JwPjzutFWM+N6HIA2RCBXKo7FkK2rvYXw84/RVMvA4goEH/Qu8yDtBww20BmvFYY
 3guU1udka0/NR7/ap98Btdqsvqco6R2X/rpzx8y1eD1jzUvb6El6yg3PM1Qvd8zQ
 72aVzcdgAI4qtEAVziy5X4omNeQ6a55sUYXlCcvkiwZJQdPzkDuzntC28q3bgJva
 QD0ugX7ltBpHuZZZb2tbBN9hfMqyo7gneaY2OoGVCTb1U9ibb5JgfZOswTC2gQsE
 17vykdL5owQ8bbBj2tkRQiJ8dZoxn23hV+sZrvLm3TR8xF4oJtDqUdRs9K7iX8It
 YxTTCL1LmxHRFG/0Cy2l7VhoqkIKsoVFdavW7pivFNkzp/yQNHk4r2iJWhR9YArV
 qaE2HqIxJsev/B/lBPyo
 =mHOh
 -----END PGP SIGNATURE-----

Merge tag 'dlm-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:
 "This set includes some minor fixes and improvements.  The one large
  patch addresses the special "nodir" mode, which has been a long
  neglected proof of concept, but with these fixes seems to be quite
  usable.  It allows the resource master to be assigned statically
  instead of dynamically, which can improve performance if there is
  little locality and most resources are shared."

* tag 'dlm-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: NULL dereference on failure in kmem_cache_create()
  gfs2: fix recovery during unmount
  dlm: fixes for nodir mode
  dlm: improve error and debug messages
  dlm: avoid unnecessary search in search_rsb
  dlm: limit rcom debug messages
  dlm: fix waiter recovery
  dlm: prevent connections during shutdown
2012-05-22 19:31:38 -07:00
Linus Torvalds 6133308ad1 UBIFS:
* Always support xattrs    (remove the Kconfig option)
    * Always support debugging (remove the Kconfig option)
    * A fix for a memory leak on error path
    * A number of clean-ups
 UBI:
    * Always support debugging (remove the Kconfig option)
    * Remove "data type" hint support
    * Huge amount of renames to prepare for the fastmap wor
    * A lot of clean-ups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPuzTxAAoJECmIfjd9wqK0D/AP/iNOnYWnYZmmO18jDM48kKt/
 Jp7VTAE0l7DBUDxtiIthq4c7YxIE1o1bN9gMmvzZibvwIrZcoAOnQpeL96s1Bc9J
 t0aGm8ONvrtuyFeyxPC0aplWgqWQ49qDLGV/lIVJ+BSGmXMeU4giUIXqbsjyCPR4
 YVJJw6rLTC10EhuAUs99keJxxuN5ZMrCB8y47fD+bkalVxgqNh9JNkKabyjevt5C
 AERVWnP20hnEcwnbQWMHueGWiaqFeesTytNOy6heRi0uL3bNy5nrol7AFXKqnDc9
 OpSkApH6SCO3C8X/bIep2bL9kKiW1LpClxgDIF6p7lj2t2ToPn6PZJbP60zSHQPb
 0bgy1SzHccF3ihIMgCdOXYZ5EomBgKZyDyU6Ec+gAttE00ZbIigNmjFmukwMhO89
 I0bGvjQdKFAFSzo+ffm8xNfYjmmNfB+edLkPaVttjMWAbQ4V831ZPDT07Q11W4TQ
 2p2NDKTps3etbtkemZ/Cm1jeEWI3KuogrFhyDhpcgXc7pxlJbvMg+tt22FusoQ8T
 VPGGT+WhmXfF0ZG/gurI69k8opj4BUhm4EfGL6pGEoUMe1nGp2pSUNv5Kwby1wau
 1wElJt2qO9xdjJ4QlLc+Ux1vm8rCS1iQst9plUX1BZt2bKja7tZaW7uu4hGKqe5u
 UwrosuYcmS1Ei1Rs7Sqz
 =+6Qi
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.5-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI and UBIFS updates from Artem Bityutskiy:

UBIFS:
   * Always support xattrs    (remove the Kconfig option)
   * Always support debugging (remove the Kconfig option)
   * A fix for a memory leak on error path
   * A number of clean-ups
UBI:
   * Always support debugging (remove the Kconfig option)
   * Remove "data type" hint support
   * Huge amount of renames to prepare for the fastmap wor
   * A lot of clean-ups

* tag 'upstream-3.5-rc1' of git://git.infradead.org/linux-ubifs: (54 commits)
  UBI: modify ubi_wl_flush function to clear work queue for a lnum
  UBI: introduce UBI_ALL constant
  UBI: add lnum and vol_id to struct ubi_work
  UBI: add volume id struct ubi_ainf_peb
  UBI: add in hex the value for UBI_INTERNAL_VOL_START to comment
  UBI: rename scan.c to attach.c
  UBI: remove scan.h
  UBI: rename UBI_SCAN_UNKNOWN_EC
  UBI: move and rename attach_by_scanning
  UBI: rename _init_scan functions
  UBI: amend comments after all the renamings
  UBI: rename ubi_scan_leb_slab
  UBI: rename ubi_scan_move_to_list
  UBI: rename ubi_scan_destroy_ai
  UBI: rename ubi_scan_get_free_peb
  UBI: rename ubi_scan_rm_volume
  UBI: rename ubi_scan_find_av
  UBI: rename ubi_scan_add_used
  UBI: remove unused function
  UBI: make ubi_scan_erase_peb static and rename
  ...
2012-05-22 19:30:27 -07:00
Linus Torvalds e8650a0823 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "As usual, it's mostly typo fixes, redundant code elimination and some
  documentation updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits)
  edac, mips: don't change code that has been removed in edac/mips tree
  xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer
  lib: Change mail address of Oskar Schirmer
  net: Change mail address of Oskar Schirmer
  arm/m68k: Change mail address of Sebastian Hess
  i2c: Change mail address of Oskar Schirmer
  net: Fix tcp_build_and_update_options comment in struct tcp_sock
  atomic64_32.h: fix parameter naming mismatch
  Kconfig: replace "--- help ---" with "---help---"
  c2port: fix bogus Kconfig "default no"
  edac: Fix spelling errors.
  qla1280: Remove redundant NULL check before release_firmware() call
  remoteproc: remove redundant NULL check before release_firmware()
  qla2xxx: Remove redundant NULL check before release_firmware() call.
  aic94xx: Get rid of redundant NULL check before release_firmware() call
  tehuti: delete redundant NULL check before release_firmware()
  qlogic: get rid of a redundant test for NULL before call to release_firmware()
  bna: remove redundant NULL test before release_firmware()
  tg3: remove redundant NULL test before release_firmware() call
  typhoon: get rid of redundant conditional before all to release_firmware()
  ...
2012-05-22 19:22:50 -07:00
Linus Torvalds fb09bafda6 Staging tree pull request for 3.5-rc1
Here is the big staging tree pull request for the 3.5-rc1 merge window.
 
 Loads of changes here, and we just narrowly added more lines than we
 added:
  622 files changed, 28356 insertions(+), 26059 deletions(-)
 
 But, good news is that there is a number of subsystems that moved out of
 the staging tree, to their respective "real" portions of the kernel.
 
 Code that moved out was:
 	- iio core code
 	- mei driver
 	- vme core and bridge drivers
 
 There was one broken network driver that moved into staging as a step
 before it is removed from the tree (pc300), and there was a few new
 drivers added to the tree:
 	- new iio drivers
 	- gdm72xx wimax USB driver
 	- ipack subsystem and 2 drivers
 
 All of the movements around have acks from the various subsystem
 maintainers, and all of this has been in the linux-next tree for a
 while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk+7q8MACgkQMUfUDdst+ymjogCguo8fANFVlPWeZGeoBTL+aQfQ
 yTkAoLE0codmh+2SvhulYgyU1Wh6ZDK2
 =nJ2F
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging tree changes from Greg Kroah-Hartman:
 "Here is the big staging tree pull request for the 3.5-rc1 merge
  window.

  Loads of changes here, and we just narrowly added more lines than we
  added:
   622 files changed, 28356 insertions(+), 26059 deletions(-)

  But, good news is that there is a number of subsystems that moved out
  of the staging tree, to their respective "real" portions of the
  kernel.

  Code that moved out was:
	- iio core code
	- mei driver
	- vme core and bridge drivers

  There was one broken network driver that moved into staging as a step
  before it is removed from the tree (pc300), and there was a few new
  drivers added to the tree:
	- new iio drivers
	- gdm72xx wimax USB driver
	- ipack subsystem and 2 drivers

  All of the movements around have acks from the various subsystem
  maintainers, and all of this has been in the linux-next tree for a
  while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fixed up various trivial conflicts, along with a non-trivial one found
in -next and pointed out by Olof Johanssen: a clean - but incorrect -
merge of the arch/arm/boot/dts/at91sam9g20.dtsi file.  Fix up manually
as per Stephen Rothwell.

* tag 'staging-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (536 commits)
  Staging: bcm: Remove two unused variables from Adapter.h
  Staging: bcm: Removes the volatile type definition from Adapter.h
  Staging: bcm: Rename all "INT" to "int" in Adapter.h
  Staging: bcm: Fix warning: __packed vs. __attribute__((packed)) in Adapter.h
  Staging: bcm: Correctly format all comments in Adapter.h
  Staging: bcm: Fix all whitespace issues in Adapter.h
  Staging: bcm: Properly format braces in Adapter.h
  Staging: ipack/bridges/tpci200: remove unneeded casts
  Staging: ipack/bridges/tpci200: remove TPCI200_SHORTNAME constant
  Staging: ipack: remove board_name and bus_name fields from struct ipack_device
  Staging: ipack: improve the register of a bus and a device in the bus.
  staging: comedi: cleanup all the comedi_driver 'detach' functions
  staging: comedi: remove all 'default N' in Kconfig
  staging: line6/config.h: Delete unused header
  staging: gdm72xx depends on NET
  staging: gdm72xx: Set up parent link in sysfs for gdm72xx devices
  staging: drm/omap: initial dmabuf/prime import support
  staging: drm/omap: dmabuf/prime mmap support
  pstore/ram: Add ECC support
  pstore/ram: Switch to persistent_ram routines
  ...
2012-05-22 16:34:21 -07:00
Linus Torvalds 5d4e2d08e7 Driver core pull for 3.5-rc1
Here's the driver core, and other driver subsystems, pull request for
 the 3.5-rc1 merge window.
 
 Outside of a few minor driver core changes, we ended up with the
 following different subsystem and core changes as well, due to
 interdependancies on the driver core:
  - hyperv driver updates
  - drivers/memory being created and some drivers moved into it
  - extcon driver subsystem created out of the old Android staging switch
    driver code
  - dynamic debug updates
  - printk rework, and /dev/kmsg changes
 
 All of this has been tested in the linux-next releases for a few weeks
 with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk+7q28ACgkQMUfUDdst+ykXmwCfcPASzC+/bDkuqdWsqzxlWZ7+
 VOQAnAriySv397St36J6Hz5bMQZwB1Yq
 =SQc+
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg Kroah-Hartman:
 "Here's the driver core, and other driver subsystems, pull request for
  the 3.5-rc1 merge window.

  Outside of a few minor driver core changes, we ended up with the
  following different subsystem and core changes as well, due to
  interdependancies on the driver core:
   - hyperv driver updates
   - drivers/memory being created and some drivers moved into it
   - extcon driver subsystem created out of the old Android staging
     switch driver code
   - dynamic debug updates
   - printk rework, and /dev/kmsg changes

  All of this has been tested in the linux-next releases for a few weeks
  with no reported problems.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fix up conflicts in drivers/extcon/extcon-max8997.c where git noticed
that a patch to the deleted drivers/misc/max8997-muic.c driver needs to
be applied to this one.

* tag 'driver-core-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (90 commits)
  uio_pdrv_genirq: get irq through platform resource if not set otherwise
  memory: tegra{20,30}-mc: Remove empty *_remove()
  printk() - isolate KERN_CONT users from ordinary complete lines
  sysfs: get rid of some lockdep false positives
  Drivers: hv: util: Properly handle version negotiations.
  Drivers: hv: Get rid of an unnecessary check in vmbus_prep_negotiate_resp()
  memory: tegra{20,30}-mc: Use dev_err_ratelimited()
  driver core: Add dev_*_ratelimited() family
  Driver Core: don't oops with unregistered driver in driver_find_device()
  printk() - restore prefix/timestamp printing for multi-newline strings
  printk: add stub for prepend_timestamp()
  ARM: tegra30: Make MC optional in Kconfig
  ARM: tegra20: Make MC optional in Kconfig
  ARM: tegra30: MC: Remove unnecessary BUG*()
  ARM: tegra20: MC: Remove unnecessary BUG*()
  printk: correctly align __log_buf
  ARM: tegra30: Add Tegra Memory Controller(MC) driver
  ARM: tegra20: Add Tegra Memory Controller(MC) driver
  printk() - restore timestamp printing at console output
  printk() - do not merge continuation lines of different threads
  ...
2012-05-22 16:02:13 -07:00
Chuck Lever acdeb69d9c NFS: EXCHANGE_ID should save the server major and minor ID
Save the server major and minor ID results from EXCHANGE_ID, as they
are needed for detecting server trunking.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:48 -04:00
Chuck Lever 4bf590e08f NFS: Add nfs_client behavior flags
"noresvport" and "discrtry" can be passed to nfs_create_rpc_client()
by setting flags in the passed-in nfs_client.  This change makes it
easy to add new flags.

Note that these settings are now "sticky" over the lifetime of a
struct nfs_client, and may even be copied when an nfs_client is
cloned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:47 -04:00
Chuck Lever 8cab4c390b NFS: Refactor nfs_get_client(): initialize nfs_client
Clean up: Continue to rationalize the locking in nfs_get_client() by
moving the logic that handles the case where a matching server IP
address is not found.

When we support server trunking detection, client initialization may
return a different nfs_client struct than was passed to it.  Change
the synopsis of the init_client methods to return an nfs_client.

The client initialization logic in nfs_get_client() is not much more
than a wrapper around ->init_client.  It's simpler to keep the little
bits of error handling in the version-specific init_client methods.

No behavior change is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:47 -04:00
Chuck Lever f411703adc NFS: Refactor nfs_get_client(): add nfs_found_client()
Clean up: Code that takes and releases nfs_client_lock remains in
nfs_get_client().  Logic that handles a pre-existing nfs_client is
moved to a separate function.

No behavior change is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:46 -04:00
Chuck Lever f092075dd3 NFS: Always use the same SETCLIENTID boot verifier
Currently our NFS client assigns a unique SETCLIENTID boot verifier
for each server IP address it knows about.  It's set to CURRENT_TIME
when the struct nfs_client for that server IP is created.

During the SETCLIENTID operation, our client also presents an
nfs_client_id4 string to servers, as an identifier on which the server
can hang all of this client's NFSv4 state.  Our client's
nfs_client_id4 string is unique for each server IP address.

An NFSv4 server is obligated to wipe all NFSv4 state associated with
an nfs_client_id4 string when the client presents the same
nfs_client_id4 string along with a changed SETCLIENTID boot verifier.

When our client unmounts the last of a server's shares, it destroys
that server's struct nfs_client.  The next time the client mounts that
NFS server, it creates a fresh struct nfs_client with a fresh boot
verifier.  On seeing the fresh verifer, the server wipes any previous
NFSv4 state associated with that nfs_client_id4.

However, NFSv4.1 clients are supposed to present the same
nfs_client_id4 string to all servers.  And, to support Transparent
State Migration, the same nfs_client_id4 string should be presented
to all NFSv4.0 servers so they recognize that migrated state for this
client belongs with state a server may already have for this client.
(This is known as the Uniform Client String model).

If the nfs_client_id4 string is the same but the boot verifier changes
for each server IP address, SETCLIENTID and EXCHANGE_ID operations
from such a client could unintentionally result in a server wiping a
client's previously obtained lease.

Thus, if our NFS client is going to use a fixed nfs_client_id4 string,
either for NFSv4.0 or NFSv4.1 mounts, our NFS client should use a
boot verifier that does not change depending on server IP address.
Replace our current per-nfs_client boot verifier with a per-nfs_net
boot verifier.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:46 -04:00
Chuck Lever 2c820d9a97 NFS: Force server to drop NFSv4 state
nfs4_reset_all_state() refreshes the boot verifier a server sees to
trigger that server to wipe this client's state.  This function is
invoked when an NFSv4.1 server reports that it has revoked some or
all of a client's NFSv4 state.

To facilitate server trunking discovery, we will eventually want to
move the cl_boot_time field to a more global structure.  The Uniform
Client String model (and specifically, server trunking detection)
requires that all servers see the same boot verifier until the client
actually does reboot, and not a fresh verifier every time the client
unmounts and remounts the server.

Without the cl_boot_time field, however, nfs4_reset_all_state() will
have to find some other way to force the server to purge the client's
NFSv4 state.

Because these verifiers are opaque (ie, the server doesn't know or
care that they happen to be timestamps), we can force the server
to wipe NFSv4 state by updating the boot verifier as we do now, then
immediately afterwards establish a fresh client ID using the old boot
verifier again.

Hopefully there are no extra paranoid server implementations that keep
track of the client's boot verifiers and prevent clients from reusing
a previous one.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:45 -04:00
Chuck Lever ce1c8fc12d NFS: Remove nfs_unique_id
Clean up:  this structure is unused.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:45 -04:00
Chuck Lever 177313f149 NFS: Clean up return code checking in nfs4_proc_exchange_id()
Clean up: update to use matching types in "if" expressions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:44 -04:00
Chuck Lever 73ea666c2b NFS: Use proper naming conventions for the nfs_client.net field
Clean up:  When naming fields and data types, follow established
conventions to facilitate accurate grep/cscope searches.

Introduced by commit e50a7a1a "NFS: make NFS client allocated per
network namespace context," Tue Jan 10, 2012.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:44 -04:00
Chuck Lever 591555465e NFS: Use proper naming conventions for nfs_client.impl_id field
Clean up:  When naming fields and data types, follow established
conventions to facilitate accurate grep/cscope searches.

Additionally, for consistency, move the impl_id field into the NFSv4-
specific part of the nfs_client, and free that memory in the logic
that shuts down NFSv4 nfs_clients.

Introduced by commit 7d2ed9ac "NFSv4: parse and display server
implementation ids," Fri Feb 17, 2012.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:43 -04:00
Chuck Lever 79d4e1f0d8 NFS: Use proper naming conventions for NFSv4.1 server scope fields
Clean up:  When naming fields and data types, follow established
conventions to facilitate accurate grep/cscope searches.

Additionally, for consistency, move the scope field into the NFSv4-
specific part of the nfs_client, and free that memory in the logic
that shuts down NFSv4 nfs_clients.

Introduced by commit 99fe60d0 "nfs41: exchange_id operation", April
1 2009.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:43 -04:00
Chuck Lever e3c0fb7ef5 NFS: Add NFSDBG_STATE
fs/nfs/nfs4state.c does not yet have any dprintk() call sites, and I'm
about to introduce some.  We will need a new flag for enabling them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:42 -04:00
Chuck Lever c3607282b4 NFS: Don't swap bytes in nfs4_construct_boot_verifier()
The SETCLIENTID boot verifier is opaque to NFSv4 servers, thus there
is no requirement for byte swapping before the client puts the
verifier on the wire.

This treatment is similar to other timestamp-based verifiers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:42 -04:00
Bryan Schumaker 497826af60 NFS: Fix compiler warnings
The "struct inode *inode" was only used in a dprintk, so compiling with
CONFIG_SUNRPC_DEBUG off triggers a warning.  To get around this, I
remove the "struct inode *inode" variable and instead change the
dprintk()s to use hdr->inode instead.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:43:04 -04:00
Andy Adamson bd4aeffb5b NFSv4.1 skip rpc_call_done only on disconnected DS slot_table_waitq tasks
We reset all I/O on a disconnected data server through the pgio layer indicated
by the NFS_IOHDR_REDO flag.

Differentiate between on-the-wire tasks returning with an error which must
call rpc_call_done and tasks woken from the data server slot_table_waitq
waiting for a session slot with a status of zero which call rpc_exit in
rpc_prepare and need to skip rpc_call_done.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:42:42 -04:00
Andy Adamson 996074cb8c NFSv4.1 Just use nfs_put_client in filelayout release
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:42:37 -04:00
Andy Adamson d42e78737c NFSv4.1 fix null state reference in filelayout_async_handle_error
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:42:28 -04:00
Trond Myklebust 53b8ee3464 NFSv4.1: Fix a bad reference count issue in the pNFS commit code
filelayout_scan_commit_lists needs to bump the reference count on
the struct nfs_page just like nfs_scan_commit_list().

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:36:27 -04:00
Rafael J. Wysocki 8714c8d74d Merge branch 'pm-sleep'
* pm-sleep:
  epoll: Fix user space breakage related to EPOLLWAKEUP
2012-05-22 20:57:19 +02:00
Rafael J. Wysocki a8159414d7 epoll: Fix user space breakage related to EPOLLWAKEUP
Commit 4d7e30d (epoll: Add a flag, EPOLLWAKEUP, to prevent
suspend while epoll events are ready) caused some applications to
malfunction, because they set the bit corresponding to the new
EPOLLWAKEUP flag in their eventpoll flags and they don't have the
new CAP_EPOLLWAKEUP capability.

To prevent that from happening, change epoll_ctl() to clear
EPOLLWAKEUP in epds.events if the caller doesn't have the
CAP_EPOLLWAKEUP capability instead of failing and returning an
error code, which allows the affected applications to function
normally.

Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-05-22 20:57:06 +02:00
Linus Torvalds cb60e3e65c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "New notable features:
   - The seccomp work from Will Drewry
   - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski
   - Longer security labels for Smack from Casey Schaufler
   - Additional ptrace restriction modes for Yama by Kees Cook"

Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
  apparmor: fix long path failure due to disconnected path
  apparmor: fix profile lookup for unconfined
  ima: fix filename hint to reflect script interpreter name
  KEYS: Don't check for NULL key pointer in key_validate()
  Smack: allow for significantly longer Smack labels v4
  gfp flags for security_inode_alloc()?
  Smack: recursive tramsmute
  Yama: replace capable() with ns_capable()
  TOMOYO: Accept manager programs which do not start with / .
  KEYS: Add invalidation support
  KEYS: Do LRU discard in full keyrings
  KEYS: Permit in-place link replacement in keyring list
  KEYS: Perform RCU synchronisation on keys prior to key destruction
  KEYS: Announce key type (un)registration
  KEYS: Reorganise keys Makefile
  KEYS: Move the key config into security/keys/Kconfig
  KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat
  Yama: remove an unused variable
  samples/seccomp: fix dependencies on arch macros
  Yama: add additional ptrace scopes
  ...
2012-05-21 20:27:36 -07:00
Linus Torvalds 62c8d92278 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 changes from Steven Whitehouse.

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits)
  GFS2: Fix quota adjustment return code
  GFS2: Add rgrp information to block_alloc trace point
  GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer
  GFS2: Update glock doc to add new stats info
  GFS2: Update main gfs2 doc
  GFS2: Remove redundant metadata block type check
  GFS2: Fix sgid propagation when using ACLs
  GFS2: eliminate log elements and simplify
  GFS2: Eliminate vestigial sd_log_le_rg
  GFS2: Eliminate needless parameter from function gfs2_setbit
  GFS2: Log code fixes
  GFS2: Remove unused argument from gfs2_internal_read
  GFS2: Remove bd_list_tr
  GFS2: Remove duplicate log code
  GFS2: Clean up log write code path
  GFS2: Use variable rather than qa to determine if unstuff necessary
  GFS2: Change variable blk to biblk
  GFS2: Fix function parameter comments in rgrp.c
  GFS2: Eliminate offset parameter to gfs2_setbit
  GFS2: Use slab for block reservation memory
  ...
2012-05-21 19:21:20 -07:00
Linus Torvalds 2e321806b6 Revert "vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu"
This reverts commit 8c01a529b8.

It turns out the d_unhashed() check isn't unnecessary after all: while
it's true that unhashing will increment the sequence numbers, that does
not necessarily invalidate the RCU lookup, because it might have seen
the dentry pointer (before it got unhashed), but by the time it loaded
the sequence number, it could have seen the *new* sequence number (after
it got unhashed).

End result: we might look up an unhashed dentry that is about to be
freed, with the sequence number never indicating anything bad about it.
So checking that the dentry is still hashed (*after* reading the sequence
number) is indeed the proper fix, and was never unnecessary.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-21 18:48:10 -07:00
James Morris ff2bb047c4 Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next
Per pull request, for 3.5.
2012-05-22 11:21:06 +10:00
Linus Torvalds 6326c71fd2 vfs: be even more careful about dentry RCU name lookups
Miklos Szeredi points out that we need to also worry about memory
odering when doing the dentry name comparison asynchronously with RCU.

In particular, doing a rename can do a memcpy() of one dentry name over
another, and we want to make sure that any unlocked reader will always
see the proper terminating NUL character, so that it won't ever run off
the allocation.

Rather than having to be extra careful with the name copy or at lookup
time for each character, this resolves the issue by making sure that all
names that are inlined in the dentry always have a NUL character at the
end of the name allocation.  If we do that at dentry allocation time, we
know that no future name copy will ever change that final NUL to
anything else, so there are no memory ordering issues.

So even if a concurrent rename ends up overwriting the NUL character
that terminates the original name, we always know that there is one
final NUL at the end, and there is no worry about the lockless RCU
lookup traversing the name too far.

The out-of-line allocations are never copied over, so we can just make
sure that we write the name (with terminating NULL) and do a write
barrier before we expose the name to anything else by setting it in the
dentry.

Reported-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-21 16:14:04 -07:00
Linus Torvalds a70b52ec1a vfs: make AIO use the proper rw_verify_area() area helpers
We had for some reason overlooked the AIO interface, and it didn't use
the proper rw_verify_area() helper function that checks (for example)
mandatory locking on the file, and that the size of the access doesn't
cause us to overflow the provided offset limits etc.

Instead, AIO did just the security_file_permission() thing (that
rw_verify_area() also does) directly.

This fixes it to do all the proper helper functions, which not only
means that now mandatory file locking works with AIO too, we can
actually remove lines of code.

Reported-by: Manish Honap <manish_honap_vit@yahoo.co.in>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-21 16:06:20 -07:00