Commit Graph

34380 Commits

Author SHA1 Message Date
Linus Torvalds d3bad75a6d Driver core / sysfs patches for 3.14-rc1
Here's the big driver core and sysfs patch set for 3.14-rc1.
 
 There's a lot of work here moving sysfs logic out into a "kernfs" to
 allow other subsystems to also have a virtual filesystem with the same
 attributes of sysfs (handle device disconnect, dynamic creation /
 removal  as needed / unneeded, etc.  This is primarily being done for
 the cgroups filesystem, but the goal is to also move debugfs to it when
 it is ready, solving all of the known issues in that filesystem as well.
 The code isn't completed yet, but all should be stable now (there is a
 big section that was reverted due to problems found when testing.)
 
 There's also some other smaller fixes, and a driver core addition that
 allows for a "collection" of objects, that the DRM people will be using
 soon (it's in this tree to make merges after -rc1 easier.)
 
 All of this has been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlLdh0cACgkQMUfUDdst+ylv4QCfeDKDgLo4LsaBIIrFSxLoH/c7
 UUsAoMPRwA0h8wy+BQcJAg4H4J4maKj3
 =0pc0
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core / sysfs patches from Greg KH:
 "Here's the big driver core and sysfs patch set for 3.14-rc1.

  There's a lot of work here moving sysfs logic out into a "kernfs" to
  allow other subsystems to also have a virtual filesystem with the same
  attributes of sysfs (handle device disconnect, dynamic creation /
  removal as needed / unneeded, etc)

  This is primarily being done for the cgroups filesystem, but the goal
  is to also move debugfs to it when it is ready, solving all of the
  known issues in that filesystem as well.  The code isn't completed
  yet, but all should be stable now (there is a big section that was
  reverted due to problems found when testing)

  There's also some other smaller fixes, and a driver core addition that
  allows for a "collection" of objects, that the DRM people will be
  using soon (it's in this tree to make merges after -rc1 easier)

  All of this has been in linux-next with no reported issues"

* tag 'driver-core-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (113 commits)
  kernfs: associate a new kernfs_node with its parent on creation
  kernfs: add struct dentry declaration in kernfs.h
  kernfs: fix get_active failure handling in kernfs_seq_*()
  Revert "kernfs: fix get_active failure handling in kernfs_seq_*()"
  Revert "kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq"
  Revert "kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()"
  Revert "kernfs: remove KERNFS_REMOVED"
  Revert "kernfs: restructure removal path to fix possible premature return"
  Revert "kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()"
  Revert "kernfs: remove kernfs_addrm_cxt"
  Revert "kernfs: make kernfs_get_active() block if the node is deactivated but not removed"
  Revert "kernfs: implement kernfs_{de|re}activate[_self]()"
  Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers"
  Revert "pci: use device_remove_file_self() instead of device_schedule_callback()"
  Revert "scsi: use device_remove_file_self() instead of device_schedule_callback()"
  Revert "s390: use device_remove_file_self() instead of device_schedule_callback()"
  Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
  Revert "kernfs: remove unnecessary NULL check in __kernfs_remove()"
  kernfs: remove unnecessary NULL check in __kernfs_remove()
  drivers/base: provide an infrastructure for componentised subsystems
  ...
2014-01-20 15:49:44 -08:00
Linus Torvalds 48ba620aab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
 "This is a set of 3 regression fixes.

  This fixes /proc/mounts when using "ip netns add <netns>" to display
  the actual mount point.

  This fixes a regression in clone that broke lxc-attach.

  This fixes a regression in the permission checks for mounting /proc
  that made proc unmountable if binfmt_misc was in use.  Oops.

  My apologies for sending this pull request so late.  Al Viro gave
  interesting review comments about the d_path fix that I wanted to
  address in detail before I sent this pull request.  Unfortunately a
  bad round of colds kept from addressing that in detail until today.
  The executive summary of the review was:

  Al: Is patching d_path really sufficient?
      The prepend_path, d_path, d_absolute_path, and __d_path family of
      functions is a really mess.

  Me: Yes, patching d_path is really sufficient.  Yes, the code is mess.
      No it is not appropriate to rewrite all of d_path for a regression
      that has existed for entirely too long already, when a two line
      change will do"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  vfs: Fix a regression in mounting proc
  fork:  Allow CLONE_PARENT after setns(CLONE_NEWPID)
  vfs: In d_path don't call d_dname on a mount point
2014-01-17 17:29:36 -08:00
Tejun Heo db4aad209b kernfs: associate a new kernfs_node with its parent on creation
Once created, a kernfs_node is always destroyed by kernfs_put().
Since ba7443bc65 ("sysfs, kernfs: implement
kernfs_create/destroy_root()"), kernfs_put() depends on kernfs_root()
to locate the ino_ida.  kernfs_root() in turn depends on
kernfs_node->parent being set for !dir nodes.  This means that
kernfs_put() of a !dir node requires its ->parent to be initialized.

This leads to oops when a newly created !dir node is destroyed without
going through kernfs_add_one() or after failing kernfs_add_one()
before ->parent is set.  kernfs_root() invoked from kernfs_put() will
try to dereference NULL parent.

Fix it by moving parent association to kernfs_new_node() from
kernfs_add_one().  kernfs_new_node() now takes @parent instead of
@root and determines the root from the parent and also sets the new
node's parent properly.  @parent parameter is removed from
kernfs_add_one().  As there's no parent when creating the root node,
__kernfs_new_node() which takes @root as before and doesn't set the
parent is used in that case.

This ensures that a kernfs_node in any stage in its life has its
parent associated and thus can be put.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-17 11:50:07 -08:00
Linus Torvalds 70b23ce347 fix data corruption on NFS writeback
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJS080OAAoJECvKgwp+S8JaIdUQAJKNZTzXKylUjUZty42t57Jh
 1qRrQeJ6ha+JVSpYX4jJz/mSzUdJdjoFg7J3O54OnVFj/CnlcY7GRZj3VMel9ijf
 uhlf8DcU6JsThcFK4Q6mqXtdAHDPkQ1jkQHLNCe7bow9AjCzHymAZWJix4YvEsXF
 zeJJURMqSaJeo/44MynnXyn/h5RRhg+5HWErhoFiVUzDzHR3RoQqmt3lPVVJkdj1
 iokHLMzGui2vs52vUJj2yx7m9kaoDx/6bJpqR61qHfk5S4wjLkUI+1ID8dsTNVF2
 4O3THb0nUDWx4wuJIxrAKoPiYjiemX1KmQXlUVr3IsfhDiiBbLyviVyn4aRaFIxV
 IRCVXCj1CWw+cFLeCA5E+/WvpxjLfKs4WNBxIqjes5YRPM4PLpU3MDiabssaUzHI
 0VPbU8TQ05hqH0wbs0hIgXyvED6yNn9d3sPHS2Lb5i2tp3E0FzVEoh2EH2jn8lmQ
 1DAdi+ezk9EiJs8AFiN6MSIBpAZosX3Nq+RTmYGKqLZMGnxlJ30YspNlipiBPFpC
 4xokkMZAZ0+wzpVabOMie36Rc/AaOAqiOjS1C6UIoOSrBTgtwWL7Ft2Da3SKb0KX
 XQhNWCHNYcgOn9/DDDmGxzwt6HsEzOIYinMwrG37LSass5KEvopssmiLCXn8wry+
 QXUoiFFFAPpg8iXaqj4X
 =AHdo
 -----END PGP SIGNATURE-----

Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback fix from Wu Fengguang:
 "Fix data corruption on NFS writeback.

  It has been in linux-next for one month"

* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Fix data corruption on NFS
2014-01-16 08:23:34 +07:00
Andreas Rohner 70f2fe3a26 nilfs2: fix segctor bug that causes file system corruption
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean.  It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

  nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

  NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
  NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

 1. The cleaner starts the segment construction
 2. nilfs_segctor_collect is called
 3. sc_stage is on NILFS_ST_SUFILE and segments are freed
 4. sc_stage is on NILFS_ST_DAT current segment is full
 5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
 6. The new segment is one of the segments freed in step 3
 7. nilfs_sufile_cancel_freev is called and produces an error message
 8. Loop around and the collection starts again
 9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
10. A few hours later another segment construction allocates the
    segment and causes file system corruption

This can be prevented by simply reordering the statements.  If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-15 14:19:42 +07:00
Tejun Heo bb305947bd kernfs: fix get_active failure handling in kernfs_seq_*()
When kernfs_seq_start() fails to obtain an active reference, it
returns ERR_PTR(-ENODEV).  kernfs_seq_stop() is then invoked with the
error pointer value; however, it still proceeds to invoke
kernfs_put_active() on the node leading to unbalanced put.

If kernfs_seq_stop() is called even after active ref failure, it
should skip invocation of @ops->seq_stop() and put_active.
Unfortunately, this is a bit complicated because active ref failure
isn't the only thing which may fail with ERR_PTR(-ENODEV).
@ops->seq_start/next() may also fail with the error value and
kernfs_seq_stop() doesn't have a way to tell apart those failures.

Work it around by factoring out the active part of kernfs_seq_stop()
into kernfs_seq_stop_active() and invoking it directly if
@ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
kernfs_seq_stop() to skip kernfs_seq_stop_active() on
ERR_PTR(-ENODEV).  This is a bit nasty but ensures that the active put
is skipped iff get_active failed in kernfs_seq_start().

tj: This was originally committed as d92d2e6bd7 but got reverted by
    683bb2761f along with other kernfs self removal patches.
    However, this one is an independent fix and shouldn't have been
    reverted together.  Reinstate the change.  Sorry about the mess.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-14 08:49:22 -08:00
Greg Kroah-Hartman 683bb2761f Revert "kernfs: fix get_active failure handling in kernfs_seq_*()"
This reverts commit d92d2e6bd7.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:49:01 -08:00
Greg Kroah-Hartman 87da149343 Revert "kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq"
This reverts commit ea1c472dfe.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:43:11 -08:00
Greg Kroah-Hartman 0890147fe0 Revert "kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()"
This reverts commit a69d001cfc.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:39:52 -08:00
Greg Kroah-Hartman 798c75a0d4 Revert "kernfs: remove KERNFS_REMOVED"
This reverts commit ae34372eb8.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:36:03 -08:00
Greg Kroah-Hartman 4f4b1b6471 Revert "kernfs: restructure removal path to fix possible premature return"
This reverts commit 45a140e587.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:30:47 -08:00
Greg Kroah-Hartman 55f6e30d0a Revert "kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()"
This reverts commit f601f9a2bf.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:27:16 -08:00
Greg Kroah-Hartman 7653fe9d6c Revert "kernfs: remove kernfs_addrm_cxt"
This reverts commit 99177a3411.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:20:56 -08:00
Greg Kroah-Hartman f4b3e631b3 Revert "kernfs: make kernfs_get_active() block if the node is deactivated but not removed"
This reverts commit 895a068a52.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:13:39 -08:00
Greg Kroah-Hartman 9b0925a6ff Revert "kernfs: implement kernfs_{de|re}activate[_self]()"
This reverts commit 9f010c2ad5.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:09:38 -08:00
Greg Kroah-Hartman a9f138b0e5 Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers"
This reverts commit 1ae06819c7.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 14:05:13 -08:00
Greg Kroah-Hartman a30f82b7eb Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
This reverts commit d1ba277e79.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 13:51:36 -08:00
Greg Kroah-Hartman ce9b499c9f Revert "kernfs: remove unnecessary NULL check in __kernfs_remove()"
This reverts commit 88533f990c.

Tejun writes:
        I'm sorry but can you please revert the whole series?
        get_active() waiting while a node is deactivated has potential
        to lead to deadlock and that deactivate/reactivate interface is
        something fundamentally flawed and that cgroup will have to work
        with the remove_self() like everybody else.  IOW, I think the
        first posting was correct.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13 13:50:31 -08:00
Tejun Heo 88533f990c kernfs: remove unnecessary NULL check in __kernfs_remove()
895a068a52 ("kernfs: make kernfs_get_active() block if the node is
deactivated but not removed") added "struct kernfs_root *root =
kernfs_root(kn);" at the head of the function; however, the parameter
@kn is checked for later implying that the function may be called with
NULL.  This means that we may end up invoking kernfs_root() with NULL
which will oops.  None of the existing users invokes removal with NULL
@kn, so this bug doesn't actually trigger.

We can relocate kernfs_root() invocation after NULL check; however,
allowing NULL param tends to cause more confusion than actually
helping anything.  As there's no existing user, let's remove the
spurious NULL check.

This bug was detected by smatch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-11 15:40:19 -08:00
Tejun Heo d1ba277e79 sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
All device_schedule_callback_owner() users are converted to use
device_remove_file_self().  Remove now unused
{sysfs|device}_schedule_callback_owner().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 16:03:19 -08:00
Linus Torvalds e2bc44706f xfs: bugfixes for 3.13-rc8
- fix off-by-one in xfs_attr3_rmt_verify
 - fix missing destroy_work_on_stack() in xfs_bmapi_allocate
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJS0ECWAAoJENaLyazVq6ZOgn0QAKSC/pkP4km+QbmL0R7SqSJH
 ZSSj16gIjR5lHlwI3PQzv5BgyEC9BcRDKWXN6dy+GHHuMtP4qYK8cLWFcyl7EysH
 HAyDBnaJVphXt23C5iIzk+iseNfRYXA2LOpYSH6qfhZ5bxEeYzQS42zL4YhxZrXq
 kzLHojcTLUx0IzJ+4oHn5AXSgPt+PXxNz3s+TU9virFnfSMlw2qYukxQtG49nbQr
 kQjNHgeTIBKzeHdlnxmv5Rd2bD//397w5aWXxmaUh8fk6Z7VJi40ALAG4Pks81HF
 +TEgMtF9/xTXdlwrYJDoHp++vUs6HANCX+wSAb4MdrBQvjh/USytK2WFwOeMyyR6
 L/iogfPXHHizTkoYSzPwPdEmCCFhzidvBEqNX68+ojlJnDtoart7IgkOcm9LvaQI
 j//u76CPRcd8tFh+1fDNaXn1ykJ6/CepSY13/yOnbpc7JoDbtqK2R8HFxdSlkDDg
 UooLF2AfQ6lX280cUWwV0flqGO6iTIM3Fw1mIq3z8X4usNn+bMnlOu/DUnCbF5bB
 YJCV4uT7f04w7oJqin9a7LHaHKRD56tWQun/OCEd7ZV/hJ1YRYlhhLfSdWdX7+SX
 oIawXJy7NvCPQLaTwycD3h2gDlaxw17GAc9rA3AcCknxBsgNosv1ETQnEPC4iIAq
 QsVal7p6oMLZ/qx6mvX7
 =Xpq3
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
 "Here we have a bugfix for an off-by-one in the remote attribute
  verifier that results in a forced shutdown which you can hit with v5
  superblock by creating a 64k xattr, and a fix for a missing
  destroy_work_on_stack() in the allocation worker.

  It's a bit late, but they are both fairly straightforward"

* tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs:
  xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
  xfs: fix off-by-one error in xfs_attr3_rmt_verify
2014-01-11 06:33:03 +07:00
Tejun Heo 1ae06819c7 kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers
Sometimes it's necessary to implement a node which wants to delete
nodes including itself.  This isn't straightforward because of kernfs
active reference.  While a file operation is in progress, an active
reference is held and kernfs_remove() waits for all such references to
drain before completing.  For a self-deleting node, this is a deadlock
as kernfs_remove() ends up waiting for an active reference that itself
is sitting on top of.

This currently is worked around in the sysfs layer using
sysfs_schedule_callback() which makes such removals asynchronous.
While it works, it's rather cumbersome and inherently breaks
synchronicity of the operation - the file operation which triggered
the operation may complete before the removal is finished (or even
started) and the removal may fail asynchronously.  If a removal
operation is immmediately followed by another operation which expects
the specific name to be available (e.g. removal followed by rename
onto the same name), there's no way to make the latter operation
reliable.

The thing is there's no inherent reason for this to be asynchrnous.
All that's necessary to do this synchronous is a dedicated operation
which drops its own active ref and deactivates self.  This patch
implements kernfs_remove_self() and its wrappers in sysfs and driver
core.  kernfs_remove_self() is to be called from one of the file
operations, drops the active ref and deactivates using
__kernfs_deactivate_self(), removes the self node, and restores active
ref to the dead node using __kernfs_reactivate_self() so that the ref
is balanced afterwards.  __kernfs_remove() is updated so that it takes
an early exit if the target node is already fully removed so that the
active ref restored by kernfs_remove_self() after removal doesn't
confuse the deactivation path.

This makes implementing self-deleting nodes very easy.  The normal
removal path doesn't even need to be changed to use
kernfs_remove_self() for the self-deleting node.  The method can
invoke kernfs_remove_self() on itself before proceeding the normal
removal path.  kernfs_remove() invoked on the node by the normal
deletion path will simply be ignored.

This will replace sysfs_schedule_callback().  A subtle feature of
sysfs_schedule_callback() is that it collapses multiple invocations -
even if multiple removals are triggered, the removal callback is run
only once.  An equivalent effect can be achieved by testing the return
value of kernfs_remove_self() - only the one which gets %true return
value should proceed with actual deletion.  All other instances of
kernfs_remove_self() will wait till the enclosing kernfs operation
which invoked the winning instance of kernfs_remove_self() finishes
and then return %false.  This trivially makes all users of
kernfs_remove_self() automatically show correct synchronous behavior
even when there are multiple concurrent operations - all "echo 1 >
delete" instances will finish only after the whole operation is
completed by one of the instances.

v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing
    and sysfs_remove_file_self() had incorrect return type.  Fix it.
    Reported by kbuild test bot.

v3: Updated to use __kernfs_{de|re}activate_self().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 14:01:05 -08:00
Tejun Heo 9f010c2ad5 kernfs: implement kernfs_{de|re}activate[_self]()
This patch implements four functions to manipulate deactivation state
- deactivate, reactivate and the _self suffixed pair.  A new fields
kernfs_node->deact_depth is added so that concurrent and nested
deactivations are handled properly.  kernfs_node->hash is moved so
that it's paired with the new field so that it doesn't increase the
size of kernfs_node.

A kernfs user's lock would normally nest inside active ref but during
removal the user may want to perform kernfs_remove() while holding the
said lock, which would introduce a reverse locking dependency.  This
function can be used to break such reverse dependency by allowing
deactivation step to performed separately outside user's critical
section.

This will also be used implement kernfs_remove_self().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:51:21 -08:00
Tejun Heo 895a068a52 kernfs: make kernfs_get_active() block if the node is deactivated but not removed
Currently, kernfs_get_active() fails if the target node is
deactivated.  This is fine as a node always gets removed after
deactivation; however, we're gonna add reactivation so the assumption
won't hold.  It'd be incorrect for kernfs_get_active() to fail for a
node which was deactivated only temporarily.

This patch makes kernfs_get_active() block if the node is deactivated
but not removed.  If the node gets reactivated (not yet implemented),
it will be retried and succeed.  If the node gets removed, it will be
woken up and fail.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:48:08 -08:00
Tejun Heo 99177a3411 kernfs: remove kernfs_addrm_cxt
kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
added because there were operations which should be performed outside
kernfs_mutex after adding and removing kernfs_nodes.  The necessary
operations were recorded in kernfs_addrm_cxt and performed by
kernfs_addrm_finish(); however, after the recent changes which
relocated deactivation and unmapping so that they're performed
directly during removal, the only operation kernfs_addrm_finish()
performs is kernfs_put(), which can be moved inside the removal path
too.

This patch moves the kernfs_put() of the base ref to __kernfs_remove()
and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().

* kernfs_add_one() is updated to grab and release the parent's active
  ref and kernfs_mutex itself.  kernfs_get/put_active() and
  kernfs_addrm_start/finish() invocations around it are removed from
  all users.

* __kernfs_remove() puts an unlinked node directly instead of chaining
  it to kernfs_addrm_cxt.  Its callers are updated to grab and release
  kernfs_mutex instead of calling kernfs_addrm_start/finish() around
  it.

v2: Updated to fit the v2 restructuring of removal path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:48:08 -08:00
Tejun Heo f601f9a2bf kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()
kernfs_unmap_bin_file() is supposed to unmap all memory mappings of
the target file before kernfs_remove() finishes; however, it currently
is being called from kernfs_addrm_finish() and has the same race
problem as the original implementation of deactivation when there are
multiple removers - only the remover which snatches the node to its
addrm_cxt->removed list is guaranteed to wait for its completion
before returning.

It can be fixed by moving kernfs_unmap_bin_file() invocation from
kernfs_addrm_finish() to __kernfs_remove().  The function may be
called multiple times but that shouldn't do any harm.

We end up dropping kernfs_mutex in the removal loop and the node may
be removed inbetween by someone else.  kernfs_unlink_sibling() is
updated to test whether the node has already been removed and return
accordingly.  __kernfs_remove() in turn performs post-unlinking
cleanup only if it actually unlinked the node.

KERNFS_HAS_MMAP test is moved out of the unmap function into
__kernfs_remove() so that we don't unlock kernfs_mutex unnecessarily.
While at it, drop the now meaningless "bin" qualifier from the
function name.

v2: Rewritten to fit the v2 restructuring of removal path.  HAS_MMAP
    test relocated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:48:08 -08:00
Tejun Heo 45a140e587 kernfs: restructure removal path to fix possible premature return
The recursive nature of kernfs_remove() means that, even if
kernfs_remove() is not allowed to be called multiple times on the same
node, there may be race conditions between removal of parent and its
descendants.  While we can claim that kernfs_remove() shouldn't be
called on one of the descendants while the removal of an ancestor is
in progress, such rule is unnecessarily restrictive and very difficult
to enforce.  It's better to simply allow invoking kernfs_remove() as
the caller sees fit as long as the caller ensures that the node is
accessible.

The current behavior in such situations is broken.  Whoever enters
removal path first takes the node off the hierarchy and then
deactivates.  Following removers either return as soon as it notices
that it's not the first one or can't even find the target node as it
has already been removed from the hierarchy.  In both cases, the
following removers may finish prematurely while the nodes which should
be removed and drained are still being processed by the first one.

This patch restructures so that multiple removers, whether through
recursion or direction invocation, always follow the following rules.

* When there are multiple concurrent removers, only one puts the base
  ref.

* Regardless of which one puts the base ref, all removers are blocked
  until the target node is fully deactivated and removed.

To achieve the above, removal path now first deactivates the subtree,
drains it and then unlinks one-by-one.  __kernfs_deactivate() is
called directly from __kernfs_removal() and drops and regrabs
kernfs_mutex for each descendant to drain active refs.  As this means
that multiple removers can enter __kernfs_deactivate() for the same
node, the function is updated so that it can handle multiple
deactivators of the same node - only one actually deactivates but all
wait till drain completion.

The restructured removal path guarantees that a removed node gets
unlinked only after the node is deactivated and drained.  Combined
with proper multiple deactivator handling, this guarantees that any
invocation of kernfs_remove() returns only after the node itself and
all its descendants are deactivated, drained and removed.

v2: Draining separated into a separate loop (used to be in the same
    loop as unlink) and done from __kernfs_deactivate().  This is to
    allow exposing deactivation as a separate interface later.

    Root node removal was broken in v1 patch.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:48:08 -08:00
Tejun Heo ae34372eb8 kernfs: remove KERNFS_REMOVED
KERNFS_REMOVED is used to mark half-initialized and dying nodes so
that they don't show up in lookups and deny adding new nodes under or
renaming it; however, its role overlaps those of deactivation and
removal from rbtree.

It's necessary to deny addition of new children while removal is in
progress; however, this role considerably intersects with deactivation
- KERNFS_REMOVED prevents new children while deactivation prevents new
file operations.  There's no reason to have them separate making
things more complex than necessary.

KERNFS_REMOVED is also used to decide whether a node is still visible
to vfs layer, which is rather redundant as equivalent determination
can be made by testing whether the node is on its parent's children
rbtree or not.

This patch removes KERNFS_REMOVED.

* Instead of KERNFS_REMOVED, each node now starts its life
  deactivated.  This means that we now use both atomic_add() and
  atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN.  The compiler
  generates an overflow warnings when negating INT_MIN as the negation
  can't be represented as a positive number.  Nothing is actually
  broken but let's bump BIAS by one to avoid the warnings for archs
  which negates the subtrahend..

* KERNFS_REMOVED tests in add and rename paths are replaced with
  kernfs_get/put_active() of the target nodes.  Due to the way the add
  path is structured now, active ref handling is done in the callers
  of kernfs_add_one().  This will be consolidated up later.

* kernfs_remove_one() is updated to deactivate instead of setting
  KERNFS_REMOVED.  This removes deactivation from kernfs_deactivate(),
  which is now renamed to kernfs_drain().

* kernfs_dop_revalidate() now tests RB_EMPTY_NODE(&kn->rb) instead of
  KERNFS_REMOVED and KERNFS_REMOVED test in kernfs_dir_pos() is
  dropped.  A node which is removed from the children rbtree is not
  included in the iteration in the first place.  This means that a
  node may be visible through vfs a bit longer - it's now also visible
  after deactivation until the actual removal.  This slightly enlarged
  window difference doesn't make any difference to the userland.

* Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
  checks on the active ref.

* Some comment style updates in the affected area.

v2: Reordered before removal path restructuring.  kernfs_active()
    dropped and kernfs_get/put_active() used instead.  RB_EMPTY_NODE()
    used in the lookup paths.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:44:25 -08:00
Tejun Heo a69d001cfc kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()
There currently are two mechanisms gating active ref lockdep
annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask.
The former disables lockdep annotations in kernfs_get/put_active()
while the latter disables all of kernfs_deactivate().

While KERNFS_ACTIVE_REF also behaves as an optimization to skip the
deactivation step for non-file nodes, the benefit is marginal and it
needlessly diverges code paths.  Let's drop KERNFS_ACTIVE_REF and use
KERNFS_LOCKDEP in kernfs_deactivate() too.

While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP
flag so that it's more convenient and the related code can be compiled
out when not enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:44:25 -08:00
Tejun Heo ea1c472dfe kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq
kernfs_node->u.completion is used to notify deactivation completion
from kernfs_put_active() to kernfs_deactivate().  We now allow
multiple racing removals of the same node and the current removal
scheme is no longer correct - kernfs_remove() invocation may return
before the node is properly deactivated if it races against another
removal.  The removal path will be restructured to address the issue.

To help such restructure which requires supporting multiple waiters,
this patch replaces kernfs_node->u.completion with
kernfs_root->deactivate_waitq.  This makes deactivation event
notifications share a per-root waitqueue_head; however, the wait path
is quite cold and this will also allow shaving one pointer off
kernfs_node.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:44:25 -08:00
Tejun Heo d92d2e6bd7 kernfs: fix get_active failure handling in kernfs_seq_*()
When kernfs_seq_start() fails to obtain an active reference, it
returns ERR_PTR(-ENODEV).  kernfs_seq_stop() is then invoked with the
error pointer value; however, it still proceeds to invoke
kernfs_put_active() on the node leading to unbalanced put.

If kernfs_seq_stop() is called even after active ref failure, it
should skip invocation of @ops->seq_stop() and put_active.
Unfortunately, this is a bit complicated because active ref failure
isn't the only thing which may fail with ERR_PTR(-ENODEV).
@ops->seq_start/next() may also fail with the error value and
kernfs_seq_stop() doesn't have a way to tell apart those failures.

Work it around by factoring out the active part of kernfs_seq_stop()
into kernfs_seq_stop_active() and invoking it directly if
@ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
kernfs_seq_stop() to skip kernfs_seq_stop_active() on
ERR_PTR(-ENODEV).  This is a bit nasty but ensures that the active put
is skipped iff get_active failed in kernfs_seq_start().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10 13:44:25 -08:00
Chuansheng Liu 1f4a63bf01 xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 6f96b3063c)
2014-01-10 12:39:38 -06:00
Jie Liu bba719b500 xfs: fix off-by-one error in xfs_attr3_rmt_verify
With CRC check is enabled, if trying to set an attributes value just
equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
attr write verification procedure failure, which would yield the back
trace like below:

<snip>
XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
<snip>
Call Trace:
[<ffffffff816f0042>] dump_stack+0x45/0x56
[<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
[<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
[<ffffffff81097230>] ? wake_up_state+0x20/0x20
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
[<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
[<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
[<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
[<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
[<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
[<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
[<ffffffff811df9b2>] generic_setxattr+0x62/0x80
[<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
[<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
[<ffffffff811e054e>] setxattr+0x12e/0x1c0
[<ffffffff811c6e82>] ? final_putname+0x22/0x50
[<ffffffff811c708b>] ? putname+0x2b/0x40
[<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
[<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
[<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
[<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f

Tests:
    setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile

This patch fix it to check the remote EA size is greater than the
XATTR_SIZE_MAX rather than more than or equal to it, because it's
valid if the specified EA value size is equal to the limitation as
per VFS setxattr interface.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 85dd0707f0)
2014-01-10 12:38:41 -06:00
Linus Torvalds ef350bb7c5 Fix a regression introduced in v3.13-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABCAAGBQJSyv4GAAoJENNvdpvBGATwwG4QANhHQupchnt4vnyetcvTZvs3
 x0BlnZnGDwzBqRhiZa8tORABn8z/8JuzsepqZOjKfmuULDHO4hjv42DSYhiBf7l7
 rIjrPhDuSoP6aVIiaLxllaVe+d18/fLUeoJ4bw2/Np9lLTjALA7j7zpVfy9RsIrr
 mreIh5Nu7ay8R/5Mts7ApJwQTtHHEOWm+NcsisZIFoCyuJKGDyeWOutlpGcgSn2T
 W3pUTF/iuN3trXAr+VYWfn/yqewWYlQ9hEifFYiqef7dEo9ITgO7Gn0Ig12PhUcX
 KuaRcAXsx+ynB3gLjsocCPHfHRouqeEN1jzfbVLn9GIHlgU9JEYAhyZB7eKvjoAH
 kf7IKWEOOVVQcRpJLmr3cXiY3ut5gqyytwftc4lntJG5nLbDAw3MihScSRdm1DBR
 ELHD60IDHvFtGGeCqgu11MCXoXBM2HG7iQWnCQfsktlAWPSmnAtqZO6vYzPtJHvD
 iMv2WuIBlEB0Qx/JJwi27vCb8PXmfj3mT/mvr8UhlMSl1W5cBG3nXOmbiZC33tmE
 nlB0j1UC21VFOUoA9BxYP5imfpUvD5fvGCmr6QE/mw9Y675q4QIWGo8P9EJiaCBL
 zx1VbgofNLs/DoCPzMlqR9qrO4w2SwWOuwY1dJq7EkP3cfQd2mv/cq3UrqDEtxju
 otnvt2Z2dadntRq3vGv3
 =6Ghs
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfix from Ted Ts'o:
 "Fix a regression introduced in v3.13-rc6"

* tag 'ext4_for_linus_stable' of http://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix bigalloc regression
2014-01-07 08:22:42 +08:00
Eric Whitney d0abafac8c ext4: fix bigalloc regression
Commit f5a44db5d2 introduced a regression on filesystems created with
the bigalloc feature (cluster size > blocksize).  It causes xfstests
generic/006 and /013 to fail with an unexpected JBD2 failure and
transaction abort that leaves the test file system in a read only state.
Other xfstests run on bigalloc file systems are likely to fail as well.

The cause is the accidental use of a cluster mask where a cluster
offset was needed in ext4_ext_map_blocks().

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
2014-01-06 14:00:23 -05:00
Linus Torvalds 06f055f394 Merge branch 'akpm' (incoming from Andrew)
Merge patches from Andrew Morton:
 "Ten fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL
  sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
  drivers/dma/ioat/dma.c: check DMA mapping error in ioat_dma_self_test()
  mm/memory-failure.c: transfer page count from head page to tail page after split thp
  MAINTAINERS: set up proper record for Xilinx Zynq
  mm: remove bogus warning in copy_huge_pmd()
  memcg: fix memcg_size() calculation
  mm: fix use-after-free in sys_remap_file_pages
  mm: munlock: fix deadlock in __munlock_pagevec()
  mm: munlock: fix a bug where THP tail page is encountered
2014-01-02 14:40:38 -08:00
Jason Baron 4ff36ee94d epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL
The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
epoll_ctl(b, EPOLL_CTL_DEL, a, x).  The deadlock was introduced with
commmit 67347fe4e6 ("epoll: do not take global 'epmutex' for simple
topologies").

The acquistion of the ep->mtx for the destination 'ep' was added such
that a concurrent EPOLL_CTL_ADD operation would see the correct state of
the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')

However, by simply not acquiring the lock, we do not serialize behind
the ep->mtx from the add path, and thus may perform a full path check
when if we had waited a little longer it may not have been necessary.
However, this is a transient state, and performing the full loop
checking in this case is not harmful.

The important point is that we wouldn't miss doing the full loop
checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
its operating upon.  The reason we don't need to do lock ordering in the
add path, is that we are already are holding the global 'epmutex'
whenever we do the double lock.  Further, the original posting of this
patch, which was tested for the intended performance gains, did not
perform this additional locking.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-02 14:40:30 -08:00
Linus Torvalds 152b734a9e Here is a set of small fixes for GFS2. There is a fix to drop
s_umount which is copied in from the core vfs, two patches
 relate to a hard to hit "use after free" and memory leak.
 Two patches related to using DIO and buffered I/O on the same
 file to ensure correct operation in relation to glock state
 changes. The final patch adds an RCU read lock to ensure
 correct locking on an error path.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSxVxyAAoJEMrg3m4a/8jSP1kQAKkyW6DgevgZ+IHlm5+mhTeZ
 Bpdy3l6DdxZIiqoG0VqJo6DoeR4td+1q7TfyjpvFvgxjU/m/nLhKFcNd1A6TN3OK
 G9Y6q0k0aWsAUPUjg3Y6gFRAlXHQaGXQ3nMDmoTCdYSYqid8gB+oqPbfwf5uHAgU
 GVPgKxqSsJmzxPYTjpjx8mdpgiwCHa+iB+reoqxNSdxJnAk93GrBA7efonNoxKB1
 r8VJlgkJubMjxGMu6xQYLMyt1Xed85sbiASOdE+Thw700tBA/ZAtKuB8xZ4+X1Fd
 M5osKYnqodde+A3aSi6P7b+M6N+WyA/7bHhckbaQy8cwpC9xhgEqsEsIEFm0eJjB
 wbdGe2tsCTUvLy37++D5e88cF9O2F6Ku0MJJtb7KsTLZPFD9XXs/6/xx4vSSNKQt
 FC7BF5dkQiLDJvy1xvcHK43+PbOaS7/8WM1NuoNAS/L/3RYFrrHby3LqBo+kcUbV
 L9HoL8aJd60bsX7PceXA9UzaH8yk/yTgeyOtd2+VCiRVldvNtx32ylTJLUqqxeRi
 AL/tZWgxwPKb54AJMptPZ0fGP5A+pUhQgTm7fJCwrUdXQXWUW0YYK2sV3H9BZ8px
 Ga0PuJtjxj8OkGFwnugEtuQNGQ9M5uCX4UiELqP3rVRNpq4e8UkOZRqtHsU7urSB
 ezufwdI+b+uHUucva31D
 =KSDi
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes

Pull GFS2 fixes from Steven Whitehouse:
 "Here is a set of small fixes for GFS2.  There is a fix to drop
  s_umount which is copied in from the core vfs, two patches relate to a
  hard to hit "use after free" and memory leak.  Two patches related to
  using DIO and buffered I/O on the same file to ensure correct
  operation in relation to glock state changes.  The final patch adds an
  RCU read lock to ensure correct locking on an error path"

* tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Fix unsafe dereference in dump_holder()
  GFS2: Wait for async DIO in glock state changes
  GFS2: Fix incorrect invalidation for DIO/buffered I/O
  GFS2: Fix slab memory leak in gfs2_bufdata
  GFS2: Fix use-after-free race when calling gfs2_remove_from_ail
  GFS2: don't hold s_umount over blkdev_put
2014-01-02 12:45:47 -08:00
Tetsuo Handa 0b3a2c9968 GFS2: Fix unsafe dereference in dump_holder()
GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that
RCU read lock is held when using task_struct returned from pid_task().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-02 12:18:04 +00:00
Shirish Pargaonkar f1e3268126 cifs: set FILE_CREATED
Set FILE_CREATED on O_CREAT|O_EXCL.

cifs code didn't change during commit 116cc02253

Kernel bugzilla 66251

Signed-off-by: Shirish Pargaonkar <spargaonkar@suse.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-12-27 15:14:45 -06:00
Sachin Prabhu 750b8de6c4 cifs: We do not drop reference to tlink in CIFSCheckMFSymlink()
When we obtain tcon from cifs_sb, we use cifs_sb_tlink() to first obtain
tlink which also grabs a reference to it. We do not drop this reference
to tlink once we are done with the call.

The patch fixes this issue by instead passing tcon as a parameter and
avoids having to obtain a reference to the tlink. A lookup for the tcon
is already made in the calling functions and this way we avoid having to
re-run the lookup. This is also consistent with the argument list for
other similar calls for M-F symlinks.

We should also return an ENOSYS when we do not find a protocol specific
function to lookup the MF Symlink data.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-12-27 15:14:44 -06:00
Steve French ebcc943c11 Add missing end of line termination to some cifs messages
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Gregor Beck <gbeck@sernet.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2013-12-27 15:14:44 -06:00
Linus Torvalds f41bfc9423 A collection of bug fixes destined for stable and some printk cleanups
and a patch so that instead of BUG'ing we use the ext4_error()
 framework to mark the file system is corrupted.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABCAAGBQJSuSZnAAoJENNvdpvBGATwp/8P/R/VjV1O4IhDmRxbc7mNAkoB
 Mfh2Utnk9daGdMpSMhzWW6m3oyohA0ICledBusQ3ax6Ymg8jIcmGwm3rJ8gAXvR1
 4g0rQ1nw3JEGROId58FnKB3fsEmOPlt4T/LKL4boY6BfER4yu1htH0zSKBuKqykt
 feH9dMiaR1KMQ613eWY6GEonYaP8+nI1GxEfvrymInxznDPVuaLgR4oBMAmR8R76
 9vfJfFHYjbk1wQ5UEv94tic8Hi055PGCRfsLc79QwxMr5KyKz+NydDUIjgKjP9pu
 9sz8iuV79M5/hUguZY7HH9Xd0byZ+jPuNrpkrDqSNZYuArfIcsXKZM/dm7HOgFGQ
 dQzf9S/kBzJvcSHuUchhS2cm6kxCsHaqo16Fxs5kP3TmB3TrVr7EV6uBS4cm53PJ
 x6IdAORhbURfuJCRQOi/TDNUrb+ZHvIx7Gc1ujizczC3An7QurfYo7XY/rWfdj41
 eIVy0+1gqvWJsbXGInni1hKbXMU3yTJ0MqQm05A7MW/G2G6eIgEVpz8MElm33jEE
 VvC6KyZxpviRYPUmxxTbSa1vl0UG1rZdZXslgmlSyY1yItVmyTCIAG217JOTyhTX
 Ae1aZEgzLYh6dQAwweme79WF4WsBPP28TOmW2xoOH7t04gMG0t+9b/RbUDtTPItc
 HXNmIlFP9CULIQ1c2Cvh
 =KPNa
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "A collection of bug fixes destined for stable and some printk cleanups
  and a patch so that instead of BUG'ing we use the ext4_error()
  framework to mark the file system is corrupted"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: add explicit casts when masking cluster sizes
  ext4: fix deadlock when writing in ENOSPC conditions
  jbd2: rename obsoleted msg JBD->JBD2
  jbd2: revise KERN_EMERG error messages
  jbd2: don't BUG but return ENOSPC if a handle runs out of space
  ext4: Do not reserve clusters when fs doesn't support extents
  ext4: fix del_timer() misuse for ->s_err_report
  ext4: check for overlapping extents in ext4_valid_extent_entries()
  ext4: fix use-after-free in ext4_mb_new_blocks
  ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails
2013-12-26 09:26:12 -08:00
Greg Kroah-Hartman 5bd2010fbe Merge 3.13-rc5 into staging-next
We want these fixes here to handle some merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-24 09:43:21 -08:00
Linus Torvalds dc0a6b4fee Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2 fix from Jan Kara:
 "One simple fix of oops in ext2 which was recently hit by Christoph"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
2013-12-23 17:24:38 -08:00
Linus Torvalds a8472b4bb1 Merge git://git.kvack.org/~bcrl/aio-next
Pull AIO leak fixes from Ben LaHaise:
 "I've put these two patches plus Linus's change through a round of
  tests, and it passes millions of iterations of the aio numa
  migratepage test, as well as a number of repetitions of a few simple
  read and write tests.

  The first patch fixes the memory leak Kent introduced, while the
  second patch makes aio_migratepage() much more paranoid and robust"

* git://git.kvack.org/~bcrl/aio-next:
  aio/migratepages: make aio migrate pages sane
  aio: fix kioctx leak introduced by "aio: Fix a trinity splat"
2013-12-22 11:03:49 -08:00
Linus Torvalds 3dc9acb676 aio: clean up and fix aio_setup_ring page mapping
Since commit 36bc08cc01 ("fs/aio: Add support to aio ring pages
migration") the aio ring setup code has used a special per-ring backing
inode for the page allocations, rather than just using random anonymous
pages.

However, rather than remembering the pages as it allocated them, it
would allocate the pages, insert them into the file mapping (dirty, so
that they couldn't be free'd), and then forget about them.  And then to
look them up again, it would mmap the mapping, and then use
"get_user_pages()" to get back an array of the pages we just created.

Now, not only is that incredibly inefficient, it also leaked all the
pages if the mmap failed (which could happen due to excessive number of
mappings, for example).

So clean it all up, making it much more straightforward.  Also remove
some left-overs of the previous (broken) mm_populate() usage that was
removed in commit d6c355c7da ("aio: fix race in ring buffer page
lookup introduced by page migration support") but left the pointless and
now misleading MAP_POPULATE flag around.

Tested-and-acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-22 11:03:08 -08:00
Benjamin LaHaise 8e321fefb0 aio/migratepages: make aio migrate pages sane
The arbitrary restriction on page counts offered by the core
migrate_page_move_mapping() code results in rather suspicious looking
fiddling with page reference counts in the aio_migratepage() operation.
To fix this, make migrate_page_move_mapping() take an extra_count parameter
that allows aio to tell the code about its own reference count on the page
being migrated.

While cleaning up aio_migratepage(), make it validate that the old page
being passed in is actually what aio_migratepage() expects to prevent
misbehaviour in the case of races.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2013-12-21 17:56:08 -05:00
Benjamin LaHaise 1881686f84 aio: fix kioctx leak introduced by "aio: Fix a trinity splat"
e34ecee2ae reworked the percpu reference
counting to correct a bug trinity found.  Unfortunately, the change lead
to kioctxes being leaked because there was no final reference count to
put.  Add that reference count back in to fix things.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
2013-12-21 15:57:09 -05:00
Linus Torvalds a6ddeee32d xfs: bugfixes for 3.13-rc5
- fix memory leak in xfs_dir2_node_removename
 - fix quota assertion in xfs_setattr_size
 - fix quota assertions in xfs_qm_vop_create_dqattach
 - fix for hang when disabling group and project quotas before
   disabling user quotas
 - fix Dave Chinner's email address in MAINTAINERS
 - fix for file allocation alignment
 - fix for assertion in xfs_buf_stale by removing xfsbdstrat
 - fix for alignment with swalloc mount option
 - fix for "retry forever" semantics on IO errors
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJSs0PZAAoJENaLyazVq6ZOELgP/Rcx5JdjfCdvZZ7HFfzabLU6
 TOpyEpc0TJso8C92+UNZJUZWNdToEn/v1VRh6dQ+cCz3RxQfOeOKVKXU1XkCBRQO
 JxW7Pucb+SRoVf+uv6qZCCJUO1oY6JByZ8+9GuBGWK5Ul2ByxTPI50Et0Qy4wM3z
 cDvQVyjtA5+63ToUS0sR8yBSKK+8c9SkjVkdLqa+AoFJHYC+meNrZ0J1PRV2ILWu
 bFJtKFe/tO4jj/UJ1uj6ZjvVQ0jm9JH1ZE4m3tbjPcDCTHyxHu5vSBVSlPO4WbAb
 Tfaj4eB7rQy05yno2/mAjn2koaqTSg1cP5V14TMP1GzBQUpwQDAWsNGkorXPfRIn
 Xsrznxk33fTCTqVSkSnVsXKZhizzPydyVCcvf00YJssYh9IEjVdWVpxedLFVJDmO
 jatsMaEAe7Z8avtah6u5vDGTQCEPQjhHPEqhW/EUfCNG1uK6DjyMG4dDsCMufJ7N
 Ze646oXD6zd45hSPQxMV1r8ZvlQoubUgctOBNqs/nDhOblRQ7MRqkRHhPRvvzsBG
 ffVB145l5v1cud0IcpIbfWPtosnPAvoqYS+qglkXkmXmU7rk0APePDYP7XLh4+qy
 8ROkJQ0rsgmC2cyC/fmwtwWQCMCRUrI9YB2X1zRiBS6TwwATP2uIomtT7GwAfK4+
 AmCwxwy6XPMhUd3xn3Vx
 =32uU
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.13-rc5' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
 "This contains fixes for some asserts
   related to project quotas, a memory leak, a hang when disabling group or
   project quotas before disabling user quotas, Dave's email address, several
   fixes for the alignment of file allocation to stripe unit/width geometry, a
   fix for an assertion with xfs_zero_remaining_bytes, and the behavior of
   metadata writeback in the face of IO errors.

   Details:
   - fix memory leak in xfs_dir2_node_removename
   - fix quota assertion in xfs_setattr_size
   - fix quota assertions in xfs_qm_vop_create_dqattach
   - fix for hang when disabling group and project quotas before
     disabling user quotas
   - fix Dave Chinner's email address in MAINTAINERS
   - fix for file allocation alignment
   - fix for assertion in xfs_buf_stale by removing xfsbdstrat
   - fix for alignment with swalloc mount option
   - fix for "retry forever" semantics on IO errors"

* tag 'xfs-for-linus-v3.13-rc5' of git://oss.sgi.com/xfs/xfs:
  xfs: abort metadata writeback on permanent errors
  xfs: swalloc doesn't align allocations properly
  xfs: remove xfsbdstrat error
  xfs: align initial file allocations correctly
  MAINTAINERS: fix incorrect mail address of XFS maintainer
  xfs: fix infinite loop by detaching the group/project hints from user dquot
  xfs: fix assertion failure at xfs_setattr_nonsize
  xfs: fix false assertion at xfs_qm_vop_create_dqattach
  xfs: fix memory leak in xfs_dir2_node_removename
2013-12-20 15:48:45 -08:00