Commit Graph

26810 Commits

Author SHA1 Message Date
Eric W. Biederman 1523299d58 userns: Convert ext3 to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:27 -07:00
Eric W. Biederman b8a9f9e183 userns: Convert ext2 to use kuid/kgid where appropriate.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:26 -07:00
Eric W. Biederman f04c6ce2cf userns: Convert devpts to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:26 -07:00
Eric W. Biederman ebc887b278 userns: Convert binary formats to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:25 -07:00
Eric W. Biederman 9e4a36ece6 userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:23 -07:00
Eric W. Biederman a7c1938e22 userns: Convert stat to return values mapped from kuids and kgids
- Store uids and gids with kuid_t and kgid_t in struct kstat
- Convert uid and gids to userspace usable values with
  from_kuid and from_kgid

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:08:35 -07:00
Dan Carpenter 75af271ed5 dlm: NULL dereference on failure in kmem_cache_create()
We aren't allowed to pass NULL pointers to kmem_cache_destroy() so if
both allocations fail, it leads to a NULL dereference.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-15 10:39:28 -05:00
Mark Salter fce2447627 C6X: add support to build with BINFMT_ELF_FDPIC
C6x userspace supports a shared library mechanism called DSBT for systems with
no MMU. DSBT is similar to FDPIC in allowing shared text segments and private
copies of data segments without an MMU. Both methods access data using a base
register and offset. With FDPIC, the caller of an external function sets up the
base register for the callee. With DSBT, the called function sets up its own
base register. Other details differ but both userspaces need the same thing
from the kernel loader: a map of where each ELF segment was loaded. The FDPIC
loader already provides this, so DSBT just uses it.

This patch enables BINFMT_ELF_FDPIC by default for C6X and provides the
necessary architecture hooks for the generic loader.

Signed-off-by: Mark Salter <msalter@redhat.com>
2012-05-15 09:17:34 -04:00
Alan Stern 356c05d58a sysfs: get rid of some lockdep false positives
This patch (as1554) fixes a lockdep false-positive report.  The
problem arises because lockdep is unable to deal with the
tree-structured locks created by the device core and sysfs.

This particular problem involves a sysfs attribute method that
unregisters itself, not from the device it was called for, but from a
descendant device.  Lockdep doesn't understand the distinction and
reports a possible deadlock, even though the operation is safe.

This is the sort of thing that would normally be handled by using a
nested lock annotation; unfortunately it's not feasible to do that
here.  There's no sensible way to tell sysfs when attribute removal
occurs in the context of a parent attribute method.

As a workaround, the patch adds a new flag to struct attribute
telling sysfs not to inform lockdep when it acquires a readlock on a
sysfs_dirent instance for the attribute.  The readlock is still
acquired, but lockdep doesn't know about it and hence does not
complain about impossible deadlock scenarios.

Also added are macros for static initialization of attribute
structures with the ignore_lockdep flag set.  The three offending
attributes in the USB subsystem are converted to use the new macros.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Tejun Heo <tj@kernel.org>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-14 12:19:56 -07:00
Linus Torvalds 9ff00d58a9 Three fixes for 3.4:
- Fix a lock ordering deadlock in JFFS2
  - Fix an oops in the dataflash driver, triggered by a dummy call to test
    whether it has OTP functionality.
  - Fix request_mem_region() failure on amsdelta NAND driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iEYEABECAAYFAk+vekgACgkQdwG7hYl686N8bQCfdizsFrliKbDW20R/pO66NoAV
 aloAn0ln+mwe3rIdNt8qKynW8e8dbudF
 =R7XS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.4-20120513' of git://git.infradead.org/linux-mtd

Pull three MTD fixes from David Woodhouse:
 - Fix a lock ordering deadlock in JFFS2
 - Fix an oops in the dataflash driver, triggered by a dummy call to test
   whether it has OTP functionality.
 - Fix request_mem_region() failure on amsdelta NAND driver.

* tag 'for-linus-3.4-20120513' of git://git.infradead.org/linux-mtd:
  mtd: ams-delta: fix request_mem_region() failure
  jffs2: Fix lock acquisition order bug in gc path
  mtd: fix oops in dataflash driver
2012-05-13 11:33:09 -07:00
Rafael J. Wysocki 351520a9eb Merge branch 'pm-sleep'
* pm-sleep:
  PM / Sleep: User space wakeup sources garbage collector Kconfig option
  PM / Sleep: Make the limit of user space wakeup sources configurable
  PM / Documentation: suspend-and-cpuhotplug.txt: Fix typo
  PM / Sleep: Fix a mistake in a conditional in autosleep_store()
  epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready
  PM / Sleep: Add user space interface for manipulating wakeup sources, v3
  PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources
  PM / Sleep: Implement opportunistic sleep, v2
  PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints
  PM / Sleep: Change wakeup source statistics to follow Android
  PM / Sleep: Use wait queue to signal "no wakeup events in progress"
  PM / Sleep: Look for wakeup events in later stages of device suspend
  PM / Hibernate: Hibernate/thaw fixes/improvements
2012-05-11 21:15:09 +02:00
Bernd Schubert f908ee9463 bio allocation failure due to bio_get_nr_vecs()
The number of bio_get_nr_vecs() is passed down via bio_alloc() to
bvec_alloc_bs(), which fails the bio allocation if
nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
unexpected bio allocation failure.
Limiting to queue_max_segments() is not sufficient, as max_segments
also might be very large.

bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
bio_alloc(GFP_NOIO, nvecs)
xfs_alloc_ioend_bio()

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-11 16:45:12 +02:00
Jeff Moyer 080399aaaf block: don't mark buffers beyond end of disk as mapped
Hi,

We have a bug report open where a squashfs image mounted on ppc64 would
exhibit errors due to trying to read beyond the end of the disk.  It can
easily be reproduced by doing the following:

[root@ibm-p750e-02-lp3 ~]# ls -l install.img
-rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img
[root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test
[root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null
dd: reading `/dev/loop0': Input/output error
277376+0 records in
277376+0 records out
142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s

In dmesg, you'll find the following:

squashfs: version 4.0 (2009/01/31) Phillip Lougher
[   43.106012] attempt to access beyond end of device
[   43.106029] loop0: rw=0, want=277410, limit=277408
[   43.106039] Buffer I/O error on device loop0, logical block 138704
[   43.106053] attempt to access beyond end of device
[   43.106057] loop0: rw=0, want=277412, limit=277408
[   43.106061] Buffer I/O error on device loop0, logical block 138705
[   43.106066] attempt to access beyond end of device
[   43.106070] loop0: rw=0, want=277414, limit=277408
[   43.106073] Buffer I/O error on device loop0, logical block 138706
[   43.106078] attempt to access beyond end of device
[   43.106081] loop0: rw=0, want=277416, limit=277408
[   43.106085] Buffer I/O error on device loop0, logical block 138707
[   43.106089] attempt to access beyond end of device
[   43.106093] loop0: rw=0, want=277418, limit=277408
[   43.106096] Buffer I/O error on device loop0, logical block 138708
[   43.106101] attempt to access beyond end of device
[   43.106104] loop0: rw=0, want=277420, limit=277408
[   43.106108] Buffer I/O error on device loop0, logical block 138709
[   43.106112] attempt to access beyond end of device
[   43.106116] loop0: rw=0, want=277422, limit=277408
[   43.106120] Buffer I/O error on device loop0, logical block 138710
[   43.106124] attempt to access beyond end of device
[   43.106128] loop0: rw=0, want=277424, limit=277408
[   43.106131] Buffer I/O error on device loop0, logical block 138711
[   43.106135] attempt to access beyond end of device
[   43.106139] loop0: rw=0, want=277426, limit=277408
[   43.106143] Buffer I/O error on device loop0, logical block 138712
[   43.106147] attempt to access beyond end of device
[   43.106151] loop0: rw=0, want=277428, limit=277408
[   43.106154] Buffer I/O error on device loop0, logical block 138713
[   43.106158] attempt to access beyond end of device
[   43.106162] loop0: rw=0, want=277430, limit=277408
[   43.106166] attempt to access beyond end of device
[   43.106169] loop0: rw=0, want=277432, limit=277408
...
[   43.106307] attempt to access beyond end of device
[   43.106311] loop0: rw=0, want=277470, limit=2774

Squashfs manages to read in the end block(s) of the disk during the
mount operation.  Then, when dd reads the block device, it leads to
block_read_full_page being called with buffers that are beyond end of
disk, but are marked as mapped.  Thus, it would end up submitting read
I/O against them, resulting in the errors mentioned above.  I fixed the
problem by modifying init_page_buffers to only set the buffer mapped if
it fell inside of i_size.

Cheers,
Jeff

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Nick Piggin <npiggin@kernel.dk>

--

Changes from v1->v2: re-used max_block, as suggested by Nick Piggin.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-11 16:42:14 +02:00
Bob Peterson 41db1ab9be GFS2: Add rgrp information to block_alloc trace point
This is a second attempt at a patch that adds rgrp information to the
block allocation trace point for GFS2. As suggested, the patch was
modified to list the rgrp information _after_ the fields that exist today.

Again, the reason for this patch is to allow us to trace and debug
problems with the block reservations patch, which is still in the works.
We can debug problems with reservations if we can see what block allocations
result from the block reservations. It may also be handy in figuring out
if there are problems in rgrp free space accounting. In other words,
we can use it to track the rgrp and its free space along side the allocations
that are taking place.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-11 10:31:34 +01:00
Bob Peterson f2f9c81244 GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer
It turns out that the "new" parameter to function gfs2_meta_indirect_buffer
was always being passed in as zero. Therefore, this patch eliminates it
and simplifies the function.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-11 10:19:23 +01:00
Linus Torvalds 26fe575028 vfs: make it possible to access the dentry hash/len as one 64-bit entry
This allows comparing hash and len in one operation on 64-bit
architectures.  Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.

The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer.  This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 19:54:35 -07:00
Linus Torvalds ee983e8967 vfs: move dentry name length comparison from dentry_cmp() into callers
All callers do want to check the dentry length, but some of them can
check the length and the hash together, so doing it in dentry_cmp() can
be counter-productive.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 19:54:35 -07:00
Linus Torvalds 94753db5ed vfs: do the careful dentry name access for all dentry_cmp cases
Commit 12f8ad4b05 ("vfs: clean up __d_lookup_rcu() and dentry_cmp()
interfaces") did the careful ACCESS_ONCE() of the dentry name only for
the word-at-a-time case, even though the issue is generic.

Admittedly I don't really see gcc ever reloading the value in the middle
of the loop, so the ACCESS_ONCE() protects us from a fairly theoretical
issue. But better safe than sorry.

Also, this consolidates the common parts of the word-at-a-time and
bytewise logic, which includes checking the length.  We'll be changing
that later.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 19:54:09 -07:00
Linus Torvalds 8c01a529b8 vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu
The check for d_unhashed() is not strictly incorrect, but at the same
time it is also not sensible.  The actual dentry removal from the dentry
hash chains is totally asynchronous to the __d_lookup_rcu() logic, and
we depend on __d_drop() updating the sequence number to invalidate any
lookup of an unhashed dentry.

So checking d_unhashed() is not incorrect, but it's not useful either:
the code has to work correctly even without it. So just remove it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 19:52:35 -07:00
Linus Torvalds 7c283324da Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc fixes from Andrew Morton.

* emailed from Andrew Morton <akpm@linux-foundation.org>: (8 patches)
  MAINTAINERS: add maintainer for LED subsystem
  mm: nobootmem: fix sign extend problem in __free_pages_memory()
  drivers/leds: correct __devexit annotations
  memcg: free spare array to avoid memory leak
  namespaces, pid_ns: fix leakage on fork() failure
  hugetlb: prevent BUG_ON in hugetlb_fault() -> hugetlb_cow()
  mm: fix division by 0 in percpu_pagelist_fraction()
  proc/pid/pagemap: correctly report non-present ptes and holes between vmas
2012-05-10 15:17:24 -07:00
Konstantin Khlebnikov 16fbdce62d proc/pid/pagemap: correctly report non-present ptes and holes between vmas
Reset the current pagemap-entry if the current pte isn't present, or if
current vma is over.  Otherwise pagemap reports last entry again and
again.

Non-present pte reporting was broken in commit 092b50bacd ("pagemap:
introduce data structure for pagemap entry")

Reporting for holes was broken in commit 5aaabe831e ("pagemap: avoid
splitting thp when reading /proc/pid/pagemap")

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 15:06:44 -07:00
Dan Carpenter 48a5730e5b cifs: fix revalidation test in cifs_llseek()
This test is always true so it means we revalidate the length every
time, which generates more network traffic.  When it is SEEK_SET or
SEEK_CUR, then we don't need to revalidate.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-09 15:16:22 -05:00
Bob Peterson 6de1e2f34a GFS2: Remove redundant metadata block type check
This patch removes a redundant metadata block check. See description below.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-08 16:18:55 +01:00
David S. Miller 0d6c4a2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/param.c
	drivers/net/wireless/iwlwifi/iwl-agn-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans.h

Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell.  In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.

In e1000e a conflict arose in the validation code for settings of
adapter->itr.  'net-next' had more sophisticated logic so that
logic was used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 23:35:40 -04:00
Josh Cartwright 226bb7df3d jffs2: Fix lock acquisition order bug in gc path
The locking policy is such that the erase_complete_block spinlock is
nested within the alloc_sem mutex.  This fixes a case in which the
acquisition order was erroneously reversed.  This issue was caught by
the following lockdep splat:

   =======================================================
   [ INFO: possible circular locking dependency detected ]
   3.0.5 #1
   -------------------------------------------------------
   jffs2_gcd_mtd6/299 is trying to acquire lock:
    (&c->alloc_sem){+.+.+.}, at: [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890

   but task is already holding lock:
    (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #1 (&(&c->erase_completion_lock)->rlock){+.+...}:
          [<c008bec4>] validate_chain+0xe6c/0x10bc
          [<c008c660>] __lock_acquire+0x54c/0xba4
          [<c008d240>] lock_acquire+0xa4/0x114
          [<c046780c>] _raw_spin_lock+0x3c/0x4c
          [<c01f744c>] jffs2_garbage_collect_pass+0x4c/0x890
          [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [<c0071a68>] kthread+0x98/0xa0
          [<c000f264>] kernel_thread_exit+0x0/0x8

   -> #0 (&c->alloc_sem){+.+.+.}:
          [<c008ad2c>] print_circular_bug+0x70/0x2c4
          [<c008c08c>] validate_chain+0x1034/0x10bc
          [<c008c660>] __lock_acquire+0x54c/0xba4
          [<c008d240>] lock_acquire+0xa4/0x114
          [<c0466628>] mutex_lock_nested+0x74/0x33c
          [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890
          [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [<c0071a68>] kthread+0x98/0xa0
          [<c000f264>] kernel_thread_exit+0x0/0x8

   other info that might help us debug this:

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&(&c->erase_completion_lock)->rlock);
                                  lock(&c->alloc_sem);
                                  lock(&(&c->erase_completion_lock)->rlock);
     lock(&c->alloc_sem);

    *** DEADLOCK ***

   1 lock held by jffs2_gcd_mtd6/299:
    #0:  (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890

   stack backtrace:
   [<c00155dc>] (unwind_backtrace+0x0/0x100) from [<c0463dc0>] (dump_stack+0x20/0x24)
   [<c0463dc0>] (dump_stack+0x20/0x24) from [<c008ae84>] (print_circular_bug+0x1c8/0x2c4)
   [<c008ae84>] (print_circular_bug+0x1c8/0x2c4) from [<c008c08c>] (validate_chain+0x1034/0x10bc)
   [<c008c08c>] (validate_chain+0x1034/0x10bc) from [<c008c660>] (__lock_acquire+0x54c/0xba4)
   [<c008c660>] (__lock_acquire+0x54c/0xba4) from [<c008d240>] (lock_acquire+0xa4/0x114)
   [<c008d240>] (lock_acquire+0xa4/0x114) from [<c0466628>] (mutex_lock_nested+0x74/0x33c)
   [<c0466628>] (mutex_lock_nested+0x74/0x33c) from [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890)
   [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) from [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc)
   [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [<c0071a68>] (kthread+0x98/0xa0)
   [<c0071a68>] (kthread+0x98/0xa0) from [<c000f264>] (kernel_thread_exit+0x0/0x8)

This was introduce in '81cfc9f jffs2: Fix serious write stall due to erase'.

Cc: stable@kernel.org [2.6.37+]
Signed-off-by: Josh Cartwright <joshc@linux.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-05-07 20:30:14 +01:00
Linus Torvalds 8529f613b6 vfs: don't force a big memset of stat data just to clear padding fields
Admittedly this is something that the compiler should be able to just do
for us, but gcc just isn't that smart.  And trying to use a structure
initializer (which would get us the right semantics) ends up resulting
in gcc allocating stack space for _two_ 'struct stat', and then copying
one into the other.

So do it by hand - just have a per-architecture macro that initializes
the padding fields.  And if the architecture doesn't provide one, fall
back to the old behavior of just doing the whole memset() first.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-06 18:02:40 -07:00
Linus Torvalds a52dd971f9 vfs: de-crapify "cp_new_stat()" function
It's an unreadable mess of 32-bit vs 64-bit #ifdef's that mostly follow
a rather simple pattern.

Make a helper #define to handle that pattern, in the process making the
code both shorter and more readable.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-06 17:47:30 -07:00
Linus Torvalds 271fd5d728 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "The big ones here are a memory leak we introduced in rc1, and a
  scheduling while atomic if the transid on disk doesn't match the
  transid we expected.  This happens for corrupt blocks, or out of date
  disks.

  It also fixes up the ioctl definition for our ioctl to resolve logical
  inode numbers.  The __u32 was a merging error and doesn't match what
  we ship in the progs."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: avoid sleeping in verify_parent_transid while atomic
  Btrfs: fix crash in scrub repair code when device is missing
  btrfs: Fix mismatching struct members in ioctl.h
  Btrfs: fix page leak when allocing extent buffers
  Btrfs: Add properly locking around add_root_to_dirty_list
2012-05-06 10:20:07 -07:00
Chris Mason b9fab919b7 Btrfs: avoid sleeping in verify_parent_transid while atomic
verify_parent_transid needs to lock the extent range to make
sure no IO is underway, and so it can safely clear the
uptodate bits if our checks fail.

But, a few callers are using it with spinlocks held.  Most
of the time, the generation numbers are going to match, and
we don't want to switch to a blocking lock just for the error
case.  This adds an atomic flag to verify_parent_transid,
and changes it to return EAGAIN if it needs to block to
properly verifiy things.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-06 07:23:47 -04:00
Arve Hjønnevåg 4d7e30d989 epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready
When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
wakeup_source will be active to prevent suspend. This can be used to
handle wakeup events from a driver that support poll, e.g. input, if
that driver wakes up the waitqueue passed to epoll before allowing
suspend.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-05-05 21:50:41 +02:00
Linus Torvalds 12f8ad4b05 vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces
The calling conventions for __d_lookup_rcu() and dentry_cmp() are
annoying in different ways, and there is actually one single underlying
reason for both of the annoyances.

The fundamental reason is that we do the returned dentry sequence number
check inside __d_lookup_rcu() instead of doing it in the caller.  This
results in two annoyances:

 - __d_lookup_rcu() now not only needs to return the dentry and the
   sequence number that goes along with the lookup, it also needs to
   return the inode pointer that was validated by that sequence number
   check.

 - and because we did the sequence number check early (to validate the
   name pointer and length) we also couldn't just pass the dentry itself
   to dentry_cmp(), we had to pass the counted string that contained the
   name.

So that sequence number decision caused two separate ugly calling
conventions.

Both of these problems would be solved if we just did the sequence
number check in the caller instead.  There's only one caller, and that
caller already has to do the sequence number check for the parent
anyway, so just do that.

That allows us to stop returning the dentry->d_inode in that in-out
argument (pointer-to-pointer-to-inode), so we can make the inode
argument just a regular input inode pointer.  The caller can just load
the inode from dentry->d_inode, and then do the sequence number check
after that to make sure that it's synchronized with the name we looked
up.

And it allows us to just pass in the dentry to dentry_cmp(), which is
what all the callers really wanted.  Sure, dentry_cmp() has to be a bit
careful about the dentry (which is not stable during RCU lookup), but
that's actually very simple.

And now that dentry_cmp() can clearly see that the first string argument
is a dentry, we can use the direct word access for that, instead of the
careful unaligned zero-padding.  The dentry name is always properly
aligned, since it is a single path component that is either embedded
into the dentry itself, or was allocated with kmalloc() (see __d_alloc).

Finally, this also uninlines the nasty slow-case for dentry comparisons:
that one *does* need to do a sequence number check, since it will call
in to the low-level filesystems, and we want to give those a stable
inode pointer and path component length/start arguments.  Doing an extra
sequence check for that slow case is not a problem, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-04 18:21:14 -07:00
Greg Kroah-Hartman 6f24f89287 hfsplus: Fix potential buffer overflows
Commit ec81aecb29 ("hfs: fix a potential buffer overflow") fixed a few
potential buffer overflows in the hfs filesystem.  But as Timo Warns
pointed out, these changes also need to be made on the hfsplus
filesystem as well.

Reported-by: Timo Warns <warns@pre-sense.de>
Acked-by: WANG Cong <amwang@redhat.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Sage Weil <sage@newdream.net>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: stable <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-04 17:11:24 -07:00
Linus Torvalds c6de1687f5 Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  fs/cifs: fix parsing of dfs referrals
  cifs: make sure we ignore the credentials= and cred= options
  [CIFS] Update cifs version to 1.78
  cifs - check S_AUTOMOUNT in revalidate
  cifs: add missing initialization of server->req_lock
  cifs: don't cap ra_pages at the same level as default_backing_dev_info
  CIFS: Fix indentation in cifs_show_options
2012-05-04 15:34:21 -07:00
Stefan Behrens ea9947b439 Btrfs: fix crash in scrub repair code when device is missing
Fix that when scrub tries to repair an I/O or checksum error and one of
the devices containing the mirror is missing, it crashes in bio_add_page
because the bdev is a NULL pointer for missing devices.

Reported-by: Marco L. Crociani <marco.crociani@gmail.com>
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04 15:16:07 -04:00
Alexander Block d04b1debc9 btrfs: Fix mismatching struct members in ioctl.h
Fix the size members of btrfs_ioctl_ino_path_args and
btrfs_ioctl_logical_ino_args. The user space btrfs-progs utilities used
__u64 and the kernel headers used __u32 before.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04 15:16:06 -04:00
Josef Bacik 17de39ac17 Btrfs: fix page leak when allocing extent buffers
If we happen to alloc a extent buffer and then alloc a page and notice that
page is already attached to an extent buffer, we will only unlock it and
free our existing eb.  Any pages currently attached to that eb will be
properly freed, but we don't do the page_cache_release() on the page where
we noticed the other extent buffer which can cause us to leak pages and I
hope cause the weird issues we've been seeing in this area.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04 15:16:06 -04:00
Chris Mason e5846fc665 Btrfs: Add properly locking around add_root_to_dirty_list
add_root_to_dirty_list happens once at the very beginning of the
transaction, but it is still racey.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04 15:14:11 -04:00
Steven Whitehouse f9425ad4e5 GFS2: Fix sgid propagation when using ACLs
This cleans up the mode setting code when creating inodes. The
SGID bit was being reset by setattr_copy() when the user creating a
subdirectory was not in the owning group. When ACLs are in use this
SGID bit should have been propagated if the ACL allows creation of
a subdirectory. GFS2's behaviour now matches that of the other ACL
supporting filesystems in this regard.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-04 14:33:06 +01:00
Stefan Metzmacher d8f2799b10 fs/cifs: fix parsing of dfs referrals
The problem was that the first referral was parsed more than once
and so the caller tried the same referrals multiple times.

The problem was introduced partly by commit
066ce68994,
where 'ref += le16_to_cpu(ref->Size);' got lost,
but that was also wrong...

Cc: <stable@vger.kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Tested-by: Björn Jacke <bj@sernet.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-03 22:47:39 -05:00
James Morris 898bfc1d46 Linux 3.4-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPnb50AAoJEHm+PkMAQRiGAE0H/A4zFZIUGmF3miKPDYmejmrZ
 oVDYxVAu6JHjHWhu8E3VsinvyVscowjV8dr15eSaQzmDmRkSHAnUQ+dB7Di7jLC2
 MNopxsWjwyZ8zvvr3rFR76kjbWKk/1GYytnf7GPZLbJQzd51om2V/TY/6qkwiDSX
 U8Tt7ihSgHAezefqEmWp2X/1pxDCEt+VFyn9vWpkhgdfM1iuzF39MbxSZAgqDQ/9
 JJrBHFXhArqJguhENwL7OdDzkYqkdzlGtS0xgeY7qio2CzSXxZXK4svT6FFGA8Za
 xlAaIvzslDniv3vR2ZKd6wzUwFHuynX222hNim3QMaYdXm012M+Nn1ufKYGFxI0=
 =4d4w
 -----END PGP SIGNATURE-----

Merge tag 'v3.4-rc5' into next

Linux 3.4-rc5

Merge to pull in prerequisite change for Smack:
86812bb0de

Requested by Casey.
2012-05-04 12:46:40 +10:00
Linus Torvalds e419b4cc58 vfs: make word-at-a-time accesses handle a non-existing page
It turns out that there are more cases than CONFIG_DEBUG_PAGEALLOC that
can have holes in the kernel address space: it seems to happen easily
with Xen, and it looks like the AMD gart64 code will also punch holes
dynamically.

Actually hitting that case is still very unlikely, so just do the
access, and take an exception and fix it up for the very unlikely case
of it being a page-crosser with no next page.

And hey, this abstraction might even help other architectures that have
other issues with unaligned word accesses than the possible missing next
page.  IOW, this could do the byte order magic too.

Peter Anvin fixed a thinko in the shifting for the exception case.

Reported-and-tested-by: Jana Saout <jana@saout.de>
Cc:  Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-03 14:01:40 -07:00
Jeff Layton a557b97616 cifs: make sure we ignore the credentials= and cred= options
Older mount.cifs programs passed this on to the kernel after parsing
the file. Make sure the kernel ignores that option.

Should fix:

    https://bugzilla.kernel.org/show_bug.cgi?id=43195

Cc: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Ronald <ronald645@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-03 13:50:01 -05:00
Steve French f966424e99 [CIFS] Update cifs version to 1.78
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-03 13:50:01 -05:00
Ian Kent 936ad90944 cifs - check S_AUTOMOUNT in revalidate
When revalidating a dentry, if the inode wasn't known to be a dfs
entry when the dentry was instantiated, such as when created via
->readdir(), the DCACHE_NEED_AUTOMOUNT flag needs to be set on the
dentry in ->d_revalidate().

The false return from cifs_d_revalidate(), due to the inode now
being marked with the S_AUTOMOUNT flag, might not invalidate the
dentry if there is a concurrent unlazy path walk. This is because
the dentry reference count will be at least 2 in this case causing
d_invalidate() to return EBUSY. So the asumption that the dentry
will be discarded then correctly instantiated via ->lookup() might
not hold.

Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: Steve French <smfrench@gmail.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-03 13:49:47 -05:00
Subodh Nijsure 1bdcc63112 UBIFS: remove xattr Kconnfig option
Remove CONFIG_UBIFS_FS_XATTR configuration option and associated
UBIFS_FS_XATTR ifdefs.

Testing:
       Tested using integck while using nandsim on x86 & MX28 based
       platform with Micron MT29F2G08ABAEAH4 nand.

Signed-off-by: Subodh Nijsure <snijsure@grid-net.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-03 14:11:11 +03:00
Dan Carpenter 273946a5c5 UBIFS: remove douple initialization in change_category()
"heap" is initialized twice.  I removed the first one, because it makes
Smatch complain that we use "new_cat" as an offset before checking it.

This doesn't change how the code works, it's just a cleanup.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-03 14:11:11 +03:00
Eric W. Biederman 52137abe18 userns: Convert user specfied uids and gids in chown into kuids and kgid
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:29:34 -07:00
Eric W. Biederman 8e96e3b7b8 userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:29:34 -07:00
Eric W. Biederman 92361636e0 userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types
The conversion of all of the users is not done yet there are too many to change
in one go and leave the code reviewable. For now I change just the header and
a few trivial users and rely on CONFIG_UIDGID_STRICT_TYPE_CHECKS not being set
to ensure that the code will still compile during the transition.

Helper functions i_uid_read, i_uid_write, i_gid_read, i_gid_write are added
so that in most cases filesystems can avoid the complexities of multiple user
namespaces and can concentrate on moving their raw numeric values into and
out of the vfs data structures.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:29:32 -07:00
Eric W. Biederman 18815a1808 userns: Convert capabilities related permsion checks
- Use uid_eq when comparing kuids
  Use gid_eq when comparing kgids
- Use make_kuid(user_ns, 0) to talk about the user_namespace root uid

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:28:40 -07:00
Eric W. Biederman 078de5f706 userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
cred.h and a few trivial users of struct cred are changed.  The rest of the users
of struct cred are left for other patches as there are too many changes to make
in one go and leave the change reviewable.  If the user namespace is disabled and
CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile
and behave correctly.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:28:38 -07:00
Eric W. Biederman ae2975bc34 userns: Convert group_info values from gid_t to kgid_t.
As a first step to converting struct cred to be all kuid_t and kgid_t
values convert the group values stored in group_info to always be
kgid_t values.   Unless user namespaces are used this change should
have no effect.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:27:21 -07:00
Sasikantha babu b4eafca113 sysfs: Removed dup_name entirely in sysfs_rename
Since no one using "dup_name", removed it completely in sysfs_rename.

Signed-off-by: Sasikantha babu <sasikanth.v19@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-02 14:55:09 -07:00
David Teigland 1a058f5288 gfs2: fix recovery during unmount
Journal recovery from lock_dlm should not be ignored
if there is an unmount in progress.  Ignoring it will
causes the recovery to get stuck.  The recovery
process will correctly handle an in-progess unmount.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-02 14:19:12 -05:00
David Teigland 4875647a08 dlm: fixes for nodir mode
The "nodir" mode (statically assign master nodes instead
of using the resource directory) has always been highly
experimental, and never seriously used.  This commit
fixes a number of problems, making nodir much more usable.

- Major change to recovery: recover all locks and restart
  all in-progress operations after recovery.  In some
  cases it's not possible to know which in-progess locks
  to recover, so recover all.  (Most require recovery
  in nodir mode anyway since rehashing changes most
  master nodes.)

- Change the way nodir mode is enabled, from a command
  line mount arg passed through gfs2, into a sysfs
  file managed by dlm_controld, consistent with the
  other config settings.

- Allow recovering MSTCPY locks on an rsb that has not
  yet been turned into a master copy.

- Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
  from a previous, aborted recovery cycle.  Base this
  on the local recovery status not being in the state
  where any nodes should be sending LOCK messages for the
  current recovery cycle.

- Hold rsb lock around dlm_purge_mstcpy_locks() because it
  may run concurrently with dlm_recover_master_copy().

- Maintain highbast on process-copy lkb's (in addition to
  the master as is usual), because the lkb can switch
  back and forth between being a master and being a
  process copy as the master node changes in recovery.

- When recovering MSTCPY locks, flag rsb's that have
  non-empty convert or waiting queues for granting
  at the end of recovery.  (Rename flag from LOCKS_PURGED
  to RECOVER_GRANT and similar for the recovery function,
  because it's not only resources with purged locks
  that need grant a grant attempt.)

- Replace a couple of unnecessary assertion panics with
  error messages.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-02 14:15:27 -05:00
Linus Torvalds 529acf5898 NFS client bugfixes for Linux 3.4
Highlights include:
 - Fixes for the NFSv4 security negotiation
 - Use the correct hostname when mounting from a private namespace
 - NFS net namespace bugfixes for the pipefs filesystem
 - NFSv4 GETACL bugfixes
 - IPv6 bugfix for NFSv4 referrals
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPoK8MAAoJEGcL54qWCgDyr4AP/1cSY4ZjaZwZm1l9M1l1RBtx
 zBBE6RfM+4eKqwAzFNaIjLjslLMMkTV0TsARYG/CQrJ4DuonHDkdGMwXdTgWFYNN
 AuVO50QKTy+8j2PqY5t84/d6WrFrxbCckKyhixb4/uHtl6mB2jdICA7xLWa4hndS
 kPhRYZQt4zs+Db7Y66nXCLnpWaWoR34ZNxbpoCTLLyYIiUOTplfSfJ21bVZWN3Pt
 M5BYUdKDfgDV15V1/UqULL9j3xnrgFsOK9DjiHEXppXZYfEqfwmEMg9ZQw2AfAm1
 HcrcVv3YTa0I4ag3s/IeZ7wot8PJPOMQzVnzvD2FIO8FX+9vkkYQ3BwoQSVv21Ar
 hgywkT/MMlz9mCDqpjJQVgTaNq4AOoFBF5MXQz9KLWSdummjZs3ILMkpV7Ze3qpj
 Q6GEgii5Xr+Pj/D5D5W3gvkcztDhn3ziSv7fuL5fEADfrP6tYxNmLlP1MKPzrtJn
 SP7WnkmcuWXdvfnKAeOeqAsrvDuaNoHRjtNmfe1PAajUWcvVuLidYhi84dtRYvBe
 N4ukQGqerBoHN3nYhQHl0p9arXA6mAdb2Y9Pt9FY3nraA7e+oJWaEfq1vuFEgF8s
 et8mDrGYpVN155qUvCBGNIwyQXgGt6LLhBZVF9OJa59JfRPDkagaIaTVPlhKJm/Q
 Mbx7dfpGgDU+aLipyv2Q
 =vLBv
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fixes for the NFSv4 security negotiation
 - Use the correct hostname when mounting from a private namespace
 - NFS net namespace bugfixes for the pipefs filesystem
 - NFSv4 GETACL bugfixes
 - IPv6 bugfix for NFSv4 referrals

* tag 'nfs-for-3.4-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4.1: Use the correct hostname in the client identifier string
  SUNRPC: RPC client must use the current utsname hostname string
  NFS: get module in idmap PipeFS notifier callback
  NFS: Remove unused function nfs_lookup_with_sec()
  NFS: Honor the authflavor set in the clone mount data
  NFS: Fix following referral mount points with different security
  NFS: Do secinfo as part of lookup
  NFS: Handle exceptions coming out of nfs4_proc_fs_locations()
  NFS: Fix SECINFO_NO_NAME
  SUNRPC: traverse clients tree on PipeFS event
  SUNRPC: set per-net PipeFS superblock before notification
  SUNRPC: skip clients with program without PipeFS entries
  SUNRPC: skip dead but not buried clients on PipeFS events
  Avoid beyond bounds copy while caching ACL
  Avoid reading past buffer when calling GETACL
  fix page number calculation bug for block layout decode buffer
  NFSv4.1 fix page number calculation bug for filelayout decode buffers
  pnfs-obj: Remove unused variable from objlayout_get_deviceinfo()
  nfs4: fix referrals on mounts that use IPv6 addrs
2012-05-02 08:17:57 -07:00
Bob Peterson c0752aa7e4 GFS2: eliminate log elements and simplify
This patch eliminates the gfs2_log_element data structure and
rolls its two components into the gfs2_bufdata. This makes the code
easier to understand and makes it easier to migrate to a rbtree
to keep the list sorted.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-02 09:14:36 +01:00
Jeff Layton 58fa015f61 cifs: add missing initialization of server->req_lock
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-01 22:29:51 -05:00
Jeff Layton 8f71465c19 cifs: don't cap ra_pages at the same level as default_backing_dev_info
While testing, I've found that even when we are able to negotiate a
much larger rsize with the server, on-the-wire reads often end up being
capped at 128k because of ra_pages being capped at that level.

Lifting this restriction gave almost a twofold increase in sequential
read performance on my craptactular KVM test rig with a 1M rsize.

I think this is safe since the actual ra_pages that the VM requests
is run through max_sane_readahead() prior to submitting the I/O. Under
memory pressure we should end up with large readahead requests being
suppressed anyway.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-01 22:27:54 -05:00
Sachin Prabhu 156d17905e CIFS: Fix indentation in cifs_show_options
Trivial patch which fixes a misplaced tab in cifs_show_options().

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-01 22:19:43 -05:00
Randy Dunlap 8a7dc4b04b nfsd: fix nfs4recover.c printk format warning
Fix printk format warnings -- both items are size_t,
so use %zu to print them.

fs/nfsd/nfs4recover.c:580:3: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t'
fs/nfsd/nfs4recover.c:580:3: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'unsigned int'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-30 12:28:48 -07:00
Trond Myklebust 3617e5031b NFSv4.1: Use the correct hostname in the client identifier string
We need to use the hostname of the process that created the nfs_client.
That hostname is now stored in the rpc_client->cl_nodename.

Also remove the utsname()->domainname component. There is no reason
to include the NIS/YP domainname in a client identifier string.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-30 12:04:58 -04:00
Bob Peterson 1c47f09592 GFS2: Eliminate vestigial sd_log_le_rg
This patch eliminates gfs2 superblock variable sd_log_le_rg which
is no longer used.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-30 10:41:04 +01:00
Linus Torvalds 64f371bc31 autofs: make the autofsv5 packet file descriptor use a packetized pipe
The autofs packet size has had a very unfortunate size problem on x86:
because the alignment of 'u64' differs in 32-bit and 64-bit modes, and
because the packet data was not 8-byte aligned, the size of the autofsv5
packet structure differed between 32-bit and 64-bit modes despite
looking otherwise identical (300 vs 304 bytes respectively).

We first fixed that up by making the 64-bit compat mode know about this
problem in commit a32744d4ab ("autofs: work around unhappy compat
problem on x86-64"), and that made a 32-bit 'systemd' work happily on a
64-bit kernel because everything then worked the same way as on a 32-bit
kernel.

But it turned out that 'automount' had actually known and worked around
this problem in user space, so fixing the kernel to do the proper 32-bit
compatibility handling actually *broke* 32-bit automount on a 64-bit
kernel, because it knew that the packet sizes were wrong and expected
those incorrect sizes.

As a result, we ended up reverting that compatibility mode fix, and
thus breaking systemd again, in commit fcbf94b9de.

With both automount and systemd doing a single read() system call, and
verifying that they get *exactly* the size they expect but using
different sizes, it seemed that fixing one of them inevitably seemed to
break the other.  At one point, a patch I seriously considered applying
from Michael Tokarev did a "strcmp()" to see if it was automount that
was doing the operation.  Ugly, ugly.

However, a prettier solution exists now thanks to the packetized pipe
mode.  By marking the communication pipe as being packetized (by simply
setting the O_DIRECT flag), we can always just write the bigger packet
size, and if user-space does a smaller read, it will just get that
partial end result and the extra alignment padding will simply be thrown
away.

This makes both automount and systemd happy, since they now get the size
they asked for, and the kernel side of autofs simply no longer needs to
care - it could pad out the packet arbitrarily.

Of course, if there is some *other* user of autofs (please, please,
please tell me it ain't so - and we haven't heard of any) that tries to
read the packets with multiple writes, that other user will now be
broken - the whole point of the packetized mode is that one system call
gets exactly one packet, and you cannot read a packet in pieces.

Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-29 13:30:08 -07:00
Linus Torvalds 9883035ae7 pipes: add a "packetized pipe" mode for writing
The actual internal pipe implementation is already really about
individual packets (called "pipe buffers"), and this simply exposes that
as a special packetized mode.

When we are in the packetized mode (marked by O_DIRECT as suggested by
Alan Cox), a write() on a pipe will not merge the new data with previous
writes, so each write will get a pipe buffer of its own.  The pipe
buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
will tell the reader side to break the read at that boundary (and throw
away any partial packet contents that do not fit in the read buffer).

End result: as long as you do writes less than PIPE_BUF in size (so that
the pipe doesn't have to split them up), you can now treat the pipe as a
packet interface, where each read() system call will read one packet at
a time.  You can just use a sufficiently big read buffer (PIPE_BUF is
sufficient, since bigger than that doesn't guarantee atomicity anyway),
and the return value of the read() will naturally give you the size of
the packet.

NOTE! We do not support zero-sized packets, and zero-sized reads and
writes to a pipe continue to be no-ops.  Also note that big packets will
currently be split at write time, but that the size at which that
happens is not really specified (except that it's bigger than PIPE_BUF).
Currently that limit is the system page size, but we might want to
explicitly support bigger packets some day.

The main user for this is going to be the autofs packet interface,
allowing us to stop having to care so deeply about exact packet sizes
(which have had bugs with 32/64-bit compatibility modes).  But user
space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
fail with an EINVAL on kernels that do not support this interface.

Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: stable@kernel.org  # needed for systemd/autofs interaction fix
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-29 13:12:42 -07:00
Linus Torvalds e994defb7b VFS: make vfs_fstat() use f[get|put]_light()
Use the *_light() versions that properly avoid doing the file user count
updates when they are unnecessary.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-28 14:55:17 -07:00
Linus Torvalds 3f9f0aa687 VFS: clean up and simplify getname_flags()
This removes a number of silly games around strncpy_from_user() in
do_getname(), and removes that helper function entirely.  We instead
make getname_flags() just use strncpy_from_user() properly directly.

Removing the wrapper function simplifies things noticeably, mostly
because we no longer play the unnecessary games with segments (x86
strncpy_from_user() no longer needs the hack), but also because the
empty path handling is just much more obvious.  The return value of
"strncpy_to_user()" is much more obvious than checking an odd error
return case from do_getname().

[ non-x86 architectures were notified of this change several weeks ago,
  since it is possible that they have copied the old broken x86
  strncpy_from_user. But nobody reacted, so .. See

    http://www.spinics.net/lists/linux-arch/msg17313.html

  for details ]

Cc: linux-arch@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-28 14:38:32 -07:00
Stanislav Kinsbursky 71dfc5fa51 NFS: get module in idmap PipeFS notifier callback
This is bug fix.
Notifier callback is called from SUNRPC module. So before dereferencing NFS
module we have to make sure, that it's alive.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-28 13:22:19 -04:00
Linus Torvalds f7b0069317 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This has our collection of bug fixes.  I missed the last rc because I
  thought our patches were making NFS crash during my xfs test runs.
  Turns out it was an NFS client bug fixed by someone else while I tried
  to bisect it.

  All of these fixes are small, but some are fairly high impact.  The
  biggest are fixes for our mount -o remount handling, a deadlock due to
  GFP_KERNEL allocations in readdir, and a RAID10 error handling bug.

  This was tested against both 3.3 and Linus' master as of this morning."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (26 commits)
  Btrfs: reduce lock contention during extent insertion
  Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir
  Btrfs: Fix space checking during fs resize
  Btrfs: fix block_rsv and space_info lock ordering
  Btrfs: Prevent root_list corruption
  Btrfs: fix repair code for RAID10
  Btrfs: do not start delalloc inodes during sync
  Btrfs: fix that check_int_data mount option was ignored
  Btrfs: don't count CRC or header errors twice while scrubbing
  Btrfs: fix btrfs_ioctl_dev_info() crash on missing device
  btrfs: don't return EINTR
  Btrfs: double unlock bug in error handling
  Btrfs: always store the mirror we read the eb from
  fs/btrfs/volumes.c: add missing free_fs_devices
  btrfs: fix early abort in 'remount'
  Btrfs: fix max chunk size check in chunk allocator
  Btrfs: add missing read locks in backref.c
  Btrfs: don't call free_extent_buffer twice in iterate_irefs
  Btrfs: Make free_ipath() deal gracefully with NULL pointers
  Btrfs: avoid possible use-after-free in clear_extent_bit()
  ...
2012-04-28 09:30:07 -07:00
Linus Torvalds fcbf94b9de Revert "autofs: work around unhappy compat problem on x86-64"
This reverts commit a32744d4ab.

While that commit was technically the right thing to do, and made the
x86-64 compat mode work identically to native 32-bit mode (and thus
fixing the problem with a 32-bit systemd install on a 64-bit kernel), it
turns out that the automount binaries had workarounds for this compat
problem.

Now, the workarounds are disgusting: doing an "uname()" to find out the
architecture of the kernel, and then comparing it for the 64-bit cases
and fixing up the size of the read() in automount for those.  And they
were confused: it's not actually a generic 64-bit issue at all, it's
very much tied to just x86-64, which has different alignment for an
'u64' in 64-bit mode than in 32-bit mode.

But the end result is that fixing the compat layer actually breaks the
case of a 32-bit automount on a x86-64 kernel.

There are various approaches to fix this (including just doing a
"strcmp()" on current->comm and comparing it to "automount"), but I
think that I will do the one that teaches pipes about a special "packet
mode", which will allow user space to not have to care too deeply about
the padding at the end of the autofs packet.

That change will make the compat workaround unnecessary, so let's revert
it first, and get automount working again in compat mode.  The
packetized pipes will then fix autofs for systemd.

Reported-and-requested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Ian Kent <raven@themaw.net>
Cc: stable@kernel.org # for 3.3
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-28 08:29:56 -07:00
Linus Torvalds c629eaf839 Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  Use correct conversion specifiers in cifs_show_options
  CIFS: Show backupuid/gid in /proc/mounts
  cifs: fix offset handling in cifs_iovec_write
2012-04-27 20:56:54 -07:00
Chris Mason dc7fdde39e Btrfs: reduce lock contention during extent insertion
We're spending huge amounts of time on lock contention during
end_io processing because we unconditionally assume we are overwriting
an existing extent in the file for each IO.

This checks to see if we are outside i_size, and if so, it uses a
less expensive readonly search of the btree to look for existing
extents.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27 14:51:05 -04:00
Chris Mason fede766f28 Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir
Btrfs has an optimization where it will preallocate dentries during
readdir to fill in enough information to open the inode without an extra
lookup.

But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and
that leads to deadlocks because our readdir code has tree locks held.

For now, disable this optimization.  We'll fix the gfp mask in the next
merge window.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27 14:23:22 -04:00
Bryan Schumaker e245d4250d NFS: Remove unused function nfs_lookup_with_sec()
This fixes a compiler warning.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:03 -04:00
Bryan Schumaker 7e6eb683d2 NFS: Honor the authflavor set in the clone mount data
The authflavor is set in an nfs_clone_mount structure and passed to the
xdev_mount() functions where it was promptly ignored.  Instead, use it
to initialize an rpc_clnt for the cloned server.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:03 -04:00
Bryan Schumaker f05d147f7e NFS: Fix following referral mount points with different security
I create a new proc_lookup_mountpoint() to use when submounting an NFS
v4 share.  This function returns an rpc_clnt to use for performing an
fs_locations() call on a referral's mountpoint.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:02 -04:00
Bryan Schumaker 72de53ec4b NFS: Do secinfo as part of lookup
Whenever lookup sees wrongsec do a secinfo and retry the lookup to find
attributes of the file or directory, such as "is this a referral
mountpoint?".  This also allows me to remove handling -NFS4ERR_WRONSEC
as part of getattr xdr decoding.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:02 -04:00
Bryan Schumaker db0a9593d5 NFS: Handle exceptions coming out of nfs4_proc_fs_locations()
We don't want to return -NFS4ERR_WRONGSEC to the VFS because it could
cause the kernel to oops.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:01 -04:00
Bryan Schumaker 31e4dda474 NFS: Fix SECINFO_NO_NAME
I was using the same decoder function for SECINFO and SECINFO_NO_NAME,
so it was returning an error when it tried to decode an OP_SECINFO_NO_NAME
header as OP_SECINFO.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:01 -04:00
Sachin Prabhu 5794d21ef4 Avoid beyond bounds copy while caching ACL
When attempting to cache ACLs returned from the server, if the bitmap
size + the ACL size is greater than a PAGE_SIZE but the ACL size itself
is smaller than a PAGE_SIZE, we can read past the buffer page boundary.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:09:53 -04:00
Daniel J Blueman 7654b72417 Btrfs: Fix space checking during fs resize
Fix out-of-space checking, addressing a warning and potential resource
leak when resizing the filesystem down while allocating blocks.

Signed-off-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27 13:55:14 -04:00
Stefan Behrens 1f699d38b6 Btrfs: fix block_rsv and space_info lock ordering
may_commit_transaction() calls
        spin_lock(&space_info->lock);
        spin_lock(&delayed_rsv->lock);
and update_global_block_rsv() calls
        spin_lock(&block_rsv->lock);
        spin_lock(&sinfo->lock);

Lockdep complains about this at run time.
Everywhere except in update_global_block_rsv(), the space_info lock is
the outer lock, therefore the locking order in update_global_block_rsv()
is changed.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27 13:55:14 -04:00
Daniel J Blueman 1daf3540fa Btrfs: Prevent root_list corruption
I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add
correct locking to address this.

Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27 13:55:13 -04:00
Jan Schmidt 3e74317ad7 Btrfs: fix repair code for RAID10
btrfs_map_block sets mirror_num, so that the repair code knows eventually
which device gave us the read error. For RAID10, mirror_num must be 1 or 2.
Before this fix mirror_num was incorrectly related to our stripe index.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27 13:55:13 -04:00
Josef Bacik 996d282c7f Btrfs: do not start delalloc inodes during sync
btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and
start writing them out, but it doesn't splice the list or anything so as
long as somebody is doing work on the box you could end up in this section
_forever_.  So just remove it, it's not needed anyway since sync will start
writeback on all inodes anyway, all we need to do is wait for ordered
extents and then we can commit the transaction.  In my horrible torture test
sync goes from taking 4 minutes to about 1.5 minutes.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27 13:55:12 -04:00
Sachin Prabhu 5a00689930 Avoid reading past buffer when calling GETACL
Bug noticed in commit
bf118a342f

When calling GETACL, if the size of the bitmap array, the length
attribute and the acl returned by the server is greater than the
allocated buffer(args.acl_len), we can Oops with a General Protection
fault at _copy_from_pages() when we attempt to read past the pages
allocated.

This patch allocates an extra PAGE for the bitmap and checks to see that
the bitmap + attribute_length + ACLs don't exceed the buffer space
allocated to it.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Jian Li <jiali@redhat.com>
[Trond: Fixed a size_t vs unsigned int printk() warning]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 13:15:07 -04:00
Bob Peterson 06344b9186 GFS2: Eliminate needless parameter from function gfs2_setbit
This patch eliminates parameter "buf1" from function gfs2_setbit.
This is possible because it was always passed in as bi->bi_bh->b_data.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-27 10:46:07 +01:00
Linus Torvalds 110a5c8b38 Merge branch 'akpm' (Andrew's patch-bomb)
Merge fixes from Andrew Morton:
 "13 fixes.  The acerhdf patches aren't (really) fixes.  But they've
  been stuck in my tree for up to two years, sent to Matthew multiple
  times and the developers are unhappy."

* emailed from Andrew Morton <akpm@linux-foundation.org>: (13 patches)
  mm: fix NULL ptr dereference in move_pages
  mm: fix NULL ptr dereference in migrate_pages
  revert "proc: clear_refs: do not clear reserved pages"
  drivers/rtc/rtc-ds1307.c: fix BUG shown with lock debugging enabled
  arch/arm/mach-ux500/mbox-db5500.c: world-writable sysfs fifo file
  hugetlbfs: lockdep annotate root inode properly
  acerhdf: lowered default temp fanon/fanoff values
  acerhdf: add support for new hardware
  acerhdf: add support for Aspire 1410 BIOS v1.3314
  fs/buffer.c: remove BUG() in possible but rare condition
  mm: fix up the vmscan stat in vmstat
  epoll: clear the tfile_check_list on -ELOOP
  mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma
2012-04-26 15:24:45 -07:00
David Teigland 6d40c4a708 dlm: improve error and debug messages
Change some existing error/debug messages to
collect more useful information, and add
some new error/debug messages to address
recently found problems.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:41:46 -05:00
David Teigland 57638bf3aa dlm: avoid unnecessary search in search_rsb
If the rsb is found in the "keep" tree, but is
not the right type (i.e. not MASTER), we can
return immediately with the result.  There's
no point in going on to search the "toss" list
as if we hadn't found it.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:37:56 -05:00
David Teigland d6e24788d2 dlm: limit rcom debug messages
Unify the checking for both types of ignored
rcom messages, and replace the two log_debug
statements with a single, rate limited debug
message.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:37:37 -05:00
David Teigland 13ef11110f dlm: fix waiter recovery
An outstanding remote operation (an lkb on the "waiter"
list) could sometimes miss being resent during recovery.
The decision was based on the lkb_nodeid field, which
could have changed during an earlier aborted recovery,
so it no longer represents the actual remote destination.
The lkb_wait_nodeid is always the actual remote node,
so it is the best value to use.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:36:04 -05:00
David Teigland 513ef596d4 dlm: prevent connections during shutdown
During lowcomms shutdown, a new connection could possibly
be created, and attempt to use a workqueue that's been
destroyed.  Similarly, during startup, a new connection
could attempt to use a workqueue that's not been set up
yet.  Add a global variable to indicate when new connections
are allowed.

Based on patch by: Christine Caulfield <ccaulfie@redhat.com>

Reported-by: dann frazier <dann.frazier@canonical.com>
Reviewed-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:35:38 -05:00
Jim Rees 10bd295a0b fix page number calculation bug for block layout decode buffer
Signed-off-by: Jim Rees <rees@umich.edu>
Suggested-by: Andy Adamson <andros@netapp.com>
Suggested-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26 12:23:23 -04:00
Andy Adamson e5265a0c58 NFSv4.1 fix page number calculation bug for filelayout decode buffers
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26 12:23:23 -04:00
Sachin Bhamare 9526b2b6d6 pnfs-obj: Remove unused variable from objlayout_get_deviceinfo()
Local variable 'sb' was not being used in objlayout_get_deviceinfo().

Signed-off-by: Sachin Bhamare <sbhamare@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26 12:15:51 -04:00
Weston Andros Adamson 1aba156763 nfs4: fix referrals on mounts that use IPv6 addrs
All referrals (IPv4 addr, IPv6 addr, and DNS) are broken on mounts of
IPv6 addresses, because validation code uses a path that is parsed
from the dev_name ("<server>:<path>") by splitting on the first colon and
colons are used in IPv6 addrs.
This patch ignores colons within IPv6 addresses that are escaped by '[' and ']'.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26 12:11:29 -04:00
Eric W. Biederman 22d917d80e userns: Rework the user_namespace adding uid/gid mapping support
- Convert the old uid mapping functions into compatibility wrappers
- Add a uid/gid mapping layer from user space uid and gids to kernel
  internal uids and gids that is extent based for simplicty and speed.
  * Working with number space after mapping uids/gids into their kernel
    internal version adds only mapping complexity over what we have today,
    leaving the kernel code easy to understand and test.
- Add proc files /proc/self/uid_map /proc/self/gid_map
  These files display the mapping and allow a mapping to be added
  if a mapping does not exist.
- Allow entering the user namespace without a uid or gid mapping.
  Since we are starting with an existing user our uids and gids
  still have global mappings so are still valid and useful they just don't
  have local mappings.  The requirement for things to work are global uid
  and gid so it is odd but perfectly fine not to have a local uid
  and gid mapping.
  Not requiring global uid and gid mappings greatly simplifies
  the logic of setting up the uid and gid mappings by allowing
  the mappings to be set after the namespace is created which makes the
  slight weirdness worth it.
- Make the mappings in the initial user namespace to the global
  uid/gid space explicit.  Today it is an identity mapping
  but in the future we may want to twist this for debugging, similar
  to what we do with jiffies.
- Document the memory ordering requirements of setting the uid and
  gid mappings.  We only allow the mappings to be set once
  and there are no pointers involved so the requirments are
  trivial but a little atypical.

Performance:

In this scheme for the permission checks the performance is expected to
stay the same as the actuall machine instructions should remain the same.

The worst case I could think of is ls -l on a large directory where
all of the stat results need to be translated with from kuids and
kgids to uids and gids.  So I benchmarked that case on my laptop
with a dual core hyperthread Intel i5-2520M cpu with 3M of cpu cache.

My benchmark consisted of going to single user mode where nothing else
was running. On an ext4 filesystem opening 1,000,000 files and looping
through all of the files 1000 times and calling fstat on the
individuals files.  This was to ensure I was benchmarking stat times
where the inodes were in the kernels cache, but the inode values were
not in the processors cache.  My results:

v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)

All of the configurations ran in roughly 120ns when I performed tests
that ran in the cpu cache.

So in summary the performance impact is:
1ns improvement in the worst case with user namespace support compiled out.
8ns aka 5% slowdown in the worst case with user namespace support compiled in.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-04-26 02:01:39 -07:00
Linus Torvalds 2300fd67b4 NFS client bugfixes for Linux 3.4
Highlights include:
 - Fix NFSv4 infinite loops on open(O_TRUNC)
 - Fix an Oops and an infinite loop in the NFSv4 flock code
 - Don't register the PipeFS filesystem until it has been set up
 - Fix an Oops in nfs_try_to_update_request
 - Don't reuse NFSv4 open owners: fixes a bad sequence id storm.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPlbzwAAoJEGcL54qWCgDy24oQALZE67vBft7M2j0BiWhVbV15
 YLbCf6x/h+0BJAkKWdrBaw7N6GX6OYBOX2SsmrBkzYf5mgHeju5+dH0CmRAR5xib
 5d+Lwxif1l+rABfdzzJf8gY1L1THyJCnfmarKKyYEJ5OC1pJyulKLanXSPzPfzlm
 APV5Jf6NM2WRgkCqzP6zf61NG0HbDSR7C//HQ3k21Sdt9XDLf5qLHBSuPIQ+BlZY
 EvpbERTtJgp7rPJsLQv1F2dgasDUQNg8G+tmZatGcqEiNxVyQ2YqwshaldOVqftv
 3Kocs6OW5C1ESj1dFJZmeMZ/+GSHjRJx8fpqHJjmCsh4kPGgFviQDdYwu4FDhhPI
 FZslC5nVi8JMTPNJAFmfvbwPQId/TSRPCWYO5PtW1LSfRT/+25b6M5duro1eGIbJ
 /FDoOCYQmepNOfobU9Q3roDWyNSLYFaUaMJUrccRcAuS3S2NEXisTAT49kmqa1Vm
 ZArOJBnXTgmGi30nKhqqLJ43P61ekhX0AQ6PycZAXkjeRlkQs7AAQbMJZMB2X0r9
 KtRCDPiH2NuR0FwxNMkMP4BXdsaY7Sz/xiSZXLOUf1SeWBiBtYoDdrQ3z67SGOeG
 qxI3qXXl0KC2+l2jnezcWhBf4CDpxftGIBi+rKWJt8stoYzbemB/M1lkoTCwrVzq
 8Gwyy0QTVzE9VkY77oVW
 =hQAK
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix NFSv4 infinite loops on open(O_TRUNC)
 - Fix an Oops and an infinite loop in the NFSv4 flock code
 - Don't register the PipeFS filesystem until it has been set up
 - Fix an Oops in nfs_try_to_update_request
 - Don't reuse NFSv4 open owners: fixes a bad sequence id storm.

* tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Keep dropped state owners on the LRU list for a while
  NFSv4: Ensure that we don't drop a state owner more than once
  NFSv4: Ensure we do not reuse open owner names
  nfs: Enclose hostname in brackets when needed in nfs_do_root_mount
  NFS: put open context on error in nfs_flush_multi
  NFS: put open context on error in nfs_pagein_multi
  NFSv4: Fix open(O_TRUNC) and ftruncate() error handling
  NFSv4: Ensure that we check lock exclusive/shared type against open modes
  NFSv4: Ensure that the LOCK code sets exception->inode
  NFS: check for req==NULL in nfs_try_to_update_request cleanup
  SUNRPC: register PipeFS file system after pernet sybsystem
2012-04-25 21:38:44 -07:00
Will Deacon 63f61a6f46 revert "proc: clear_refs: do not clear reserved pages"
Revert commit 85e72aa538 ("proc: clear_refs: do not clear reserved
pages"), which was a quick fix suitable for -stable until ARM had been
moved over to the gate_vma mechanism:

https://lkml.org/lkml/2012/1/14/55

With commit f9d4861f ("ARM: 7294/1: vectors: use gate_vma for vectors user
mapping"), ARM does now use the gate_vma, so the PageReserved check can be
removed from the proc code.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Nicolas Pitre <nico@linaro.org>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-25 21:26:34 -07:00