Commit Graph

35195 Commits

Author SHA1 Message Date
Al Viro b37199e626 rcuwalk: recheck mount_lock after mountpoint crossing attempts
We can get false negative from __lookup_mnt() if an unrelated vfsmount
gets moved.  In that case legitimize_mnt() is guaranteed to fail,
and we will fall back to non-RCU walk... unless we end up running
into a hard error on a filesystem object we wouldn't have reached
if not for that false negative.  IOW, delaying that check until
the end of pathname resolution is wrong - we should recheck right
after we attempt to cross the mountpoint.  We don't need to recheck
unless we see d_mountpoint() being true - in that case even if
we have just raced with mount/umount, we can simply go on as if
we'd come at the moment when the sucker wasn't a mountpoint; if we
run into a hard error as the result, it was a legitimate outcome.
__lookup_mnt() returning NULL is different in that respect, since
it might've happened due to operation on completely unrelated
mountpoint.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-23 00:32:55 -04:00
Al Viro e825196d48 make prepend_name() work correctly when called with negative *buflen
In all callchains leading to prepend_name(), the value left in *buflen
is eventually discarded unused if prepend_name() has returned a negative.
So we are free to do what prepend() does, and subtract from *buflen
*before* checking for underflow (which turns into checking the sign
of subtraction result, of course).

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-23 00:28:40 -04:00
Eric Biggers 99aea68134 vfs: Don't let __fdget_pos() get FMODE_PATH files
Commit bd2a31d522 ("get rid of fget_light()") introduced the
__fdget_pos() function, which returns the resulting file pointer and
fdput flags combined in an 'unsigned long'.  However, it also changed the
behavior to return files with FMODE_PATH set, which shouldn't happen
because read(), write(), lseek(), etc. aren't allowed on such files.
This commit restores the old behavior.

This regression actually had no effect on read() and write() since
FMODE_READ and FMODE_WRITE are not set on file descriptors opened with
O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH
to fail with ESPIPE rather than EBADF.

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-23 00:03:12 -04:00
Eric Biggers d7a15f8d07 vfs: atomic f_pos access in llseek()
Commit 9c225f2655 ("vfs: atomic f_pos accesses as per POSIX") changed
several system calls to use fdget_pos() instead of fdget(), but missed
sys_llseek().  Fix it.

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-23 00:03:12 -04:00
Linus Torvalds 33807f4f0d Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "A fix for the problem which Al spotted in cifs_writev and a followup
  (noticed when fixing CVE-2014-0069) patch to ensure that cifs never
  sends more than the smb frame length over the socket (as we saw with
  that cifs_iovec_write problem that Jeff fixed last month)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: mask off top byte in get_rfc1002_length()
  cifs: sanity check length of data to send before sending
  CIFS: Fix wrong pos argument of cifs_find_lock_conflict
2014-03-11 11:53:42 -07:00
Linus Torvalds 8712a00514 Merge branch 'akpm' (patches from Andrew Morton)
Merge misc fixes from Andrew Morton:
 "Nine fixes"

* emailed patches from Andrew Morton akpm@linux-foundation.org>:
  cris: convert ffs from an object-like macro to a function-like macro
  hfsplus: add HFSX subfolder count support
  tools/testing/selftests/ipc/msgque.c: handle msgget failure return correctly
  MAINTAINERS: blackfin: add git repository
  revert "kallsyms: fix absolute addresses for kASLR"
  mm/Kconfig: fix URL for zsmalloc benchmark
  fs/proc/base.c: fix GPF in /proc/$PID/map_files
  mm/compaction: break out of loop on !PageBuddy in isolate_freepages_block
  mm: fix GFP_THISNODE callers and clarify
2014-03-10 17:26:36 -07:00
Sergei Antonov d7d673a591 hfsplus: add HFSX subfolder count support
Adds support for HFSX 'HasFolderCount' flag and a corresponding
'folderCount' field in folder records.  (For reference see
HFS_FOLDERCOUNT and kHFSHasFolderCountBit/kHFSHasFolderCountMask in
Apple's source code.)

Ignoring subfolder count leads to fs errors found by Mac:

  ...
  Checking catalog hierarchy.
  HasFolderCount flag needs to be set (id = 105)
  (It should be 0x10 instead of 0)
  Incorrect folder count in a directory (id = 2)
  (It should be 7 instead of 6)
  ...

Steps to reproduce:
 Format with "newfs_hfs -s /dev/diskXXX".
 Mount in Linux.
 Create a new directory in root.
 Unmount.
 Run "fsck_hfs /dev/diskXXX".

The patch handles directory creation, deletion, and rename.

Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-10 17:26:21 -07:00
Artem Fetishev 70335abb26 fs/proc/base.c: fix GPF in /proc/$PID/map_files
The expected logic of proc_map_files_get_link() is either to return 0
and initialize 'path' or return an error and leave 'path' uninitialized.

By the time dname_to_vma_addr() returns 0 the corresponding vma may have
already be gone.  In this case the path is not initialized but the
return value is still 0.  This results in 'general protection fault'
inside d_path().

Steps to reproduce:

  CONFIG_CHECKPOINT_RESTORE=y

    fd = open(...);
    while (1) {
        mmap(fd, ...);
        munmap(fd, ...);
    }

  ls -la /proc/$PID/map_files

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991

Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Signed-off-by: Aleksandr Terekhov <aleksandr_terekhov@epam.com>
Reported-by: <wiebittewas@gmail.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-10 17:26:20 -07:00
Linus Torvalds e6a4b6f5ea Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

Clean up file table accesses (get rid of fget_light() in favor of the
fdget() interface), add proper file position locking.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get rid of fget_light()
  sockfd_lookup_light(): switch to fdget^W^Waway from fget_light
  vfs: atomic f_pos accesses as per POSIX
  ocfs2 syncs the wrong range...
2014-03-10 12:57:26 -07:00
Al Viro bd2a31d522 get rid of fget_light()
instead of returning the flags by reference, we can just have the
low-level primitive return those in lower bits of unsigned long,
with struct file * derived from the rest.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-10 11:44:42 -04:00
Linus Torvalds 9c225f2655 vfs: atomic f_pos accesses as per POSIX
Our write() system call has always been atomic in the sense that you get
the expected thread-safe contiguous write, but we haven't actually
guaranteed that concurrent writes are serialized wrt f_pos accesses, so
threads (or processes) that share a file descriptor and use "write()"
concurrently would quite likely overwrite each others data.

This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says:

 "2.9.7 Thread Interactions with Regular File Operations

  All of the following functions shall be atomic with respect to each
  other in the effects specified in POSIX.1-2008 when they operate on
  regular files or symbolic links: [...]"

and one of the effects is the file position update.

This unprotected file position behavior is not new behavior, and nobody
has ever cared.  Until now.  Yongzhi Pan reported unexpected behavior to
Michael Kerrisk that was due to this.

This resolves the issue with a f_pos-specific lock that is taken by
read/write/lseek on file descriptors that may be shared across threads
or processes.

Reported-by: Yongzhi Pan <panyongzhi@gmail.com>
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-10 11:44:41 -04:00
Al Viro 1b56e98990 ocfs2 syncs the wrong range...
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-03-10 11:43:32 -04:00
Linus Torvalds fe9ea91cde NFS client bugfixes for Linux 3.14
Highlights include:
 
 - Fix another nfs4_sequence corruptor in RELEASE_LOCKOWNER
 - Fix an Oopsable delegation callback race
 - Fix another bad stateid infinite loop
 - Fail the data server I/O is the stateid represents a lost lock
 - Fix an Oopsable sunrpc trace event
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTHJSVAAoJEGcL54qWCgDyVRkP/2t43gjMF6P+Yc7VUW2e5uTv
 rHhPGFLuDVs9oS3WUYegzvThZMs//ovTaYgUSDNpOYztEB6P8bDRm41q/VgUIixY
 zWFoEplDgAZZE7gP2EJuXJv3bEdhJqXuCG2KUysqMsaIGlahrlQdHmqGTz6Y931o
 WROyMWVvnL4IoEtQHVR7DwyqkvSmifPJ8MZZv3Liy82wuw1fCsh8uy8mkYYSbdvN
 OK4JmHqdJ+CbAZ0WmE4Xe3Itqy/aIMBL9Jyrq4Zl1QX0p7ez3Xpy4XwmtlZXn2KP
 bKMfK2vP9RggagIpjUL+dhCqxlsyjlF6EzTnQRe7jXqlJ/vJ9pQF8X294jwRysfp
 80jDqsTSND4JQiZuBISID23N1nL0TzrP2tWqipR9zx5JJMRVzYZWTzEq4w2uAHgg
 aW2vTdRNRLZWydlfFNQ8FiuEPIFoQaJFmOCQisec2LtfffLZZBz7JPofjNH9CgU8
 mcbPhv75m2imXDOylydiVoD4x/myCGheYw2hpqhb1ZeuQxdN9lnwa0JzjPiP1h38
 XIYwzM7TE8WayrdkMDCeIem1dz/VexknfKmXmFXlMfn3GRKxowCSrggxKG92k0eP
 L35cJj91a9AoxMz/ej0erv0iI1flLeoYP9aJzIRtZf+SB1BZkKhmWlFRQKqnlIOA
 BzjYui4mUoEQEa5Sk7Th
 =JfQx
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - Fix another nfs4_sequence corruptor in RELEASE_LOCKOWNER
   - Fix an Oopsable delegation callback race
   - Fix another bad stateid infinite loop
   - Fail the data server I/O is the stateid represents a lost lock
   - Fix an Oopsable sunrpc trace event"

* tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Fix oops when trace sunrpc_task events in nfs client
  NFSv4: Fail the truncate() if the lock/open stateid is invalid
  NFSv4.1 Fail data server I/O if stateid represents a lost lock
  NFSv4: Fix the return value of nfs4_select_rw_stateid
  NFSv4: nfs4_stateid_is_current should return 'true' for an invalid stateid
  NFS: Fix a delegation callback race
  NFSv4: Fix another nfs4_sequence corruptor
2014-03-09 19:17:39 -07:00
Linus Torvalds 2a75184d52 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Small collection of fixes for 3.14-rc. It contains:

   - Three minor update to blk-mq from Christoph.

   - Reduce number of unaligned (< 4kb) in-flight writes on mtip32xx to
     two.  From Micron.

   - Make the blk-mq CPU notify spinlock raw, since it can't be a
     sleeper spinlock on RT.  From Mike Galbraith.

   - Drop now bogus BUG_ON() for bio iteration with blk integrity.  From
     Nic Bellinger.

   - Properly propagate the SYNC flag on requests. From Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: add REQ_SYNC early
  rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
  bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter world
  blk-mq: support partial I/O completions
  blk-mq: merge blk_mq_insert_request and blk_mq_run_request
  blk-mq: remove blk_mq_alloc_rq
  mtip32xx: Reduce the number of unaligned writes to 2
2014-03-07 09:59:44 -08:00
Trond Myklebust 0418dae105 NFSv4: Fail the truncate() if the lock/open stateid is invalid
If the open stateid could not be recovered, or the file locks were lost,
then we should fail the truncate() operation altogether.

Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-05 11:55:25 -05:00
Andy Adamson 869a9d375d NFSv4.1 Fail data server I/O if stateid represents a lost lock
Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-05 11:55:24 -05:00
Trond Myklebust 927864cd92 NFSv4: Fix the return value of nfs4_select_rw_stateid
In commit 5521abfdcf (NFSv4: Resend the READ/WRITE RPC call
if a stateid change causes an error), we overloaded the return value of
nfs4_select_rw_stateid() to cause it to return -EWOULDBLOCK if an RPC
call is outstanding that would cause the NFSv4 lock or open stateid
to change.
That is all redundant when we actually copy the stateid used in the
read/write RPC call that failed, and check that against the current
stateid. It is doubly so, when we consider that in the NFSv4.1 case,
we also set the stateid's seqid to the special value '0', which means
'match the current valid stateid'.

Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-05 11:55:24 -05:00
Trond Myklebust e1253be0ec NFSv4: nfs4_stateid_is_current should return 'true' for an invalid stateid
When nfs4_set_rw_stateid() can fails by returning EIO to indicate that
the stateid is completely invalid, then it makes no sense to have it
trigger a retry of the READ or WRITE operation. Instead, we should just
have it fall through and attempt a recovery.

This fixes an infinite loop in which the client keeps replaying the same
bad stateid back to the server.

Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-05 11:55:06 -05:00
Vyacheslav Dubeyko bd2c003532 hfsplus: fix remount issue
Current implementation of HFS+ driver has small issue with remount
option.  Namely, for example, you are unable to remount from RO mode
into RW mode by means of command "mount -o remount,rw /dev/loop0
/mnt/hfsplus".  Trying to execute sequence of commands results in an
error message:

  mount /dev/loop0 /mnt/hfsplus
  mount -o remount,ro /dev/loop0 /mnt/hfsplus
  mount -o remount,rw /dev/loop0 /mnt/hfsplus

  mount: you must specify the filesystem type

  mount -t hfsplus -o remount,rw /dev/loop0 /mnt/hfsplus

  mount: /mnt/hfsplus not mounted or bad option

The reason of such issue is failure of mount syscall:

  mount("/dev/loop0", "/mnt/hfsplus", 0x2282a60, MS_MGC_VAL|MS_REMOUNT, NULL) = -1 EINVAL (Invalid argument)

Namely, hfsplus_parse_options_remount() method receives empty "input"
argument and return false in such case.  As a result, hfsplus_remount()
returns -EINVAL error code.

This patch fixes the issue by means of return true for the case of empty
"input" argument in hfsplus_parse_options_remount() method.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04 07:55:49 -08:00
Jan Kara 15c34a7606 ocfs2: fix quota file corruption
Global quota files are accessed from different nodes.  Thus we cannot
cache offset of quota structure in the quota file after we drop our node
reference count to it because after that moment quota structure may be
freed and reallocated elsewhere by a different node resulting in
corruption of quota file.

Fix the problem by clearing dq_off when we are releasing dquot structure.
We also remove the DB_READ_B handling because it is useless -
DQ_ACTIVE_B is set iff DQ_READ_B is set.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04 07:55:48 -08:00
David Rientjes 668f9abbd4 mm: close PageTail race
Commit bf6bddf192 ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04 07:55:47 -08:00
Trond Myklebust 755a48a7a4 NFS: Fix a delegation callback race
The clean-up in commit 36281caa83 ended up removing a NULL pointer check
that is needed in order to prevent an Oops in
nfs_async_inode_return_delegation().

Reported-by: "Yan, Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/5313E9F6.2020405@intel.com
Fixes: 36281caa83 (NFSv4: Further clean-ups of delegation stateid validation)
Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-02 22:03:12 -05:00
Linus Torvalds 3751c97036 Driver core fix for 3.14-rc5
Here is a single sysfs fix for 3.14-rc5.  It fixes a reported problem
 with the namespace code in sysfs.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlMTq7sACgkQMUfUDdst+yml/wCgkUWPlSGv3UA5AJ1yDBnFqgxB
 RcAAn1CM1x6k3ULHG6Hz7SGkFg9dqpjz
 =aK2B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull sysfs fix from Greg KH:
 "Here is a single sysfs fix for 3.14-rc5.  It fixes a reported problem
  with the namespace code in sysfs"

* tag 'driver-core-3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  sysfs: fix namespace refcnt leak
2014-03-02 15:13:41 -08:00
Trond Myklebust b7e63a1079 NFSv4: Fix another nfs4_sequence corruptor
nfs4_release_lockowner needs to set the rpc_message reply to point to
the nfs4_sequence_res in order to avoid another Oopsable situation
in nfs41_assign_slot.

Fixes: fbd4bfd1d9 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER)
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-01 13:51:53 -06:00
Jeff Layton dca1c8d17a cifs: mask off top byte in get_rfc1002_length()
The rfc1002 length actually includes a type byte, which we aren't
masking off. In most cases, it's not a problem since the
RFC1002_SESSION_MESSAGE type is 0, but when doing a RFC1002 session
establishment, the type is non-zero and that throws off the returned
length.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2014-02-28 14:01:14 -06:00
Linus Torvalds 8d7531825c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull filesystem fixes from Jan Kara:
 "Notification, writeback, udf, quota fixes

  The notification patches are (with one exception) a fallout of my
  fsnotify rework which went into -rc1 (I've extented LTP to cover these
  cornercases to avoid similar breakage in future).

  The UDF patch is a nasty data corruption Al has recently reported,
  the revert of the writeback patch is due to possibility of violating
  sync(2) guarantees, and a quota bug can lead to corruption of quota
  files in ocfs2"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: Allocate overflow events with proper type
  fanotify: Handle overflow in case of permission events
  fsnotify: Fix detection whether overflow event is queued
  Revert "writeback: do not sync data dirtied after sync start"
  quota: Fix race between dqput() and dquot_scan_active()
  udf: Fix data corruption on file type conversion
  inotify: Fix reporting of cookies for inotify events
2014-02-27 10:37:22 -08:00
Li Zefan fed95bab8d sysfs: fix namespace refcnt leak
As mount() and kill_sb() is not a one-to-one match, we shoudn't get
ns refcnt unconditionally in sysfs_mount(), and instead we should
get the refcnt only when kernfs_mount() allocated a new superblock.

v2:
- Changed the name of the new argument, suggested by Tejun.
- Made the argument optional, suggested by Tejun.

v3:
- Make the new argument as second-to-last arg, suggested by Tejun.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
 ---
 fs/kernfs/mount.c      | 8 +++++++-
 fs/sysfs/mount.c       | 5 +++--
 include/linux/kernfs.h | 9 +++++----
 3 files changed, 15 insertions(+), 7 deletions(-)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-25 07:37:52 -08:00
Jan Kara ff57cd5863 fsnotify: Allocate overflow events with proper type
Commit 7053aee26a "fsnotify: do not share events between notification
groups" used overflow event statically allocated in a group with the
size of the generic notification event. This causes problems because
some code looks at type specific parts of event structure and gets
confused by a random data it sees there and causes crashes.

Fix the problem by allocating overflow event with type corresponding to
the group type so code cannot get confused.

Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-25 11:18:06 +01:00
Jan Kara 482ef06c5e fanotify: Handle overflow in case of permission events
If the event queue overflows when we are handling permission event, we
will never get response from userspace. So we must avoid waiting for it.
Change fsnotify_add_notify_event() to return whether overflow has
happened so that we can detect it in fanotify_handle_event() and act
accordingly.

Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-25 11:17:58 +01:00
Jan Kara 2513190a92 fsnotify: Fix detection whether overflow event is queued
Currently we didn't initialize event's list head when we removed it from
the event list. Thus a detection whether overflow event is already
queued wasn't working. Fix it by always initializing the list head when
deleting event from a list.

Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-25 11:17:52 +01:00
Jeff Layton a26054d184 cifs: sanity check length of data to send before sending
We had a bug discovered recently where an upper layer function
(cifs_iovec_write) could pass down a smb_rqst with an invalid amount of
data in it. The length of the SMB frame would be correct, but the rqst
struct would cause smb_send_rqst to send nearly 4GB of data.

This should never be the case. Add some sanity checking to the beginning
of smb_send_rqst that ensures that the amount of data we're going to
send agrees with the length in the RFC1002 header. If it doesn't, WARN()
and return -EIO to the upper layers.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2014-02-23 20:55:07 -06:00
Pavel Shilovsky 6b1168e161 CIFS: Fix wrong pos argument of cifs_find_lock_conflict
and use generic_file_aio_write rather than __generic_file_aio_write
in cifs_writev.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steve French <smfrench@gmail.com>
2014-02-23 20:54:50 -06:00
Linus Torvalds 645ceee885 Merge branch 'xfs-fixes-for-3.14-rc4' of git://oss.sgi.com/xfs/xfs
Pull xfs fixes from Dave Chinner:
 "This is the first pull request I've had to do for you, so I'm still
  sorting things out.  The reason I'm sending this and not Ben should be
  obvious from the first commit below - SGI has stepped down from the
  XFS maintainership role.  As such, I'd like to take another
  opportunity to thank them for their many years of effort maintaining
  XFS and supporting the XFS community that they developed from the
  ground up.

  So I haven't had time to work things like signed tags into my
  workflows yet, so this is just a repo branch I'm asking you to pull
  from.  And yes, I named the branch -rc4 because I wanted the fixes in
  rc4, not because the branch was for merging into -rc3.  Probably not
  right, either.

  Anyway, I should have everything sorted out by the time the next merge
  window comes around.  If there's anything that you don't like in the
  pull req, feel free to flame me unmercifully.

  The changes are fixes for recent regressions and important thinkos in
  verification code:

        - a log vector buffer alignment issue on ia32
        - timestamps on truncate got mangled
        - primary superblock CRC validation fixes and error message
          sanitisation"

* 'xfs-fixes-for-3.14-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: limit superblock corruption errors to actual corruption
  xfs: skip verification on initial "guess" superblock read
  MAINTAINERS: SGI no longer maintaining XFS
  xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sb
  xfs: ensure correct log item buffer alignment
  xfs: ensure correct timestamp updates from truncate
2014-02-22 08:26:01 -08:00
Jan Kara 0dc83bd30b Revert "writeback: do not sync data dirtied after sync start"
This reverts commit c4a391b53a. Dave
Chinner <david@fromorbit.com> has reported the commit may cause some
inodes to be left out from sync(2). This is because we can call
redirty_tail() for some inode (which sets i_dirtied_when to current time)
after sync(2) has started or similarly requeue_inode() can set
i_dirtied_when to current time if writeback had to skip some pages. The
real problem is in the functions clobbering i_dirtied_when but fixing
that isn't trivial so revert is a safer choice for now.

CC: stable@vger.kernel.org # >= 3.13
Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-22 02:02:28 +01:00
Nicholas Bellinger eec70897d8 bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter world
Given that bip->bip_iter.bi_size is decremented after bio_advance() ->
bio_integrity_advance() is called, the BUG_ON() in bio_integrity_verify()
ends up tripping in v3.14-rc1 code with the advent of immutable biovecs
in:

commit d57a5f7c66
Author: Kent Overstreet <kmo@daterainc.com>
Date:   Sat Nov 23 17:20:16 2013 -0800

    bio-integrity: Convert to bvec_iter

Given that there is no easy way to ascertain the original bi_size
value, go ahead and drop this BUG_ON().

Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-21 15:56:36 -08:00
Jan Kara 1362f4ea20 quota: Fix race between dqput() and dquot_scan_active()
Currently last dqput() can race with dquot_scan_active() causing it to
call callback for an already deactivated dquot. The race is as follows:

CPU1					CPU2
  dqput()
    spin_lock(&dq_list_lock);
    if (atomic_read(&dquot->dq_count) > 1) {
     - not taken
    if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
      spin_unlock(&dq_list_lock);
      ->release_dquot(dquot);
        if (atomic_read(&dquot->dq_count) > 1)
         - not taken
					  dquot_scan_active()
					    spin_lock(&dq_list_lock);
					    if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
					     - not taken
					    atomic_inc(&dquot->dq_count);
					    spin_unlock(&dq_list_lock);
        - proceeds to release dquot
					    ret = fn(dquot, priv);
					     - called for inactive dquot

Fix the problem by making sure possible ->release_dquot() is finished by
the time we call the callback and new calls to it will notice reference
dquot_scan_active() has taken and bail out.

CC: stable@vger.kernel.org # >= 2.6.29
Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-20 21:57:04 +01:00
Jan Kara 09ebb17ab4 udf: Fix data corruption on file type conversion
UDF has two types of files - files with data stored in inode (ICB in
UDF terminology) and files with data stored in external data blocks. We
convert file from in-inode format to external format in
udf_file_aio_write() when we find out data won't fit into inode any
longer. However the following race between two O_APPEND writes can happen:

CPU1					CPU2
udf_file_aio_write()			udf_file_aio_write()
  down_write(&iinfo->i_data_sem);
  checks that i_size + count1 fits within inode
    => no need to convert
  up_write(&iinfo->i_data_sem);
					  down_write(&iinfo->i_data_sem);
					  checks that i_size + count2 fits
					    within inode => no need to convert
					  up_write(&iinfo->i_data_sem);
  generic_file_aio_write()
    - extends file by count1 bytes
					  generic_file_aio_write()
					    - extends file by count2 bytes

Clearly if count1 + count2 doesn't fit into the inode, we overwrite
kernel buffers beyond inode, possibly corrupting the filesystem as well.

Fix the problem by acquiring i_mutex before checking whether write fits
into the inode and using __generic_file_aio_write() afterwards which
puts check and write into one critical section.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-20 21:56:00 +01:00
Linus Torvalds 6a4d07f85b Merge branch 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Quite a few fixes this time.

  Three locking fixes, all marked for -stable.  A couple error path
  fixes and some misc fixes.  Hugh found a bug in memcg offlining
  sequence and we thought we could fix that from cgroup core side but
  that turned out to be insufficient and got reverted.  A different fix
  has been applied to -mm"

* 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: update cgroup_enable_task_cg_lists() to grab siglock
  Revert "cgroup: use an ordered workqueue for cgroup destruction"
  cgroup: protect modifications to cgroup_idr with cgroup_mutex
  cgroup: fix locking in cgroup_cfts_commit()
  cgroup: fix error return from cgroup_create()
  cgroup: fix error return value in cgroup_mount()
  cgroup: use an ordered workqueue for cgroup destruction
  nfs: include xattr.h from fs/nfs/nfs3proc.c
  cpuset: update MAINTAINERS entry
  arm, pm, vmpressure: add missing slab.h includes
2014-02-20 12:01:09 -08:00
Linus Torvalds e95003c3f9 NFS client bugfixes for Linux 3.14
Highlights include stable fixes for the following bugs:
 
 - General performance regression due to NFS_INO_INVALID_LABEL being set
   when the server doesn't support labeled NFS
 - Hang in the RPC code due to a socket out-of-buffer race
 - Infinite loop when trying to establish the NFSv4 lease
 - Use-after-free bug in the RPCSEC gss code.
 - nfs4_select_rw_stateid is returning with a non-zero error value on success
 
 Other bug fixes:
 
 - Potential memory scribble in the RPC bi-directional RPC code
 - Pipe version reference leak
 - Use the correct net namespace in the new NFSv4 migration code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTBMBlAAoJEGcL54qWCgDyOakP+gKDh0VhKw8GziJFfY+6kHHI
 wej86M/coNnRPUv8n7s3N5TXMoV36qisYpxbIG/WQbOn6MfR3qjto6WoP7+vsrEq
 iNomtLivgEsWJePydTuIyAR/TK0du/zqP4zoPEDgdLDenucEVvkCzGIkqzg8Mddc
 duknEhIq918BkXIe3hRBWuxl+pRjwZur+TY0h/OR11oodqTYHxrE37f5PvREWOmB
 08hhjOFBFYlyEnCjD3I1SFmcXQxkKzvACavvbhTyF6u/37oL/QC1/DZKL5mSdOJ6
 novO8sv9gIpn/RhsEMOdaeYMYM5QTvkYIJQyLpKAYyaLZ42EMbRkczwNE7C0ZWyi
 F9MizDMNTie+DvsSHZPYwABTDOOQOuWPa9PO3Lyo8UtWxfmTOjYr2Rre1wyG0/+0
 ywb3JiKQCtVDPnmSHhqxFfVp9XvS7D2/vz0udgKjrCLKyDC2OMMGLVa/acnrtZmz
 s94QpPiqhjnRqIuKo251HtbK3AVaLBNQxBCPszieYwPm9aZ04P7mjsGg1WuhhB2v
 +eSa9UkicGJwKWJWtIBr54qIAOlEXu2bXY+vio5UfbDb+5qfBHe0TmNrz5QNJ53A
 x0eUBocth9VpW1cv/Rf+o30tJyZGy6Jtv8kn2hfXkJXNL3N1Gn25rD3tpb+5TtdY
 b7gtHy13/oe8yi0tK91f
 =tll2
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include stable fixes for the following bugs:

   - General performance regression due to NFS_INO_INVALID_LABEL being
     set when the server doesn't support labeled NFS
   - Hang in the RPC code due to a socket out-of-buffer race
   - Infinite loop when trying to establish the NFSv4 lease
   - Use-after-free bug in the RPCSEC gss code.
   - nfs4_select_rw_stateid is returning with a non-zero error value on
     success

  Other bug fixes:

  - Potential memory scribble in the RPC bi-directional RPC code
  - Pipe version reference leak
  - Use the correct net namespace in the new NFSv4 migration code"

* tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS fix error return in nfs4_select_rw_stateid
  NFSv4: Use the correct net namespace in nfs4_update_server
  SUNRPC: Fix a pipe_version reference leak
  SUNRPC: Ensure that gss_auth isn't freed before its upcall messages
  SUNRPC: Fix potential memory scribble in xprt_free_bc_request()
  SUNRPC: Fix races in xs_nospace()
  SUNRPC: Don't create a gss auth cache unless rpc.gssd is running
  NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS
2014-02-19 12:13:02 -08:00
Andy Adamson 146d70caaa NFS fix error return in nfs4_select_rw_stateid
Do not return an error when nfs4_copy_delegation_stateid succeeds.

Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1392737765-41942-1-git-send-email-andros@netapp.com
Fixes: ef1820f9be (NFSv4: Don't try to recover NFSv4 locks when...)
Cc: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19 09:31:56 -05:00
Eric Sandeen 5ef11eb070 xfs: limit superblock corruption errors to actual corruption
Today, if

xfs_sb_read_verify
  xfs_sb_verify
    xfs_mount_validate_sb

detects superblock corruption, it'll be extremely noisy, dumping
2 stacks, 2 hexdumps, etc.

This is because we call XFS_CORRUPTION_ERROR in xfs_mount_validate_sb
as well as in xfs_sb_read_verify.

Also, *any* errors in xfs_mount_validate_sb which are not corruption
per se; things like too-big-blocksize, bad version, bad magic, v1 dirs,
rw-incompat etc - things which do not return EFSCORRUPTED - will
still do the whole XFS_CORRUPTION_ERROR spew when xfs_sb_read_verify
sees any error at all.  And it suggests to the user that they
should run xfs_repair, even if the root cause of the mount failure
is a simple incompatibility.

I'll submit that the probably-not-corrupted errors don't warrant
this much noise, so this patch removes the warning for anything
other than EFSCORRUPTED returns, and replaces the lower-level
XFS_CORRUPTION_ERROR with an xfs_notice().

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19 15:39:35 +11:00
Eric Sandeen daba5427da xfs: skip verification on initial "guess" superblock read
When xfs_readsb() does the very first read of the superblock,
it makes a guess at the length of the buffer, based on the
sector size of the underlying storage.  This may or may
not match the filesystem sector size in sb_sectsize, so
we can't i.e. do a CRC check on it; it might be too short.

In fact, mounting a filesystem with sb_sectsize larger
than the device sector size will cause a mount failure
if CRCs are enabled, because we are checksumming a length
which exceeds the buffer passed to it.

So always read twice; the first time we read with NULL
buffer ops to skip verification; then set the proper
read length, hook up the proper verifier, and give it
another go.

Once we are sure that we've got the right buffer length,
we can also use bp->b_length in the xfs_sb_read_verify,
rather than the less-trusted on-disk sectorsize for
secondary superblocks.  Before this we ran the risk of
passing junk to the crc32c routines, which didn't always
handle extreme values.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19 15:39:16 +11:00
Eric Sandeen 7a01e707a3 xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sb
My earlier commit 10e6e65 deserves a layer or two of brown paper
bags.  The logic in that commit means that a CRC failure on the
primary superblock will *never* result in an error return.

Hopefully this fixes it, so that we always return the error
if it's a primary superblock, otherwise only if the filesystem
has CRCs enabled.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-02-19 15:33:05 +11:00
Linus Torvalds 341bbdc512 Another ACL regression. This one more subtle.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTA+AkAAoJEDaohF61QIxkIQwP/jkvIFJFuLCAn62ogIW/lnFe
 nyfjqoz5UpWHEIofXzMalt0ugYby5VLHWI17FPFmAmrpTrpyHFjVt1qt8GhJ8yM3
 mj3inorur9+/COozpEfqacS9bqiuH5DB35ufgA4EQTT9uwnI4AKypwQ3PogrBAxw
 9TE4AedPbqAbYPAyNEZhuNnLCCf6kRIlb0lK6HWPQ7769YsSokmoHxa+Rke1NDyx
 b2oABa4PHQTx0H53ppZKQok77Rg1dALeOfak6AawOeHijzRz05IEdV5ZH8MEMPTD
 Yb9R6cDBMxGg6YKUYgQrE1BYQ9azqsotFFmqE0gYB376ag/R6M3NmM/Jx6bD2OkW
 jmS+pI18EdJ97cRnylmasGYxI1G/3N9RhoTK7g4H5Cvmzs84Khw3cp7cN4LqUMzA
 7+3rh+Gd49gvR0YY3/gjlyTVZihvS7JDiYsAJBCIiTW2UtsLPdNaT/X8K18hmZ5/
 z7awKk/GPoNxUDke4NRFv+zoI+7GjorLG9DZZ/vKeIwR0DN1DQZpNGu/YGN+nHG7
 YfIwAFNjBnyFsR1ev18dR0wSuSm0fGuvPx5CKWQaLdZit/2WxZNVc6oslZ08vUNn
 VqE+MEkd5zKlQ5a7IXo2GUOUkuSsdW9aYXlNbbG4I/CBE2Nanu296lvbRH85bYnf
 hokisNr50zX/7a41v9FD
 =iBAK
 -----END PGP SIGNATURE-----

Merge tag 'jfs-3.14-rc4' of git://github.com/kleikamp/linux-shaggy

Pull jfs fix from David Kleikamp:
 "Another ACL regression. This one more subtle"

* tag 'jfs-3.14-rc4' of git://github.com/kleikamp/linux-shaggy:
  jfs: set i_ctime when setting ACL
2014-02-18 15:49:40 -08:00
Linus Torvalds 805937cf45 Miscellaneous ext4 bug fixes for v3.14
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTA56tAAoJENNvdpvBGATwgrYQAKKWtbMT3xUHJE5Et4//0Pep
 Wn1c8icQDqz+Y1C0L6ELZbY8rQaYb2cQxuf6kJAa7KJHF4lVCa8LzW+VG6S+z+Fy
 +HSoi3eaS/IWzP+eZSE71u+MPN+tdURg5n305/YP40r7OH5lzsQT7LRKosifJwFc
 t+B7lz6XwpgPQ4ooDEMNAHwTaxcocSYlmQqRUgiMsEwqOvskm+CWswB+piPrOzaH
 kyNNIKwds6UH4hVhWEwPk7IrOS6VXe0IlgpkNZM1AOW6DWJ3SNF0qGeH5/e4pG+O
 xpv8UvVej7QJWMyLGPE8z1NBCZ7P57nFOfeiag3Xuvgl/q3WB+5Ye/UZk6DNUaAF
 h2SqS8D4rHuYB9n5izfIDtqJQZ69jbt2dDzTW9a88OVvN1UScaqh++2GeX2OJ/V8
 W6XkOE/g1atO7SyAOaH9RD3D/8+Dp3DZZxocZ4G+/SoKETF8/RboLxyeozoRima2
 FFPaqS105W7b9+QdG1avOlod+0E4wU8gw5kKKuFdAoAjFXaQ+u3R+87w5r8G5alU
 BO+Trf4CTWMLZQv40HEcgDEeF1Nggd48+3GG3oiUmch9qRAZjwyk6gxLvDo1673e
 CcAXpITLn3qPDVzWu/rba8uxRr95PFpXLQ9V8it8b9rWPogodoKg+A7v0Ro/n7bR
 MszulL/GyP1isllyO+Ku
 =XNON
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous ext4 bug fixes for v3.14"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix use after free in jbd2_journal_start_reserved()
  ext4: don't leave i_crtime.tv_sec uninitialized
  ext4: fix online resize with a non-standard blocks per group setting
  ext4: fix online resize with very large inode tables
  ext4: don't try to modify s_flags if the the file system is read-only
  ext4: fix error paths in swap_inode_boot_loader()
  ext4: fix xfstest generic/299 block validity failures
2014-02-18 10:04:09 -08:00
J. R. Okajima 1406b916f4 nfsd: fix lost nfserrno() call in nfsd_setattr()
There is a regression in
	208d0ac 2014-01-07 nfsd4: break only delegations when appropriate
which deletes an nfserrno() call in nfsd_setattr() (by accident,
probably), and NFSD becomes ignoring an error from VFS.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-02-18 12:25:57 -05:00
Jan Kara 45a22f4c11 inotify: Fix reporting of cookies for inotify events
My rework of handling of notification events (namely commit 7053aee26a
"fsnotify: do not share events between notification groups") broke
sending of cookies with inotify events. We didn't propagate the value
passed to fsnotify() properly and passed 4 uninitialized bytes to
userspace instead (so it is also an information leak). Sadly I didn't
notice this during my testing because inotify cookies aren't used very
much and LTP inotify tests ignore them.

Fix the problem by passing the cookie value properly.

Fixes: 7053aee26a
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-18 11:17:17 +01:00
Dan Carpenter 92e3b40537 jbd2: fix use after free in jbd2_journal_start_reserved()
If start_this_handle() fails then it leads to a use after free of
"handle".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-02-17 20:33:01 -05:00
Linus Torvalds 87eeff7974 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
 "We have some patches fixing up ACL support issues from Zheng and
  Guangliang and a mount option to enable/disable this support.  (These
  fixes were somewhat delayed by the Chinese holiday.)

  There is also a small fix for cached readdir handling when directories
  are fragmented"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix __dcache_readdir()
  ceph: add acl, noacl options for cephfs mount
  ceph: make ceph_forget_all_cached_acls() static inline
  ceph: add missing init_acl() for mkdir() and atomic_open()
  ceph: fix ceph_set_acl()
  ceph: fix ceph_removexattr()
  ceph: remove xattr when null value is given to setxattr()
  ceph: properly handle XATTR_CREATE and XATTR_REPLACE
2014-02-17 13:51:00 -08:00
Linus Torvalds 351a7934c0 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Three cifs fixes, the most important fixing the problem with passing
  bogus pointers with writev (CVE-2014-0069).

  Two additional cifs fixes are still in review (including the fix for
  an append problem which Al also discovered)"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix too big maxBuf size for SMB3 mounts
  cifs: ensure that uncached writes handle unmapped areas correctly
  [CIFS] Fix cifsacl mounts over smb2 to not call cifs
2014-02-17 13:50:11 -08:00
David Howells 7026f1929e FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree
When FS-Cache allocates an object, the following sequence of events can
occur:

 -->fscache_alloc_object()
    -->cachefiles_alloc_object() [via cache->ops->alloc_object]
    <--[returns new object]
    -->fscache_attach_object()
    <--[failed]
    -->cachefiles_put_object() [via cache->ops->put_object]
       -->fscache_object_destroy()
          -->fscache_objlist_remove()
             -->rb_erase() to remove the object from fscache_object_list.

resulting in a crash in the rbtree code.

The problem is that the object is only added to fscache_object_list on
the success path of fscache_attach_object() where it calls
fscache_objlist_add().

So if fscache_attach_object() fails, the object won't have been added to
the objlist rbtree.  We do, however, unconditionally try to remove the
object from the tree.

Thanks to NeilBrown for finding this and suggesting this solution.

Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: (a customer of) NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-17 13:47:35 -08:00
Dave Jones 416e2abd92 reiserfs: fix utterly brain-damaged indentation.
This has been this way for years, and every time I stumble across it I
lose my lunch.  After coming across it for the nth time in the Coverity
results, I had to overcome the bystander effect and do something about
it.

This ignores the 79 column limit in favor of making it look like C
instead of gibberish.

The correct thing to do here would be to lose some of the indentation by
breaking this function up into several smaller ones.  I might do that at
some point if I have the stomach to look at this again.

(Also some of those overlong ternary operations would likely be more
readable as regular if's)

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-17 13:46:33 -08:00
Yan, Zheng 4d5f5df673 ceph: fix __dcache_readdir()
If directory is fragmented, readdir() read its dirfrags one by one.
After reading all dirfrags, the corresponding dentries are sorted in
(frag_t, off) order in the dcache. If dentries of a directory are all
cached, __dcache_readdir() can use the cached dentries to satisfy
readdir syscall. But when checking if a given dentry is after the
position of readdir, __dcache_readdir() compares numerical value of
frag_t directly. This is wrong, it should use ceph_frag_compare().

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17 12:37:13 -08:00
Sage Weil 45195e42c7 ceph: add acl, noacl options for cephfs mount
Make the 'acl' option dependent on having ACL support compiled in.  Make
the 'noacl' option work even without it so that one can always ask it to
be off and not error out on mount when it is not supported.

Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-17 12:37:12 -08:00
Guangliang Zhao c969d9bf91 ceph: make ceph_forget_all_cached_acls() static inline
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Sage Weil <sage@inktank.com>
2014-02-17 12:37:12 -08:00
Yan, Zheng b20a95a0dd ceph: add missing init_acl() for mkdir() and atomic_open()
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17 12:37:11 -08:00
Yan, Zheng 7a92d64760 ceph: fix ceph_set_acl()
If acl is equivalent to file mode permission bits, ceph_set_acl()
needs to remove any existing acl xattr. Use __ceph_setxattr() to
handle both setting and removing acl xattr cases, it doesn't return
-ENODATA when there is no acl xattr.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17 12:37:11 -08:00
Yan, Zheng 524186ace6 ceph: fix ceph_removexattr()
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17 12:37:10 -08:00
Yan, Zheng bcdfeb2eb4 ceph: remove xattr when null value is given to setxattr()
For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE
to distinguish null value case from the zero-length value case.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17 12:37:09 -08:00
Yan, Zheng fbc0b970dd ceph: properly handle XATTR_CREATE and XATTR_REPLACE
return -EEXIST if XATTR_CREATE is set and xattr alread exists.
return -ENODATA if XATTR_REPLACE is set but xattr does not exist.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17 12:37:05 -08:00
Trond Myklebust 292f503cad NFSv4: Use the correct net namespace in nfs4_update_server
We need to use the same net namespace that was used to resolve
the hostname and sockaddr arguments.

Fixes: 32e62b7c3e (NFS: Add nfs4_update_server)
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-17 14:15:46 -05:00
Theodore Ts'o 19ea806037 ext4: don't leave i_crtime.tv_sec uninitialized
If the i_crtime field is not present in the inode, don't leave the
field uninitialized.

Fixes: ef7f38359 ("ext4: Add nanosecond timestamps")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-02-16 19:29:32 -05:00
Linus Torvalds 3962dfbe22 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We have a small collection of fixes in my for-linus branch.

  The big thing that stands out is a revert of a new ioctl.  Users
  haven't shipped yet in btrfs-progs, and Dave Sterba found a better way
  to export the information"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: use right clone root offset for compressed extents
  btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
  Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
  Btrfs: fix max_inline mount option
  Btrfs: fix a lockdep warning when cleaning up aborted transaction
  Revert "btrfs: add ioctl to export size of global metadata reservation"
2014-02-16 11:05:27 -08:00
Theodore Ts'o 3d2660d0c9 ext4: fix online resize with a non-standard blocks per group setting
The set_flexbg_block_bitmap() function assumed that the number of
blocks in a blockgroup was sb->blocksize * 8, which is normally true,
but not always!  Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block
bitmap corruption after:

mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G
mount -t ext4 /dev/vdd /vdd
resize2fs /dev/vdd 8G

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Jon Bernard <jbernard@tuxion.com>
Cc: stable@vger.kernel.org
2014-02-15 22:42:25 -05:00
Theodore Ts'o b93c953534 ext4: fix online resize with very large inode tables
If a file system has a large number of inodes per block group, all of
the metadata blocks in a flex_bg may be larger than what can fit in a
single block group.  Unfortunately, ext4_alloc_group_tables() in
resize.c was never tested to see if it would handle this case
correctly, and there were a large number of bugs which caused the
following sequence to result in a BUG_ON:

kernel bug at fs/ext4/resize.c:409!
   ...
call trace:
 [<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830
 [<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80
 [<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00
 [<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0
 [<ffffffff811b9df2>] ? final_putname+0x22/0x50
 [<ffffffff811c1371>] sys_ioctl+0x81/0xa0
 [<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b
code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0
rip  [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180


This can be reproduced with the following command sequence:

   mke2fs -t ext4 -i 4096 /dev/vdd 1G
   mount -t ext4 /dev/vdd /vdd
   resize2fs /dev/vdd 8G

To fix this, we need to make sure the right thing happens when a block
group's inode table straddles two block groups, which means the
following bugs had to be fixed:

1) Not clearing the BLOCK_UNINIT flag in the second block group in
   ext4_alloc_group_tables --- the was proximate cause of the BUG_ON.

2) Incorrectly determining how many block groups contained contiguous
   free blocks in ext4_alloc_group_tables().

3) Incorrectly setting the start of the next block range to be marked
   in use after a discontinuity in setup_new_flex_group_blocks().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-02-15 21:33:13 -05:00
Filipe David Borba Manana 93de4ba864 Btrfs: use right clone root offset for compressed extents
For non compressed extents, iterate_extent_inodes() gives us offsets
that take into account the data offset from the file extent items, while
for compressed extents it doesn't. Therefore we have to adjust them before
placing them in a send clone instruction. Not doing this adjustment leads to
the receiving end requesting for a wrong a file range to the clone ioctl,
which results in different file content from the one in the original send
root.

Issue reproducible with the following excerpt from the test I made for
xfstests:

  _scratch_mkfs
  _scratch_mount "-o compress-force=lzo"

  $XFS_IO_PROG -f -c "truncate 118811" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0x0d -b 39987 92267 39987" $SCRATCH_MNT/foo

  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1

  $XFS_IO_PROG -c "pwrite -S 0x3e -b 80000 200000 80000" $SCRATCH_MNT/foo
  $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
  $XFS_IO_PROG -c "pwrite -S 0xdc -b 10000 250000 10000" $SCRATCH_MNT/foo
  $XFS_IO_PROG -c "pwrite -S 0xff -b 10000 300000 10000" $SCRATCH_MNT/foo

  # will be used for incremental send to be able to issue clone operations
  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/clones_snap

  $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2

  $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
  $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
      -x $SCRATCH_MNT/mysnap2/clones_snap $SCRATCH_MNT/mysnap2
  $FSSUM_PROG -A -f -w $tmp/clones.fssum $SCRATCH_MNT/clones_snap \
      -x $SCRATCH_MNT/clones_snap/mysnap1 -x $SCRATCH_MNT/clones_snap/mysnap2

  $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
  $BTRFS_UTIL_PROG send $SCRATCH_MNT/clones_snap -f $tmp/clones.snap
  $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 \
      -c $SCRATCH_MNT/clones_snap $SCRATCH_MNT/mysnap2 -f $tmp/2.snap

  _scratch_unmount
  _scratch_mkfs
  _scratch_mount

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
  $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 2>> $seqres.full

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/clones.snap
  $FSSUM_PROG -r $tmp/clones.fssum $SCRATCH_MNT/clones_snap 2>> $seqres.full

  $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
  $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 2>> $seqres.full

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-15 08:04:27 -08:00
Anand Jain f085381e6d btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
bdev is null when disk has disappeared and mounted with
the degrade option

stack trace
---------
btrfs_sysfs_add_one+0x105/0x1c0 [btrfs]
open_ctree+0x15f3/0x1fe0 [btrfs]
btrfs_mount+0x5db/0x790 [btrfs]
? alloc_pages_current+0xa4/0x160
mount_fs+0x34/0x1b0
vfs_kern_mount+0x62/0xf0
do_mount+0x22e/0xa80
? __get_free_pages+0x9/0x40
? copy_mount_options+0x31/0x170
SyS_mount+0x7e/0xc0
system_call_fastpath+0x16/0x1b
---------

reproducer:
-------
mkfs.btrfs -draid1 -mraid1 /dev/sdc /dev/sdd
(detach a disk)
devmgt detach /dev/sdc [1]
mount -o degrade /dev/sdd /btrfs
-------

[1] github.com/anajain/devmgt.git

Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Tested-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-15 08:03:09 -08:00
Pavel Shilovsky 2365c4eaf0 CIFS: Fix too big maxBuf size for SMB3 mounts
SMB3 servers can respond with MaxTransactSize of more than 4M
that can cause a memory allocation error returned from kmalloc
in a lock codepath. Also the client doesn't support multicredit
requests now and allows buffer sizes of 65536 bytes only. Set
MaxTransactSize to this maximum supported value.

Cc: stable@vger.kernel.org # 3.7+
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2014-02-14 16:50:47 -06:00
Jeff Layton 5d81de8e86 cifs: ensure that uncached writes handle unmapped areas correctly
It's possible for userland to pass down an iovec via writev() that has a
bogus user pointer in it. If that happens and we're doing an uncached
write, then we can end up getting less bytes than we expect from the
call to iov_iter_copy_from_user. This is CVE-2014-0069

cifs_iovec_write isn't set up to handle that situation however. It'll
blindly keep chugging through the page array and not filling those pages
with anything useful. Worse yet, we'll later end up with a negative
number in wdata->tailsz, which will confuse the sending routines and
cause an oops at the very least.

Fix this by having the copy phase of cifs_iovec_write stop copying data
in this situation and send the last write as a short one. At the same
time, we want to avoid sending a zero-length write to the server, so
break out of the loop and set rc to -EFAULT if that happens. This also
allows us to handle the case where no address in the iovec is valid.

[Note: Marking this for stable on v3.4+ kernels, but kernels as old as
       v2.6.38 may have a similar problem and may need similar fix]

Cc: <stable@vger.kernel.org> # v3.4+
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2014-02-14 16:46:15 -06:00
Josef Bacik 3a0dfa6a12 Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
A user was running into errors from an NFS export of a subvolume that had a
default subvol set.  When we mount a default subvol we will use d_obtain_alias()
to find an existing dentry for the subvolume in the case that the root subvol
has already been mounted, or a dummy one is allocated in the case that the root
subvol has not already been mounted.  This allows us to connect the dentry later
on if we wander into the path.  However if we don't ever wander into the path we
will keep DCACHE_DISCONNECTED set for a long time, which angers NFS.  It doesn't
appear to cause any problems but it is annoying nonetheless, so simply unset
DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to
use d_materialise_unique() instead which will make everything play nicely
together and reconnect stuff if we wander into the defaul subvol path from a
different way.  With this patch I'm no longer getting the NFS errors when
exporting a volume that has been mounted with a default subvol set.  Thanks,

cc: bfields@fieldses.org
cc: ebiederm@xmission.com
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-14 13:44:32 -08:00
Mitch Harder feb5f96589 Btrfs: fix max_inline mount option
Currently, the only mount option for max_inline that has any effect is
max_inline=0.  Any other value that is supplied to max_inline will be
adjusted to a minimum of 4k.  Since max_inline has an effective maximum
of ~3900 bytes due to page size limitations, the current behaviour
only has meaning for max_inline=0.

This patch will allow the the max_inline mount option to accept non-zero
values as indicated in the documentation.

Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-14 13:44:32 -08:00
Liu Bo a9d2d4adb6 Btrfs: fix a lockdep warning when cleaning up aborted transaction
Given now we have 2 spinlock for management of delayed refs,
CONFIG_DEBUG_SPINLOCK=y helped me find this,

[ 4723.413809] BUG: spinlock wrong CPU on CPU#1, btrfs-transacti/2258
[ 4723.414882]  lock: 0xffff880048377670, .magic: dead4ead, .owner: btrfs-transacti/2258, .owner_cpu: 2
[ 4723.417146] CPU: 1 PID: 2258 Comm: btrfs-transacti Tainted: G        W  O 3.12.0+ #4
[ 4723.421321] Call Trace:
[ 4723.421872]  [<ffffffff81680fe7>] dump_stack+0x54/0x74
[ 4723.422753]  [<ffffffff81681093>] spin_dump+0x8c/0x91
[ 4723.424979]  [<ffffffff816810b9>] spin_bug+0x21/0x26
[ 4723.425846]  [<ffffffff81323956>] do_raw_spin_unlock+0x66/0x90
[ 4723.434424]  [<ffffffff81689bf7>] _raw_spin_unlock+0x27/0x40
[ 4723.438747]  [<ffffffffa015da9e>] btrfs_cleanup_one_transaction+0x35e/0x710 [btrfs]
[ 4723.443321]  [<ffffffffa015df54>] btrfs_cleanup_transaction+0x104/0x570 [btrfs]
[ 4723.444692]  [<ffffffff810c1b5d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 4723.450336]  [<ffffffff810c1c2d>] ? trace_hardirqs_on+0xd/0x10
[ 4723.451332]  [<ffffffffa015e5ee>] transaction_kthread+0x22e/0x270 [btrfs]
[ 4723.452543]  [<ffffffffa015e3c0>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[ 4723.457833]  [<ffffffff81079efa>] kthread+0xea/0xf0
[ 4723.458990]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.460133]  [<ffffffff81692aac>] ret_from_fork+0x7c/0xb0
[ 4723.460865]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.496521] ------------[ cut here ]------------

----------------------------------------------------------------------

The reason is that we get to call cond_resched_lock(&head_ref->lock) while
still holding @delayed_refs->lock.

So it's different with __btrfs_run_delayed_refs(), where we do drop-acquire
dance before and after actually processing delayed refs.

Here we don't drop the lock, others are not able to add new delayed refs to
head_ref, so cond_resched_lock(&head_ref->lock) is not necessary here.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-14 13:44:32 -08:00
Chris Mason 11bcac89c0 Revert "btrfs: add ioctl to export size of global metadata reservation"
This reverts commit 01e219e806.

David Sterba found a different way to provide these features without adding a new
ioctl.  We haven't released any progs with this ioctl yet, so I'm taking this out
for now until we finalize things.

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
CC: Jeff Mahoney <jeffm@suse.com>
2014-02-14 13:42:13 -08:00
Linus Torvalds 8ba74517e4 Merge branch 'for-3.14' of git://linux-nfs.org/~bfields/linux
Pull two nfsd bugfixes from Bruce Fields.

* 'for-3.14' of git://linux-nfs.org/~bfields/linux:
  lockd: send correct lock when granting a delayed lock.
  nfsd4: fix acl buffer overrun
2014-02-14 12:48:46 -08:00
Linus Torvalds 5e57dc8110 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block IO fixes from Jens Axboe:
 "Second round of updates and fixes for 3.14-rc2.  Most of this stuff
  has been queued up for a while.  The notable exception is the blk-mq
  changes, which are naturally a bit more in flux still.

  The pull request contains:

   - Two bug fixes for the new immutable vecs, causing crashes with raid
     or swap.  From Kent.

   - Various blk-mq tweaks and fixes from Christoph.  A fix for
     integrity bio's from Nic.

   - A few bcache fixes from Kent and Darrick Wong.

   - xen-blk{front,back} fixes from David Vrabel, Matt Rushton, Nicolas
     Swenson, and Roger Pau Monne.

   - Fix for a vec miscount with integrity vectors from Martin.

   - Minor annotations or fixes from Masanari Iida and Rashika Kheria.

   - Tweak to null_blk to do more normal FIFO processing of requests
     from Shlomo Pongratz.

   - Elevator switching bypass fix from Tejun.

   - Softlockup in blkdev_issue_discard() fix when !CONFIG_PREEMPT from
     me"

* 'for-linus' of git://git.kernel.dk/linux-block: (31 commits)
  block: add cond_resched() to potentially long running ioctl discard loop
  xen-blkback: init persistent_purge_work work_struct
  blk-mq: pair blk_mq_start_request / blk_mq_requeue_request
  blk-mq: dont assume rq->errors is set when returning an error from ->queue_rq
  block: Fix cloning of discard/write same bios
  block: Fix type mismatch in ssize_t_blk_mq_tag_sysfs_show
  blk-mq: rework flush sequencing logic
  null_blk: use blk_complete_request and blk_mq_complete_request
  virtio_blk: use blk_mq_complete_request
  blk-mq: rework I/O completions
  fs: Add prototype declaration to appropriate header file include/linux/bio.h
  fs: Mark function as static in fs/bio-integrity.c
  block/null_blk: Fix completion processing from LIFO to FIFO
  block: Explicitly handle discard/write same segments
  block: Fix nr_vecs for inline integrity vectors
  blk-mq: Add bio_integrity setup to blk_mq_make_request
  blk-mq: initialize sg_reserved_size
  blk-mq: handle dma_drain_size
  blk-mq: divert __blk_put_request for MQ ops
  blk-mq: support at_head inserations for blk_execute_rq
  ...
2014-02-14 10:45:18 -08:00
Dave Kleikamp 844fa1b5f8 jfs: set i_ctime when setting ACL
This fixes a regression in 3.14-rc1 where xfstests generic/307 fails.

jfs sets the ctime on the inode when writing an xattr. Previously,
jfs went ahead and stored an acl that can be completely represented
in the traditional permission bits, so the ctime was always set in
the xattr code. The new code doesn't bother storing the acl in that
case, thus the ctime isn't getting set.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
2014-02-13 15:56:05 -06:00
NeilBrown 2ec197db1a lockd: send correct lock when granting a delayed lock.
If an NFS client attempts to get a lock (using NLM) and the lock is
not available, the server will remember the request and when the lock
becomes available it will send a GRANT request to the client to
provide the lock.

If the client already held an adjacent lock, the GRANT callback will
report the union of the existing and new locks, which can confuse the
client.

This happens because __posix_lock_file (called by vfs_lock_file)
updates the passed-in file_lock structure when adjacent or
over-lapping locks are found.

To avoid this problem we take a copy of the two fields that can
be changed (fl_start and fl_end) before the call and restore them
afterwards.
An alternate would be to allocate a 'struct file_lock', initialise it,
use locks_copy_lock() to take a copy, then locks_release_private()
after the vfs_lock_file() call.  But that is a lot more work.

Reported-by: Olaf Kirch <okir@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

--
v1 had a couple of issues (large on-stack struct and didn't really work properly).
This version is much better tested.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-02-13 14:55:02 -05:00
Theodore Ts'o 2330141097 ext4: don't try to modify s_flags if the the file system is read-only
If an ext4 file system is created by some tool other than mke2fs
(perhaps by someone who has a pathalogical fear of the GPL) that
doesn't set one or the other of the EXT2_FLAGS_{UN}SIGNED_HASH flags,
and that file system is then mounted read-only, don't try to modify
the s_flags field.  Otherwise, if dm_verity is in use, the superblock
will change, causing an dm_verity failure.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-02-12 12:16:04 -05:00
Zheng Liu 30d29b119e ext4: fix error paths in swap_inode_boot_loader()
In swap_inode_boot_loader() we forgot to release ->i_mutex and resume
unlocked dio for inode and inode_bl if there is an error starting the
journal handle.  This commit fixes this issue.

Reported-by: Ahmed Tamrawi <ahmedtamrawi@gmail.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dr. Tilmann Bubeck <t.bubeck@reinform.de>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org  # v3.10+
2014-02-12 11:48:31 -05:00
Eric Whitney 15cc176785 ext4: fix xfstest generic/299 block validity failures
Commit a115f749c1 (ext4: remove wait for unwritten extent conversion from
ext4_truncate) exposed a bug in ext4_ext_handle_uninitialized_extents().
It can be triggered by xfstest generic/299 when run on a test file
system created without a journal.  This test continuously fallocates and
truncates files to which random dio/aio writes are simultaneously
performed by a separate process.  The test completes successfully, but
if the test filesystem is mounted with the block_validity option, a
warning message stating that a logical block has been mapped to an
illegal physical block is posted in the kernel log.

The bug occurs when an extent is being converted to the written state
by ext4_end_io_dio() and ext4_ext_handle_uninitialized_extents()
discovers a mapping for an existing uninitialized extent. Although it
sets EXT4_MAP_MAPPED in map->m_flags, it fails to set map->m_pblk to
the discovered physical block number.  Because map->m_pblk is not
otherwise initialized or set by this function or its callers, its
uninitialized value is returned to ext4_map_blocks(), where it is
stored as a bogus mapping in the extent status tree.

Since map->m_pblk can accidentally contain illegal values that are
larger than the physical size of the file system,  calls to
check_block_validity() in ext4_map_blocks() that are enabled if the
block_validity mount option is used can fail, resulting in the logged
warning message.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org  # 3.11+
2014-02-12 10:42:45 -05:00
J. Bruce Fields 09bdc2d70d nfsd4: fix acl buffer overrun
4ac7249ea5 "nfsd: use get_acl and
->set_acl" forgets to set the size in the case get_acl() succeeds, so
_posix_to_nfsv4_one() can then write past the end of its allocation.
Symptoms were slab corruption warnings.

Also, some minor cleanup while we're here.  (Among other things, note
that the first few lines guarantee that pacl is non-NULL.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-02-11 13:48:11 -05:00
Kent Overstreet 8423ae3d7a block: Fix cloning of discard/write same bios
Immutable biovecs changed the way bio segments are treated in such a way that
bio_for_each_segment() cannot now do what we want for discard/write same bios,
since bi_size means something completely different for them.

Fortunately discard and write same bios never have more than a single biovec, so
bio_for_each_segment() is unnecessary and not terribly meaningful for them, but
we still have to special case them in a few places.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-11 08:40:45 -07:00
Linus Torvalds 6792dfe383 Merge branch 'akpm' (patches from Andrew Morton)
Merge misc fixes from Andrew Morton:
 "A bunch of fixes"

* emailed patches fron Andrew Morton <akpm@linux-foundation.org>:
  ocfs2: check existence of old dentry in ocfs2_link()
  ocfs2: update inode size after zeroing the hole
  ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size
  mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED
  smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls
  mm: fix page leak at nfs_symlink()
  slub: do not assert not having lock in removing freed partial
  gitignore: add all.config
  ocfs2: fix ocfs2_sync_file() if filesystem is readonly
  drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero
  fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem
  xen: properly account for _PAGE_NUMA during xen pte translations
  mm/slub.c: list_lock may not be held in some circumstances
  drivers/md/bcache/extents.c: use %zi to format size_t
  vmcore: prevent PT_NOTE p_memsz overflow during header update
  drivers/message/i2o/i2o_config.c: fix deadlock in compat_ioctl(I2OGETIOPS)
  Documentation/: update 00-INDEX files
  checkpatch: fix detection of git repository
  get_maintainer: fix detection of git repository
  drivers/misc/sgi-gru/grukdump.c: unlocking should be conditional in gru_dump_context()
2014-02-10 16:03:16 -08:00
Xue jiufei 0e048316ff ocfs2: check existence of old dentry in ocfs2_link()
System call linkat first calls user_path_at(), check the existence of
old dentry, and then calls vfs_link()->ocfs2_link() to do the actual
work.  There may exist a race when Node A create a hard link for file
while node B rm it.

         Node A                          Node B
user_path_at()
  ->ocfs2_lookup(),
find old dentry exist
                                rm file, add inode say inodeA
                                to orphan_dir

call ocfs2_link(),create a
hard link for inodeA.

                                rm the link, add inodeA to orphan_dir
                                again

When orphan_scan work start, it calls ocfs2_queue_orphans() to do the
main work.  It first tranverses entrys in orphan_dir, linking all inodes
in this orphan_dir to a list look like this:

	inodeA->inodeB->...->inodeA

When tranvering this list, it will fall into loop, calling iput() again
and again.  And finally trigger BUG_ON(inode->i_state & I_CLEAR).

Signed-off-by: joyce <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Junxiao Bi c7d2cbc364 ocfs2: update inode size after zeroing the hole
fs-writeback will release the dirty pages without page lock whose offset
are over inode size, the release happens at
block_write_full_page_endio().  If not update, dirty pages in file holes
may be released before flushed to the disk, then file holes will contain
some non-zero data, this will cause sparse file md5sum error.

To reproduce the bug, find a big sparse file with many holes, like vm
image file, its actual size should be bigger than available mem size to
make writeback work more frequently, tar it with -S option, then keep
untar it and check its md5sum again and again until you get a wrong
md5sum.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Younger Liu <younger.liu@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Younger Liu d62e74be12 ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size
The issue scenario is as following:

- Create a small file and fallocate a large disk space for a file with
  FALLOC_FL_KEEP_SIZE option.

- ftruncate the file back to the original size again.  but the disk free
  space is not changed back.  This is a real bug that be fixed in this
  patch.

In order to solve the issue above, we modified ocfs2_setattr(), if
attr->ia_size != i_size_read(inode), It calls ocfs2_truncate_file(), and
truncate disk space to attr->ia_size.

Signed-off-by: Younger Liu <younger.liu@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Tested-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Jensen <shencanquan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Rafael Aquini a0b54adda3 mm: fix page leak at nfs_symlink()
Changes in commit a0b8cab3b9 ("mm: remove lru parameter from
__pagevec_lru_add and remove parts of pagevec API") have introduced a
call to add_to_page_cache_lru() which causes a leak in nfs_symlink() as
now the page gets an extra refcount that is not dropped.

Jan Stancek observed and reported the leak effect while running test8
from Connectathon Testsuite.  After several iterations over the test
case, which creates several symlinks on a NFS mountpoint, the test
system was quickly getting into an out-of-memory scenario.

This patch fixes the page leak by dropping that extra refcount
add_to_page_cache_lru() is grabbing.

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: <stable@vger.kernel.org>	[3.11.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:42 -08:00
Younger Liu a987c7ca7f ocfs2: fix ocfs2_sync_file() if filesystem is readonly
If filesystem is readonly, there is no need to flush drive's caches or
force any uncommitted transactions.

[akpm@linux-foundation.org: return -EROFS, not 0]
Signed-off-by: Younger Liu <younger.liucn@gmail.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:42 -08:00
Eric W. Biederman 96c7a2ff21 fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem
Recently due to a spike in connections per second memcached on 3
separate boxes triggered the OOM killer from accept.  At the time the
OOM killer was triggered there was 4GB out of 36GB free in zone 1.  The
problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
hold a bitmap, and there was sufficient fragmentation that the largest
page available was 8KiB.

I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
but I do agree that order 3 allocations are very likely to succeed.

There are always pathologies where order > 0 allocations can fail when
there are copious amounts of free memory available.  Using the pigeon
hole principle it is easy to show that it requires 1 page more than 50%
of the pages being free to guarantee an order 1 (8KiB) allocation will
succeed, 1 page more than 75% of the pages being free to guarantee an
order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
the pages being free to guarantee an order 3 allocate will succeed.

A server churning memory with a lot of small requests and replies like
memcached is a common case that if anything can will skew the odds
against large pages being available.

Therefore let's not give external applications a practical way to kill
linux server applications, and specify __GFP_NORETRY to the kmalloc in
alloc_fdmem.  Unless I am misreading the code and by the time the code
reaches should_alloc_retry in __alloc_pages_slowpath (where
__GFP_NORETRY becomes signification).  We have already tried everything
reasonable to allocate a page and the only thing left to do is wait.  So
not waiting and falling back to vmalloc immediately seems like the
reasonable thing to do even if there wasn't a chance of triggering the
OOM killer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Greg Pearson 38dfac843c vmcore: prevent PT_NOTE p_memsz overflow during header update
Currently, update_note_header_size_elf64() and
update_note_header_size_elf32() will add the size of a PT_NOTE entry to
real_sz even if that causes real_sz to exceeds max_sz.  This patch
corrects the while loop logic in those routines to ensure that does not
happen and prints a warning if a PT_NOTE entry is dropped.  If zero
PT_NOTE entries are found or this condition is encountered because the
only entry was dropped, a warning is printed and an error is returned.

One possible negative side effect of exceeding the max_sz limit is an
allocation failure in merge_note_headers_elf64() or
merge_note_headers_elf32() which would produce console output such as
the following while booting the crash kernel.

  vmalloc: allocation failure: 14076997632 bytes
  swapper/0: page allocation failure: order:0, mode:0x80d2
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-gbp1 #7
  Call Trace:
    dump_stack+0x19/0x1b
    warn_alloc_failed+0xf0/0x160
    __vmalloc_node_range+0x19e/0x250
    vmalloc_user+0x4c/0x70
    merge_note_headers_elf64.constprop.9+0x116/0x24a
    vmcore_init+0x2d4/0x76c
    do_one_initcall+0xe2/0x190
    kernel_init_freeable+0x17c/0x207
    kernel_init+0xe/0x180
    ret_from_fork+0x7c/0xb0

  Kdump: vmcore not initialized

  kdump: dump target is /dev/sda4
  kdump: saving to /sysroot//var/crash/127.0.0.1-2014.01.28-13:58:52/
  kdump: saving vmcore-dmesg.txt
  Cannot open /proc/vmcore: No such file or directory
  kdump: saving vmcore-dmesg.txt failed
  kdump: saving vmcore
  kdump: saving vmcore failed

This type of failure has been seen on a four socket prototype system
with certain memory configurations.  Most PT_NOTE sections have a single
entry similar to:

  n_namesz = 0x5
  n_descsz = 0x150
  n_type   = 0x1

Occasionally, a second entry is encountered with very large n_namesz and
n_descsz sizes:

  n_namesz = 0x80000008
  n_descsz = 0x510ae163
  n_type   = 0x80000008

Not yet sure of the source of these extra entries, they seem bogus, but
they shouldn't cause crash dump to fail.

Signed-off-by: Greg Pearson <greg.pearson@hp.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:40 -08:00
Steve French 42eacf9e57 [CIFS] Fix cifsacl mounts over smb2 to not call cifs
When mounting with smb2/smb3 (e.g. vers=2.1) and cifsacl mount option,
it was trying to get the mode by querying the acl over the cifs
rather than smb2 protocol.  This patch makes that protocol
independent and makes cifsacl smb2 mounts return a more intuitive
operation not supported error (until we add a worker function
for smb2_get_acl).

Note that a previous patch fixed getxattr/setxattr for the CIFSACL xattr
which would unconditionally call cifs_get_acl and cifs_set_acl (even when
mounted smb2). I made those protocol independent last week (new protocol
version operations "get_acl" and "set_acl" but did not add an
smb2_get_acl and smb2_set_acl yet so those now simply return EOPNOTSUPP
which at least is better than sending cifs requests on smb2 mount)

The previous patches did not fix the one remaining case though ie
mounting with "cifsacl" when getting mode from acl would unconditionally
end up calling "cifs_get_acl_from_fid" even for smb2 - so made that protocol
independent but to make that protocol independent had to make sure that the callers
were passing the protocol independent handle structure (cifs_fid) instead
of cifs specific _u16 network file handle (ie cifs_fid instead of cifs_fid->fid)

Now mount with smb2 and cifsacl mount options will return EOPNOTSUP (instead
of timing out) and a future patch will add smb2 operations (e.g. get_smb2_acl)
to enable this.

Signed-off-by: Steve French <smfrench@gmail.com>
2014-02-10 14:08:16 -06:00
Linus Torvalds cbf2822a7d Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Small fix from Jeff for writepages leak, and some fixes for ACLs and
  xattrs when SMB2 enabled.

  Am expecting another fix from Jeff and at least one more fix (for
  mounting SMB2 with cifsacl) in the next week"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] clean up page array when uncached write send fails
  cifs: use a flexarray in cifs_writedata
  retrieving CIFS ACLs when mounted with SMB2 fails dropping session
  Add protocol specific operation for CIFS xattrs
2014-02-10 10:33:50 -08:00
Trond Myklebust fd1defc257 NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS
Commit aa9c266962 (NFS: Client implementation of Labeled-NFS) introduces
a performance regression. When nfs_zap_caches_locked is called, it sets
the NFS_INO_INVALID_LABEL flag irrespectively of whether or not the
NFS server supports security labels. Since that flag is never cleared,
it means that all calls to nfs_revalidate_inode() will now trigger
an on-the-wire GETATTR call.

This patch ensures that we never set the NFS_INO_INVALID_LABEL unless the
server advertises support for labeled NFS.
It also causes nfs_setsecurity() to clear NFS_INO_INVALID_LABEL when it
has successfully set the security label for the inode.
Finally it gets rid of the NFS_INO_INVALID_LABEL cruft from nfs_update_inode,
which has nothing to do with labeled NFS.

Reported-by: Neil Brown <neilb@suse.de>
Cc: stable@vger.kernel.org # 3.11+
Tested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-10 08:44:12 -05:00
Linus Torvalds f94aa7c7f1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A couple of fixes, both -stable fodder.  The O_SYNC bug is fairly
  old..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix a kmap leak in virtio_console
  fix O_SYNC|O_APPEND syncing the wrong range on write()
2014-02-09 18:12:07 -08:00
Dave Chinner 3895e51f6d xfs: ensure correct log item buffer alignment
On 32 bit platforms, the log item vector headers are not 64 bit
aligned or sized. hence if we don't take care to align them
correctly or pad the buffer appropriately for 8 byte alignment, we
can end up with alignment issues when accessing the user buffer
directly as a structure.

To solve this, simply pad the buffer headers to 64 bit offset so
that the data section is always 8 byte aligned.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Tested-by: Michael L. Semon <mlsemon35@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10 10:37:18 +11:00
Christoph Hellwig fe60a8a091 xfs: ensure correct timestamp updates from truncate
The VFS doesn't set the proper ATTR_CTIME and ATTR_MTIME values for
truncate, so filesystems have to manually add them.  The
introduction of xfs_setattr_time accidentally broke this special
case an caused a regression in generic/313.  Fix this by removing
the local mask variable in xfs_setattr_size so that we only have a
single place to keep the attribute information.

cc: <stable@vger.kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10 10:35:22 +11:00
Rashika Kheria 0b4ef8de09 fs: Mark function as static in fs/bio-integrity.c
Mark functions as static in bio-integrity.c because it is not used
outside this file.

This eliminates the following warnings in bio-integrity.c:
fs/bio-integrity.c:224:5: warning: no previous prototype for ‘bio_integrity_tag’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-09 13:56:23 -07:00
Al Viro d311d79de3 fix O_SYNC|O_APPEND syncing the wrong range on write()
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
	pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
	pos_before_write .. pos_before_write + written - 1
instead.  Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().

All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().

The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-09 15:18:09 -05:00
Linus Torvalds 9c1db77981 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This is a small collection of fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix data corruption when reading/updating compressed extents
  Btrfs: don't loop forever if we can't run because of the tree mod log
  btrfs: reserve no transaction units in btrfs_ioctl_set_features
  btrfs: commit transaction after setting label and features
  Btrfs: fix assert screwup for the pending move stuff
2014-02-09 11:12:26 -08:00
Filipe David Borba Manana a2aa75e18a Btrfs: fix data corruption when reading/updating compressed extents
When using a mix of compressed file extents and prealloc extents, it
is possible to fill a page of a file with random, garbage data from
some unrelated previous use of the page, instead of a sequence of zeroes.

A simple sequence of steps to get into such case, taken from the test
case I made for xfstests, is:

   _scratch_mkfs
   _scratch_mount "-o compress-force=lzo"
   $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
   $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

This results in the following file items in the fs tree:

   item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
       inode generation 6 transid 6 size 542872 block group 0 mode 100600
   item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
       inode ref index 2 namelen 6 name: foobar
   item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
       extent data disk byte 0 nr 0 gen 6
       extent data offset 0 nr 24576 ram 266240
       extent compression 0
   item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
       prealloc data disk byte 12849152 nr 241664 gen 6
       prealloc data offset 0 nr 241664
   item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
       extent data disk byte 12845056 nr 4096 gen 6
       extent data offset 0 nr 20480 ram 20480
       extent compression 2
   item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
       prealloc data disk byte 13090816 nr 405504 gen 6
       prealloc data offset 0 nr 258048

The on disk extent at offset 266240 (which corresponds to 1 single disk block),
contains 5 compressed chunks of file data. Each of the first 4 compress 4096
bytes of file data, while the last one only compresses 3024 bytes of file data.
Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
1072 bytes) should always return zeroes (our next extent is a prealloc one).

The solution here is the compression code path to zero the remaining (untouched)
bytes of the last page it uncompressed data into, as the information about how
much space the file data consumes in the last page is not known in the upper layer
fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
the remainder of the page but only if it corresponds to the last page of the inode
and if the inode's size is not a multiple of the page size.

This would cause not only returning random data on reads, but also permanently
storing random data when updating parts of the region that should be zeroed.
For the example above, it means updating a single byte in the region [285648 ; 286720[
would store that byte correctly but also store random data on disk.

A test case for xfstests follows soon.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08 17:57:15 -08:00