Commit Graph

42859 Commits

Author SHA1 Message Date
Chao Yu
e84587250a f2fs: check node id earily when readaheading node page
Add node id check in ra_node_page and get_node_page_ra like get_node_page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:45:16 -08:00
Jeff Layton
b4d629a39e locks: rename __posix_lock_file to posix_lock_inode
...a more descriptive name and we can drop the double underscore prefix.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
2016-01-08 11:38:30 -05:00
Jeff Layton
e24dadab08 locks: prink more detail when there are leaked locks
Right now, we just get WARN_ON_ONCE, which is not particularly helpful.
Have it dump some info about the locks and the inode to make it easier
to track down leaked locks in the future.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
2016-01-08 11:38:25 -05:00
Jeff Layton
f27a0fe083 locks: pass inode pointer to locks_free_lock_context
...so we can print information about it if there are leaked locks.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
2016-01-08 11:38:19 -05:00
Jeff Layton
1890910fd0 locks: sprinkle some tracepoints around the file locking code
Add some tracepoints around the POSIX locking code. These were useful
when tracking down problems when handling the race between setlk and
close.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
2016-01-08 11:38:13 -05:00
Jeff Layton
0752ba807b locks: don't check for race with close when setting OFD lock
We don't clean out OFD locks on close(), so there's no need to check
for a race with them here. They'll get cleaned out at the same time
that flock locks are.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
2016-01-08 11:38:07 -05:00
Jeff Layton
7f3697e24d locks: fix unlock when fcntl_setlk races with a close
Dmitry reported that he was able to reproduce the WARN_ON_ONCE that
fires in locks_free_lock_context when the flc_posix list isn't empty.

The problem turns out to be that we're basically rebuilding the
file_lock from scratch in fcntl_setlk when we discover that the setlk
has raced with a close. If the l_whence field is SEEK_CUR or SEEK_END,
then we may end up with fl_start and fl_end values that differ from
when the lock was initially set, if the file position or length of the
file has changed in the interim.

Fix this by just reusing the same lock request structure, and simply
override fl_type value with F_UNLCK as appropriate. That ensures that
we really are unlocking the lock that was initially set.

While we're there, make sure that we do pop a WARN_ON_ONCE if the
removal ever fails. Also return -EBADF in this event, since that's
what we would have returned if the close had happened earlier.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Fixes: c293621bbf (stale POSIX lock handling)
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
2016-01-07 20:32:48 -05:00
Dave Chinner
e35438196c xfs: bmapbt checking on debug kernels too expensive
For large sparse or fragmented files, checking every single entry in
the bmapbt on every operation is prohibitively expensive. Especially
as such checks rarely discover problems during normal operations on
high extent coutn files. Our regression tests don't tend to exercise
files with hundreds of thousands to millions of extents, so mostly
this isn't noticed.

However, trying to run things like xfs_mdrestore of large filesystem
dumps on a debug kernel quickly becomes impossible as the CPU is
completely burnt up repeatedly walking the sparse file bmapbt that
is generated for every allocation that is made.

Hence, if the file has more than 10,000 extents, just don't bother
with walking the tree to check it exhaustively. The btree code has
checks that ensure that the newly inserted/removed/modified record
is correctly ordered, so the entrie tree walk in thses cases has
limited additional value.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-08 11:28:49 +11:00
Dave Chinner
121e213eab xfs: add tracepoints to readpage calls
This allows us to see page cache driven readahead in action as it
passes through XFS. This helps to understand buffered read
throughput problems such as readahead IO IO sizes being too small
for the underlying device to reach max throughput.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-08 11:28:35 +11:00
Fan Li
de1475cc53 f2fs: read isize while holding i_mutex in fiemap
make sure the isize we read doesn't change during the process.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-06 19:15:49 -08:00
Jaegeuk Kim
957efb0c21 Revert "f2fs: check the node block address of newly allocated nid"
Original issue is fixed by:

  f2fs: cover more area with nat_tree_lock

This reverts commit 24928634f8.
2016-01-06 19:15:49 -08:00
Jaegeuk Kim
a51311938e f2fs: cover more area with nat_tree_lock
There was a subtle bug on nat cache management which incurs wrong nid allocation
or wrong block addresses when try_to_free_nats is triggered heavily.
This patch enlarges the previous coverage of nat_tree_lock to avoid data race.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-06 19:15:48 -08:00
Geliang Tang
43584c1d42 jffs2: use to_delayed_work
Use to_delayed_work() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2016-01-06 15:16:46 -08:00
Dmitry Monakhov
a1c6f05733 fs: use block_device name vsprintf helper
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 13:03:18 -05:00
Dmitry Monakhov
424081f3c8 fs: use gendisk->disk_name where possible
gendisk with part==0 is obviously gendisk->disk_name.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 12:42:11 -05:00
Mateusz Guzik
ccec5ee302 poll: plug an unused argument to do_poll
Number of fds is already known based on passed list.

No functional changes.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 08:26:52 -05:00
Dave Chinner
4922be51ef Merge branch 'xfs-dax-fixes-for-4.5' into for-next 2016-01-05 08:08:47 +11:00
Dave Chinner
7eeabbd4b6 Merge branch 'xfs-misc-fixes-for-4.5' into for-next 2016-01-05 08:08:35 +11:00
Brian Foster
609adfc2ed xfs: debug mode log record crc error injection
XFS now uses CRC verification over a limited section of the log to
detect torn writes prior to a crash. This is difficult to test directly
due to the timing and hardware requirements to cause a short write.

Add a mechanism to inject CRC errors into log records to facilitate
testing torn write detection during log recovery. This mechanism is
dangerous and can result in filesystem corruption. Thus, it is only
available in DEBUG mode for testing/development purposes. Set a non-zero
value to the following sysfs entry to enable error injection:

	/sys/fs/xfs/<dev>/log/log_badcrc_factor

Once enabled, XFS intentionally writes an invalid CRC to a log record at
some random point in the future based on the provided frequency. The
filesystem immediately shuts down once the record has been written to
the physical log to prevent metadata writeback (e.g., AIL insertion)
once the log write completes. This helps reasonably simulate a torn
write to the log as the affected record must be safe to discard. The
next mount after the intentional shutdown requires log recovery and
should detect and recover from the torn write.

Note again that this _will_ result in data loss or worse. For testing
and development purposes only!

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-05 07:41:16 +11:00
Brian Foster
7088c4136f xfs: detect and trim torn writes during log recovery
Certain types of storage, such as persistent memory, do not provide
sector atomicity for writes. This means that if a crash occurs while XFS
is writing log records, only part of those records might make it to the
storage. This is problematic because log recovery uses the cycle value
packed at the top of each log block to locate the head/tail of the log.
This can lead to CRC verification failures during log recovery and an
unmountable fs for a filesystem that is otherwise consistent.

Update log recovery to incorporate log record CRC verification as part
of the head/tail discovery process. Once the head is located via the
traditional algorithm, run a CRC-only pass over the records up to the
head of the log. If CRC verification fails, assume that the records are
torn as a matter of policy and trim the head block back to the start of
the first bad record.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-05 07:40:16 +11:00
Al Viro
80f8dccf95 HFS wants 8Kb per-superblock allocation; just use kmalloc()
... rather than play with __get_free_pages() (and figuring out the
allocation order, etc.)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:29:34 -05:00
Al Viro
76e8d7cb71 jfs: microoptimize get_zeroed_page / virt_to_page
get_zeroed_page does alloc_page and returns page_address of the result;
subsequent virt_to_page will recover the page, but since the caller
needs both page and its page_address() anyway, why bother going through
that wrapper at all?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:29:28 -05:00
Al Viro
4e728cf8ff hpfs: missing endianness annotation
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:29:03 -05:00
Al Viro
62fb4a155f don't carry MAY_OPEN in op->acc_mode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:28:40 -05:00
Al Viro
b40ef8696f saner calling conventions for copy_mount_options()
let it just return NULL, pointer to kernel copy or ERR_PTR().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:28:32 -05:00
Al Viro
bb646cdb12 proc_pid_attr_write(): switch to memdup_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:28:00 -05:00
Al Viro
16e5c1fc36 convert a bunch of open-coded instances of memdup_user_nul()
A _lot_ of ->write() instances were open-coding it; some are
converted to memdup_user_nul(), a lot more remain...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:26:58 -05:00
Al Viro
7e935c7ca1 Merge branch 'memdup_user_nul' into work.misc 2016-01-04 10:25:34 -05:00
Pantelis Antoniou
03607ace80 configfs: implement binary attributes
ConfigFS lacked binary attributes up until now. This patch
introduces support for binary attributes in a somewhat similar
manner of sysfs binary attributes albeit with changes that
fit the configfs usage model.

Problems that configfs binary attributes fix are everything that
requires a binary blob as part of the configuration of a resource,
such as bitstream loading for FPGAs, DTBs for dynamically created
devices etc.

Look at Documentation/filesystems/configfs/configfs.txt for internals
and howto use them.

This patch is against linux-next as of today that contains
Christoph's configfs rework.

Signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
[hch: folded a fix from Geert Uytterhoeven <geert+renesas@glider.be>]
[hch: a few tiny updates based on review feedback]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-01-04 12:31:46 +01:00
Chao Yu
e0afc4d6d0 f2fs: introduce max_file_blocks in sbi
Introduce max_file_blocks in sbi to store max block index of file in f2fs,
it could be used to avoid unneeded calculation of max block index in
runtime.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix overflow of sbi->max_file_blocks]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-03 21:40:04 -08:00
Dave Chinner
a6d7636e8d xfs: fix recursive splice read locking with DAX
Doing a splice read (generic/249) generates a lockdep splat because
we recursively lock the inode iolock in this path:

SyS_sendfile64
do_sendfile
do_splice_direct
splice_direct_to_actor
do_splice_to
xfs_file_splice_read			<<<<<< lock here
default_file_splice_read
vfs_readv
do_readv_writev
do_iter_readv_writev
xfs_file_read_iter			<<<<<< then here

The issue here is that for DAX inodes we need to avoid the page
cache path and hence simply push it into the normal read path.
Unfortunately, we can't tell down at xfs_file_read_iter() whether we
are being called from the splice path and hence we cannot avoid the
locking at this layer. Hence we simply have to drop the inode
locking at the higher splice layer for DAX.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:28:25 +11:00
Dave Chinner
3b0fe47805 xfs: Don't use reserved blocks for data blocks with DAX
Commit 1ca1915 ("xfs: Don't use unwritten extents for DAX") enabled
the DAX allocation call to dip into the reserve pool in case it was
converting unwritten extents rather than allocating blocks. This was
a direct copy of the unwritten extent conversion code, but had an
unintended side effect of allowing normal data block allocation to
use the reserve pool. Hence normal block allocation could deplete
the reserve pool and prevent unwritten extent conversion at ENOSPC,
hence violating fallocate guarantees on preallocated space.

Fix it by checking whether the incoming map from __xfs_get_blocks()
spans an unwritten extent and only use the reserve pool if the
allocation covers an unwritten extent.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:22:45 +11:00
Markus Elfring
a841b64df2 XFS: Use a signed return type for suffix_kstrtoint()
The return type "unsigned long" was used by the suffix_kstrtoint()
function even though it will eventually return a negative error code.
Improve this implementation detail by using the type "int" instead.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:13:21 +11:00
Darrick J. Wong
c5ab131ba0 libxfs: refactor short btree block verification
Create xfs_btree_sblock_verify() to verify short-format btree blocks
(i.e. the per-AG btrees with 32-bit block pointers) instead of
open-coding them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:13:21 +11:00
Darrick J. Wong
96f859d52b libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
inside it, gcc will quietly round the structure size up to the nearest
64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
macro returning incorrect results for v5 filesystems on 64-bit
machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
will see garbage in AGFL item 119 and complain.

Therefore, tell gcc not to pad the structure so that the AGFL size
calculation is correct.

cc: <stable@vger.kernel.org> # 3.10 - 4.4
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:13:21 +11:00
Darrick J. Wong
6d3eb1eca0 libxfs: use a convenience variable instead of open-coding the fork
Use a convenience variable instead of open-coding the inode fork.
This isn't really needed for now, but will become important when we
add the copy-on-write fork later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:12:42 +11:00
Darrick J. Wong
9b434a347c xfs: fix log ticket type printing
Update the log ticket reservation type printing code to reflect
all the types of log tickets, to avoid incorrect debug output and
avoid running off the end of the array.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:11:42 +11:00
Darrick J. Wong
2e9101da60 libxfs: make xfs_alloc_fix_freelist non-static
Since xfs_repair wants to use xfs_alloc_fix_freelist, remove the
static designation.  xfsprogs already has this; this simply brings
the kernel up to date.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:10:42 +11:00
Alexander Kuleshov
211fe1a4db xfs: make xfs_buf_ioend_async() static
There are no callers of the xfs_buf_ioend_async() function outside
of the fs/xfs/xfs_buf.c. So, let's make it static.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:10:42 +11:00
Masatake YAMATO
ffc671f1ea xfs: send warning of project quota to userspace via netlink
Linux's quota subsystem has an ability to handle project quota. This
commit just utilizes the ability from xfs side.  dbus-monitor and
quota_nld shipped as part of quota-tools can be used for testing.
See the patch posting on the XFS list for details on testing.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:10:42 +11:00
Eric Sandeen
f1f96c4946 xfs: get mp from bma->ip in xfs_bmap code
In my earlier commit

  c29aad4 xfs: pass mp to XFS_WANT_CORRUPTED_GOTO

I added some local mp variables with code which indicates that
mp might be NULL.  Coverity doesn't like this now, because the
updated per-fs XFS_STATS macros dereference mp.

I don't think this is actually a problem; from what I can tell,
we cannot get to these functions with a null bma->tp, so my NULL
check was probably pointless.  Still, it's not super obvious.

So switch this code to get mp from the inode on the xfs_bmalloca
structure, with no conditional, because the functions are already
using bmap->ip directly.

Addresses-Coverity-Id: 1339552
Addresses-Coverity-Id: 1339553
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:10:42 +11:00
Eric Sandeen
233135b763 xfs: print name of verifier if it fails
This adds a name to each buf_ops structure, so that if
a verifier fails we can print the type of verifier that
failed it.  Should be a slight debugging aid, I hope.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:10:19 +11:00
Jia He
1d4292bfdc libxfs: Optimize the loop for xfs_bitmap_empty
If there is any non zero bit in a long bitmap, it can jump out of the
loop and finish the function as soon as possible.

Signed-off-by: Jia He <hejianet@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:10:19 +11:00
Brian Foster
eed6b462fb xfs: refactor log record start detection into a new helper
As part of the head/tail discovery process, log recovery locates the
head block and then reverse seeks to find the start of the last active
record in the log. This is non-trivial as the record itself could have
wrapped around the end of the physical log. Log recovery torn write
detection potentially needs to walk further behind the last record in
the log, as multiple log I/Os can be in-flight at one time during a
crash event.

Therefore, refactor the reverse log record header search mechanism into
a new helper that supports the ability to seek past an arbitrary number
of log records (or until the tail is hit). Update the head/tail search
mechanism to call the new helper, but otherwise there is no change in
log recovery behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 15:55:10 +11:00
Brian Foster
6528250b71 xfs: support a crc verification only log record pass
Log recovery torn write detection uses CRC verification over a range of
the active log to identify torn writes. Since the generic log recovery
pass code implements a superset of the functionality required for CRC
verification, it can be easily modified to support a CRC verification
only pass.

Create a new CRC pass type and update the log record processing helper
to skip everything beyond CRC verification when in this mode. This pass
will be invoked in subsequent patches to implement torn write detection.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 15:55:10 +11:00
Brian Foster
d7f37692e3 xfs: return start block of first bad log record during recovery
Each log recovery pass walks from the tail block to the head block and
processes records appropriately based on the associated log pass type.
There are various failure conditions that can occur through this
sequence, such as I/O errors, CRC errors, etc. Log torn write detection
will perform CRC verification near the head of the log to detect torn
writes and trim torn records from the log appropriately.

As it is, xlog_do_recovery_pass() only returns an error code in the
event of CRC failure, which isn't enough information to trim the head of
the log. Update xlog_do_recovery_pass() to optionally return the start
block of the associated record when an error occurs. This patch contains
no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 15:55:10 +11:00
Brian Foster
b94fb2d178 xfs: refactor and open code log record crc check
Log record CRC verification currently occurs during active log recovery,
immediately before a log record is unpacked. Therefore, the CRC
calculation code is buried within the data unpack function. CRC
verification pass support only needs to go so far as check the CRC, but
this is not easily allowed as the code is currently organized.

Since we now have a new log record processing helper, pull the record
CRC verification code out from the unpack helper and open-code it at the
top of the new process helper. This facilitates the ability to modify
how records are processed based on the type of the current pass. This
patch contains no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 15:55:10 +11:00
Brian Foster
9d94901f6e xfs: refactor log record unpack and data processing
xlog_do_recovery_pass() duplicates a couple function calls related to
processing log records because the function must handle wrapping around
the end of the log if the head is behind the tail. This is implemented
as separate loops. CRC verification pass support will modify how records
are processed in both of these loops.

Rather than continue to duplicate code, factor the calls that process a
log record into a new helper and call that helper from both loops. This
patch contains no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 15:55:10 +11:00
Brian Foster
a70f9fe52d xfs: detect and handle invalid iclog size set by mkfs
XFS log records have separate fields for the record size and the iclog
size used to write the record. mkfs.xfs zeroes the log and writes an
unmount record to generate a clean log for the subsequent mount. The
userspace record logging code has a bug where the iclog size (h_size)
field of the log record is hardcoded to 32k, even if a log stripe unit
is specified. The log record length is correctly extended to the stripe
unit. Since the kernel log recovery code uses the h_size field to
determine the log buffer size, this means that the kernel can attempt to
read/process records larger than the buffer size and overrun the buffer.

This has historically not been a problem because the kernel doesn't
actually run through log recovery in the clean unmount case. Instead,
the kernel detects that a single unmount record exists between the head
and tail and pushes the tail forward such that the log is viewed as
clean (head == tail). Once CRC verification is enabled, however, all
records at the head of the log are verified for CRC errors and thus we
are susceptible to overrun problems if the iclog field is not correct.

While the core problem must be fixed in userspace, this is historical
behavior that must be detected in the kernel to avoid severe side
effects such as memory corruption and crashes. Update the log buffer
size calculation code to detect this condition, warn the user and resize
the log buffer based on the log stripe unit. Return a corruption error
in cases where this does not look like a clean filesystem (i.e., the log
record header indicates more than one operation).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 15:55:10 +11:00
Darrick J. Wong
2b3909f8a7 btrfs: use new dedupe data function pointer
Now that the VFS encapsulates the dedupe ioctl, wire up btrfs to it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-01 02:36:40 -05:00
Darrick J. Wong
54dbc15172 vfs: hoist the btrfs deduplication ioctl to the vfs
Hoist the btrfs EXTENT_SAME ioctl up to the VFS and make the name
more systematic (FIDEDUPERANGE).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-01 02:36:19 -05:00
Darrick J. Wong
d79bdd52d8 vfs: wire up compat ioctl for CLONE/CLONE_RANGE
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-01 02:36:02 -05:00
Chao Yu
3a9e6433a3 f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink
Add missed CONFIG_F2FS_FS_XATTR for encrypted symlink inode in order
to avoid unneeded registry of ->{get,set,remove}xattr.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 18:54:50 -08:00
Jaegeuk Kim
137d09f002 f2fs: introduce zombie list for fast shrinking extent trees
This patch removes refcount, and instead, adds zombie_list to shrink directly
without radix tree traverse.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 15:39:22 -08:00
Jaegeuk Kim
c00ba55485 f2fs: monitor zombie_tree count
This patch adds an entry to show the number of zombie extent_tree.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 15:33:00 -08:00
David S. Miller
c07f30ad68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-31 18:20:10 -05:00
Jaegeuk Kim
c46a155bdf f2fs: use IPU for fdatasync
This patch fixes missing IPU condition when fdatasync is called.
With this patch, fdatasync is able to avoid additional node writes for recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 13:49:17 -08:00
Jaegeuk Kim
8d4ea29b64 f2fs: write pending bios when cp_error is set
When testing ioc_shutdown, put_super is able to be hanged by waiting for
writebacking pages as follows.

INFO: task umount:2723 blocked for more than 120 seconds.
      Tainted: G           O    4.4.0-rc3+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount          D ffff88000859f9d8     0  2723   2110 0x00000000
 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540
 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff
 ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c
Call Trace:
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8182310c>] schedule+0x3c/0x90
 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430
 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0
 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30
 [<ffffffff8111617c>] ? ktime_get+0xac/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110
 [<ffffffff81823a25>] bit_wait_io+0x35/0x50
 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90
 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0
 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840
 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60
 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs]
 [<ffffffff812639bc>] evict+0xbc/0x190
 [<ffffffff81263d19>] iput+0x229/0x2c0
 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs]
 [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0
 [<ffffffff812478f7>] kill_block_super+0x27/0x70
 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs]
 [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70
 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60
 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90
 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20
 [<ffffffff810ac463>] task_work_run+0x73/0xa0
 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0
 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0
 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 13:08:02 -08:00
Jaegeuk Kim
1f6fa26199 f2fs: remove f2fs_bug_on in terms of max_depth
There is no report on this bug_on case, but if malicious attacker changed this
field intentionally, we can just reset it as a MAX value.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31 11:42:46 -08:00
Jaegeuk Kim
732d56489f f2fs: fix f2fs_ioc_abort_volatile_write
There are two rules to handle aborting volatile or atomic writes.

1. drop atomic writes
 - we don't need to keep any stale db data.

2. write journal data
 - we should keep the journal data with fsync for db recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:53:25 -08:00
Chao Yu
4e0d836d5f f2fs: fix to skip recovering dot dentries in a readonly fs
If filesystem is readonly, leave user message info instead of recovering
inline dot inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:51:28 -08:00
Jaegeuk Kim
ed3d12561a f2fs: load largest extent all the time
Otherwise, we can get mismatched largest extent information.

One example is:
1. mount f2fs w/ extent_cache
2. make a small extent
3. umount
4. mount f2fs w/o extent_cache
5. update the largest extent
6. umount
7. mount f2fs w/ extent_cache
8. get the old extent made by #2

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:20 -08:00
Jaegeuk Kim
819d9153d4 f2fs: use i_size_read to get i_size
We need to use i_size_read() to get inode->i_size.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:19 -08:00
Jaegeuk Kim
8dc0d6a11e f2fs: early check broken symlink length in the encrypted case
If link is broken, its len is zero, and we don't need to move forward.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:19 -08:00
Chao Yu
e96248bb45 f2fs: clean up f2fs_ioc_write_checkpoint
Use f2fs_sync_fs to clean up codes in f2fs_ioc_write_checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove unused err variable]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:18 -08:00
Yunlei He
179448bfe4 f2fs: add a max block check for get_data_block_bmap
This patch adds a max block check for get_data_block_bmap.

Trinity test program will send a block number as parameter into
ioctl_fibmap, which will be used in get_node_path(), when the block
number large than f2fs max blocks, it will trigger kernel bug.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com>
[Jaegeuk Kim: fix missing condition, pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:17 -08:00
Fan Li
9a950d52b7 f2fs: fix bugs and simplify codes of f2fs_fiemap
fix bugs:
1. len could be updated incorrectly when start+len is beyond isize.
2. If there is a hole consisting of more than two blocks, it could
   fail to add FIEMAP_EXTENT_LAST flag for the last extent.
3. If there is an extent beyond isize, when we search extents in a range
   that ends at isize, it will also return the extent beyond isize,
   which is outside the range.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:16 -08:00
Chao Yu
6d5a1495ee f2fs: let user being aware of IO error
Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.

This sould be avoided, so this patch reports such kind of error to user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:15 -08:00
Chao Yu
d53841740f f2fs: add missing f2fs_balance_fs in __recover_dot_dentries
__recover_do_dentries will try to grab free space in storage, so fix to
add missing f2fs_balance_fs here.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:14 -08:00
Jaegeuk Kim
06d6f22639 f2fs: declare static function
The __f2fs_commit_super is static.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:14 -08:00
Jaegeuk Kim
b4d07a3e1a f2fs: avoid f2fs_lock_op in f2fs_write_begin
If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in
order to avoid the checkpoint latency in the write syscall.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:13 -08:00
Jaegeuk Kim
4aa69d5667 f2fs: return early when trying to read null nid
If get_node_page() gets zero nid, we can return early without getting a wrong
page. For example, get_dnode_of_data() can try to do that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:12 -08:00
Jaegeuk Kim
2aadac085c f2fs: introduce prepare_write_begin to clean up
This patch adds prepare_write_begin to clean f2fs_write_begin.
The major role of this function is to convert any inline_data and allocate
or find block address.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:11 -08:00
Chao Yu
fba48a8b14 f2fs: don't convert inline inode when inline_data option is disable
If inline_data option is disable, when truncating an inline inode with
size which is not exceed maxinum inline size, we should not convert
inline inode to regular one to avoid the overhead of synchronizing
conversion.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:10 -08:00
Chao Yu
c34f42e2cb f2fs: report error of do_checkpoint
do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.

So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:09 -08:00
Jaegeuk Kim
2a34076070 f2fs: call f2fs_balance_fs only when node was changed
If user tries to update or read data, we don't need to call f2fs_balance_fs
which triggers f2fs_gc, which increases unnecessary long latency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:09 -08:00
Chao Yu
3104af35eb f2fs: reduce covered region of sbi->cp_rwsem in f2fs_map_blocks
Only cover sbi->cp_rwsem on one dnode page's allocation and modification
instead of multiple's in f2fs_map_blocks, it can reduce the covered region
of cp_rwsem, then we can avoid potential long time delay for concurrent
checkpointer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:08 -08:00
Jaegeuk Kim
93bae099ea f2fs: record node block allocation in dnode_of_data
This patch introduces recording node block allocation in dnode_of_data.
This information helps to figure out whether any node block is allocated during
specific file operations.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:07 -08:00
Jaegeuk Kim
00623e6bcf f2fs: avoid unnecessary f2fs_gc for dir operations
The f2fs_balance_fs doesn't need to cover f2fs_new_inode or f2fs_find_entry
works.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:06 -08:00
Jaegeuk Kim
b9d777b85f f2fs: check inline_data flag at converting time
We can check inode's inline_data flag  when calling to convert it.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:14:05 -08:00
Jaegeuk Kim
74fd8d9927 f2fs: speed up shrinking extent tree entries
If there is no candidates for shrinking slab entries, we don't need to traverse
any trees at all.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix missing initialization reported by Yunlei He]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30 10:13:00 -08:00
Al Viro
fceef393a5 switch ->get_link() to delayed_call, kill ->put_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30 13:01:03 -05:00
xuejiufei
cc28d6d80f ocfs2/dlm: clear migration_pending when migration target goes down
We have found a BUG on res->migration_pending when migrating lock
resources.  The situation is as follows.

dlm_mark_lockres_migration
  res->migration_pending = 1;
  __dlm_lockres_reserve_ast
  dlm_lockres_release_ast returns with res->migration_pending remains
      because other threads reserve asts
  wait dlm_migration_can_proceed returns 1
  >>>>>>> o2hb found that target goes down and remove target
          from domain_map
  dlm_migration_can_proceed returns 1
  dlm_mark_lockres_migrating returns -ESHOTDOWN with
      res->migration_pending still remains.

When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
with res->migration_pending.  So clear migration_pending when target is
down.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-29 17:45:49 -08:00
Junxiao Bi
b5a8bc338e ocfs2: fix flock panic issue
Commit 4f6563677a ("Move locks API users to locks_lock_inode_wait()")
move flock/posix lock indentify code to locks_lock_inode_wait(), but
missed to set fl_flags to FL_FLOCK which caused the following kernel
panic on 4.4.0_rc5.

  kernel BUG at fs/locks.c:1895!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ocfs2(O) ocfs2_dlmfs(O) ocfs2_stack_o2cb(O) ocfs2_dlm(O) ocfs2_nodemanager(O) ocfs2_stackglue(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront
  CPU: 0 PID: 20268 Comm: flock_unit_test Tainted: G           O    4.4.0-rc5-next-20151217 #1
  Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
  task: ffff88007b3672c0 ti: ffff880028b58000 task.ti: ffff880028b58000
  RIP: locks_lock_inode_wait+0x2e/0x160
  Call Trace:
    ocfs2_do_flock+0x91/0x160 [ocfs2]
    ocfs2_flock+0x76/0xd0 [ocfs2]
    SyS_flock+0x10f/0x1a0
    entry_SYSCALL_64_fastpath+0x12/0x71
  Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 48 81 ec 88 00 00 00 8b 46 40 83 e0 03 83 f8 01 0f 84 ad 00 00 00 83 f8 02 74 04 <0f> 0b eb fe 4c 8d ad 60 ff ff ff 4c 8d 7b 58 e8 0e 8e 73 00 4d
  RIP  locks_lock_inode_wait+0x2e/0x160
   RSP <ffff880028b5bce8>
  ---[ end trace dfca74ec9b5b274c ]---

Fixes: 4f6563677a ("Move locks API users to locks_lock_inode_wait()")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-29 17:45:49 -08:00
Joseph Qi
5c9ee4cbf2 ocfs2: fix BUG when calculate new backup super
When resizing, it firstly extends the last gd.  Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.

But it currently doesn't consider the situation that the backup super is
already done.  And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:

    BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);

So check whether the backup super is done and then do the updates.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-29 17:45:49 -08:00
Al Viro
cd3417c8fc kill free_page_put_link()
all callers are better off with kfree_put_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-29 16:03:53 -05:00
Al Viro
b25472f9b9 new helpers: no_seek_end_llseek{,_size}()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-23 10:41:31 -05:00
Linus Torvalds
0bee6ec80b Just one fix for a NFSv4 callback bug introduced in 4.4.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWeF4NAAoJECebzXlCjuG+PvcQAL3AvxDzDnaNFhZJgWZMnRyC
 OlXlPE4clfiFXSB7C39xNBcn7eCJYLkINCQLu4ywAS+y7/22sX7unCTt7UXL99K3
 GffV/QvxOatSssik+CtS9gIMkRLW9Fs6fuQZ4k5w+UtveISpyFoRfw8hbISABL1w
 NtgGIESXL8WXO+OSbVF/wRV8g1+FVi/gXWAOAoUtHBzyUho2JfECXO1XYz6mQ44M
 HN4Bvx75dU3SieECHRKsq8yRbkPYHP9ron/+MskBZm7VkV/6mboFlFfivNncid0Y
 ivpjeYP5xTj4KoXPlQ3feA9AbADNVshAKeDQYpDxRJimMjr6VVRFVDpzbKJc+5ou
 if9AjZUiX02mHZShKMDsJR3kHBu+OzWLtQIDJUtLTIAaeb+V/2NEScnCyCIibXv7
 l52zqJ7upEYFuUGFYIZgsEKZgOAm7e3appIAtGG5Nt9ejUVR1LVPfsa8u2xXhUgp
 FN1TLmeQw6ZLRXcXa7vHcyQh/gJbPsm3PH514QYS+G3nMyXG8XnYKlMe98uhReno
 A3MH5MxfgyiuUITJopVpZfKoEFpYcid21osmVqiZfawoxr4iTocogDArETW7prCL
 QjN9sF+drlG70m/unDBKpQMPI0fhlmjY/VrK9YNlgvNaYKsJFVJnVFE1rCOuzj01
 ekT3egZmGUR7kX94DuTt
 =UJhV
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.4-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fix from Bruce Fields:
 "Just one fix for a NFSv4 callback bug introduced in 4.4"

* tag 'nfsd-4.4-1' of git://linux-nfs.org/~bfields/linux:
  nfsd: don't hold ls_mutex across a layout recall
2015-12-22 15:52:32 -08:00
Jaegeuk Kim
7441ccef33 f2fs: use atomic variable for total_extent_tree
It would be better to use atomic variable for total_extent_tree.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-22 10:31:41 -08:00
Junxiao Bi
a93a998382 gfs2: fix flock panic issue
Commit 4f6563677a ("Move locks API users to locks_lock_inode_wait()")
moved flock/posix lock identify code to locks_lock_inode_wait(), but
missed to set fl_flags to FL_FLOCK which will cause kernel panic in
locks_lock_inode_wait().

Fixes: 4f6563677a ("Move locks API users to locks_lock_inode_wait()")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-22 08:06:08 -06:00
Borislav Petkov
362f924b64 x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.

The remaining ones need more careful inspection before a conversion can
happen. On the TODO.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:55 +01:00
Linus Torvalds
fc315e3e5c Merge branch 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "A couple of small fixes"

* 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: check prepare_uptodate_page() error code earlier
  Btrfs: check for empty bitmap list in setup_cluster_bitmaps
  btrfs: fix misleading warning when space cache failed to load
  Btrfs: fix transaction handle leak in balance
  Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list
2015-12-18 15:35:08 -08:00
Colin Ian King
41a0c249cb proc: fix -ESRCH error when writing to /proc/$pid/coredump_filter
Writing to /proc/$pid/coredump_filter always returns -ESRCH because commit
774636e19e ("proc: convert to kstrto*()/kstrto*_from_user()") removed
the setting of ret after the get_proc_task call and incorrectly left it as
-ESRCH.  Instead, return 0 when successful.

Example breakage:

  echo 0 > /proc/self/coredump_filter
  bash: echo: write error: No such process

Fixes: 774636e19e ("proc: convert to kstrto*()/kstrto*_from_user()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-18 14:25:40 -08:00
Bob Peterson
6cc4b6e801 GFS2: Don't do glock put on when inode creation fails
Currently the error path of function gfs2_inode_lookup calls function
gfs2_glock_put corresponding to an earlier call to gfs2_glock_get for
the inode glock. That's wrong because the error path also calls
iget_failed() which eventually calls iput, which eventually calls
gfs2_evict_inode, which does another gfs2_glock_put. This double-put
can cause the glock reference count to get off.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-18 11:04:46 -06:00
Bob Peterson
5ea31bc0a6 GFS2: Always use iopen glock for gl_deletes
Before this patch, when function try_rgrp_unlink queued a glock for
delete_work to reclaim the space, it used the inode glock to do so.
That's different from the iopen callback which uses the iopen glock
for the same purpose. We should be consistent and always use the
iopen glock. This may also save us reference counting problems with
the inode glock, since clear_glock does an extra glock_put() for the
inode glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-18 11:02:52 -06:00
Bob Peterson
783013c0f5 GFS2: Release iopen glock in gfs2_create_inode error cases
Some error cases in gfs2_create_inode were not unlocking the iopen
glock, getting the reference count off. This adds the proper unlock.
The error logic in function gfs2_create_inode was also convoluted,
so this patch simplifies it. It also takes care of a bug in
which gfs2_qa_delete() was not called in an error case.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-18 10:57:21 -06:00
Bob Peterson
ee530beafe GFS2: Truncate address space mapping when deleting an inode
In function gfs2_delete_inode() we write and flush the mapping for
a glock, among other things. We truncate the mapping for the inode,
but we never truncate the mapping for the glock. This patch makes it
also truncate the metamapping. This avoid cases where the glock is
reused by another process who is trying to recreate an inode in its
place using the same block.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-12-18 10:52:21 -06:00
Bob Peterson
86d067a797 GFS2: Wait for iopen glock dequeues
This patch changes every glock_dq for iopen glocks into a dq_wait.
This makes sure that iopen glocks do not outlive the inode itself.
In turn, that ensures that anyone trying to unlink the glock will
be able to find the inode when it receives a remote iopen callback.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-12-18 10:49:22 -06:00
Paul Gortmaker
9189922675 fs: make locks.c explicitly non-modular
The Kconfig currently controlling compilation of this code is:

config FILE_LOCKING
     bool "Enable POSIX file locking API" if EXPERT

...meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modularity so that when reading the
driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering gets bumped to one level earlier when we
use the more appropriate fs_initcall here.  However we've made similar
changes before without any fallout and none is expected here either.

Cc: Jeff Layton <jlayton@poochiereds.net>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-12-18 07:05:06 -05:00
David S. Miller
b3e0d3d7ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-17 22:08:28 -05:00