Commit Graph

53568 Commits

Author SHA1 Message Date
David Howells 0c3a5ac281 afs: Make it possible to get the data version in readpage
Store the data version number indicated by an FS.FetchData op into the read
request structure so that it's accessible by the page reader.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:56 +01:00
David Howells 5800db810a afs: Init inode before accessing cache
We no longer parse symlinks when we get the inode to determine if this
symlink is actually a mountpoint as we detect that by examining the mode
instead (symlinks are always 0777 and mountpoints 0644).

Access the cache after mapping the status so that we don't have to manually
set the inode size now.

Note that this may need adjusting if the disconnected operation is
implemented as the file metadata may have to be obtained from the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:55 +01:00
David Howells d55b4da433 afs: Introduce a statistics proc file
Introduce a proc file that displays a bunch of statistics for the AFS
filesystem in the current network namespace.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:54 +01:00
David Howells 888b338461 afs: Dump bad status record
Dump an AFS FileStatus record that is detected as invalid.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:52 +01:00
David Howells 37ab636880 afs: Implement @cell substitution handling
Implement @cell substitution handling such that if @cell is seen as a name
in a dynamic root mount, then the name of the root cell for that network
namespace will be substituted for @cell during lookup.

The substitution of @cell for the current net namespace is set by writing
the cell name to /proc/fs/afs/rootcell.  The value can be obtained by
reading the file.

For example:

	# mount -t afs none /kafs -o dyn
	# echo grand.central.org >/proc/fs/afs/rootcell
	# ls /kafs/@cell
	archive/  cvs/  doc/  local/  project/  service/  software/  user/  www/
	# cat /proc/fs/afs/rootcell
	grand.central.org

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:18:58 +01:00
David Howells 6f8880d8e6 afs: Implement @sys substitution handling
Implement the AFS feature by which @sys at the end of a pathname component
may be substituted for one of a list of values, typically naming the
operating system.  Up to 16 alternatives may be specified and these are
tried in turn until one works.  Each network namespace has[*] a separate
independent list.

Upon creation of a new network namespace, the list of values is
initialised[*] to a single OpenAFS-compatible string representing arch type
plus "_linux26".  For example, on x86_64, the sysname is "amd64_linux26".

[*] Or will, once network namespace support is finalised in kAFS.

The list may be set by:

	# for i in foo bar linux-x86_64; do echo $i; done >/proc/fs/afs/sysname

for which separate writes to the same fd are amalgamated and applied on
close.  The LF character may be used as a separator to specify multiple
items in the same write() call.

The list may be cleared by:

	# echo >/proc/fs/afs/sysname

and read by:

	# cat /proc/fs/afs/sysname
	foo
	bar
	linux-x86_64

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells 5cf9dd55a0 afs: Prospectively look up extra files when doing a single lookup
When afs_lookup() is called, prospectively look up the next 50 uncached
fids also from that same directory and cache the results, rather than just
looking up the one file requested.

This allows us to use the FS.InlineBulkStatus RPC op to increase efficiency
by fetching up to 50 file statuses at a time.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells 17814aef57 afs: Don't over-increment the cell usage count when pinning it
AFS cells that are added or set as the workstation cell through /proc are
pinned against removal by setting the AFS_CELL_FL_NO_GC flag on them and
taking a ref.  The ref should be only taken if the flag wasn't already set.

Fix this by making it conditional.

Without this an assertion failure will occur during module removal
indicating that the refcount is too elevated.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells fe342cf77b afs: Fix checker warnings
Fix warnings raised by checker, including:

 (*) Warnings raised by unequal comparison for the purposes of sorting,
     where the endianness doesn't matter:

fs/afs/addr_list.c:246:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:246:30: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:248:21: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:248:49: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:283:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:283:30: warning: restricted __be16 degrades to integer

 (*) afs_set_cb_interest() is not actually used and can be removed.

 (*) afs_cell_gc_delay() should be provided with a sysctl.

 (*) afs_cell_destroy() needs to use rcu_access_pointer() to read
     cell->vl_addrs.

 (*) afs_init_fs_cursor() should be static.

 (*) struct afs_vnode::permit_cache needs to be marked __rcu.

 (*) afs_server_rcu() needs to use rcu_access_pointer().

 (*) afs_destroy_server() should use rcu_access_pointer() on
     server->addresses as the server object is no longer accessible.

 (*) afs_find_server() casts __be16/__be32 values to int in order to
     directly compare them for the purpose of finding a match in a list,
     but is should also annotate the cast with __force to avoid checker
     warnings.

 (*) afs_check_permit() accesses vnode->permit_cache outside of the RCU
     readlock, though it doesn't then access the value; the extraneous
     access is deleted.

False positives:

 (*) Conditional locking around the code in xdr_decode_AFSFetchStatus.  This
     can be dealt with in a separate patch.

fs/afs/fsclient.c:148:9: warning: context imbalance in 'xdr_decode_AFSFetchStatus' - different lock contexts for basic block

 (*) Incorrect handling of seq-retry lock context balance:

fs/afs/inode.c:455:38: warning: context imbalance in 'afs_getattr' - different
lock contexts for basic block
fs/afs/server.c:52:17: warning: context imbalance in 'afs_find_server' - different lock contexts for basic block
fs/afs/server.c:128:17: warning: context imbalance in 'afs_find_server_by_uuid' - different lock contexts for basic block

Errors:

 (*) afs_lookup_cell_rcu() needs to break out of the seq-retry loop, not go
     round again if it successfully found the workstation cell.

 (*) Fix UUID decode in afs_deliver_cb_probe_uuid().

 (*) afs_cache_permit() has a missing rcu_read_unlock() before one of the
     jumps to the someone_else_changed_it label.  Move the unlock to after
     the label.

 (*) afs_vl_get_addrs_u() is using ntohl() rather than htonl() when
     encoding to XDR.

 (*) afs_deliver_yfsvl_get_endpoints() is using htonl() rather than ntohl()
     when decoding from XDR.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
Linus Torvalds fd3b36d275 Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs namei updates from Al Viro:

 - make lookup_one_len() safe with parent locked only shared(incoming
   afs series wants that)

 - fix of getname_kernel() regression from 2015 (-stable fodder, that
   one).

* 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  getname_kernel() needs to make sure that ->name != ->iname in long case
  make lookup_one_len() safe to use with directory locked shared
  new helper: __lookup_slow()
  merge common parts of lookup_one_len{,_unlocked} into common helper
2018-04-09 12:48:05 -07:00
Linus Torvalds 8ea4a5d84e orangefs: fixes and cleanups
+ Documentation cleanups
  + removal of unused code
  + cause some structs to be static
  + implement Orangefs vm_operations fault callout
  + eliminate two single-use functions and put their cleaned up code in line.
  + replace a vmalloc/memset instance with vzalloc
  + fix a race condition bug in wait code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJayk8pAAoJEM9EDqnrzg2+92IP/2AqMOUodqF+AfKrAebQIAEk
 DvQefRC2eQXN2Ak8KYSO7oMdqfY7xeVSaKllWsSHdnktI5L9l6AOBa7uYQYF4OQt
 sqT5Wj5sbnDS54gOfFt95sUHEQXs5wegTXBE15t4uxaLSpv22mPjzqlTzyZcK7IX
 nhSIxEi+utXOW1WodT2xLgS08ac693KXciNNN14NKi5DJ+h6a8Z2kZBq/6ssC1oV
 nW2tIfRVWNVk6cfJ+Wq12JksEZOPemzjv2zojv3PZwCcwObOyxvf16nxnLjdkJvw
 obYWmR8ZcF3TItBbkxOTfanr+UAhZ55YVYDpitJmV9fpPM9eoPAR0gNWBgq45kaS
 s27pvMYf5uvGRJHgEO6hBpmi1dExDDMLfH6Y4IN+zyHKjNfgvfQKwIkzrjlmG6q6
 AgO3mbbMJfid2x7lkShuP1uYqUfv03fo2xW3zxWjjxo3tEbADSgzJQ5kjPixDvcB
 0ru2SlRVqgYIFqBN8VgWkXKbfcpSEcDITmFEkBKg/SkA08ry4iLk744UqIPa5n76
 5DYB0u99nHPMf2BBVsz6L/mUfVuHBYUBx1iII95PqJhVlj2Yi0sIj3AHUzQ4neKS
 YWGMk4VQ+B5zCObSohCRv9l3ECUqugpiJzCuZjZZS5LUkj9RO5EJ1LDhOU/31xDL
 OUTgZl7f5NykxwIHh3bD
 =zJ1Z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Fixes and cleanups:

   - Documentation cleanups

   - removal of unused code

   - make some structs static

   - implement Orangefs vm_operations fault callout

   - eliminate two single-use functions and put their cleaned up code in
     line.

   - replace a vmalloc/memset instance with vzalloc

   - fix a race condition bug in wait code"

* tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  Orangefs: documentation updates
  orangefs: document package install and xfstests procedure
  orangefs: remove unused code
  orangefs: make several *_operations structs static
  orangefs: implement vm_ops->fault
  orangefs: open code short single-use functions
  orangefs: replace vmalloc and memset with vzalloc
  orangefs: bug fix for a race condition when getting a slot
2018-04-09 12:45:47 -07:00
Linus Torvalds 190f2ace0e - Fix another compression Kconfig combination missed in testing (Tobias Regnery)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJayk5UAAoJEIly9N/cbcAmUBoQAK7Z7HEQJed0/Z1/JxW/Fpxz
 iXHD3IC4D7SVn1dhwUe7lkM7n1X+VaZSalXOXTorqt5mXgGkrXa1vaf2itLSafcz
 3/MSN7DvT2yjooBugI6sMwEEg898vdu1FIN/3PugS09y5Uh0tJbEY9s4NXmvaYC7
 Ze0e5CfCrg25Ic9oa4xI2qiVI1OUzoVNEA4S/KeE+WpClrLbxnavMZYtyFqbFfdX
 REARDCxpPH3hk0xKEN1zR1RvoSMBgc1aYdZVHq9Qp+kI73aXcf4UwAGy/a1Jkm8V
 uEkPX3jcKNHeVPmxk5Z/XgyMieDeh85RG+wHCmEeeZinnx0iOVmv0HhM/1T5NJ/A
 shpIOYmC+F+mjZUD/q/WBMH7+VTskYTG6h3avn7skAHtmVGMipVhxuRHbZh/dyMf
 zuKEfo/DloD/fP5wtuUqNjduznMVfnDOazxzogX4omvUeLRQvVsVyfqA3vv2Dv6c
 q+Gx1hBMKTMewlCyDv6EIQZ7l6ejdBnB0ZIxvhY+blp4aJuDA/j7mXbX+/NhwuBg
 QHDj4d2vA/3rgCsknVsVQjaMdX2JagoHI2XM5NNbPO31ki/m/mult/HcusHOYc8U
 5WjxoqqNRUAVjwIGKBMWUNKTIU1XnPOZpqYbsNjM/1INnisX5rhvxOgAvqQIRll2
 MXZYvxGEkmYv4pnnSPlB
 =aZpk
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.17-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore fix from Kees Cook:
 "Fix another compression Kconfig combination missed in testing (Tobias
  Regnery)"

* tag 'pstore-v4.17-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore: fix crypto dependencies without compression
2018-04-09 12:43:18 -07:00
Dan Williams e13e75b86e Merge branch 'for-4.17/dax' into libnvdimm-for-next 2018-04-09 10:50:17 -07:00
Eric Sandeen a1f69417c6 xfs: non-scrub - remove unused function parameters
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-09 10:23:42 -07:00
Christoph Hellwig 7fcd3efa1e xfs: remove filestream item xfs_inode reference
The filestreams allocator stores an xfs_fstrm_item structure in the MRU to
cache inode number to agno mappings for a particular length of time.  Each
xfs_fstrm_item contains the internal MRU structure, an inode pointer and
agno value.

The inode pointer stored in the xfs_fstrm_item is not referenced, however,
which means the inode itself can be removed and reclaimed before the MRU
item is freed. If this occurs, xfs_fstrm_free_func() can access freed or
unrelated memory through xfs_fstrm_item->ip and crash.

The obvious solution is to grab an inode reference for xfs_fstrm_item.
The filestream mechanism only actually uses the inode pointer as a means
to access the xfs_mount, however.  Rather than add unnecessary
complexity, simplify the implementation to store an xfs_mount pointer in
struct xfs_mru_cache, and pass it to the free callback.  This also
requires updates to the tracepoint class to provide the associated data
via parameters rather than the inode and a minor hack to peek at the MRU
key to establish the inode number at free time.

Based on debugging work and an earlier patch from Brian Foster, who
also wrote most of this changelog.

Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-09 10:23:39 -07:00
Jia-Ju Bai 1aa3b3e0cb fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init
dquot_init() is never called in atomic context.
This function is only set as a parameter of fs_initcall().

Despite never getting called from atomic context,
dquot_init() calls __get_free_pages() with GFP_ATOMIC,
which waits busily for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
to avoid busy waiting and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-04-09 17:48:54 +02:00
Amir Goldstein 54a307ba8d fanotify: fix logic of events on child
When event on child inodes are sent to the parent inode mark and
parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event
will not be delivered to the listener process. However, if the same
process also has a mount mark, the event to the parent inode will be
delivered regadless of the mount mark mask.

This behavior is incorrect in the case where the mount mark mask does
not contain the specific event type. For example, the process adds
a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD)
and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR).

A modify event on a file inside that directory (and inside that mount)
should not create a FAN_MODIFY event, because neither of the marks
requested to get that event on the file.

Fixes: 1968f5eed5 ("fanotify: use both marks when possible")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-04-09 16:30:14 +02:00
Al Viro 30ce4d1903 getname_kernel() needs to make sure that ->name != ->iname in long case
missed it in "kill struct filename.separate" several years ago.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-08 11:57:10 -04:00
Linus Torvalds f8cf2f16a7 Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull integrity updates from James Morris:
 "A mixture of bug fixes, code cleanup, and continues to close
  IMA-measurement, IMA-appraisal, and IMA-audit gaps.

  Also note the addition of a new cred_getsecid LSM hook by Matthew
  Garrett:

     For IMA purposes, we want to be able to obtain the prepared secid
     in the bprm structure before the credentials are committed. Add a
     cred_getsecid hook that makes this possible.

  which is used by a new CREDS_CHECK target in IMA:

     In ima_bprm_check(), check with both the existing process
     credentials and the credentials that will be committed when the new
     process is started. This will not change behaviour unless the
     system policy is extended to include CREDS_CHECK targets -
     BPRM_CHECK will continue to check the same credentials that it did
     previously"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  ima: Fallback to the builtin hash algorithm
  ima: Add smackfs to the default appraise/measure list
  evm: check for remount ro in progress before writing
  ima: Improvements in ima_appraise_measurement()
  ima: Simplify ima_eventsig_init()
  integrity: Remove unused macro IMA_ACTION_RULE_FLAGS
  ima: drop vla in ima_audit_measurement()
  ima: Fix Kconfig to select TPM 2.0 CRB interface
  evm: Constify *integrity_status_msg[]
  evm: Move evm_hmac and evm_hash from evm_main.c to evm_crypto.c
  fuse: define the filesystem as untrusted
  ima: fail signature verification based on policy
  ima: clear IMA_HASH
  ima: re-evaluate files on privileged mounted filesystems
  ima: fail file signature verification on non-init mounted filesystems
  IMA: Support using new creds in appraisal policy
  security: Add a cred_getsecid hook
2018-04-07 16:53:59 -07:00
Linus Torvalds 3612605a5a Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull general security layer updates from James Morris:

 - Convert security hooks from list to hlist, a nice cleanup, saving
   about 50% of space, from Sargun Dhillon.

 - Only pass the cred, not the secid, to kill_pid_info_as_cred and
   security_task_kill (as the secid can be determined from the cred),
   from Stephen Smalley.

 - Close a potential race in kernel_read_file(), by making the file
   unwritable before calling the LSM check (vs after), from Kees Cook.

* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security: convert security hooks to use hlist
  exec: Set file unwritable before LSM check
  usb, signal, security: only pass the cred, not the secid, to kill_pid_info_as_cred and security_task_kill
2018-04-07 11:11:41 -07:00
Linus Torvalds 62f8e6c5dc fscache development
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWsdxrvu3V2unywtrAQJVmQ/9Fv8d/Ecdwv5nxVBmN7uA8lOYcHEbZWmd
 FhFQE8qYLjKMo9Fy4tPkBbu1l6CVnetaTRE5qwixACJAftrdjABKJAazGR3Uxief
 0jMSWScrV1XCeRErPcczHcx52Hefl8f1DQdA3zpoF0ewz7CjyxMxkl67bsYJbNKE
 T4ebCu5IJk+5PPwwMM3REKjQbunSXXnzgCLUI2cc0Yf76CTVpx6p+NpxV+2wq0p7
 vym83F68qACAEzNH+oozN7IwqjkWyYOnTtCLiMsh4iq30jP6ohtLom6RcRp7QUxM
 Z9hxgG3NptypuVBO1jKxaQ6XZGgAasYmppOmJ/SoALv2PKsAbxi372lTR4ikceKq
 H4oNTbs5tVmyvu3qFwtLN+vX+GdfaoSUnUG8vTvnCB3tHHtYj7q5QeFE0HaX4QSq
 oLANkCOZU8TJsT30pxsCNYiqc5HK9kaLjUQId9K+xq7mM/IuhtNtBQ+ZpqAh5IxB
 4bXKYLdeJ1myZrkYTa6gcTqeFax3djCBJ3UvjTnuqRZAaQg079WkG84Kdq1ZjDRp
 IQpKQnPX9JGhjW1zqLK1Ay8h+HFPgWR5BBVOaLwImr1mH+ccG0iNIeDjrOc8h6J5
 e60XM/x2dIYxpXyFYAkldbAI24aRg1FNzfniG4rSAPecf3SwWrxg/qK7uujLbJHM
 fKNA80yifHo=
 =ukqs
 -----END PGP SIGNATURE-----

Merge tag 'fscache-next-20180406' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull fscache updates from David Howells:
 "Three patches that fix some of AFS's usage of fscache:

   (1) Need to invalidate the cache if a foreign data change is detected
       on the server.

   (2) Move the vnode ID uniquifier (equivalent to i_generation) from
       the auxiliary data to the index key to prevent a race between
       file delete and a subsequent file create seeing the same index
       key.

   (3) Need to retire cookies that correspond to files that we think got
       deleted on the server.

  Four patches to fix some things in fscache and cachefiles:

   (4) Fix a couple of checker warnings.

   (5) Correctly indicate to the end-of-operation callback whether an
       operation completed or was cancelled.

   (6) Add a check for multiple cookie relinquishment.

   (7) Fix a path through the asynchronous write that doesn't wake up a
       waiter for a page if the cache decides not to write that page,
       but discards it instead.

  A couple of patches to add tracepoints to fscache and cachefiles:

   (8) Add tracepoints for cookie operators, object state machine
       execution, cachefiles object management and cachefiles VFS
       operations.

   (9) Add tracepoints for fscache operation management and page
       wrangling.

  And then three development patches:

  (10) Attach the index key and auxiliary data to the cookie, pass this
       information through various fscache-netfs API functions and get
       rid of the callbacks to the netfs to get it.

       This means that the cache can get at this information, even if
       the netfs goes away. It also means that the cache can be lazy in
       updating the coherency data.

  (11) Pass the object data size through various fscache-netfs API
       rather than calling back to the netfs for it, and store the value
       in the object.

       This makes it easier to correctly resize the object, as the size
       is updated on writes to the cache, rather than calling back out
       to the netfs.

  (12) Maintain a catalogue of allocated cookies. This makes it possible
       to catch cookie collision up front rather than down in the bowels
       of the cache being run from a service thread from the object
       state machine.

       This will also make it possible in the future to reconnect to a
       cookie that's not gone dead yet because it's waiting for
       finalisation of the storage and also make it possible to bring
       cookies online if the cache is added after the cookie has been
       obtained"

* tag 'fscache-next-20180406' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  fscache: Maintain a catalogue of allocated cookies
  fscache: Pass object size in rather than calling back for it
  fscache: Attach the index key and aux data to the cookie
  fscache: Add more tracepoints
  fscache: Add tracepoints
  fscache: Fix hanging wait on page discarded by writeback
  fscache: Detect multiple relinquishment of a cookie
  fscache: Pass the correct cancelled indications to fscache_op_complete()
  fscache, cachefiles: Fix checker warnings
  afs: Be more aggressive in retiring cached vnodes
  afs: Use the vnode ID uniquifier in the cache key not the aux data
  afs: Invalidate cache on server data change
2018-04-07 09:08:24 -07:00
Tobias Regnery e698aaf37f pstore: fix crypto dependencies without compression
Commit 58eb5b6707 ("pstore: fix crypto dependencies") fixed up the crypto
dependencies but missed the case when no compression is selected.

With CONFIG_PSTORE=y, CONFIG_PSTORE_COMPRESS=n  and CONFIG_CRYPTO=m we see
the following link error:

fs/pstore/platform.o: In function `pstore_register':
(.text+0x1b1): undefined reference to `crypto_has_alg'
(.text+0x205): undefined reference to `crypto_alloc_base'
fs/pstore/platform.o: In function `pstore_unregister':
(.text+0x3b0): undefined reference to `crypto_destroy_tfm'

Fix this by checking at compile-time if CONFIG_PSTORE_COMPRESS is enabled.

Fixes: 58eb5b6707 ("pstore: fix crypto dependencies")
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-04-06 15:45:33 -07:00
Linus Torvalds 6ad11bdd57 audit/stable-4.17 PR 20180403
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAlrD6T4UHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQVeRaWujKfIqGOg/9FPgs5cESBrocEOBAqqcmO3qjxaEy
 NKQWGTPppZwI5f5pOStL5GT3oU8jQp3IMjzUM2yIElFUg+RM5cwb0bLmhAMCJFCd
 vtrJmGDdQ0QEj5wqkprupaVEKENlSKaKePJq3NESFtcHs2cgfcIRsycj1LaOThNi
 fUcltiocBDS/jxurCgi2s4O2JTGEXfZaI0GojKjWDddL3N6QcD5aZgPQd/67T0Pt
 5dDgkXbGkd5pR97F+LovaTuLTaMXnUx5plMUd/LsueZbOxHjZL2O2E/h4aoXATMX
 zKdtG03wEebb65cQyczeTXRIBURIQCka0U0fHx7ZhS8vK2HVgr6oGfsJfyZhSp+l
 IIb/T1dSbgUURpMH0DiGs/pQrXO/9o7Rp7wIakycIHD0kcw503hbauqJEc6pwlx6
 /WQQTo6GKwHWW67OQ7AbIt4Gh9P/s96s6kEZGRH2NAjKY9xTZVM7+nnKL8hHk0xq
 uDN20AZuD5i9cZpVqw+MYdmeuHRuNPglY9S33MyaBbFeWl48voFxiabVpV3ENfLB
 Iyc5WzpxekJi9JLneEt6/r6XIissvHxsoIPL1lCYSAPIJQRmqg4sGHKAQ9o5NtFD
 MrRZSbBQVwt3+YFKixUcU+nvnhroJsQExejZoFmAdQl8f0TiihwYl8E4lSmy7ntr
 IzNm7li+y9VRJ54=
 =n1dk
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "We didn't have anything to send for v4.16, but we're back with a
  little more than usual for v4.17.

  Eleven patches in total, most fall into the small fix category, but
  there are three non-trivial changes worth calling out:

   - the audit entry filter is being removed after deprecating it for
     quite a while (years of no one really using it because it turns out
     to be not very practical)

   - created our own version of "__mutex_owner()" because the locking
     folks were upset we were using theirs

   - improved our handling of kernel command line parameters to make
     them more forgiving

   - we fixed auditing of symlink operations

  Everything passes the audit-testsuite and as of a few minutes ago it
  merges well with your tree"

* tag 'audit-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: add refused symlink to audit_names
  audit: remove path param from link denied function
  audit: link denied should not directly generate PATH record
  audit: make ANOM_LINK obey audit_enabled and audit_dummy_context
  audit: do not panic on invalid boot parameter
  audit: track the owner of the command mutex ourselves
  audit: return on memory error to avoid null pointer dereference
  audit: bail before bug check if audit disabled
  audit: deprecate the AUDIT_FILTER_ENTRY filter
  audit: session ID should not set arch quick field pointer
  audit: update bugtracker and source URIs
2018-04-06 15:01:25 -07:00
Linus Torvalds 69824bcc4b - Add lz4hc and 842 to pstore compression options (Geliang Tang)
- Refactor to use crypto compression API (Geliang Tang)
 - Fix up Kconfig dependencies for compression (Arnd Bergmann)
 - Allow for run-time compression selection
 - Remove stack VLA usage
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJawlj9AAoJEIly9N/cbcAm4wMP/1LsEBcZIUrdYD2WtaDoCrMJ
 lSPws0SjzoPhAPF264Adk0PRbr8XaQ093bRHqi7QGGAjRI+GwD4bJl+mzuZwkPq5
 ZNfBpZ9nST7KYQzy37f756VAa/CA5F+ta4aFkXkY1Ab6xxAf1vjKa1yKgA+ewHg6
 dve1C13CeZSbq3Fl8UpfevG9w+y0AsD5MiibAujc7UFE8Qbi8OSX7boeoE20VqRx
 TN+VFfNXLEPbDXkPDAvhmWkEjnKI/BGCkr2+hAgMLj/UNy7Odf8WhpjAHqrj2EPi
 pG4gHPKxAbyII87UyDW8ZvQbysx/TysRmTSwHirHSSh6BHQDOQ2WccP1DQbL+Mnc
 8XcIdE4snxSPZ/dj7WpwWvxNSRq3gInZ0fc+bvpVfUenh8kX2n8oy/hUtc0RimiF
 wg0fn9rbwOL41UMzBAZIbJjqBSApSMdTP/vNPPh09oYgK+GuPuQvG/KEz/y8Jh1T
 DpDx28XD5UpH4opD5rz4KDIT1zbLNkxXckAztrtJUfWE0ILU7IOn4N/4No+IPidA
 Hq4kQyHCk7BnZbQ1eEWAG/WdJYHEs/xbgw+IaVBa6b5y790+KhBdTNZ6YxIPg1i/
 rWdXgn6c80K3QiMwidh4EGe0JuQQIlEI04YurW47/YZWZ66A/lD/fAMOedaG3g4q
 0/04pWseEAjg+a44y/8+
 =gqnB
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:
 "This cycle was almost entirely improvements to the pstore compression
  options, noted below:

   - Add lz4hc and 842 to pstore compression options (Geliang Tang)

   - Refactor to use crypto compression API (Geliang Tang)

   - Fix up Kconfig dependencies for compression (Arnd Bergmann)

   - Allow for run-time compression selection

   - Remove stack VLA usage"

* tag 'pstore-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore: fix crypto dependencies
  pstore: Use crypto compress API
  pstore/ram: Do not use stack VLA for parity workspace
  pstore: Select compression at runtime
  pstore: Avoid size casts for 842 compression
  pstore: Add lz4hc and 842 compression support
2018-04-06 14:59:01 -07:00
Linus Torvalds 3b54765cca Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc things

 - ocfs2 updates

 - the v9fs maintainers have been missing for a long time. I've taken
   over v9fs patch slinging.

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (116 commits)
  mm,oom_reaper: check for MMF_OOM_SKIP before complaining
  mm/ksm: fix interaction with THP
  mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t
  headers: untangle kmemleak.h from mm.h
  include/linux/mmdebug.h: make VM_WARN* non-rvals
  mm/page_isolation.c: make start_isolate_page_range() fail if already isolated
  mm: change return type to vm_fault_t
  mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes
  mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory
  kernel/fork.c: detect early free of a live mm
  mm: make counting of list_lru_one::nr_items lockless
  mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static
  block_invalidatepage(): only release page if the full page was invalidated
  mm: kernel-doc: add missing parameter descriptions
  mm/swap.c: remove @cold parameter description for release_pages()
  mm/nommu: remove description of alloc_vm_area
  zram: drop max_zpage_size and use zs_huge_class_size()
  zsmalloc: introduce zs_huge_class_size()
  mm: fix races between swapoff and flush dcache
  fs/direct-io.c: minor cleanups in do_blockdev_direct_IO
  ...
2018-04-06 14:19:26 -07:00
Al Viro 8613a209ff make lookup_one_len() safe to use with directory locked shared
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-06 16:45:33 -04:00
Al Viro 88d8331afb new helper: __lookup_slow()
lookup_slow() sans locking/unlocking the directory

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-06 16:43:47 -04:00
Al Viro 3c95f0dce8 merge common parts of lookup_one_len{,_unlocked} into common helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-06 16:33:40 -04:00
Linus Torvalds 3fd14cdcc0 MTD changes:
Core:
     * Remove support for asynchronous erase (not implemented by any of
       the existing drivers anyway)
     * Remove Cyrille from the list of SPI NOR and MTD maintainers
     * Fix kernel doc headers
     * Allow users to define the partitions parsers they want to test
       through a DT property (compatible of the partitions subnode)
     * Remove the bfin-async-flash driver (the only architecture using
       it has been removed)
     * Fix pagetest test
     * Add extra checks in mtd_erase()
     * Simplify the MTD partition creation logic and get rid of
       mtd_add_device_partitions()
 
    Drivers:
     * Add endianness information to the physmap DT binding
     * Add Eon EN29LV400A IDs to JEDEC probe logic
     * Use %*ph where appropriate
 
 SPI NOR changes:
   Drivers:
     * Make fsl-quaspi assign different names to MTD devices connected
       to the same QSPI controller
     * Remove an unneeded driver.bus assigned in the fsl-qspi driver
 
 NAND changes:
   Core:
     * Prepare arrival of the SPI NAND subsystem by implementing a
       generic (interface-agnostic) layer to ease manipulation of NAND
       devices
     * Move onenand code base to the drivers/mtd/nand/ dir
     * Rework timing mode selection
     * Provide a generic way for NAND chip drivers to flag a specific
       GET/SET FEATURE operation as supported/unsupported
     * Stop embedding ONFI/JEDEC param page in nand_chip
 
   Drivers:
     * Rework/cleanup of the mxc driver
     * Various cleanups in the vf610 driver
     * Migrate the fsmc and vf610 to ->exec_op()
     * Get rid of the pxa driver (replaced by marvell_nand)
     * Support ->setup_data_interface() in the GPMI driver
     * Fix probe error path in several drivers
     * Remove support for unused hw_syndrome mode in sunxi_nand
     * Various minor improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQI5BAABCAAjBQJaxTXfHBxib3Jpcy5icmV6aWxsb25AYm9vdGxpbi5jb20ACgkQ
 Ze02AX4ItwDeFhAAs+OmGip0LHk+D7gFCBONtypOGRi0nmGsi6PkhKtUZBob6/Y2
 JoJhTrHx/LsRpqcuh3Y8KSCp0CG5WlZX2m1RU1yrcqiWvTfiXHlv6aQGfiE/esRb
 Ei4QNiND8hg+hyzc6I5wRrAuJ7jPP5BanX+n9TyvygaA1Ic5pcur0gssoYKeJTia
 18pUV+RLe67wfP02uT0GJvmXd5ecIpu0OVhmJPye2UQqnfl+eiJ2zI4TJAZN3zRW
 tD9XiX3/GP8oCnvC+1kZw48a2c/qiHs/DKCdDcH6SDTKzKha4zCgaX4220X3AqPK
 rScrFTKxiCa1utZPn9EplnW6ZW3Ud46GqReWGRQZuyphu/ntdkNim7nuJMUZfEFL
 RFtMLhXDs1aXUaYJ6F3YQeajHxVU6Ugl34UQCraCXbfmzqCkaKfgXm3EVb9d2duY
 rCkrS5N4gSQXp5SMvc0aiu2TnsRX2OCthGg0NBdHG9AaVhwuj0neeSFQ/XFPxOeh
 5jQKo4umR7tr2Md7osSx0PVxVo8uO5smqcl90SsdwrjxQxL7z1u/SJBjRf6pJDzQ
 cR1o+H9jBXecqD6m+dN3H/h0sMwHzefsASkxYNxl2OeRGYsJvBpuoQF8yUD9jnRS
 AzHhWhUNo8g3FxDe7kKgsMXoqnr/pKopMCRxVbeEkhDFNsTvBde10/Xo09U=
 =EG7L
 -----END PGP SIGNATURE-----

Merge tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd

Pull MTD updates from Boris Brezillon:
 "MTD Core:
   - Remove support for asynchronous erase (not implemented by any of
     the existing drivers anyway)
   - Remove Cyrille from the list of SPI NOR and MTD maintainers
   - Fix kernel doc headers
   - Allow users to define the partitions parsers they want to test
     through a DT property (compatible of the partitions subnode)
   - Remove the bfin-async-flash driver (the only architecture using it
     has been removed)
   - Fix pagetest test
   - Add extra checks in mtd_erase()
   - Simplify the MTD partition creation logic and get rid of
     mtd_add_device_partitions()

  MTD Drivers:
   - Add endianness information to the physmap DT binding
   - Add Eon EN29LV400A IDs to JEDEC probe logic
   - Use %*ph where appropriate

  SPI NOR Drivers:
   - Make fsl-quaspi assign different names to MTD devices connected to
     the same QSPI controller
   - Remove an unneeded driver.bus assigned in the fsl-qspi driver

  NAND Core:
   - Prepare arrival of the SPI NAND subsystem by implementing a generic
     (interface-agnostic) layer to ease manipulation of NAND devices
   - Move onenand code base to the drivers/mtd/nand/ dir
   - Rework timing mode selection
   - Provide a generic way for NAND chip drivers to flag a specific
     GET/SET FEATURE operation as supported/unsupported
   - Stop embedding ONFI/JEDEC param page in nand_chip

  NAND Drivers:
   - Rework/cleanup of the mxc driver
   - Various cleanups in the vf610 driver
   - Migrate the fsmc and vf610 to ->exec_op()
   - Get rid of the pxa driver (replaced by marvell_nand)
   - Support ->setup_data_interface() in the GPMI driver
   - Fix probe error path in several drivers
   - Remove support for unused hw_syndrome mode in sunxi_nand
   - Various minor improvements"

* tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd: (89 commits)
  dt-bindings: fsl-quadspi: Add the example of two SPI NOR
  mtd: fsl-quadspi: Distinguish the mtd device names
  mtd: nand: Fix some function description mismatches in core.c
  mtd: fsl-quadspi: Remove unneeded driver.bus assignment
  mtd: rawnand: marvell: Rename ->ecc_clk into ->core_clk
  mtd: rawnand: s3c2410: enhance the probe function error path
  mtd: rawnand: tango: fix probe function error path
  mtd: rawnand: sh_flctl: fix the probe function error path
  mtd: rawnand: omap2: fix the probe function error path
  mtd: rawnand: mxc: fix probe function error path
  mtd: rawnand: denali: fix probe function error path
  mtd: rawnand: davinci: fix probe function error path
  mtd: rawnand: cafe: fix probe function error path
  mtd: rawnand: brcmnand: fix probe function error path
  mtd: rawnand: sunxi: Stop supporting ECC_HW_SYNDROME mode
  mtd: rawnand: marvell: Fix clock resource by adding a register clock
  mtd: ftl: Use DIV_ROUND_UP()
  mtd: Fix some function description mismatches in mtdcore.c
  mtd: physmap_of: update struct map_info's swap as per map requirement
  dt-bindings: mtd-physmap: Add endianness supports
  ...
2018-04-06 12:15:41 -07:00
Linus Torvalds 9022ca6b11 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted stuff, including Christoph's I_DIRTY patches"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: move I_DIRTY_INODE to fs.h
  ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
  ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
  gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
  fs: fold open_check_o_direct into do_dentry_open
  vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents
  vfs: make sure struct filename->iname is word-aligned
  get rid of pointless includes of fs_struct.h
  [poll] annotate SAA6588_CMD_POLL users
2018-04-06 11:07:08 -07:00
David Howells ec0328e46d fscache: Maintain a catalogue of allocated cookies
Maintain a catalogue of allocated cookies so that cookie collisions can be
handled properly.  For the moment, this just involves printing a warning
and returning a NULL cookie to the caller of fscache_acquire_cookie(), but
in future it might make sense to wait for the old cookie to finish being
cleaned up.

This requires the cookie key to be stored attached to the cookie so that we
still have the key available if the netfs relinquishes the cookie.  This is
done by an earlier patch.

The catalogue also renders redundant fscache_netfs_list (used for checking
for duplicates), so that can be removed.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anna Schumaker <anna.schumaker@netapp.com>
Tested-by: Steve Dickson <steved@redhat.com>
2018-04-06 14:05:14 +01:00
David Howells ee1235a9a0 fscache: Pass object size in rather than calling back for it
Pass the object size in to fscache_acquire_cookie() and
fscache_write_page() rather than the netfs providing a callback by which it
can be received.  This makes it easier to update the size of the object
when a new page is written that extends the object.

The current object size is also passed by fscache to the check_aux
function, obviating the need to store it in the aux data.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anna Schumaker <anna.schumaker@netapp.com>
Tested-by: Steve Dickson <steved@redhat.com>
2018-04-06 14:05:14 +01:00
Jeff Moyer 3172485f4f block_invalidatepage(): only release page if the full page was invalidated
Prior to commit d47992f86b ("mm: change invalidatepage prototype to
accept length"), an offset of 0 meant that the full page was being
invalidated.  After that commit, we need to instead check the length.

Jan said:
:
: The only possible issue is that try_to_release_page() was called more
: often than necessary.  Otherwise the issue is harmless but still it's good
: to have this fixed.

Link: http://lkml.kernel.org/r/x49fu5rtnzs.fsf@segfault.boston.devel.redhat.com
Fixes: d47992f86b ("mm: change invalidatepage prototype to accept length")
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:27 -07:00
Nikolay Borisov 1c0ff0f1bd fs/direct-io.c: minor cleanups in do_blockdev_direct_IO
We already get the block counts and calculate the end block at the
beginning of the function.  Let's use the local variables for
consistency and readability.  No functional changes

[akpm@linux-foundation.org: constify the locals to prevent future slipups]
Link: http://lkml.kernel.org/r/1519638870-17756-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:26 -07:00
shunki-fujita 849cf55963 fs: don't flush pagecache when expanding block device
When changing the size of a block device, its all caches are freed.
It's necessary on shrinking to prevent spurious I/Os to the disappeared
region.  However, on expanding, such kind of I/Os doesn't happen.

Similar things can be considered for btrfs filesystem resize and
resize2fs, but they are designed not to drop caches when expanding.
Therefore this patch removes unnecessary cache drop.

Link: http://lkml.kernel.org/r/1521457240-153390-1-git-send-email-shunki-fujita@cybozu.co.jp
Signed-off-by: Shunki Fujita <shunki-fujita@cybozu.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:23 -07:00
Yiwen Jiang 7ff3c20468 fs/9p: don't set SB_NOATIME by default
When the user uses some syscall, for example mmap(v9fs_file_mmap), it
will not update atime even if user's was set mnt_flags without
MNT_NOATIME, because v9fs defaults to settine SB_NOATIME in
v9fs_set_super.

For supporting access time updating when the user mounts with relatime,
we should not set SB_NOATIME by default.

Link: http://lkml.kernel.org/r/5AB9A377.6080906@huawei.com
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Chengguang Xu a25c36577c 9p: check memory allocation result for cachetag
Check memory allocation result for cachetag in mount option parsing and
fix potential memory leak in the error case.

Link: http://lkml.kernel.org/r/1521614889-73446-1-git-send-email-cgxu519@gmx.com
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: <v9fs-developer@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Eryu Guan ac89b2ef9b 9p: don't maintain dir i_nlink if the exported fs doesn't either
If the exported filesystem dir on 9p server doesn't maintain accurate
i_nlink count, e.g.  always reports i_nlink as 1, then 9p should not
maintain nlink count either, otherwise drop_link would report warning
with i_nlink being zero.

For example:

 - overlayfs sets nlink to 1 for merged dir

 - ext4 (with dir_nlink feature enabled) sets nlink to 1 if a dir has
   more than EXT4_LINK_MAX (65000) links.

In this case, everytime a stat(2) call (getattr) on such exported dirs
on 9p client side, the i_nlink gets reset to 1, then operations like
rmdir(2), unlink(2) and rename(2) would cause the dir nlink to go to
zero (then negative), which results in warnings in drop_nlink() and/or
inc_nlink() calls.

This can be reproduced easily as the following steps:

 - export a merged overlayfs dir via qemu virtfs to guest

 - mount the exported virtfs in guest

 - create two sub-directories in the root dir of the mounted 9pfs

 - stat the root dir of 9pfs, this resets nlink to 1

 - remove all subdirs, the second unlink/rmdir would trigger warning

  ------------[ cut here ]------------
  WARNING: CPU: 3 PID: 1284 at fs/inode.c:282 drop_nlink+0x3e/0x50
  ...
  Call Trace:
    dump_stack+0x63/0x81
    __warn+0xcb/0xf0
    warn_slowpath_null+0x1d/0x20
    drop_nlink+0x3e/0x50
    v9fs_remove+0xaa/0x130 [9p]
    v9fs_vfs_rmdir+0x13/0x20 [9p]
    vfs_rmdir+0xb7/0x130
    do_rmdir+0x1b8/0x230
    SyS_unlinkat+0x22/0x30
    do_syscall_64+0x67/0x180
  ---[ end trace 43758d8ba91e603b ]---

Fix it by leaving i_nlink to be 1 and don't drop nlink if a directory
has nlink <= 2, which indicates that the underlying exported fs doesn't
maintain nlink count accurately.  This follows what ext4 does in
ext4_dec_count().

Link: http://lkml.kernel.org/r/20180312053829.4367-1-eguan@linux.alibaba.com
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Tested-by: Roman Kapl <code@rkapl.cz>
Cc: Caspar Zhang <caspar@linux.alibaba.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: <v9fs-developer@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Changwei Ge de37428638 ocfs2/dlm: clean up unused variable in dlm_process_recovery_data
Link: http://lkml.kernel.org/r/1522734135-7933-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
piaojun ba16ddfbeb ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio
We need to check len for bio_add_page() to make sure the bio has been
set up correctly, otherwise we may submit incorrect data to device.

Link: http://lkml.kernel.org/r/5ABC3EBE.5020807@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Gang He 39ec3774e3 ocfs2: add duplicated ino number check
Add duplicated ino number check, to avoid adding a file into the file
check list when this file is being checked.

Link: http://lkml.kernel.org/r/1495611866-27360-5-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Gang He 5f483c4abb ocfs2: add kobject for online file check
Use embedded kobject mechanism for online file check feature, this will
avoid to use a global list to save/search per-device online file check
related data, meanwhile, reduce the code lines and make the code logic
clear.  The changed code is based on Goldwyn Rodrigues's patches and
ext4 fs code.

Link: http://lkml.kernel.org/r/1495611866-27360-4-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Gang He 8fc2cb4ba0 ocfs2: fix some small problems
First, move setting fe_done = 1 in spin lock, avoid bring any potential
race condition.

Second, tune mlog message level from ERROR to NOTICE, since the message
should not belong to error message.

Third, tune errno to -EAGAIN when file check queue is full, this errno
is more appropriate in the case.

Link: http://lkml.kernel.org/r/1495611866-27360-3-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Gang He 5ac9438619 ocfs2: move some definitions to header file
Patch series "ocfs2: use kobject for online file check", v3.

Use embedded kobject mechanism for online file check feature, this will
avoid to use a global list to save/search per-device online file check
related data.  The changed code is based on Goldwyn Rodrigues's patches
and ext4 fs code, there is not any new features added, except some very
small fixes during this code refactoring.  Second, the code change does
not affect the underlying file check code.  Thank Goldwyn very much.

Compare with second version, add more comments in the patch
descriptions, to make sure each modification is mentioned.  Compare with
first version, split the code change into four patches, make sure each
patch will not bring ocfs2 kernel modules compiling errors.

This patch (of 3):

Move some definitions to header file, which will be referenced by other
source files when kobject mechanism is introduced.

Link: http://lkml.kernel.org/r/1495611866-27360-2-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Changwei Ge baa31b89b0 ocfs2: correct spelling mistake for migratable for all
Inspired by the ocfs2 patch to fix the spelling of migrateable to
migratable, I checked all ocfs2 files and found more spelling mistakes.
So correct them all.

Link: http://lkml.kernel.org/r/1521525734-19576-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Colin Ian King 510c48795d ocfs2: fix spelling mistake: "Migrateable" -> "Migratable"
Trivial fix to spelling mistake in mlog message text

Link: http://lkml.kernel.org/r/20180319114101.2051-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
piaojun 60c7ec9ee4 ocfs2/dlm: wait for dlm recovery done when migrating all lock resources
Wait for dlm recovery done when migrating all lock resources in case that
new lock resource left after leaving dlm domain.  And the left lock
resource will cause other nodes BUG.

        NodeA                       NodeB                NodeC

  umount:
    dlm_unregister_domain()
      dlm_migrate_all_locks()

                                   NodeB down

  do recovery for NodeB
  and collect a new lockres
  form other live nodes:

    dlm_do_recovery
      dlm_remaster_locks
        dlm_request_all_locks:

    dlm_mig_lockres_handler
      dlm_new_lockres
        __dlm_insert_lockres

  at last NodeA become the
  master of the new lockres
  and leave domain:
    dlm_leave_domain()

                                                    mount:
                                                      dlm_join_domain()

                                                    touch file and request
                                                    for the owner of the new
                                                    lockres, but all the
                                                    other nodes said 'NO',
                                                    so NodeC decide to be
                                                    the owner, and send do
                                                    assert msg to other
                                                    nodes:
                                                    dlmlock()
                                                      dlm_get_lock_resource()
                                                        dlm_do_assert_master()

                                                    other nodes receive the msg
                                                    and found two masters exist.
                                                    at last cause BUG in
                                                    dlm_assert_master_handler()
                                                    -->BUG();

Link: http://lkml.kernel.org/r/5AAA6E25.7090303@huawei.com
Fixes: bc9838c4d4 ("dlm: allow dlm do recovery during shutdown")
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Changwei Ge a43d24cb3b ocfs2/dlm: clean up unused stack variable in dlm_do_local_ast()
Link: http://lkml.kernel.org/r/1521116681-14602-2-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Changwei Ge 2bcb654c93 ocfs2/dlm: clean up unused argument for dlm_destroy_recovery_area()
Link: http://lkml.kernel.org/r/1521116681-14602-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Changwei Ge 26ec1615ca fs/ocfs2/dlm/dlmrecovery.c: remove unrelated comment
Obviously, the comment before dlm_do_local_recovery_cleanup() has nothing
to do with it.  So remove it.

Link: http://lkml.kernel.org/r/1519371054-4648-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Changwei Ge a012ab4d43 ocfs2: remove two unused functions from suballoc.c
The two functions are no longer used.

Link: http://lkml.kernel.org/r/1519609595-26229-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
piaojun a17b485aae ocfs2: remove unnecessary null pointer check before kmem_cache_destroy()
As kmem_cache_destroy() already handles null pointers, so we can remove
the conditional test entirely.

Link: http://lkml.kernel.org/r/5A9EB21D.3000209@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Jun Piao bb34f24c7d ocfs2/dlm: don't handle migrate lockres if already in shutdown
We should not handle migrate lockres if we are already in
'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
dlm domain.  At last other nodes will get stuck into infinite loop when
requsting lock from us.

The problem is caused by concurrency umount between nodes.  Before
receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
migrate target.  So N2 will continue sending lockres to N1 even though
N1 has left domain.

        N1                             N2 (owner)
                                       touch file

    access the file,
    and get pr lock

                                       begin leave domain and
                                       pick up N1 as new owner

    begin leave domain and
    migrate all lockres done

                                       begin migrate lockres to N1

    end leave domain, but
    the lockres left
    unexpectedly, because
    migrate task has passed

[piaojun@huawei.com: v3]
  Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:22 -07:00
Jia Guo 1202d4ba28 ocfs2: keep the trace point consistent with the function name
Keep the trace point consistent with the function name.

Link: http://lkml.kernel.org/r/02609aba-84b2-a22d-3f3b-bc1944b94260@huawei.com
Fixes: 3ef045c3d8 ("ocfs2: switch to ->write_iter()")
Signed-off-by: Jia Guo <guojia12@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Acked-by: Gang He <ghe@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:21 -07:00
piaojun bb4c9d6765 ocfs2: remove some unused function declarations
Remove some unused function declarations in dlmcommon.h.

Link: http://lkml.kernel.org/r/5A7D1034.7050807@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:21 -07:00
piaojun d324cd4c80 ocfs2: use 'oi' instead of 'OCFS2_I()'
We could use 'oi' instead of 'OCFS2_I()' to make code more elegant.

Link: http://lkml.kernel.org/r/5A7020FE.5050906@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:21 -07:00
piaojun 1119d3c06f ocfs2: use 'osb' instead of 'OCFS2_SB()'
We could use 'osb' instead of 'OCFS2_SB()' to make code more elegant.

Link: http://lkml.kernel.org/r/5A702111.7090907@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:21 -07:00
Mike Kravetz 5df63c2a14 hugetlbfs: fix bug in pgoff overflow checking
This is a fix for a regression in 32 bit kernels caused by an invalid
check for pgoff overflow in hugetlbfs mmap setup.  The check incorrectly
specified that the size of a loff_t was the same as the size of a long.
The regression prevents mapping hugetlbfs files at offsets greater than
4GB on 32 bit kernels.

On 32 bit kernels conversion from a page based unsigned long can not
overflow a loff_t byte offset.  Therefore, skip this check if
sizeof(unsigned long) != sizeof(loff_t).

Link: http://lkml.kernel.org/r/20180330145402.5053-1-mike.kravetz@oracle.com
Fixes: 63489f8e82 ("hugetlbfs: check for pgoff value overflow")
Reported-by: Dan Rue <dan.rue@linaro.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nic Losby <blurbdust@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:21 -07:00
Linus Torvalds be88751f32 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc filesystem updates from Jan Kara:
 "udf, ext2, quota, fsnotify fixes & cleanups:

   - udf fixes for handling of media without uid/gid

   - udf fixes for some corner cases in parsing of volume recognition
     sequence

   - improvements of fsnotify handling of ENOMEM

   - new ioctl to allow setting of watch descriptor id for inotify (for
     checkpoint - restart)

   - small ext2, reiserfs, quota cleanups"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Kill an unused extern entry form quota.h
  reiserfs: Remove VLA from fs/reiserfs/reiserfs.h
  udf: fix potential refcnt problem of nls module
  ext2: change return code to -ENOMEM when failing memory allocation
  udf: Do not mark possibly inconsistent filesystems as closed
  fsnotify: Let userspace know about lost events due to ENOMEM
  fanotify: Avoid lost events due to ENOMEM for unlimited queues
  udf: Remove never implemented mount options
  udf: Update mount option documentation
  udf: Provide saner default for invalid uid / gid
  udf: Clean up handling of invalid uid/gid
  udf: Apply uid/gid mount options also to new inodes & chown
  udf: Ignore [ug]id=ignore mount options
  udf: Fix handling of Partition Descriptors
  udf: Unify common handling of descriptors
  udf: Convert descriptor index definitions to enum
  udf: Allow volume descriptor sequence to be terminated by unrecorded block
  udf: Simplify handling of Volume Descriptor Pointers
  udf: Fix off-by-one in volume descriptor sequence length
  inotify: Extend ioctl to allow to request id of new watch descriptor
2018-04-05 19:17:50 -07:00
Linus Torvalds 5e4d659713 Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on
server xdr decoding (with an eye towards eliminating a data copy in the
 RDMA case).
 
 I did some refactoring of the delegation code in preparation for
 eliminating some delegation self-conflicts and implementing write
 delegations.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaxi5LAAoJECebzXlCjuG+deAQAL9NHsv6bIydkE6wX305c/bR
 gm73yryF1kfOuHmiLq15mrljiKyCEbRSPqzzM8k2TkywHhyvOLEjxHbhZnwDyQec
 DaZUzWLKNkK64UFXEvTKyNwfGObwsGQ+QLkV7N9mF3Ps9M9/u2vMHKQypvA9hJ7z
 DGN7MO7Ud7N0Viu03vp4m+p7gypoWGFj6Sh1QAkR/7TE/supcS+qqOWU4vLpYFhu
 /l2gJym59FWqHajwqs0Qu9LpHfsEx5HySZbj7GczbGRMka3y/AnjgnngcriP+63B
 ZcPpqSdD4Yeq1OJklU5Wicy+u54rFkA9VE1EArrC9RAEwav6iMhhhaESpRH2JFJE
 SO7cgUSCb2+65XgeSBfDygn+09PcN+eRF3sxXkQpHKsozQOH+qdyr7F4/ePunwi8
 Ah7pIkczRUrj7gMmlNOg97wpHbffO4YnpRESA934qf7MMHRQwDsEkl512kFAyadZ
 g1DI3iByfUpBQvRWJSLasyjyWUqRZDMmyO3yi3i/08sMI3XE1IOWpNkJAooNYC1X
 1FCDn1VXlTdcmC8yw6Da1L05PCVf25tSjpYQZ6r25KtrVi9iBOGV741vmyVMvDEw
 OwUuVatRd+AY+YJ6iUraNH4SlnHWZ/qKDFJECWbrG/4uUHyo8FIXGAgL8RgtbS8+
 fQeRjaDyhHD0XH7TJ10x
 =fJI1
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on
  server xdr decoding (with an eye towards eliminating a data copy in
  the RDMA case).

  I did some refactoring of the delegation code in preparation for
  eliminating some delegation self-conflicts and implementing write
  delegations"

* tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux: (40 commits)
  nfsd: fix incorrect umasks
  sunrpc: remove incorrect HMAC request initialization
  NFSD: Clean up legacy NFS SYMLINK argument XDR decoders
  NFSD: Clean up legacy NFS WRITE argument XDR decoders
  nfsd: Trace NFSv4 COMPOUND execution
  nfsd: Add I/O trace points in the NFSv4 read proc
  nfsd: Add I/O trace points in the NFSv4 write path
  nfsd: Add "nfsd_" to trace point names
  nfsd: Record request byte count, not count of vectors
  nfsd: Fix NFSD trace points
  svc: Report xprt dequeue latency
  sunrpc: Report per-RPC execution stats
  sunrpc: Re-purpose trace_svc_process
  sunrpc: Save remote presentation address in svc_xprt for trace events
  sunrpc: Simplify trace_svc_recv
  sunrpc: Simplify do_enqueue tracing
  sunrpc: Move trace_svc_xprt_dequeue()
  sunrpc: Update show_svc_xprt_flags() to include recently added flags
  svc: Simplify ->xpo_secure_port
  sunrpc: Remove unneeded pointer dereference
  ...
2018-04-05 19:15:29 -07:00
Linus Torvalds 274c0e74e5 f2fs-for-4.17-rc1
In this round, we've mainly focused on performance tuning and critical bug fixes
 occurred in low-end devices. Sheng Yong introduced lost_found feature to keep
 missing files during recovery instead of thrashing them. We're preparing coming
 fsverity implementation. And, we've got more features to communicate with users
 for better performance. In low-end devices, some memory-related issues were
 fixed, and subtle race condtions and corner cases were addressed as well.
 
 Enhancement:
  - large nat bitmaps for more free node ids
  - add three block allocation policies to pass down write hints given by user
  - expose extension list to user and introduce hot file extension
  - tune small devices seamlessly for low-end devices
  - set readdir_ra by default
  - give more resources under gc_urgent mode regarding to discard and cleaning
  - introduce fsync_mode to enforce posix or not
  - nowait aio support
  - add lost_found feature to keep dangling inodes
  - reserve bits for future fsverity feature
  - add test_dummy_encryption for FBE
 
 Bug fix:
  - don't use highmem for dentry pages
  - align memory boundary for bitops
  - truncate preallocated blocks in write errors
  - guarantee i_times on fsync call
  - clear CP_TRIMMED_FLAG correctly
  - prevent node chain loop during recovery
  - avoid data race between atomic write and background cleaning
  - avoid unnecessary selinux violation warnings on resgid option
  - GFP_NOFS to avoid deadlock in quota and read paths
  - fix f2fs_skip_inode_update to allow i_size recovery
 
 In addition to them, there are several minor bug fixes and clean-ups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlrFmaYACgkQQBSofoJI
 UNKaSw//RZraDQmODfDieyjGAd8y1NgGDju1jgk62QJUqe/G3jIKN/A39HU3I2Ak
 jLIotwZ8j1yNBUGbXMXZTY+6sRkQGJrcgawgvZ0gwWSeqTAiCi+AbqnNysTv8pDp
 Rh1iMCGvNKv1/vbFizJumcK2haH/KzX9Z7uDBjji4N7tTT1nUGVz+vmUO/qS6Xi0
 44BrEN1rhaF71JNaiQE0WuEc0GY2IU0W7l4IWirJeNqsK0BTVPnq1vxc8SsQQRmn
 Wdr4rUw2SRAynGMCQZlMW/sG0sxV0WMAE0xVE8WCFMXzHx2GNVakUgXMTZ2DgUKT
 JlgdsB5NQYrZJqSEAdNq3WclJpznlhOLHp8hjITxSRxhd2j7GH7xujUoGSsNiXDr
 n2s3Xg6wWMx+D4vsfXYJuZQKYqZUcBlPRgXa74/aXnsxRTLaormJtFNWkLqELIzR
 Qaw7XIhZoOyKkH7CPSWk0eYyQVi13l9QV70iyBfY4bcN2aLZLxTX2Axwws3ESOCt
 Nf0jIe3L4k7MSdoMR0K51rhCMlPWDuddnQSs9Vsw3w8U5pVph+BKG5RBUI9N4hPv
 U0gKJfj8LIaxhHTfborEHf+hjq/0sN0TBPFBsAtqvv1LBXkMeWEeo0wXtxSJC89b
 3xBRQ0p4Nox4fEHs36AZID2OMOV8iamiEqzP/o9ja0sMzEjnYmo=
 =Ligi
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs update from Jaegeuk Kim:
 "In this round, we've mainly focused on performance tuning and critical
  bug fixes occurred in low-end devices. Sheng Yong introduced
  lost_found feature to keep missing files during recovery instead of
  thrashing them. We're preparing coming fsverity implementation. And,
  we've got more features to communicate with users for better
  performance. In low-end devices, some memory-related issues were
  fixed, and subtle race condtions and corner cases were addressed as
  well.

  Enhancements:
   - large nat bitmaps for more free node ids
   - add three block allocation policies to pass down write hints given by user
   - expose extension list to user and introduce hot file extension
   - tune small devices seamlessly for low-end devices
   - set readdir_ra by default
   - give more resources under gc_urgent mode regarding to discard and cleaning
   - introduce fsync_mode to enforce posix or not
   - nowait aio support
   - add lost_found feature to keep dangling inodes
   - reserve bits for future fsverity feature
   - add test_dummy_encryption for FBE

  Bug fixes:
   - don't use highmem for dentry pages
   - align memory boundary for bitops
   - truncate preallocated blocks in write errors
   - guarantee i_times on fsync call
   - clear CP_TRIMMED_FLAG correctly
   - prevent node chain loop during recovery
   - avoid data race between atomic write and background cleaning
   - avoid unnecessary selinux violation warnings on resgid option
   - GFP_NOFS to avoid deadlock in quota and read paths
   - fix f2fs_skip_inode_update to allow i_size recovery

  In addition to the above, there are several minor bug fixes and clean-ups"

* tag 'f2fs-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits)
  f2fs: remain written times to update inode during fsync
  f2fs: make assignment of t->dentry_bitmap more readable
  f2fs: truncate preallocated blocks in error case
  f2fs: fix a wrong condition in f2fs_skip_inode_update
  f2fs: reserve bits for fs-verity
  f2fs: Add a segment type check in inplace write
  f2fs: no need to initialize zero value for GFP_F2FS_ZERO
  f2fs: don't track new nat entry in nat set
  f2fs: clean up with F2FS_BLK_ALIGN
  f2fs: check blkaddr more accuratly before issue a bio
  f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read
  f2fs: introduce a new mount option test_dummy_encryption
  f2fs: introduce F2FS_FEATURE_LOST_FOUND feature
  f2fs: release locks before return in f2fs_ioc_gc_range()
  f2fs: align memory boundary for bitops
  f2fs: remove unneeded set_cold_node()
  f2fs: add nowait aio support
  f2fs: wrap all options with f2fs_sb_info.mount_opt
  f2fs: Don't overwrite all types of node to keep node chain
  f2fs: introduce mount option for fsync mode
  ...
2018-04-05 19:12:55 -07:00
Linus Torvalds 3526dd0c78 for-4.17/block-20180402
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJawr05AAoJEPfTWPspceCmT2UP/1uuaqwzyl4VjFNb/k7KS7UM
 +Cs/1HBlGomgMA8orDTGqtWqLRdR3z4RSh0+MvXTzQ78HpFVYz7CbDc9itHm+G9M
 X0ypD4kF/JGCFb5cxk+x6qv28uO2nv4DP3+0hHqJWLH4UVJBWDY6bs4BPShsf9QB
 I6XjioNMhoqylXgdOITLODJZz+TcChlJMDAqwhpJwh9TH1wjobleAZ6AdmCPfgi5
 h0UCKMUKzcVJlNZwQUrzrs2cxcx9Uhunnbz7HK0ZV4n/FKFtDpGynFpQQ71pZxKe
 Be0ZOBPCQvC3ykOM/egCIvC/e5y7FgrjORD6jxyu1PTwAugI5E1VYSMxHkXvgPAx
 zOo9A7RT4GPO2tDQv+DbzNFpqeSAclTgSmr+/y1wmheBs8DiSt7MPVBiNM4zdCNv
 NLk9z7IEjFhdmluSB/LbTb1aokypMb/q7QTLouPHdwGn80k7yrhFyLHgdjpNTQ2K
 UHfHZvGxkOX6SmFhBNOtIFUkuSceenh64a0RkRle7filx+ImpbCVm2/GYi9zZNCu
 EtctgzLbLmz40zMiyDaZS2bxBgGzfn6yf4xd9LsaAJPMhvZnmXogT0D9ctWXB0WU
 mMaS7sOkLnNjnGkzF1fHkeiZ/oigrstJbe+CA7BtOdwxpWn6MZBgKEoFQ6iA2b3X
 5J1axMgVH5LAsIEcEQVq
 =RVhK
 -----END PGP SIGNATURE-----

Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
 "It's a pretty quiet round this time, which is nice. This contains:

   - series from Bart, cleaning up the way we set/test/clear atomic
     queue flags.

   - series from Bart, fixing races between gendisk and queue
     registration and removal.

   - set of bcache fixes and improvements from various folks, by way of
     Michael Lyle.

   - set of lightnvm updates from Matias, most of it being the 1.2 to
     2.0 transition.

   - removal of unused DIO flags from Nikolay.

   - blk-mq/sbitmap memory ordering fixes from Omar.

   - divide-by-zero fix for BFQ from Paolo.

   - minor documentation patches from Randy.

   - timeout fix from Tejun.

   - Alpha "can't write a char atomically" fix from Mikulas.

   - set of NVMe fixes by way of Keith.

   - bsg and bsg-lib improvements from Christoph.

   - a few sed-opal fixes from Jonas.

   - cdrom check-disk-change deadlock fix from Maurizio.

   - various little fixes, comment fixes, etc from various folks"

* tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits)
  blk-mq: Directly schedule q->timeout_work when aborting a request
  blktrace: fix comment in blktrace_api.h
  lightnvm: remove function name in strings
  lightnvm: pblk: remove some unnecessary NULL checks
  lightnvm: pblk: don't recover unwritten lines
  lightnvm: pblk: implement 2.0 support
  lightnvm: pblk: implement get log report chunk
  lightnvm: pblk: rename ppaf* to addrf*
  lightnvm: pblk: check for supported version
  lightnvm: implement get log report chunk helpers
  lightnvm: make address conversions depend on generic device
  lightnvm: add support for 2.0 address format
  lightnvm: normalize geometry nomenclature
  lightnvm: complete geo structure with maxoc*
  lightnvm: add shorten OCSSD version in geo
  lightnvm: add minor version to generic geometry
  lightnvm: simplify geometry structure
  lightnvm: pblk: refactor init/exit sequences
  lightnvm: Avoid validation of default op value
  lightnvm: centralize permission check for lightnvm ioctl
  ...
2018-04-05 14:27:02 -07:00
Linus Torvalds 672a9c1069 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  kfifo: fix inaccurate comment
  tools/thermal: tmon: fix for segfault
  net: Spelling s/stucture/structure/
  edd: don't spam log if no EDD information is present
  Documentation: Fix early-microcode.txt references after file rename
  tracing: Block comments should align the * on each line
  treewide: Fix typos in printk
  GenWQE: Fix a typo in two comments
  treewide: Align function definition open/close braces
2018-04-05 11:56:35 -07:00
Nikolay Borisov 1e1c50a929 btrfs: Fix possible softlock on single core machines
do_chunk_alloc implements a loop checking whether there is a pending
chunk allocation and if so causes the caller do loop. Generally this
loop is executed only once, however testing with btrfs/072 on a single
core vm machines uncovered an extreme case where the system could loop
indefinitely. This is due to a missing cond_resched when loop which
doesn't give a chance to the previous chunk allocator finish its job.

The fix is to simply add the missing cond_resched.

Fixes: 6d74119f1a ("Btrfs: avoid taking the chunk_mutex in do_chunk_alloc")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-05 19:22:35 +02:00
Liu Bo b98def7ca6 Btrfs: bail out on error during replay_dir_deletes
If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
to bail out, otherwise @ret would be forced to be 0 after 'break;' and
the caller won't be aware of it.

Fixes: e02119d5a7 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-05 19:22:26 +02:00
Liu Bo 80c0b4210a Btrfs: fix NULL pointer dereference in log_dir_items
0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is
returned, path->nodes[0] could be NULL, log_dir_items lacks such a
check for <0 and we may run into a null pointer dereference panic.

Fixes: e02119d5a7 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-05 19:22:17 +02:00
Linus Torvalds 06dd3dfeea Char/Misc patches for 4.17-rc1
Here is the big set of char/misc driver patches for 4.17-rc1.
 
 There are a lot of little things in here, nothing huge, but all
 important to the different hardware types involved:
 	- thunderbolt driver updates
 	- parport updates (people still care...)
 	- nvmem driver updates
 	- mei updates (as always)
 	- hwtracing driver updates
 	- hyperv driver updates
 	- extcon driver updates
 	- and a handfull of even smaller driver subsystem and individual
 	  driver updates
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWsShSQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykNqwCfUbfvopswb1PesHCLABDBsFQChgoAniDa6pS9
 kI8TN5MdLN85UU27Mkb6
 =BzFR
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc updates from Greg KH:
 "Here is the big set of char/misc driver patches for 4.17-rc1.

  There are a lot of little things in here, nothing huge, but all
  important to the different hardware types involved:

   -  thunderbolt driver updates

   -  parport updates (people still care...)

   -  nvmem driver updates

   -  mei updates (as always)

   -  hwtracing driver updates

   -  hyperv driver updates

   -  extcon driver updates

   -  ... and a handful of even smaller driver subsystem and individual
      driver updates

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (149 commits)
  hwtracing: Add HW tracing support menu
  intel_th: Add ACPI glue layer
  intel_th: Allow forcing host mode through drvdata
  intel_th: Pick up irq number from resources
  intel_th: Don't touch switch routing in host mode
  intel_th: Use correct method of finding hub
  intel_th: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
  stm class: Make dummy's master/channel ranges configurable
  stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
  MAINTAINERS: Bestow upon myself the care for drivers/hwtracing
  hv: add SPDX license id to Kconfig
  hv: add SPDX license to trace
  Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
  Drivers: hv: vmbus: respect what we get from hv_get_synint_state()
  /dev/mem: Avoid overwriting "err" in read_mem()
  eeprom: at24: use SPDX identifier instead of GPL boiler-plate
  eeprom: at24: simplify the i2c functionality checking
  eeprom: at24: fix a line break
  eeprom: at24: tweak newlines
  eeprom: at24: refactor at24_probe()
  ...
2018-04-04 20:07:20 -07:00
Linus Torvalds 9abf8acea2 TTY/Serial driver patches for 4.17-rc1
Here is the big set of tty and serial driver patches for 4.17-rc1
 
 Not all that big really, most are just small fixes and additions to
 existing drivers.  There's a bunch of work on the imx serial driver
 recently for some reason, and a new embedded serial driver added as
 well.
 
 Full details are in the shortlog.
 
 All of these have been in the linux-next tree for a while with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWsSn+w8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynb6wCdEf5dAUrSB37ptZY78n4kc6nI6NAAniDO+rjL
 ppZZp7QTIB/bnPfW8cOH
 =+tfY
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the big set of tty and serial driver patches for 4.17-rc1

  Not all that big really, most are just small fixes and additions to
  existing drivers. There's a bunch of work on the imx serial driver
  recently for some reason, and a new embedded serial driver added as
  well.

  Full details are in the shortlog.

  All of these have been in the linux-next tree for a while with no
  reported issues"

* tag 'tty-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (66 commits)
  serial: expose buf_overrun count through proc interface
  serial: mvebu-uart: fix tx lost characters
  tty: serial: msm_geni_serial: Fix return value check in qcom_geni_serial_probe()
  tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP
  8250-men-mcb: add support for 16z025 and 16z057
  powerpc: Mark the variable earlycon_acpi_spcr_enable maybe_unused
  serial: stm32: fix initialization of RS485 mode
  ARM: dts: STi: Remove "console=ttyASN" from bootargs for STi boards
  vt: change SGR 21 to follow the standards
  serdev: Fix typo in serdev_device_alloc
  ARM: dts: STi: Fix aliases property name for STi boards
  tty: st-asc: Update tty alias
  serial: stm32: add support for RS485 hardware control mode
  dt-bindings: serial: stm32: add RS485 optional properties
  selftests: add devpts selftests
  devpts: comment devpts_mntget()
  devpts: resolve devpts bind-mounts
  devpts: hoist out check for DEVPTS_SUPER_MAGIC
  serial: 8250: Add Nuvoton NPCM UART
  serial: mxs-auart: disable clks of Alphascale ASM9260
  ...
2018-04-04 18:43:49 -07:00
Jiang Biao 4fb1cd8230 ubifs: Remove useless parameter of lpt_heap_replace
The parameter *old_lprops* is never used in lpt_heap_replace(),
remove it to avoid compile warning.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-04-04 23:48:11 +02:00
Jiang Biao cc19478348 ubifs: Constify struct ubifs_lprops in scan_for_leb_for_idx
Constify struct ubifs_lprops in scan_for_leb_for_idx to be
consistent with other references.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-04-04 23:48:10 +02:00
Stefan Agner ae4c8081eb ubifs: remove unnecessary assignment
Assigning a value of a variable to itself is not useful. This
fixes a warning shown when using clang:
  warning: explicitly assigning value of variable of type 'int' to itself [-Wself-assign]

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-04-04 23:48:10 +02:00
Richard Weinberger aac17948a7 ubifs: Check ubifs_wbuf_sync() return code
If ubifs_wbuf_sync() fails we must not write a master node with the
dirty marker cleared.
Otherwise it is possible that in case of an IO error while syncing we
mark the filesystem as clean and UBIFS refuses to recover upon next
mount.

Cc: <stable@vger.kernel.org>
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
2018-04-04 23:41:44 +02:00
Linus Torvalds 3e968c9f14 Cleanups and bugfixes for ext4, including some fixes to make ext4 more
robust against maliciously crafted file system images.  (I still don't
 recommend that container folks hold any delusions that mounting
 arbitary images that can be crafted by malicious attackers should be
 considered sane thing to do, though!)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlrE13oACgkQ8vlZVpUN
 gaM/fQf/bw5IDWNZ9vAYNEBDsu4Je14XVTZVgpDFESk5BGjdVusAv562QCQjC3Pf
 O3uqRSh7DbulICzYGy0BmjM7Gw0Kj5WhfDjPbVXhnGoNpF7NUEhNos8xWLRWukky
 P8mPxv8cj2oprtfM1Cy8mtl0KT8jywkOahq6OmRo6+F1V32IzQAfpPpItIw9lCv8
 JcFgc/2K9zjAG89TR/vWChi8AGhNUXsYEYt+l3tMu+CRC+FWaJ7aEGPn9pmOlqOV
 svHG9skImvINOO4FPydOu+yw9spqFKBX9NO0J9MsCAHmmmKW2GW3RANGIhY9kgBb
 a/T+/cQZTTc+Xu6VzRf4e/SGUa5mgA==
 =HndN
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Cleanups and bugfixes for ext4, including some fixes to make ext4 more
  robust against maliciously crafted file system images.

  (I still don't recommend that container folks hold any delusions that
  mounting arbitary images that can be crafted by malicious attackers
  should be considered sane thing to do, though!)"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (29 commits)
  ext4: force revalidation of directory pointer after seekdir(2)
  ext4: add extra checks to ext4_xattr_block_get()
  ext4: add bounds checking to ext4_xattr_find_entry()
  ext4: move call to ext4_error() into ext4_xattr_check_block()
  ext4: don't show data=<mode> option if defaulted
  ext4: omit init_itable=n in procfs when disabled
  ext4: show more binary mount options in procfs
  ext4: simplify kobject usage
  ext4: remove unused parameters in sysfs code
  ext4: null out kobject* during sysfs cleanup
  ext4: don't allow r/w mounts if metadata blocks overlap the superblock
  ext4: always initialize the crc32c checksum driver
  ext4: fail ext4_iget for root directory if unallocated
  ext4: limit xattr size to INT_MAX
  ext4: add validity checks for bitmap block numbers
  ext4: fix comments in ext4_swap_extents()
  ext4: use generic_writepages instead of __writepage/write_cache_pages
  ext4: don't complain about incorrect features when probing
  ext4: remove EXT4_STATE_DIOREAD_LOCK flag
  ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin()
  ...
2018-04-04 14:19:24 -07:00
Linus Torvalds a8f8e8ac76 various SMB3/CIFS fixes for 4.17
-----BEGIN PGP SIGNATURE-----
 
 iQGwBAABCAAaBQJaxRzZExxzbWZyZW5jaEBnbWFpbC5jb20ACgkQiiy9cAdyT1Fb
 FQv/Rd/5CrYhZumBrPvFW2jcbRQ2ANTnSRTA3rpd/jJM52DZ7nvcePr/qm9wRLrT
 puMfd8e0a4Df5Mo/ns806iphRtYctKpMKLnkBqPL0WqrXLYSi/Nz3wy/DFyuh3C7
 U22gDYAjQ4dy6Am0CG/y4i1h8D0hRkmMS6PQECpjmNwqjtmfZn5kWJRv+W5UNNj9
 QPldz5PdyNpPw7DxDRetl5uGqUKqsvUATo109hL7ks97qgHUzMHeXWmQpOSS+exh
 P7tPNphIPJYM2VG+uDvIg15l00lgQxzzN0uOs+x7ZDnZ1Bil/a3So823SfJyXNU2
 utJNWSuN/OSHUCmmd7yn+rLd2oJa55+U+Bb3gWaZ8beP639d8P1kEF/isJzu6ede
 gh92lyU2ecfyHNbjKzAbQwwQxnkmMC5XGhP/2+eawsCyo+vk/NQR4NIehqawj/OB
 eQSRnT0vNgi4p4OXgJ3FpBBKNBFtTmWrxqKyl0U9C5nw+YdxBdkr16qlFi2b9DjC
 bW0/
 =VW5l
 -----END PGP SIGNATURE-----

Merge tag '4.17-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:
 "Includes SMB3.11 security improvements, as well as various fixes for
  stable and some debugging improvements"

* tag '4.17-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Add minor debug message during negprot
  smb3: Fix root directory when server returns inode number of zero
  cifs: fix sparse warning on previous patch in a few printks
  cifs: add server->vals->header_preamble_size
  cifs: smbd: disconnect transport on RDMA errors
  cifs: smbd: avoid reconnect lockup
  Don't log confusing message on reconnect by default
  Don't log expected error on DFS referral request
  fs: cifs: Replace _free_xid call in cifs_root_iget function
  SMB3.1.1 dialect is no longer experimental
  Tree connect for SMB3.1.1 must be signed for non-encrypted shares
  fix smb3-encryption breakage when CONFIG_DEBUG_SG=y
  CIFS: fix sha512 check in cifs_crypto_secmech_release
  CIFS: implement v3.11 preauth integrity
  CIFS: add sha512 secmech
  CIFS: refactor crypto shash/sdesc allocation&free
  Update README file for cifs.ko
  Update TODO list for cifs.ko
  cifs: fix memory leak in SMB2_open()
  CIFS: SMBD: fix spelling mistake: "faield" and "legnth"
2018-04-04 14:09:27 -07:00
Boris Brezillon fe5f31a801 Linux 4.16-rc2
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlqKKI0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGRNAH/0v3+nuJ0oiHE1Cl
 fH89F9Ma17j8oTo28byRPi7X5XJfJAqANhHa209rguvnC27y3ew/l9k93HoxG12i
 ttvyKFDQulQbytfJZXw8lhUyYGXVsTpyNaihPe/NtqPdIxNgfrXsUN9EIEtcnuS2
 SiAj51jUySDRNR4ST6TOx4ulDm1zLrmA28WHOBNOTvDi4jTQMt1TsngHfF5AySBB
 lD4RTRDDwWDWtdMI7euYSq019TiDXCxmwQ94vZjrqmjmSQcl/yCK/JzEV33SZslg
 4WqGIllxONvP/UlwxZLaJ+RrslqxNgDVqQKwJdfYhGaWvpgPFtS1s86zW6IgyXny
 02jJfD0=
 =DLWn
 -----END PGP SIGNATURE-----

Merge tag 'v4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mtd/next

Backmerge v4.16-rc2 into mtd/next to resolve a conflict between Linus'
master branch and nand/for-4.17.
2018-04-04 22:13:35 +02:00
Linus Torvalds 2bd99df54f We've only got 9 GFS2 patches for this merge window:
1. Abhi Das contributed a patch to report journal recovery times more
    accurately during journal replay.
 2. Andreas Gruenbacher contributed a patch to fix fallocate chunk size.
 3. Andreas added a patch to correctly dirty inodes during rename.
 4. Andreas added a patch to improve the comment for function gfs2_block_map.
 5. Andreas added a patch to improve kernel trace point iomap end:
    The physical block address was added.
 6. Andreas added a patch to fix a nasty file system corruption bug that
    surfaced in xfstests 476 in punch-hole/truncate.
 7. Andreas fixed a problem Christoph Helwig pointed out, namely, that GFS2
    was misusing the IOMAP_ZERO flag. The zeroing of new blocks was moved
    to the proper fallocate code.
 8. I contributed a patch to declare function gfs2_remove_from_ail as static.
 9. I added a patch to only set PageChecked for jdata page writes.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaw7mxAAoJENeLYdPf93o7kbUH/3MVpJKJe+jdtb5zMGE+byPZ
 T4x+kjZcGwIjWtqIMnga2X1mMIpJ/MCQzZriT+zno2BRRxPS8MQsKp9rcmEGiwoP
 mKZnVGm85xvBDg/JaAnCOeH2S90BlXX3roaB4U4031tAmcZOz/tVbBiTCBXbZ7FP
 F9tnKt/CkRwBBE8CBvstpgYYvbM+kkfiQ8x38FJFx79lOBytIRhPCHpbR3mgko+p
 HrJNTadGJie65nsplKZcQpDQTPrslwP/ynyGh313ReztThHfCB6teL6sNOgJ5u+T
 ThCA95QesH3P45Dd+xl/SuCDd0pZLLslNgOl/q7qiFWaoxFFBKldN3Bnc7ItnDw=
 =iw4e
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.17.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Bob Peterson:
 "We've only got nine GFS2 patches for this merge window:

   - report journal recovery times more accurately during journal replay
     (Abhi Das)

   - fix fallocate chunk size (Andreas Gruenbacher)

   - correctly dirty inodes during rename (Andreas Gruenbacher)

   - improve the comment for function gfs2_block_map (Andreas
     Gruenbacher)

   - improve kernel trace point iomap end: The physical block address
     was added (Andreas Gruenbacher)

   - fix a nasty file system corruption bug that surfaced in xfstests
     476 in punch-hole/truncate (Andreas Gruenbacher)

   - fix a problem Christoph Helwig pointed out, namely, that GFS2 was
     misusing the IOMAP_ZERO flag. The zeroing of new blocks was moved
     to the proper fallocate code (Andreas Gruenbacher)

   - declare function gfs2_remove_from_ail as static (Bob Peterson)

   - only set PageChecked for jdata page writes (Bob Peterson)"

* tag 'gfs2-4.17.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: time journal recovery steps accurately
  gfs2: Zero out fallocated blocks in fallocate_chunk
  gfs2: Check for the end of metadata in punch_hole
  gfs2: gfs2_iomap_end tracepoint: log block address
  gfs2: Improve gfs2_block_map comment
  GFS2: Only set PageChecked for jdata pages
  GFS2: Make function gfs2_remove_from_ail static
  gfs2: Dirty source inode during rename
  gfs2: Fix fallocate chunk size
2018-04-04 13:09:42 -07:00
Linus Torvalds 94514bbe9e for-4.17-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlrDg0UACgkQxWXV+ddt
 WDvtOQ//QCk0zH2EcPQnqOW6HqAfkkDc7D9P51sK1izNM3vBEYtbuPlY6wp3xnJr
 0hbPjGNU7vMC/4SIwSgEdXyAvlpOr1gm9n+w1GcxpkjLa8l4P+2wt9OX0BSzRUMu
 X7LQxqg2zmQibFy4b1MSmDGsO2dxB2eqvVUT/Ir4b56uqkdtValYRWY75APJIZ5l
 6w0Ja3HVvgOX3pVwSmadCpfMEonN4JE+mfHaP8RajAlTGQcUPq8If9w4BtEoWQRl
 QC7kUCCTmp+isnzH7u4EqEQC6XUqEqeuQH+Bli1pNYTvipHY+9EO1dSxHZoCelgk
 M9PpQz8x+N6ZMcMNtJQVifkfN6tAp/acdWBTZtlpqB8nZR4v5bBndS/5TNBMu3/v
 JfhMEsIUz5o2mWz9qUneyK80RoTRkL5SfdgOclx2Yd7K1fKuJsUjmdXaT08BzotS
 5cxTFZu7EcFJyg4eHemjdyRYr3cUkS19P2uIJle9nj3DAMpCvIyL1c1vn5eB/7MN
 3JeRME6AOQcD7sFgVNuYhGVdIBuwHU6kj4mf1WN27YKGdbsaMZsFz7/HH3SsBLOF
 E0p7Q25HcHSrAimcmLTifN+gol8Y5m6dT0Pjuf9y9QbWvHwh7oRq7DVQ3L8hJkGz
 j64b0jlv43P0QorWvsX6/VJBevWedhe1hZtrBhPFSE0CtJ3Ml4Q=
 =wkSk
 -----END PGP SIGNATURE-----

Merge tag 'for-4.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "There are a several user visible changes, the rest is mostly invisible
  and continues to clean up the whole code base.

  User visible changes:
   - new mount option nossd_spread (pair for ssd_spread)

   - mount option subvolid will detect junk after the number and fail
     the mount

   - add message after cancelled device replace

   - direct module dependency on libcrc32, removed own crc wrappers

   - removed user space transaction ioctls

   - use lighter locking when reading /proc/self/mounts, RCU instead of
     mutex to avoid unnecessary contention

  Enhancements:
   - skip writeback of last page when truncating file to same size

   - send: do not issue unnecessary truncate operations

   - mount option token specifiers: use %u for unsigned values, more
     validation

   - selftests: more tree block validations

  qgroups:
   - preparatory work for splitting reservation types for data and
     metadata, this should allow for more accurate tracking and fix some
     issues with underflows or do further enhancements

   - split metadata reservations for started and joined transaction so
     they do not get mixed up and are accounted correctly at commit time

   - with the above, it's possible to revert patch that potentially
     deadlocks when trying to make more space by explicitly committing
     when the quota limit is hit

   - fix root item corruption when multiple same source snapshots are
     created with quota enabled

  RAID56:
   - make sure target is identical to source when raid56 rebuild fails
     after dev-replace

   - faster rebuild during scrub, batch by stripes and not
     block-by-block

   - make more use of cached data when rebuilding from a missing device

  Fixes:
   - null pointer deref when device replace target is missing

   - fix fsync after hole punching when using no-holes feature

   - fix lockdep splat when allocating percpu data with wrong GFP flags

  Cleanups, refactoring, core changes:
   - drop redunant parameters from various functions

   - kill and opencode trivial helpers

   - __cold/__exit function annotations

   - dead code removal

   - continued audit and documentation of memory barriers

   - error handling: handle removal from uuid tree

   - error handling: remove handling of impossible condtitons

   - more debugging or error messages

   - updated tracepoints

   - one VLA use removal (and one still left)"

* tag 'for-4.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (164 commits)
  btrfs: lift errors from add_extent_changeset to the callers
  Btrfs: print error messages when failing to read trees
  btrfs: user proper type for btrfs_mask_flags flags
  btrfs: split dev-replace locking helpers for read and write
  btrfs: remove stale comments about fs_mutex
  btrfs: use RCU in btrfs_show_devname for device list traversal
  btrfs: update barrier in should_cow_block
  btrfs: use lockdep_assert_held for mutexes
  btrfs: use lockdep_assert_held for spinlocks
  btrfs: Validate child tree block's level and first key
  btrfs: tests/qgroup: Fix wrong tree backref level
  Btrfs: fix copy_items() return value when logging an inode
  Btrfs: fix fsync after hole punching when using no-holes feature
  btrfs: use helper to set ulist aux from a qgroup
  Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
  btrfs: qgroup: Update trace events for metadata reservation
  btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space
  btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item
  btrfs: qgroup: Use separate meta reservation type for delalloc
  btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS
  ...
2018-04-04 13:03:38 -07:00
Linus Torvalds 547c43d777 Changes for this release:
- Various cleanups and code fixes
 - Implement lazytime as a mount option
 - Convert various on-disk metadata checks from asserts to -EFSCORRUPTED
 - Fix accounting problems with the rmap per-ag reservations
 - Refactorings and cleanups for xfs_log_force
 - Various bugfixes for the reflink code
 - Work around v5 AGFL padding problems to prevent fs shutdowns
 - Establish inode fork verifiers to inspect on-disk metadata correctness
 - Various online scrub fixes
 - Fix v5 swapext blowing up on deleted inodes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJawZs5AAoJEPh/dxk0SrTrUbAQAKCT0zaYDHViC6p0yxVMTa1z
 7fivnwtKNYc2LiihV6wPp+Hj5YtTGExYncJOLuTsAIuBZ6px+jlV9bpA8X9mWgbN
 e5XXyqz1O8nn/5iBwKRQm2yFdSnsQQfWXNm0XPNTuPGxuzlzxF/rpFN4UlWGdZul
 tigHom5gZD//GYfYHrsOb/7CIRGw90ebpqM3Nt4eAi5o0H5eK46sHKUYtAngSfPm
 FdPHJwmw5Kx+yZW5EdR+ELbLqGsBKsOfsp9SG+un0R+kvj/CKC2ovgwS6tuU+gsi
 MRD8C0zHlz4ikQrmJ0bV+no7T+9bC8fQDIZu0h7dQ1acWb2F1Epr1LRIxNH/1bLi
 qbtchVZkCNXiV0GMQ2iNo1cDJO3AICsQwTuktpoUMU1QOWgQenvzdZCUOQAUqne6
 xwnrCq19UbmNlCdkRWChrVn9Gb7FNYVhe15W/y0qZhzJxWam6yIzKBm91Zc/XLp8
 L5VUc+FVmtSiHXpEVttSwVeMSzhDfG6qOL42dFmw7xwh7JO/vXi0MlxjGe215ApS
 lhBWjEOGB9kbUxMjhqS5KsFn8E1DhL0AMD7N53z7eBTh5Eani81ytf1PzXWhvLbI
 1auY0+7cVggXFltcW6rfAJFC0EEuw6wsx86rl3G+dQ9vmlhy4zaWlt0EJEGmNC90
 Kw4GpFLDmtV93K++lD1C
 =fdIf
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.17-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "Here's the first round of fixes for XFS for 4.17.

  The biggest new features this time around are the addition of lazytime
  support, further enhancement of the on-disk inode metadata verifiers,
  and a patch to smooth over some of the AGFL padding problems that have
  intermittently plagued users since 4.5. I forsee sending a second pull
  request next week with further bug fixes and speedups in the online
  scrub code and elsewhere.

  This series has been run through a full xfstests run over the weekend
  and through a quick xfstests run against this morning's master, with
  no major failures reported.

  Summary of changes for this release:

   - Various cleanups and code fixes

   - Implement lazytime as a mount option

   - Convert various on-disk metadata checks from asserts to -EFSCORRUPTED

   - Fix accounting problems with the rmap per-ag reservations

   - Refactorings and cleanups for xfs_log_force

   - Various bugfixes for the reflink code

   - Work around v5 AGFL padding problems to prevent fs shutdowns

   - Establish inode fork verifiers to inspect on-disk metadata
     correctness

   - Various online scrub fixes

   - Fix v5 swapext blowing up on deleted inodes"

* tag 'xfs-4.17-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (49 commits)
  xfs: do not log/recover swapext extent owner changes for deleted inodes
  xfs: clean up xfs_mount allocation and dynamic initializers
  xfs: remove dead inode version setting code
  xfs: catch inode allocation state mismatch corruption
  xfs: xfs_scrub_iallocbt_xref_rmap_inodes should use xref_set_corrupt
  xfs: flag inode corruption if parent ptr doesn't get us a real inode
  xfs: don't accept inode buffers with suspicious unlinked chains
  xfs: move inode extent size hint validation to libxfs
  xfs: record inode buf errors as a xref error in inobt scrubber
  xfs: remove xfs_buf parameter from inode scrub methods
  xfs: inode scrubber shouldn't bother with raw checks
  xfs: bmap scrubber should do rmap xref with bmap for sparse files
  xfs: refactor inode buffer verifier error logging
  xfs: refactor inode verifier error logging
  xfs: refactor bmap record validation
  xfs: sanity-check the unused space before trying to use it
  xfs: detect agfl count corruption and reset agfl
  xfs: unwind the try_again loop in xfs_log_force
  xfs: refactor xfs_log_force_lsn
  xfs: minor cleanup for xfs_reflink_end_cow
  ...
2018-04-04 12:44:02 -07:00
Linus Torvalds 2e08edc5c5 Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs dcache updates from Al Viro:
 "Part of this is what the trylock loop elimination series has turned
  into, part making d_move() preserve the parent (and thus the path) of
  victim, plus some general cleanups"

* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (22 commits)
  d_genocide: move export to definition
  fold dentry_lock_for_move() into its sole caller and clean it up
  make non-exchanging __d_move() copy ->d_parent rather than swap them
  oprofilefs: don't oops on allocation failure
  lustre: get rid of pointless casts to struct dentry *
  debugfs_lookup(): switch to lookup_one_len_unlocked()
  fold lookup_real() into __lookup_hash()
  take out orphan externs (empty_string/slash_string)
  split d_path() and friends into a separate file
  dcache.c: trim includes
  fs/dcache: Avoid a try_lock loop in shrink_dentry_list()
  get rid of trylock loop around dentry_kill()
  handle move to LRU in retain_dentry()
  dput(): consolidate the "do we need to retain it?" into an inlined helper
  split the slow part of lock_parent() off
  now lock_parent() can't run into killed dentry
  get rid of trylock loop in locking dentries on shrink list
  d_delete(): get rid of trylock loop
  fs/dcache: Move dentry_kill() below lock_parent()
  fs/dcache: Remove stale comment from dentry_kill()
  ...
2018-04-04 12:05:25 -07:00
David Howells 402cb8dda9 fscache: Attach the index key and aux data to the cookie
Attach copies of the index key and auxiliary data to the fscache cookie so
that:

 (1) The callbacks to the netfs for this stuff can be eliminated.  This
     can simplify things in the cache as the information is still
     available, even after the cache has relinquished the cookie.

 (2) Simplifies the locking requirements of accessing the information as we
     don't have to worry about the netfs object going away on us.

 (3) The cache can do lazy updating of the coherency information on disk.
     As long as the cache is flushed before reboot/poweroff, there's no
     need to update the coherency info on disk every time it changes.

 (4) Cookies can be hashed or put in a tree as the index key is easily
     available.  This allows:

     (a) Checks for duplicate cookies can be made at the top fscache layer
     	 rather than down in the bowels of the cache backend.

     (b) Caching can be added to a netfs object that has a cookie if the
     	 cache is brought online after the netfs object is allocated.

A certain amount of space is made in the cookie for inline copies of the
data, but if it won't fit there, extra memory will be allocated for it.

The downside of this is that live cache operation requires more memory.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anna Schumaker <anna.schumaker@netapp.com>
Tested-by: Steve Dickson <steved@redhat.com>
2018-04-04 13:41:28 +01:00
David Howells 08c2e3d087 fscache: Add more tracepoints
Add more tracepoints to fscache, including:

 (*) fscache_page - Tracks netfs pages known to fscache.

 (*) fscache_check_page - Tracks the netfs querying whether a page is
     pending storage.

 (*) fscache_wake_cookie - Tracks cookies being woken up after a page
     completes/aborts storage in the cache.

 (*) fscache_op - Tracks operations being initialised.

 (*) fscache_wrote_page - Tracks return of the backend write_page op.

 (*) fscache_gang_lookup - Tracks lookup of pages to be stored in the write
     operation.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04 13:41:27 +01:00
David Howells a18feb5576 fscache: Add tracepoints
Add some tracepoints to fscache:

 (*) fscache_cookie - Tracks a cookie's usage count.

 (*) fscache_netfs - Logs registration of a network filesystem, including
     the pointer to the cookie allocated.

 (*) fscache_acquire - Logs cookie acquisition.

 (*) fscache_relinquish - Logs cookie relinquishment.

 (*) fscache_enable - Logs enablement of a cookie.

 (*) fscache_disable - Logs disablement of a cookie.

 (*) fscache_osm - Tracks execution of states in the object state machine.

and cachefiles:

 (*) cachefiles_ref - Tracks a cachefiles object's usage count.

 (*) cachefiles_lookup - Logs result of lookup_one_len().

 (*) cachefiles_mkdir - Logs result of vfs_mkdir().

 (*) cachefiles_create - Logs result of vfs_create().

 (*) cachefiles_unlink - Logs calls to vfs_unlink().

 (*) cachefiles_rename - Logs calls to vfs_rename().

 (*) cachefiles_mark_active - Logs an object becoming active.

 (*) cachefiles_wait_active - Logs a wait for an old object to be
     destroyed.

 (*) cachefiles_mark_inactive - Logs an object becoming inactive.

 (*) cachefiles_mark_buried - Logs the burial of an object.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04 13:41:27 +01:00
David Howells 2c98425720 fscache: Fix hanging wait on page discarded by writeback
If the fscache asynchronous write operation elects to discard a page that's
pending storage to the cache because the page would be over the store limit
then it needs to wake the page as someone may be waiting on completion of
the write.

The problem is that the store limit may be updated by a different
asynchronous operation - and so may miss the write - and that the store
limit may not even get updated until later by the netfs.

Fix the kernel hang by making fscache_write_op() mark as written any pages
that are over the limit.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04 13:41:26 +01:00
David Howells d0fb31ecda fscache: Detect multiple relinquishment of a cookie
Report if an fscache cookie is relinquished multiple times by the netfs.

Signed-off-by: David <dhowells@redhat.com>
2018-04-04 13:41:26 +01:00
David Howells b27ddd4624 fscache: Pass the correct cancelled indications to fscache_op_complete()
The last parameter to fscache_op_complete() is a bool indicating whether or
not the operation was cancelled.  A lot of the time the inverse value is
given or no differentiation is made.  Fix this.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04 13:41:26 +01:00
David Howells bfa3837ec3 fscache, cachefiles: Fix checker warnings
Fix a couple of checker warnings in fscache and cachefiles:

 (1) fscache_n_op_requeue is never used, so get rid of it.

 (2) cachefiles_uncache_page() is passed in a lock that it releases, so
     this needs annotating.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04 13:41:26 +01:00
David Howells 678edd09c2 afs: Be more aggressive in retiring cached vnodes
When relinquishing cookies, either due to iget failure or to inode
eviction, retire a cookie if we think the corresponding vnode got deleted
on the server rather than just letting it lie in the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04 13:41:26 +01:00
David Howells 27a3ee3a04 afs: Use the vnode ID uniquifier in the cache key not the aux data
AFS vnodes (files) are referenced by a triplet of { volume ID, vnode ID,
uniquifier }.  Currently, kafs is only using the vnode ID as the file key
in the volume fscache index and checking the uniquifier on cookie
acquisition against the contents of the auxiliary data stored in the cache.

Unfortunately, this is subject to a race in which an FS.RemoveFile or
FS.RemoveDir op is issued against the server but the local afs inode isn't
torn down and disposed off before another thread issues something like
FS.CreateFile.  The latter then gets given the vnode ID that just got
removed, but with a new uniquifier and a cookie collision occurs in the
cache because the cookie is only keyed on the vnode ID whereas the inode is
keyed on the vnode ID plus the uniquifier.

Fix this by keying the cookie on the uniquifier in addition to the vnode ID
and dropping the uniquifier from the auxiliary data supplied.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04 13:41:25 +01:00
David Howells c1515999bd afs: Invalidate cache on server data change
Invalidate any data stored in fscache for a vnode that changes on the
server so that we don't end up with the cache in a bad state locally.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-04 13:41:25 +01:00
Martin Brandenburg 209469d978 orangefs: remove unused code
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:28 -04:00
Martin Brandenburg bdd6f08358 orangefs: make several *_operations structs static
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:27 -04:00
Martin Brandenburg a5135eeab2 orangefs: implement vm_ops->fault
Must retrieve size before running filemap_fault so the kernel has
an up-to-date size.

This should have been caught by xfstests generic/246, but it was masked
by orangefs_new_inode, which set i_size to PAGE_SIZE.  When nothing
caused a getattr prior to a pagefault, i_size was still PAGE_SIZE.
Since xfstests only read 10 bytes, it did not catch this bug.

When orangefs_new_inode was modified to perform a getattr instead,
i_size was set to zero, as it was a newly created file.  Then
orangefs_file_write_iter did NOT set i_size.  Instead it invalidated the
attribute cache, which should have caused the next caller to retrieve
i_size.  But the fault handler did not know it was supposed to retrieve
i_size.  So during xfstests, i_size was still zero, and filemap_fault
returned VM_FAULT_SIGBUS.

Fixes xfstests generic/452.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:27 -04:00
Jaegeuk Kim 214c2461a8 f2fs: remain written times to update inode during fsync
This fixes xfstests/generic/392.

The failure was caused by different times between 1) one marked in the last
fsync(2) call and 2) the other given by roll-forward recovery after power-cut.
The reason was that we skipped updating inode block at 1), since its i_size
was recoverable along with 4KB-aligned data writes, which was fixed by:
  "f2fs: fix a wrong condition in f2fs_skip_inode_update"

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-03 18:52:47 -07:00
Linus Torvalds d92cd810e6 Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "rcu_work addition and a couple trivial changes"

* 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: remove the comment about the old manager_arb mutex
  workqueue: fix the comments of nr_idle
  fs/aio: Use rcu_work instead of explicit rcu and work item
  cgroup: Use rcu_work instead of explicit rcu and work item
  RCU, workqueue: Implement rcu_work
2018-04-03 18:00:13 -07:00
Linus Torvalds 5bb053bef8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Support offloading wireless authentication to userspace via
    NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari.

 2) A lot of work on network namespace setup/teardown from Kirill Tkhai.
    Setup and cleanup of namespaces now all run asynchronously and thus
    performance is significantly increased.

 3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon
    Streiff.

 4) Support zerocopy on RDS sockets, from Sowmini Varadhan.

 5) Use denser instruction encoding in x86 eBPF JIT, from Daniel
    Borkmann.

 6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime
    Chevallier.

 7) Support grafting of child qdiscs in mlxsw driver, from Nogah
    Frankel.

 8) Add packet forwarding tests to selftests, from Ido Schimmel.

 9) Deal with sub-optimal GSO packets better in BBR congestion control,
    from Eric Dumazet.

10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern.

11) Add path MTU tests to selftests, from Stefano Brivio.

12) Various bits of IPSEC offloading support for mlx5, from Aviad
    Yehezkel, Yossi Kuperman, and Saeed Mahameed.

13) Support RSS spreading on ntuple filters in SFC driver, from Edward
    Cree.

14) Lots of sockmap work from John Fastabend. Applications can use eBPF
    to filter sendmsg and sendpage operations.

15) In-kernel receive TLS support, from Dave Watson.

16) Add XDP support to ixgbevf, this is significant because it should
    allow optimized XDP usage in various cloud environments. From Tony
    Nguyen.

17) Add new Intel E800 series "ice" ethernet driver, from Anirudh
    Venkataramanan et al.

18) IP fragmentation match offload support in nfp driver, from Pieter
    Jansen van Vuuren.

19) Support XDP redirect in i40e driver, from Björn Töpel.

20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of
    tracepoints in their raw form, from Alexei Starovoitov.

21) Lots of striding RQ improvements to mlx5 driver with many
    performance improvements, from Tariq Toukan.

22) Use rhashtable for inet frag reassembly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits)
  net: mvneta: improve suspend/resume
  net: mvneta: split rxq/txq init and txq deinit into SW and HW parts
  ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh
  net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
  net: bgmac: Correctly annotate register space
  route: check sysctl_fib_multipath_use_neigh earlier than hash
  fix typo in command value in drivers/net/phy/mdio-bitbang.
  sky2: Increase D3 delay to sky2 stops working after suspend
  net/mlx5e: Set EQE based as default TX interrupt moderation mode
  ibmvnic: Disable irqs before exiting reset from closed state
  net: sched: do not emit messages while holding spinlock
  vlan: also check phy_driver ts_info for vlan's real device
  Bluetooth: Mark expected switch fall-throughs
  Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME
  Bluetooth: btrsi: remove unused including <linux/version.h>
  Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4
  sh_eth: kill useless check in __sh_eth_get_regs()
  sh_eth: add sh_eth_cpu_data::no_xdfar flag
  ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()
  ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()
  ...
2018-04-03 14:04:18 -07:00
J. Bruce Fields 880a3a5325 nfsd: fix incorrect umasks
We're neglecting to clear the umask after it's set, which can cause a
later unrelated rpc to (incorrectly) use the same umask if it happens to
be processed by the same thread.

There's a more subtle problem here too:

An NFSv4 compound request is decoded all in one pass before any
operations are executed.

Currently we're setting current->fs->umask at the time we decode the
compound.  In theory a single compound could contain multiple creates
each setting a umask.  In that case we'd end up using whichever umask
was passed in the *last* operation as the umask for all the creates,
whether that was correct or not.

So, we should just be saving the umask at decode time and waiting to set
it until we actually process the corresponding operation.

In practice it's unlikely any client would do multiple creates in a
single compound.  And even if it did they'd likely be from the same
process (hence carry the same umask).  So this is a little academic, but
we should get it right anyway.

Fixes: 47057abde5 (nfsd: add support for the umask attribute)
Cc: stable@vger.kernel.org
Reported-by: Lucash Stach <l.stach@pengutronix.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 16:27:08 -04:00
Chuck Lever 38a7031559 NFSD: Clean up legacy NFS SYMLINK argument XDR decoders
Move common code in NFSD's legacy SYMLINK decoders into a helper.
The immediate benefits include:

 - one fewer data copies on transports that support DDP
 - consistent error checking across all versions
 - reduction of code duplication
 - support for both legal forms of SYMLINK requests on RDMA
   transports for all versions of NFS (in particular, NFSv2, for
   completeness)

In the long term, this helper is an appropriate spot to perform a
per-transport call-out to fill the pathname argument using, say,
RDMA Reads.

Filling the pathname in the proc function also means that eventually
the incoming filehandle can be interpreted so that filesystem-
specific memory can be allocated as a sink for the pathname
argument, rather than using anonymous pages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:16 -04:00
Chuck Lever 8154ef2776 NFSD: Clean up legacy NFS WRITE argument XDR decoders
Move common code in NFSD's legacy NFS WRITE decoders into a helper.
The immediate benefit is reduction of code duplication and some nice
micro-optimizations (see below).

In the long term, this helper can perform a per-transport call-out
to fill the rq_vec (say, using RDMA Reads).

The legacy WRITE decoders and procs are changed to work like NFSv4,
which constructs the rq_vec just before it is about to call
vfs_writev.

Why? Calling a transport call-out from the proc instead of the XDR
decoder means that the incoming FH can be resolved to a particular
filesystem and file. This would allow pages from the backing file to
be presented to the transport to be filled, rather than presenting
anonymous pages and copying or flipping them into the file's page
cache later.

I also prefer using the pages in rq_arg.pages, instead of pulling
the data pages directly out of the rqstp::rq_pages array. This is
currently the way the NFSv3 write decoder works, but the other two
do not seem to take this approach. Fixing this removes the only
reference to rq_pages found in NFSD, eliminating an NFSD assumption
about how transports use the pages in rq_pages.

Lastly, avoid setting up the first element of rq_vec as a zero-
length buffer. This happens with an RDMA transport when a normal
Read chunk is present because the data payload is in rq_arg's
page list (none of it is in the head buffer).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:16 -04:00
Chuck Lever fff4080b2f nfsd: Trace NFSv4 COMPOUND execution
This helps record the identity and timing of the ops in each NFSv4
COMPOUND, replacing dprintk calls that did much the same thing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:15 -04:00
Chuck Lever 87c5942e8f nfsd: Add I/O trace points in the NFSv4 read proc
NFSv4 read compound processing invokes nfsd_splice_read and
nfs_readv directly, so the trace points currently in nfsd_read are
not invoked for NFSv4 reads.

Move the NFSD READ trace points to common helpers so that NFSv4
reads are captured.

Also, record any local I/O error that occurs, the total count of
bytes that were actually returned, and whether splice or vectored
read was used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:15 -04:00
Chuck Lever d890be159a nfsd: Add I/O trace points in the NFSv4 write path
NFSv4 write compound processing invokes nfsd_vfs_write directly. The
trace points currently in nfsd_write are not effective for NFSv4
writes.

Move the trace points into the shared nfsd_vfs_write() helper.

After the I/O, we also want to record any local I/O error that
might have occurred, and the total count of bytes that were actually
moved (rather than the requested number).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:15 -04:00
Chuck Lever f394b62b7b nfsd: Add "nfsd_" to trace point names
Follow naming convention used in client and in sunrpc layers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:14 -04:00
Chuck Lever 79e0b4e247 nfsd: Record request byte count, not count of vectors
Byte count is more helpful to know than vector count.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:14 -04:00
Chuck Lever afa720a091 nfsd: Fix NFSD trace points
nfsd-1915  [003] 77915.780959: write_opened:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780960: write_io_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780964: write_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1

Byte swapping and knfsd_fh_hash() are not available in "trace-cmd
report", where the print format string is actually used. These
data transformations have to be done during the TP_fast_assign step.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:13 -04:00
Stefan Agner 47299f79ea nfsd: use correct enum type in decode_cb_op_status
Use enum nfs_cb_opnum4 in decode_cb_op_status. This fixes warnings
seen with clang:
  fs/nfsd/nfs4callback.c:451:36: warning: implicit conversion from
      enumeration type 'enum nfs_cb_opnum4' to different enumeration
      type 'enum nfs_opnum4' [-Wenum-conversion]
        status = decode_cb_op_status(xdr, OP_CB_SEQUENCE, &cb->cb_seq_status);
                 ~~~~~~~~~~~~~~~~~~~      ^~~~~~~~~~~~~~

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:08 -04:00
Fengguang Wu 51d87bc2bf nfsd: fix boolreturn.cocci warnings
fs/nfsd/nfs4state.c:926:8-9: WARNING: return of 0/1 in function 'nfs4_delegation_exists' with return type bool
fs/nfsd/nfs4state.c:2955:9-10: WARNING: return of 0/1 in function 'nfsd4_compound_in_session' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: 68b18f5294 ("nfsd: make nfs4_get_existing_delegation less confusing")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[bfields: also fix -EAGAIN]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-04-03 15:08:08 -04:00
Dan Williams d2c997c0f1 fs, dax: use page->mapping to warn if truncate collides with a busy page
Catch cases where extent unmap operations encounter pages that are
pinned / busy. Typically this is pinned pages that are under active dma.
This warning is a canary for potential data corruption as truncated
blocks could be allocated to a new file while the device is still
performing i/o.

Here is an example of a collision that this implementation catches:

 WARNING: CPU: 2 PID: 1286 at fs/dax.c:343 dax_disassociate_entry+0x55/0x80
 [..]
 Call Trace:
  __dax_invalidate_mapping_entry+0x6c/0xf0
  dax_delete_mapping_entry+0xf/0x20
  truncate_exceptional_pvec_entries.part.12+0x1af/0x200
  truncate_inode_pages_range+0x268/0x970
  ? tlb_gather_mmu+0x10/0x20
  ? up_write+0x1c/0x40
  ? unmap_mapping_range+0x73/0x140
  xfs_free_file_space+0x1b6/0x5b0 [xfs]
  ? xfs_file_fallocate+0x7f/0x320 [xfs]
  ? down_write_nested+0x40/0x70
  ? xfs_ilock+0x21d/0x2f0 [xfs]
  xfs_file_fallocate+0x162/0x320 [xfs]
  ? rcu_read_lock_sched_held+0x3f/0x70
  ? rcu_sync_lockdep_assert+0x2a/0x50
  ? __sb_start_write+0xd0/0x1b0
  ? vfs_fallocate+0x20c/0x270
  vfs_fallocate+0x154/0x270
  SyS_fallocate+0x43/0x80
  entry_SYSCALL_64_fastpath+0x1f/0x96

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-03 05:41:19 -07:00
Dan Williams fb094c9074 ext2, dax: introduce ext2_dax_aops
In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Otherwise, direct-I/O
triggers incorrect page cache assumptions and warnings.

Reviewed-by: Jan Kara <jack@suse.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-03 05:41:05 -07:00
Linus Torvalds 642e7fd233 Merge branch 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull removal of in-kernel calls to syscalls from Dominik Brodowski:
 "System calls are interaction points between userspace and the kernel.
  Therefore, system call functions such as sys_xyzzy() or
  compat_sys_xyzzy() should only be called from userspace via the
  syscall table, but not from elsewhere in the kernel.

  At least on 64-bit x86, it will likely be a hard requirement from
  v4.17 onwards to not call system call functions in the kernel: It is
  better to use use a different calling convention for system calls
  there, where struct pt_regs is decoded on-the-fly in a syscall wrapper
  which then hands processing over to the actual syscall function. This
  means that only those parameters which are actually needed for a
  specific syscall are passed on during syscall entry, instead of
  filling in six CPU registers with random user space content all the
  time (which may cause serious trouble down the call chain). Those
  x86-specific patches will be pushed through the x86 tree in the near
  future.

  Moreover, rules on how data may be accessed may differ between kernel
  data and user data. This is another reason why calling sys_xyzzy() is
  generally a bad idea, and -- at most -- acceptable in arch-specific
  code.

  This patchset removes all in-kernel calls to syscall functions in the
  kernel with the exception of arch/. On top of this, it cleans up the
  three places where many syscalls are referenced or prototyped, namely
  kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h"

* 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits)
  bpf: whitelist all syscalls for error injection
  kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
  kernel/sys_ni: sort cond_syscall() entries
  syscalls/x86: auto-create compat_sys_*() prototypes
  syscalls: sort syscall prototypes in include/linux/compat.h
  net: remove compat_sys_*() prototypes from net/compat.h
  syscalls: sort syscall prototypes in include/linux/syscalls.h
  kexec: move sys_kexec_load() prototype to syscalls.h
  x86/sigreturn: use SYSCALL_DEFINE0
  x86: fix sys_sigreturn() return type to be long, not unsigned long
  x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
  mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
  mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
  mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
  fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
  fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
  fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
  fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
  kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
  kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
  ...
2018-04-02 21:22:12 -07:00
Linus Torvalds f5a8eb632b arch: remove obsolete architecture ports
This removes the entire architecture code for blackfin, cris, frv, m32r,
 metag, mn10300, score, and tile, including the associated device drivers.
 
 I have been working with the (former) maintainers for each one to ensure
 that my interpretation was right and the code is definitely unused in
 mainline kernels. Many had fond memories of working on the respective
 ports to start with and getting them included in upstream, but also saw
 no point in keeping the port alive without any users.
 
 In the end, it seems that while the eight architectures are extremely
 different, they all suffered the same fate: There was one company
 in charge of an SoC line, a CPU microarchitecture and a software
 ecosystem, which was more costly than licensing newer off-the-shelf
 CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems
 that all the SoC product lines are still around, but have not used the
 custom CPU architectures for several years at this point. In contrast,
 CPU instruction sets that remain popular and have actively maintained
 kernel ports tend to all be used across multiple licensees.
 
 The removal came out of a discussion that is now documented at
 https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
 marking any ports as deprecated but remove them all at once after I made
 sure that they are all unused. Some architectures (notably tile, mn10300,
 and blackfin) are still being shipped in products with old kernels,
 but those products will never be updated to newer kernel releases.
 
 After this series, we still have a few architectures without mainline
 gcc support:
 
 - unicore32 and hexagon both have very outdated gcc releases, but the
   maintainers promised to work on providing something newer. At least
   in case of hexagon, this will only be llvm, not gcc.
 
 - openrisc, risc-v and nds32 are still in the process of finishing their
   support or getting it added to mainline gcc in the first place.
   They all have patched gcc-7.3 ports that work to some degree, but
   complete upstream support won't happen before gcc-8.1. Csky posted
   their first kernel patch set last week, their situation will be similar.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJawdL2AAoJEGCrR//JCVInuH0P/RJAZh1nTD+TR34ZhJq2TBoo
 PgygwDU7Z2+tQVU+EZ453Gywz9/NMRFk1RWAZqrLix4ZtyIMvC6A1qfT2yH1Y7Fb
 Qh6tccQeLe4ezq5u4S/46R/fQXu3Txr92yVwzJJUuPyU0arF9rv5MmI8e6p7L1en
 yb74kSEaCe+/eMlsEj1Cc1dgthDNXGKIURHkRsILoweysCpesjiTg4qDcL+yTibV
 FP2wjVbniKESMKS6qL71tiT5sexvLsLwMNcGiHPj94qCIQuI7DLhLdBVsL5Su6gI
 sbtgv0dsq4auRYAbQdMaH1hFvu6WptsuttIbOMnz2Yegi2z28H8uVXkbk2WVLbqG
 ZESUwutGh8MzOL2RJ4jyyQq5sfo++CRGlfKjr6ImZRv03dv0pe/W85062cK5cKNs
 cgDDJjGRorOXW7dyU6jG2gRqODOQBObIv3w5efdq5OgzOWlbI4EC+Y5u1Z0JF/76
 pSwtGXA6YhwC+9LLAlnVTHG+yOwuLmAICgoKcTbzTVDKA2YQZG/cYuQfI5S1wD8e
 X6urPx3Md2GCwLXQ9mzKBzKZUpu/Tuhx0NvwF4qVxy6x1PELjn68zuP7abDHr46r
 57/09ooVN+iXXnEGMtQVS/OPvYHSa2NgTSZz6Y86lCRbZmUOOlK31RDNlMvYNA+s
 3iIVHovno/JuJnTOE8LY
 =fQ8z
 -----END PGP SIGNATURE-----

Merge tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pul removal of obsolete architecture ports from Arnd Bergmann:
 "This removes the entire architecture code for blackfin, cris, frv,
  m32r, metag, mn10300, score, and tile, including the associated device
  drivers.

  I have been working with the (former) maintainers for each one to
  ensure that my interpretation was right and the code is definitely
  unused in mainline kernels. Many had fond memories of working on the
  respective ports to start with and getting them included in upstream,
  but also saw no point in keeping the port alive without any users.

  In the end, it seems that while the eight architectures are extremely
  different, they all suffered the same fate: There was one company in
  charge of an SoC line, a CPU microarchitecture and a software
  ecosystem, which was more costly than licensing newer off-the-shelf
  CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
  seems that all the SoC product lines are still around, but have not
  used the custom CPU architectures for several years at this point. In
  contrast, CPU instruction sets that remain popular and have actively
  maintained kernel ports tend to all be used across multiple licensees.

  [ See the new nds32 port merged in the previous commit for the next
    generation of "one company in charge of an SoC line, a CPU
    microarchitecture and a software ecosystem"   - Linus ]

  The removal came out of a discussion that is now documented at
  https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
  marking any ports as deprecated but remove them all at once after I
  made sure that they are all unused. Some architectures (notably tile,
  mn10300, and blackfin) are still being shipped in products with old
  kernels, but those products will never be updated to newer kernel
  releases.

  After this series, we still have a few architectures without mainline
  gcc support:

   - unicore32 and hexagon both have very outdated gcc releases, but the
     maintainers promised to work on providing something newer. At least
     in case of hexagon, this will only be llvm, not gcc.

   - openrisc, risc-v and nds32 are still in the process of finishing
     their support or getting it added to mainline gcc in the first
     place. They all have patched gcc-7.3 ports that work to some
     degree, but complete upstream support won't happen before gcc-8.1.
     Csky posted their first kernel patch set last week, their situation
     will be similar

  [ Palmer Dabbelt points out that RISC-V support is in mainline gcc
    since gcc-7, although gcc-7.3.0 is the recommended minimum  - Linus ]"

This really says it all:

 2498 files changed, 95 insertions(+), 467668 deletions(-)

* tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
  MAINTAINERS: UNICORE32: Change email account
  staging: iio: remove iio-trig-bfin-timer driver
  tty: hvc: remove tile driver
  tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
  serial: remove tile uart driver
  serial: remove m32r_sio driver
  serial: remove blackfin drivers
  serial: remove cris/etrax uart drivers
  usb: Remove Blackfin references in USB support
  usb: isp1362: remove blackfin arch glue
  usb: musb: remove blackfin port
  usb: host: remove tilegx platform glue
  pwm: remove pwm-bfin driver
  i2c: remove bfin-twi driver
  spi: remove blackfin related host drivers
  watchdog: remove bfin_wdt driver
  can: remove bfin_can driver
  mmc: remove bfin_sdh driver
  input: misc: remove blackfin rotary driver
  input: keyboard: remove bf54x driver
  ...
2018-04-02 20:20:12 -07:00
Yunlong Song c79d152094 f2fs: make assignment of t->dentry_bitmap more readable
In make_dentry_ptr_block, it is confused with "&" for t->dentry_bitmap
but without "&" for t->dentry, so delete "&" to make code more readable.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-02 20:09:35 -07:00
Jaegeuk Kim dc7a10ddee f2fs: truncate preallocated blocks in error case
If write is failed, we must deallocate the blocks that we couldn't write.

Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-02 20:09:34 -07:00
Dave Chinner 0612d11663 xfs: fix intent use-after-free on abort
When an intent is aborted during it's initial commit through
xfs_defer_trans_abort(), there is a use after free. The current
report is for a RUI  through this path in generic/388:

 Freed by task 6274:
  __kasan_slab_free+0x136/0x180
  kmem_cache_free+0xe7/0x4b0
  xfs_trans_free_items+0x198/0x2e0
  __xfs_trans_commit+0x27f/0xcc0
  xfs_trans_roll+0x17b/0x2a0
  xfs_defer_trans_roll+0x6ad/0xe60
  xfs_defer_finish+0x2a6/0x2140
  xfs_alloc_file_space+0x53a/0xf90
  xfs_file_fallocate+0x5c6/0xac0
  vfs_fallocate+0x2f5/0x930
  ioctl_preallocate+0x1dc/0x320
  do_vfs_ioctl+0xfe4/0x1690

The problem is that the RUI has two active references - one in the
current transaction, and another held by the defer_ops structure
that is passed to the RUD (intent done) so that both the intent and
the intent done structures are freed on commit of the intent done.

Hence during abort, we need to release the intent item, because the
defer_ops reference is released separately via ->abort_intent
callback. Fix all the intent code to do this correctly.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-02 20:08:27 -07:00
Linus Torvalds ce6eba3dba Merge branch 'sched-wait-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull wait_var_event updates from Ingo Molnar:
 "This introduces the new wait_var_event() API, which is a more flexible
  waiting primitive than wait_on_atomic_t().

  All wait_on_atomic_t() users are migrated over to the new API and
  wait_on_atomic_t() is removed. The migration fixes one bug and should
  result in no functional changes for the other usecases"

* 'sched-wait-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/wait: Improve __var_waitqueue() code generation
  sched/wait: Remove the wait_on_atomic_t() API
  sched/wait, arch/mips: Fix and convert wait_on_atomic_t() usage to the new wait_var_event() API
  sched/wait, fs/ocfs2: Convert wait_on_atomic_t() usage to the new wait_var_event() API
  sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
  sched/wait, fs/fscache: Convert wait_on_atomic_t() usage to the new wait_var_event() API
  sched/wait, fs/btrfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
  sched/wait, fs/afs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
  sched/wait, drivers/media: Convert wait_on_atomic_t() usage to the new wait_var_event() API
  sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API
  sched/wait: Introduce wait_var_event()
2018-04-02 16:50:39 -07:00
Chandan Rajendra c959025eda xfs: Remove "committed" argument of xfs_dir_ialloc
xfs_dir_ialloc() rolls the current transaction when allocation of a new
inode required the space manager to perform an allocation and replinish
the Inode btree.

None of the callers of xfs_dir_ialloc() need to know if the
transaction was committed. Hence this commit removes the "committed"
argument of xfs_dir_ialloc.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-02 15:47:43 -07:00
Junling Zheng 235831d7dd f2fs: fix a wrong condition in f2fs_skip_inode_update
Fix commit 97dd26ad83 (f2fs: fix wrong AUTO_RECOVER condition).
We should use ~PAGE_MASK to determine whether i_size is aligned to
the f2fs's block size or not.

Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-02 13:21:51 -07:00
Dominik Brodowski edf292c76b fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
Using the ksys_fallocate() wrapper allows us to get rid of in-kernel
calls to the sys_fallocate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_fallocate().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:09 +02:00
Dominik Brodowski 36028d5dd7 fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
Using the ksys_p{read,write}64() wrappers allows us to get rid of
in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls.
The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_p{read,write}64().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:09 +02:00
Dominik Brodowski df260e21e6 fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
Using the ksys_truncate() wrapper allows us to get rid of in-kernel
calls to the sys_truncate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_truncate().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:08 +02:00
Dominik Brodowski 806cbae122 fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
Using this helper allows us to avoid the in-kernel calls to the
sys_sync_file_range() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it uses
the same calling convention as sys_sync_file_range().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:07 +02:00
Dominik Brodowski 70f68ee81e fs: add ksys_sync() helper; remove in-kernel calls to sys_sync()
Using this helper allows us to avoid the in-kernel calls to the
sys_sync() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_sync().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:05 +02:00
Dominik Brodowski 3ce4a7bf66 fs: add ksys_read() helper; remove in-kernel calls to sys_read()
Using this helper allows us to avoid the in-kernel calls to the
sys_read() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_read().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:04 +02:00
Dominik Brodowski 76847e4344 fs: add ksys_lseek() helper; remove in-kernel calls to sys_lseek()
Using this helper allows us to avoid the in-kernel calls to the
sys_lseek() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_lseek().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:03 +02:00
Dominik Brodowski cbb60b924b fs: add ksys_ioctl() helper; remove in-kernel calls to sys_ioctl()
Using this helper allows us to avoid the in-kernel calls to the
sys_ioctl() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_ioctl().

After careful review, at least some of these calls could be converted
to do_vfs_ioctl() in future.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:03 +02:00
Dominik Brodowski 454dab3f96 fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64()
Using this helper allows us to avoid the in-kernel calls to the
sys_getdents64() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_getdents64().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:02 +02:00
Dominik Brodowski bae217ea8c fs: add ksys_open() wrapper; remove in-kernel calls to sys_open()
Using this wrapper allows us to avoid the in-kernel calls to the
sys_open() syscall. The ksys_ prefix denotes that this function is meant
as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_open().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:01 +02:00
Dominik Brodowski 2ca2a09d62 fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()
Using the ksys_close() wrapper allows us to get rid of in-kernel calls
to the sys_close() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_close(), with one subtle
difference:

The few places which checked the return value did not care about the return
value re-writing in sys_close(), so simply use a wrapper around
__close_fd().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:00 +02:00
Dominik Brodowski 411d9475cf fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate()
Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel
calls to the sys_ftruncate() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_ftruncate().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:16:00 +02:00
Dominik Brodowski 55731b3cda fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers
Using the fs-interal do_fchownat() wrapper allows us to get rid of
fs-internal calls to the sys_fchownat() syscall.

Introducing the ksys_fchown() helper and the ksys_{,}chown() wrappers
allows us to avoid the in-kernel calls to the sys_{,l,f}chown() syscalls.
The ksys_ prefix denotes that these functions are meant as a drop-in
replacement for the syscalls. In particular, they use the same calling
convention as sys_{,l,f}chown().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:59 +02:00
Dominik Brodowski cbfe20f565 fs: add do_faccessat() helper and ksys_access() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_faccessat() helper allows us to get rid of
fs-internal calls to the sys_faccessat() syscall.

Introducing the ksys_access() wrapper allows us to avoid the in-kernel
calls to the sys_access() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In
particular, it uses the same calling convention as sys_access().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:58 +02:00
Dominik Brodowski 03450e271a fs: add ksys_fchmod() and do_fchmodat() helpers and ksys_chmod() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_fchmodat() helper allows us to get rid of
fs-internal calls to the sys_fchmodat() syscall.

Introducing the ksys_fchmod() helper and the ksys_chmod() wrapper allows
us to avoid the in-kernel calls to the sys_fchmod() and sys_chmod()
syscalls. The ksys_ prefix denotes that these functions are meant as a
drop-in replacement for the syscalls. In particular, they use the same
calling convention as sys_fchmod() and sys_chmod().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:57 +02:00
Dominik Brodowski 46ea89eb65 fs: add do_linkat() helper and ksys_link() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_linkat() helper allows us to get rid of
fs-internal calls to the sys_linkat() syscall.

Introducing the ksys_link() wrapper allows us to avoid the in-kernel
calls to sys_link() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it uses
the same calling convention as sys_link().

In the near future, the only fs-external user of ksys_link() should be
converted to use vfs_link() instead.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:57 +02:00
Dominik Brodowski 87c4e19262 fs: add do_mknodat() helper and ksys_mknod() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_mknodat() helper allows us to get rid of
fs-internal calls to the sys_mknodat() syscall.

Introducing the ksys_mknod() wrapper allows us to avoid the in-kernel
calls to sys_mknod() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it uses
the same calling convention as sys_mknod().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:56 +02:00
Dominik Brodowski b724e846b4 fs: add do_symlinkat() helper and ksys_symlink() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_symlinkat() helper allows us to get rid of
fs-internal calls to the sys_symlinkat() syscall.

Introducing the ksys_symlink() wrapper allows us to avoid the in-kernel
calls to the sys_symlink() syscall. The ksys_ prefix denotes that this
function is meant as a drop-in replacement for the syscall. In particular,
it uses the same calling convention as sys_symlink().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:55 +02:00
Dominik Brodowski 0101db7a30 fs: add do_mkdirat() helper and ksys_mkdir() wrapper; remove in-kernel calls to syscall
Using the fs-internal do_mkdirat() helper allows us to get rid of
fs-internal calls to the sys_mkdirat() syscall.

Introducing the ksys_mkdir() wrapper allows us to avoid the in-kernel calls
to the sys_mkdir() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_mkdir().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:54 +02:00
Dominik Brodowski f459dffae1 fs: add ksys_rmdir() wrapper; remove in-kernel calls to sys_rmdir()
Using this wrapper allows us to avoid the in-kernel calls to the
sys_rmdir() syscall. The ksys_ prefix denotes that this function is meant
as a drop-in replacement for the syscall. In particular, it uses the same
calling convention as sys_rmdir().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:54 +02:00
Dominik Brodowski 6380161ce9 hostfs: rename do_rmdir() to hostfs_do_rmdir()
do_rmdir() is used in the VFS layer at fs/namei.c, so use a different
name in hostfs.

Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:53 +02:00
Dominik Brodowski 447016e968 fs: add ksys_chdir() helper; remove in-kernel calls to sys_chdir()
Using this helper allows us to avoid the in-kernel calls to the sys_chdir()
syscall. The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_chdir().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:51 +02:00
Dominik Brodowski e7a3e8b2ed fs: add ksys_write() helper; remove in-kernel calls to sys_write()
Using this helper allows us to avoid the in-kernel calls to the sys_write()
syscall. The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_write().

In the near future, the do_mounts / initramfs callers of ksys_write()
should be converted to use filp_open() and vfs_write() instead.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:51 +02:00
Dominik Brodowski a16fe33ab5 fs: add ksys_chroot() helper; remove-in kernel calls to sys_chroot()
Using this helper allows us to avoid the in-kernel calls to the
sys_chroot() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_chroot().

In the near future, the fs-external callers of ksys_chroot() should be
converted to use kern_path()/set_fs_root() directly. Then ksys_chroot()
can be moved within sys_chroot() again.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:50 +02:00
Dominik Brodowski c7248321a3 fs: add ksys_dup{,3}() helper; remove in-kernel calls to sys_dup{,3}()
Using ksys_dup() and ksys_dup3() as helper functions allows us to
avoid the in-kernel calls to the sys_dup() and sys_dup3() syscalls.
The ksys_ prefix denotes that these functions are meant as a drop-in
replacement for the syscalls. In particular, they use the same
calling convention as sys_dup{,3}().

In the near future, the fs-external callers of ksys_dup{,3}() should be
converted to call do_dup2() directly. Then, ksys_dup{,3}() can be moved
within sys_dup{,3}() again.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:49 +02:00
Dominik Brodowski 3a18ef5c1b fs: add ksys_umount() helper; remove in-kernel call to sys_umount()
Using this helper allows us to avoid the in-kernel call to the sys_umount()
syscall. The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as ksys_umount().

In the near future, the only fs-external caller of ksys_umount() should be
converted to call do_umount() directly. Then, ksys_umount() can be moved
within sys_umount() again.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:48 +02:00
Dominik Brodowski 312db1aa1d fs: add ksys_mount() helper; remove in-kernel calls to sys_mount()
Using this helper allows us to avoid the in-kernel calls to the sys_mount()
syscall. The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_mount().

In the near future, all callers of ksys_mount() should be converted to call
do_mount() directly.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:48 +02:00
Dominik Brodowski ab0d1e85bf fs/quota: use COMPAT_SYSCALL_DEFINE for sys32_quotactl()
While sys32_quotactl() is only needed on x86, it can use the recommended
COMPAT_SYSCALL_DEFINEx() machinery for its setup.

Acked-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:47 +02:00
Dominik Brodowski cb0b476ab1 fs/quota: add kernel_quotactl() helper; remove in-kernel call to syscall
Using the fs-internal kernel_quotactl() helper allows us to get rid of
the fs-internal call to the sys_quotactl() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:46 +02:00
Dominik Brodowski 183caa3c86 fanotify: add do_fanotify_mark() helper; remove in-kernel call to syscall
Using the fs-internal do_fanotify_mark() helper allows us to get rid of
the fs-internal call to the sys_fanotify_mark() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Acked-by: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:45 +02:00
Dominik Brodowski d0d89d1ed3 inotify: add do_inotify_init() helper; remove in-kernel call to syscall
Using the inotify-internal do_inotify_init() helper allows us to get rid
of the in-kernel call to sys_inotify_init1() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Acked-by: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:45 +02:00
Dominik Brodowski ab641afa73 fs: add do_compat_futimesat() helper; remove in-kernel call to compat syscall
Using the fs-internal do_compat_futimesat() helper allows us to get rid of
the fs-internal call to the compat_sys_futimesat() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:44 +02:00
Dominik Brodowski 570484bfe8 fs: add do_compat_signalfd4() helper; remove in-kernel call to compat syscall
Using the fs-internal do_compat_signalfd4() helper allows us to get rid of
the fs-internal call to the compat_sys_signalfd4() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:43 +02:00
Dominik Brodowski 05585e4495 fs: add do_compat_select() helper; remove in-kernel call to compat syscall
Using the fs-internal do_compat_select() helper allows us to get rid of
the fs-internal call to the compat_sys_select() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:42 +02:00
Dominik Brodowski e02af2ff65 fs: add do_compat_fcntl64() helper; remove in-kernel call to compat syscall
Using the fs-internal do_compat_fcntl64() helper allows us to get rid of
the fs-internal call to the compat_sys_fcntl64() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:42 +02:00
Dominik Brodowski 4bdb9acabf fs: add kern_select() helper; remove in-kernel call to sys_select()
Using this helper allows us to avoid the in-kernel call to the sys_umount()
syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:41 +02:00
Dominik Brodowski 30cfe4ef8b fs: add do_vmsplice() helper; remove in-kernel call to syscall
Using the fs-internal do_vmsplice() helper allows us to get rid of the
fs-internal call to the sys_vmsplice() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:40 +02:00
Dominik Brodowski 98e5f7bd2c fs: add do_lookup_dcookie() helper; remove in-kernel call to syscall
Using the fs-internal do_lookup_dcookie() helper allows us to get rid of
fs-internal calls to the sys_lookup_dcookie() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:39 +02:00
Dominik Brodowski 2fc96f8331 fs: add do_eventfd() helper; remove internal call to sys_eventfd()
Using this helper removes an in-kernel call to the sys_eventfd() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:39 +02:00
Dominik Brodowski 52fb6db0fd fs: add do_signalfd4() helper; remove internal calls to sys_signalfd4()
Using this helper removes in-kernel calls to the sys_signalfd4() syscall
function.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:38 +02:00
Dominik Brodowski 791eb22eef fs: add do_epoll_*() helpers; remove internal calls to sys_epoll_*()
Using the helper functions do_epoll_create() and do_epoll_wait() allows us
to remove in-kernel calls to the related syscall functions.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:37 +02:00
Dominik Brodowski f13903587c fs: add do_futimesat() helper; remove internal call to sys_futimesat()
Using this helper removes the in-kernel call to the sys_futimesat()
syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:37 +02:00
Dominik Brodowski ee81feb64e fs: add do_renameat2() helper; remove internal call to sys_renameat2()
Using this helper removes in-kernel calls to the sys_renameat2() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:36 +02:00
Dominik Brodowski 0a216dd1cf fs: add do_pipe2() helper; remove internal call to sys_pipe2()
Using this helper removes an in-kernel call to the sys_pipe2() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:35 +02:00
Dominik Brodowski 2dae024806 fs: add do_readlinkat() helper; remove internal call to sys_readlinkat()
Using the do_readlinkat() helper removes an in-kernel call to the
sys_readlinkat() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:15:34 +02:00
Steve French 07108d0e7c cifs: Add minor debug message during negprot
Check for unknown security mode flags during negotiate protocol
if debugging enabled.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-02 13:11:15 -05:00
Steve French 7ea884c77e smb3: Fix root directory when server returns inode number of zero
Some servers return inode number zero for the root directory, which
causes ls to display incorrect data (missing "." and "..").

If the server returns zero for the inode number of the root directory,
fake an inode number for it.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2018-04-02 13:11:03 -05:00
Steve French 6c4ba31133 cifs: fix sparse warning on previous patch in a few printks
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-02 13:10:40 -05:00
Ronnie Sahlberg 93012bf984 cifs: add server->vals->header_preamble_size
This variable is set to 4 for all protocol versions and replaces
the hardcoded constant 4 throughought the code.
This will later be updated to reflect whether a response packet
has a 4 byte length preamble or not once we start removing this
field from the SMB2+ dialects.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-02 13:09:44 -05:00
Martin Brandenburg dbcb5e7fc4 orangefs: open code short single-use functions
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-02 08:10:17 -04:00
Colin Ian King 81e3d0253f orangefs: replace vmalloc and memset with vzalloc
Use vzalloc instead of the vmalloc, memset combo

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-02 08:10:17 -04:00
David Reynolds c2676ef801 orangefs: bug fix for a race condition when getting a slot
When a slot becomes free, call wake_up_locked regardless of the number
of slots available.

Without this patch, wake_up_locked is only called when going from no
free slots to one. This means that there is a chance a waiting task
will not be woken up. In many cases, the system will bounce between 0
and 1 free slots, and the waiting tasks will be woken up. But if there
is still a waiting task and another slot becomes available before the
number of free slots reaches zero, that waiting task may never be woken
up since the number of free slots may never reach zero again.

The bug behavior is easy to reproduce with the following script,
where /mnt/orangefs is an OrangeFS file system.

for i in {1..100}; do
	for j in {1..20}; do
		dd if=/dev/zero of=/mnt/orangefs/tmp$j bs=32768 count=32 &
	done
	wait
done

Signed-off-by: David Reynolds <david@omnibond.com>
Reviewed-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-02 08:10:17 -04:00
Luis Henriques 9122eed528 ceph: quota: report root dir quota usage in statfs
This commit changes statfs default behaviour when reporting usage
statistics.  Instead of using the overall filesystem usage, statfs now
reports the quota for the filesystem root, if ceph.quota.max_bytes has
been set for this inode.  If quota hasn't been set, it falls back to the
old statfs behaviour.

A new mount option is also added ('noquotadf') to disable this behaviour.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:53 +02:00
Luis Henriques d557c48db7 ceph: quota: add counter for snaprealms with quota
By keeping a counter with the number of snaprealms that have quota set
allows to optimize the functions that need to walk throught the realms
hierarchy looking for quotas.  Thus, if this counter is zero it's safe to
assume that there are no realms with quota.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:53 +02:00
Luis Henriques e3161f17d9 ceph: quota: cache inode pointer in ceph_snap_realm
Keep a pointer to the inode in struct ceph_snap_realm.  This allows to
optimize functions that walk the realms hierarchy (e.g. in quotas).

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:53 +02:00
Yan, Zheng 0eb6bbe4d9 ceph: fix root quota realm check
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:52 +02:00
Yan, Zheng 2596366907 ceph: don't check quota for snap inode
snap inode's i_snap_realm is not pointing to ceph_snap_realm.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:52 +02:00
Luis Henriques 1ab302a0cb ceph: quota: update MDS when max_bytes is approaching
When we're reaching the ceph.quota.max_bytes limit, i.e., when writing
more than 1/16th of the space left in a quota realm, update the MDS with
the new file size.

This mirrors the fuse-client approach with commit 122c50315ed1 ("client:
Inform mds file size when approaching quota limit"), in the ceph git tree.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:52 +02:00
Luis Henriques 2b83845f8b ceph: quota: support for ceph.quota.max_bytes
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:52 +02:00
Luis Henriques cafe21a4fb ceph: quota: don't allow cross-quota renames
This patch changes ceph_rename so that -EXDEV is returned if an attempt is
made to mv a file between two different dir trees with different quotas
setup.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:52 +02:00
Luis Henriques b7a2921765 ceph: quota: support for ceph.quota.max_files
This patch adds support for the max_files quota.  It hooks into all the
ceph functions that add new filesystem objects that need to be checked
against the quota limits.  When these limits are hit, -EDQUOT is returned.

Note that we're not checking quotas on ceph_link().  ceph_link doesn't
really create a new inode,  and since the MDS doesn't update the directory
statistics when a new (hard) link is created (only with symlinks), they
are not accounted as a new file.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:51 +02:00
Luis Henriques fb18a57568 ceph: quota: add initial infrastructure to support cephfs quotas
This patch adds the infrastructure required to support cephfs quotas as it
is currently implemented in the ceph fuse client.  Cephfs quotas can be
set on any directory, and can restrict the number of bytes or the number
of files stored beneath that point in the directory hierarchy.

Quotas are set using the extended attributes 'ceph.quota.max_files' and
'ceph.quota.max_bytes', and can be removed by setting these attributes to
'0'.

Link: http://tracker.ceph.com/issues/22372
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:51 +02:00
Yan, Zheng 7aac453a03 ceph: rename function drop_leases() to a more descriptive name
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:49 +02:00
Chengguang Xu 50c55aeca2 ceph: fix invalid point dereference for error case in mdsc destroy
1. set fsc->mdsc after successfully allocate all necessary memory
in mdsc init.
2. if fsc->mdsc is NULL, just skip destroy operation in mdsc destroy.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:49 +02:00
Chengguang Xu 98cfda8104 ceph: return proper bool type to caller instead of pointer
Change to return true/false only for bool type return code.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:49 +02:00
Chengguang Xu bb48bd4dc4 ceph: optimize memory usage
In current code, regular file and directory use same struct
ceph_file_info to store fs specific data so the struct has to
include some fields which are only used for directory
(e.g., readdir related info), when having plenty of regular files,
it will lead to memory waste.

This patch introduces dedicated ceph_dir_file_info cache for
readdir related thins. So that regular file does not include those
unused fields anymore.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:49 +02:00
Chengguang Xu 47474d0b01 ceph: optimize mds session register
Do memory allocation first, so that avoid unnecessary
initialization of newly allocated session in error case.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:48 +02:00
Chengguang Xu 57a35dfb52 libceph, ceph: add __init attribution to init funcitons
Add __init attribution to the functions which are called only once
during initiating/registering operations and deleting unnecessary
symbol exports.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:48 +02:00
Chengguang Xu 51b10f3fe4 ceph: filter out used flags when printing unused open flags
Filter out used access mode flags when printing unused open flags.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:48 +02:00
Yan, Zheng 1582af2eaa ceph: don't wait on writeback when there is no more dirty pages
In sync mode, writepages() needs to write all dirty pages. But
it can only write dirty pages associated with the oldest snapc.
To write dirty pages associated with next snapc, it needs to wait
until current writes complete.

If there is no more dirty pages, writepages() should not wait on
writeback. Otherwise, dirty page writeback becomes very slow.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:48 +02:00
Yan, Zheng af9cc401ce ceph: invalidate pages that beyond EOF in ceph_writepages_start()
Dirty pages can be associated with different capsnap. Different capsnap
may have different EOF value. So invalidating dirty pages according to
the largest EOF value is wrong. Dirty pages beyond EOF, but associated
with other capsnap, do not get invalidated.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:47 +02:00
Chengguang Xu bc4b5ad3a6 ceph: mark the cap cache as unreclaimable
Releasing cap is affected by many factors (e.g., avail_count/reserve_count/min_count)
and min_count could be specified high volume in client mount option. Hence it's better
to mark cap cache as unreclaimable in case of non-trivial discrepancies between memory
shown as reclaimable and what is actually reclaimed.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:47 +02:00
Chengguang Xu 73737682e0 ceph: change variable name to follow common rule
Variable name ci is mostly used for ceph_inode_info.
Variable name fi is mostly used for ceph_file_info.
Variable name cf is mostly used for ceph_cap_flush.

Change variable name to follow above common rules
in case of confusing.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:47 +02:00
Chengguang Xu 79cd674aed ceph: optimizing cap reservation
When caps_avail_count is in a low level, most newly
trimmed caps will probably go into ->caps_list and
caps_avail_count will be increased. Hence after trimming,
should recheck caps_avail_count to effectly reuse
newly trimmed caps. Also, when releasing unnecessary
caps follow the same rule of ceph_put_cap.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:47 +02:00
Chengguang Xu b517c1d87f ceph: release unreserved caps if having enough available caps
When unreserving caps check if there is too mamy available caps
in the ->caps_list, if so release unreserved caps.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:47 +02:00
Chengguang Xu e327ce0685 ceph: optimizing cap allocation
When setting high volume of caps_min_count or having many
unreserved caps, unused caps may always keep in the ->caps_list
even can't get new cap from kmem_cache_alloc because lack of
maximum limitation of caps_avail_count. Hence reuse caps in
->caps_list if available, it's maybe better than setting max
limitation of caps_avail_count and releasing unused caps when
reaching the limit.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:46 +02:00
Chengguang Xu b884014a91 ceph: adding protection for showing cap reservation info
Adding spinlock protection during getting cap reservation
ralated fields so that the numbers match below BUG_ON condition
in the code.

BUG_ON(mdsc->caps_total_count != mdsc->caps_use_count +
				 mdsc->caps_reserve_count +
				 mdsc->caps_avail_count);

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:46 +02:00
Chengguang Xu 4d8969af28 ceph: use seq_show_option for string type options
Using seq_show_option to replace seq_printf for string type options.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:45 +02:00
Chengguang Xu 11e1478df9 libceph, ceph: change permission for readonly debugfs entries
Remove write permission for debugfs entries which only have readonly
function.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:45 +02:00
Chengguang Xu 7ae7a828d9 ceph: keep consistent semantic in fscache related option combination
When specifying multiple fscache related options, the result isn't always
the same as option order, this fix will keep strict consistent meaning
by order.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:45 +02:00
Chengguang Xu 4c069a5821 ceph: add newline to end of debug message format
Some of dout format do not include newline in the end,
fix for the files which are in fs/ceph and net/ceph directories,
and changing printk to dout for printing debug info in super.c

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:44 +02:00
Ilya Dryomov 08c1ac508b libceph, ceph: move ceph_calc_file_object_mapping() to striper.c
ceph_calc_file_object_mapping() has nothing to do with osdmaps.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:43 +02:00
Ilya Dryomov dccbf08005 libceph, ceph: change ceph_calc_file_object_mapping() signature
- make it void
- xlen (object extent length) out parameter should be u32 because only
  a single stripe unit is mapped at a time

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02 10:12:38 +02:00
Theodore Ts'o e40ff21389 ext4: force revalidation of directory pointer after seekdir(2)
A malicious user could force the directory pointer to be in an invalid
spot by using seekdir(2).  Use the mechanism we already have to notice
if the directory has changed since the last time we called
ext4_readdir() to force a revalidation of the pointer.

Reported-by: syzbot+1236ce66f79263e8a862@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-04-01 23:21:03 -04:00
Long Li 21a4e14aae cifs: smbd: disconnect transport on RDMA errors
On RDMA errors, transport should disconnect the RDMA CM connection. This
will notify the upper layer, and it will attempt transport reconnect.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Long Li 48f238a79f cifs: smbd: avoid reconnect lockup
During transport reconnect, other processes may have registered memory
and blocked on transport. This creates a deadlock situation because the
transport resources can't be freed, and reconnect is blocked.

Fix this by returning to upper layer on timeout. Before returning,
transport status is set to reconnecting so other processes will release
memory registration resources.

Upper layer will retry the reconnect. This is not in fast I/O path so
setting the timeout to 5 seconds.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Steve French 2a18287b54 Don't log confusing message on reconnect by default
Change the following message (which can occur on reconnect) from
a warning to an FYI message.  It is confusing to users.

   [58360.523634] CIFS VFS: Free previous auth_key.response = 00000000a91cdc84

By default this message won't show up on reconnect unless the user bumps
up the log level to include FYI messages.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-01 20:24:40 -05:00
Steve French 2564f2ff83 Don't log expected error on DFS referral request
STATUS_FS_DRIVER_REQUIRED is expected when DFS is not turned
on on the server.  Do not log it on DFS referral response.
It clutters the dmesg log unnecessarily at mount time.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com
Reviewed-by: Ronnie sahlberg <lsahlber@redhat.com>
2018-04-01 20:24:40 -05:00
Phillip Potter 31cd106bb1 fs: cifs: Replace _free_xid call in cifs_root_iget function
Modify end of cifs_root_iget function in fs/cifs/inode.c to call
free_xid(xid) instead of _free_xid(xid), thereby allowing debug
notification of this action when enabled.

Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01 20:24:40 -05:00
Steve French d68f353fc9 SMB3.1.1 dialect is no longer experimental
SMB3.1.1 is a very important dialect, with much improved security.
We can remove the ExPERIMENTAL comments about it. It is widely
supported by servers.

Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Steve French 6188f28bf6 Tree connect for SMB3.1.1 must be signed for non-encrypted shares
SMB3.1.1 tree connect was only being signed when signing was mandatory
but needs to always be signed (for non-guest users).

See MS-SMB2 section 3.2.4.1.1

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Ronnie Sahlberg 262916bc69 fix smb3-encryption breakage when CONFIG_DEBUG_SG=y
We can not use the standard sg_set_buf() fucntion since when
CONFIG_DEBUG_SG=y this adds a check that will BUG_ON for cifs.ko
when we pass it an object from the stack.

Create a new wrapper smb2_sg_set_buf() which avoids doing that particular check
and use it for smb3 encryption instead.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Gustavo A. R. Silva 70e80655f5 CIFS: fix sha512 check in cifs_crypto_secmech_release
It seems this is a copy-paste error and that the proper variable to use
in this particular case is _sha512_ instead of _md5_.

Addresses-Coverity-ID: 1465358 ("Copy-paste error")
Fixes: 1c6614d229e7 ("CIFS: add sha512 secmech")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01 20:24:40 -05:00
Aurelien Aptel 8bd68c6e47 CIFS: implement v3.11 preauth integrity
SMB3.11 clients must implement pre-authentification integrity.

* new mechanism to certify requests/responses happening before Tree
  Connect.
* supersedes VALIDATE_NEGOTIATE
* fixes signing for SMB3.11

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-01 20:24:40 -05:00
Aurelien Aptel 5fcd7f3f96 CIFS: add sha512 secmech
* prepare for SMB3.11 pre-auth integrity
* enable sha512 when SMB311 is enabled in Kconfig
* add sha512 as a soft dependency

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-01 20:24:39 -05:00
Aurelien Aptel 82fb82be05 CIFS: refactor crypto shash/sdesc allocation&free
shash and sdesc and always allocated and freed together.
* abstract this in new functions cifs_alloc_hash() and cifs_free_hash().
* make smb2/3 crypto allocation independent from each other.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:39 -05:00
Ronnie Sahlberg b7a73c84eb cifs: fix memory leak in SMB2_open()
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:39 -05:00
Colin Ian King ac65cb6203 CIFS: SMBD: fix spelling mistake: "faield" and "legnth"
Trivial fix to spelling mistake in log_rdma_send and log_rdma_mr
message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-04-01 20:24:39 -05:00
David S. Miller c0b458a946 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c,
we had some overlapping changes:

1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE -->
   MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE

2) In 'net-next' params->log_rq_size is renamed to be
   params->log_rq_mtu_frames.

3) In 'net-next' params->hard_mtu is added.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01 19:49:34 -04:00
Theodore Ts'o 54dd0e0a1b ext4: add extra checks to ext4_xattr_block_get()
Add explicit checks in ext4_xattr_block_get() just in case the
e_value_offs and e_value_size fields in the the xattr block are
corrupted in memory after the buffer_verified bit is set on the xattr
block.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-03-30 20:04:11 -04:00
David Sterba 57599c7e77 btrfs: lift errors from add_extent_changeset to the callers
The missing error handling in add_extent_changeset was hidden, so make
it at least visible in the callers.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:03:25 +02:00
Liu Bo f50f435390 Btrfs: print error messages when failing to read trees
When mount fails to read trees like fs tree, checksum tree, extent
tree, etc, there is not enough information about where went wrong.

With this, messages like

"BTRFS warning (device sdf): failed to read root (objectid=7): -5"

would help us a bit.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:02:14 +02:00
David Sterba 38e82de8cc btrfs: user proper type for btrfs_mask_flags flags
All users pass a local unsigned int and not the __uXX types that are
supposed to be used for userspace interfaces.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:07 +02:00
David Sterba 7e79cb86be btrfs: split dev-replace locking helpers for read and write
The current calls are unclear in what way btrfs_dev_replace_lock takes
the locks, so drop the argument, split the helpers and use similar
naming as for read and write locks.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:07 +02:00
David Sterba e7ab0af6c3 btrfs: remove stale comments about fs_mutex
The fs_mutex has been killed in 2008, a213501153 ("Btrfs: Replace
the big fs_mutex with a collection of other locks"), still remembered in
some comments.

We don't have any extra needs for locking in the ACL handlers.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:07 +02:00
David Sterba 88c14590cd btrfs: use RCU in btrfs_show_devname for device list traversal
The show_devname callback is used to print device name in
/proc/self/mounts, we need to traverse the device list consistently and
read the name that's copied to a seq buffer so we don't need further
locking.

If the first device is being deleted at the same time, the RCU will
allow us to read the device name, though it will become stale right
after the RCU protection ends. This is unavoidable and the user can
expect that the device will disappear from the filesystem's list at some
point.

The device_list_mutex was pretty heavy as it is used eg. for writing
superblock and a few other IO related contexts. This can stall any
application that reads the proc file for no reason.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
David Sterba d1980131ca btrfs: update barrier in should_cow_block
Once there was a simple int force_cow that was used with the plain
barriers, and then converted to a bit, so we should use the appropriate
barrier helper.

Other variables in the complex if condition do not depend on a barrier,
so we should be fine in case the atomic barrier becomes a no-op.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
David Sterba a32bf9a302 btrfs: use lockdep_assert_held for mutexes
Using lockdep_assert_held is preferred, replace mutex_is_locked.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
David Sterba a4666e688f btrfs: use lockdep_assert_held for spinlocks
Using lockdep_assert_held is preferred, replace assert_spin_locked.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
Qu Wenruo 581c176041 btrfs: Validate child tree block's level and first key
We have several reports about node pointer points to incorrect child
tree blocks, which could have even wrong owner and level but still with
valid generation and checksum.

Although btrfs check could handle it and print error message like:
leaf parent key incorrect 60670574592

Kernel doesn't have enough check on this type of corruption correctly.
At least add such check to read_tree_block() and btrfs_read_buffer(),
where we need two new parameters @level and @first_key to verify the
child tree block.

The new @level check is mandatory and all call sites are already
modified to extract expected level from its call chain.

While @first_key is optional, the following call sites are skipping such
check:
1) Root node/leaf
   As ROOT_ITEM doesn't contain the first key, skip @first_key check.
2) Direct backref
   Only parent bytenr and level is known and we need to resolve the key
   all by ourselves, skip @first_key check.

Another note of this verification is, it needs extra info from nodeptr
or ROOT_ITEM, so it can't fit into current tree-checker framework, which
is limited to node/leaf boundary.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
Qu Wenruo 3c0efdf03b btrfs: tests/qgroup: Fix wrong tree backref level
The extent tree of the test fs is like the following:

 BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free space 3919
  item 0 key (4096 168 4096) itemoff 3944 itemsize 51
          extent refs 1 gen 1 flags 2
          tree block key (68719476736 0 0) level 1
                                           ^^^^^^^
          ref#0: tree block backref root 5

And it's using an empty tree for fs tree, so there is no way that its
level can be 1.

For REAL (created by mkfs) fs tree backref with no skinny metadata, the
result should look like:

 item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
         refs 1 gen 4 flags TREE_BLOCK
         tree block key (256 INODE_ITEM 0) level 0
                                           ^^^^^^^
         tree block backref root 5

Fix the level to 0, so it won't break later tree level checker.

Fixes: faa2dbf004 ("Btrfs: add sanity tests for new qgroup accounting code")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Filipe Manana 8434ec46c6 Btrfs: fix copy_items() return value when logging an inode
When logging an inode, at tree-log.c:copy_items(), if we call
btrfs_next_leaf() at the loop which checks for the need to log holes, we
need to make sure copy_items() returns the value 1 to its caller and
not 0 (on success). This is because the path the caller passed was
released and is now different from what is was before, and the caller
expects a return value of 0 to mean both success and that the path
has not changed, while a return value of 1 means both success and
signals the caller that it can not reuse the path, it has to perform
another tree search.

Even though this is a case that should not be triggered on normal
circumstances or very rare at least, its consequences can be very
unpredictable (especially when replaying a log tree).

Fixes: 16e7549f04 ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Filipe Manana 4ee3fad34a Btrfs: fix fsync after hole punching when using no-holes feature
When we have the no-holes mode enabled and fsync a file after punching a
hole in it, we can end up not logging the whole hole range in the log tree.
This happens if the file has extent items that span more than one leaf and
we punch a hole that covers a range that starts in a leaf but does not go
beyond the offset of the first extent in the next leaf.

Example:

  $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb
  $ mount /dev/sdb /mnt
  $ for ((i = 0; i <= 831; i++)); do
	offset=$((i * 2 * 256 * 1024))
	xfs_io -f -c "pwrite -S 0xab -b 256K $offset 256K" \
		/mnt/foobar >/dev/null
    done
  $ sync

  # We now have 2 leafs in our filesystem fs tree, the first leaf has an
  # item corresponding the extent at file offset 216530944 and the second
  # leaf has a first item corresponding to the extent at offset 217055232.
  # Now we punch a hole that partially covers the range of the extent at
  # offset 216530944 but does go beyond the offset 217055232.

  $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar
  $ xfs_io -c "fsync" /mnt/foobar

  <power fail>

  # mount to replay the log
  $ mount /dev/sdb /mnt

  # Before this patch, only the subrange [216658016, 216662016[ (length of
  # 4000 bytes) was logged, leaving an incorrect file layout after log
  # replay.

Fix this by checking if there is a hole between the last extent item that
we processed and the first extent item in the next leaf, and if there is
one, log an explicit hole extent item.

Fixes: 16e7549f04 ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
David Sterba a1840b5023 btrfs: use helper to set ulist aux from a qgroup
We have a nice helper to do proper casting of a qgroup to a ulist aux
value. And several places that could make use of it.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Qu Wenruo 0b78877a2a Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
This reverts commit 48a89bc4f2.

The idea to commit transaction and free some space after hitting qgroup
limit is good, although the problem is it can easily cause deadlocks.

One deadlock example is caused by trying to flush data while still
holding it:

Call Trace:
 __schedule+0x49d/0x10f0
 schedule+0xc6/0x290
 schedule_timeout+0x187/0x1c0
 wait_for_completion+0x204/0x3a0
 btrfs_wait_ordered_extents+0xa40/0xaf0 [btrfs]
 qgroup_reserve+0x913/0xa10 [btrfs]
 btrfs_qgroup_reserve_data+0x3ef/0x580 [btrfs]
 btrfs_check_data_free_space+0x96/0xd0 [btrfs]
 __btrfs_buffered_write+0x3ac/0xd40 [btrfs]
 btrfs_file_write_iter+0x62a/0xba0 [btrfs]
 __vfs_write+0x320/0x430
 vfs_write+0x107/0x270
 SyS_write+0xbf/0x150
 do_syscall_64+0x1b0/0x3d0
 entry_SYSCALL64_slow_path+0x25/0x25

Another can be caused by trying to commit one transaction while nesting
with trans handle held by ourselves:

btrfs_start_transaction()
|- btrfs_qgroup_reserve_meta_pertrans()
   |- qgroup_reserve()
      |- btrfs_join_transaction()
      |- btrfs_commit_transaction()

The retry is causing more problems than exppected when limit is enabled.
At least a graceful EDQUOT is way better than deadlock.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Qu Wenruo 4ee0d8832c btrfs: qgroup: Update trace events for metadata reservation
Now trace_qgroup_meta_reserve() will have extra type parameter.

And introduce two new trace events:

1) trace_qgroup_meta_free_all_pertrans()
   For btrfs_qgroup_free_meta_all_pertrans()

2) trace_qgroup_meta_convert()
   For btrfs_qgroup_convert_reserved_meta()

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Qu Wenruo 8287475a20 btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space
For quota disabled->enable case, it's possible that at reservation time
quota was not enabled so no bytes were really reserved, while at release
time, quota was enabled so we will try to release some bytes we didn't
really own.

Such situation can cause metadata reserveation underflow, for both types,
also less possible for per-trans type since quota enable will commit
transaction.

To address this, record qgroup meta reserved bytes into
root::qgroup_meta_rsv_pertrans and ::prealloc.
So at releasing time we won't free any bytes we didn't reserve.

For DATA, it's already handled by io_tree, so nothing needs to be done
there.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:04 +02:00
Qu Wenruo 4f5427ccce btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item
Quite similar for delalloc, some modification to delayed-inode and
delayed-item reservation.  Also needs extra parameter for release case
to distinguish normal release and error release.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:03 +02:00
Theodore Ts'o 9496005d6c ext4: add bounds checking to ext4_xattr_find_entry()
Add some paranoia checks to make sure we don't stray beyond the end of
the valid memory region containing ext4 xattr entries while we are
scanning for a match.

Also rename the function to xattr_find_entry() since it is static and
thus only used in fs/ext4/xattr.c

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-03-30 20:00:56 -04:00
Qu Wenruo 43b18595d6 btrfs: qgroup: Use separate meta reservation type for delalloc
Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
preallocated meta rsv, making it quite easy to underflow qgroup meta
reservation.

Since we have the new qgroup meta rsv types, apply it to delalloc
reservation.

Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
rsv type.

And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
they will convert corresponding META_PREALLOC reservation to
META_PERTRANS.

This is mainly due to the fact that current qgroup numbers will only be
updated in btrfs_commit_transaction(), that's to say if we don't keep
such placeholder reservation, we can exceed qgroup limitation.

And for callers freeing outstanding extent in error handler, we will
just free META_PREALLOC bytes.

This behavior makes callers of btrfs_qgroup_release_meta() or
btrfs_qgroup_convert_meta() to be aware of which type they are.
So in this patch, btrfs_delalloc_release_metadata() and its callers get
an extra parameter to info qgroup to do correct meta convert/release.

The good news is, even we use the wrong type (convert or free), it won't
cause obvious bug, as prealloc type is always in good shape, and the
type only affects how per-trans meta is increased or not.

So the worst case will be at most metadata limitation can be sometimes
exceeded (no convert at all) or metadata limitation is reached too soon
(no free at all).

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:14 +02:00
Qu Wenruo 64cfaef636 btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS
For meta_prealloc reservation users, after btrfs_join_transaction()
caller will modify tree so part (or even all) meta_prealloc reservation
should be converted to meta_pertrans until transaction commit time.

This patch introduces a new function,
btrfs_qgroup_convert_reserved_meta() to do this for META_PREALLOC
reservation user.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:14 +02:00
Qu Wenruo e1211d0e89 btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup
Since qgroup has seperate metadata reservation types now, we can
completely get rid of the old root->qgroup_meta_rsv, which mostly acts
as current META_PERTRANS reservation type.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:14 +02:00
Qu Wenruo 733e03a0b2 btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans
Btrfs uses 2 different methods to reseve metadata qgroup space.

1) Reserve at btrfs_start_transaction() time
   This is quite straightforward, caller will use the trans handler
   allocated to modify b-trees.

   In this case, reserved metadata should be kept until qgroup numbers
   are updated.

2) Reserve by using block_rsv first, and later btrfs_join_transaction()
   This is more complicated, caller will reserve space using block_rsv
   first, and then later call btrfs_join_transaction() to get a trans
   handle.

   In this case, before we modify trees, the reserved space can be
   modified on demand, and after btrfs_join_transaction(), such reserved
   space should also be kept until qgroup numbers are updated.

Since these two types behave differently, split the original "META"
reservation type into 2 sub-types:

  META_PERTRANS:
    For above case 1)

  META_PREALLOC:
    For reservations that happened before btrfs_join_transaction() of
    case 2)

NOTE: This patch will only convert existing qgroup meta reservation
callers according to its situation, not ensuring all callers are at
correct timing.
Such fix will be added in later patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:14 +02:00
Qu Wenruo 5c40507ffb btrfs: qgroup: Cleanup the remaining old reservation counters
So qgroup is switched to new separate types reservation system.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:13 +02:00
Qu Wenruo 64ee4e751a btrfs: qgroup: Update trace events to use new separate rsv types
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:13 +02:00
Qu Wenruo 429d6275d5 btrfs: qgroup: Fix wrong qgroup reservation update for relationship modification
When modifying qgroup relationship, for qgroup which only owns exclusive
extents, we will go through quick update path.

In this path, we will add/subtract exclusive and reference number for
parent qgroup, since the source (child) qgroup only has exclusive
extents, destination (parent) qgroup will also own or lose those extents
exclusively.

The same should be the same for reservation, since later reservation
adding/releasing will also affect parent qgroup, without the reservation
carried from child, parent will underflow reservation or have dead
reservation which will never be freed.

However original code doesn't do the same thing for reservation.
It handles qgroup reservation quite differently:

It removes qgroup reservation, as it's allocating space from the
reserved qgroup for relationship adding.
But does nothing for qgroup reservation if we're removing a qgroup
relationship.

According to the original code, it looks just like because we're adding
qgroup->rfer, the code assumes we're writing new data, so it's follows
the normal write routine, by reducing qgroup->reserved and adding
qgroup->rfer/excl.

This old behavior is wrong, and should be fixed to follow the same
excl/rfer behavior.

Just fix it by using the correct behavior described above.

Fixes: 31193213f1 ("Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:13 +02:00
Qu Wenruo dba213242f btrfs: qgroup: Make qgroup_reserve and its callers to use separate reservation type
Since most callers of qgroup_reserve() are already defined by type,
converting qgroup_reserve() is quite an easy work.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:13 +02:00
Qu Wenruo f59c0347d4 btrfs: qgroup: Introduce helpers to update and access new qgroup rsv
Introduce helpers to:

1) Get total reserved space
   For limit calculation
2) Add/release reserved space for given type
   With underflow detection and warning
3) Add/release reserved space according to child qgroup

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:13 +02:00
Qu Wenruo d4e5c92055 btrfs: qgroup: Skeleton to support separate qgroup reservation type
Instead of single qgroup->reserved, use a new structure btrfs_qgroup_rsv
to store different types of reservation.

This patch only updates the header needed to compile.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:13 +02:00
Omar Sandoval 0a0d4415e3 Btrfs: delete dead code in btrfs_orphan_add()
btrfs_orphan_add() has had this case commented out since it was first
introduced in commit d68fc57b7e ("Btrfs: Metadata reservation for
orphan inodes"). Most of the orphan cleanup code has been rewritten
since then, so it's safe to say that this code isn't needed.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
[ switch to bool ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:12 +02:00
Misono, Tomohiro 4408ea7c5f btrfs: ctree.h: Fix wrong comment position about csum size
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:12 +02:00
Jeff Mahoney 75cb379d26 btrfs: defer adding raid type kobject until after chunk relocation
Any time the first block group of a new type is created, we add a new
kobject to sysfs to hold the attributes for that type.  Kobject-internal
allocations always use GFP_KERNEL, making them prone to fs-reclaim races.
While it appears as if this can occur any time a block group is created,
the only times the first block group of a new type can be created in
memory is at mount and when we create the first new block group during
raid conversion.

This patch adds a new list to track pending kobject additions and then
handles them after we do chunk relocation.  Between relocating the
target chunk (or forcing allocation of a new chunk in the case of data)
and removing the old chunk, we're in a safe place for fs-reclaim to
occur.  We're holding the volume mutex, which is already held across
page faults, and the delete_unused_bgs_mutex, which will only stall
the cleaner thread.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:12 +02:00
Jeff Mahoney dc2d3005d2 btrfs: remove dead create_space_info calls
Since commit 2be12ef79 (btrfs: Separate space_info create/update), we've
separated out the creation and updating of the space info structures.
That commit was a straightforward refactoring of the two parts of
update_space_info, but we can go a step further.  Since commits
c59021f84 (Btrfs: fix OOPS of empty filesystem after balance) and
b742bb82f (Btrfs: Link block groups of different raid types), we know
that the space_info structures will be created at mount and there will
only ever be, at most, three of them.

This patch cleans out the create_space_info calls after __find_space_info
returns NULL since __find_space_info *can't* return NULL.

The initial cause for reviewing this was the kobject_add calls from
create_space_info occuring in sites where fs-reclaim wasn't allowed.  Now
we are certain they occur only early in the mount process and are safe.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:12 +02:00
Liu Bo 580c6efaf9 Btrfs: replace: cache rbio when rebuild data on missing device
Rebuild on missing device is as same as recover, after it's done, rbio
has data which is consistent with on-disk data, so it can be cached to
avoid further reads.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:12 +02:00
Jeff Mahoney 8a5a916d9a btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
While running btrfs/011, I hit the following lockdep splat.

This is the important bit:
   pcpu_alloc+0x1ac/0x5e0
   __percpu_counter_init+0x4e/0xb0
   btrfs_init_fs_root+0x99/0x1c0 [btrfs]
   btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
   resolve_indirect_refs+0x130/0x830 [btrfs]
   find_parent_nodes+0x69e/0xff0 [btrfs]
   btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
   btrfs_find_all_roots+0x50/0x70 [btrfs]
   btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
   btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]

The percpu_counter_init call in btrfs_alloc_subvolume_writers
uses GFP_KERNEL, which we can't do during transaction commit.

This switches it to GFP_NOFS.

========================================================
WARNING: possible irq lock inversion dependency detected
4.12.14-kvmsmall #8 Tainted: G        W
--------------------------------------------------------
kswapd0/50 just changed the state of lock:
 (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
but this lock took another, RECLAIM_FS-unsafe lock in the past:
 (pcpu_alloc_mutex){+.+.+.}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
Chain exists of:
  &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(pcpu_alloc_mutex);
                               local_irq_disable();
                               lock(&delayed_node->mutex);
                               lock(&found->groups_sem);
  <Interrupt>
    lock(&delayed_node->mutex);

 *** DEADLOCK ***

2 locks held by kswapd0/50:
 #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
 #1:  (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50

the shortest dependencies between 2nd lock and 1st lock:
   -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
      HARDIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          pcpu_alloc+0x1ac/0x5e0
                          alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                          __do_tune_cpucache+0x2c/0x220
                          do_tune_cpucache+0x26/0xc0
                          enable_cpucache+0x6d/0xf0
                          kmem_cache_init_late+0x42/0x75
                          start_kernel+0x343/0x4cb
                          x86_64_start_kernel+0x127/0x134
                          secondary_startup_64+0xa5/0xb0
      SOFTIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          pcpu_alloc+0x1ac/0x5e0
                          alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                          __do_tune_cpucache+0x2c/0x220
                          do_tune_cpucache+0x26/0xc0
                          enable_cpucache+0x6d/0xf0
                          kmem_cache_init_late+0x42/0x75
                          start_kernel+0x343/0x4cb
                          x86_64_start_kernel+0x127/0x134
                          secondary_startup_64+0xa5/0xb0
      RECLAIM_FS-ON-W at:
                             __kmalloc+0x47/0x310
                             pcpu_extend_area_map+0x2b/0xc0
                             pcpu_alloc+0x3ec/0x5e0
                             alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                             __do_tune_cpucache+0x2c/0x220
                             do_tune_cpucache+0x26/0xc0
                             enable_cpucache+0x6d/0xf0
                             __kmem_cache_create+0x1bf/0x390
                             create_cache+0xba/0x1b0
                             kmem_cache_create+0x1f8/0x2b0
                             ksm_init+0x6f/0x19d
                             do_one_initcall+0x50/0x1b0
                             kernel_init_freeable+0x201/0x289
                             kernel_init+0xa/0x100
                             ret_from_fork+0x3a/0x50
      INITIAL USE at:
                         __mutex_lock+0x4e/0x8c0
                         pcpu_alloc+0x1ac/0x5e0
                         alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                         setup_cpu_cache+0x2f/0x1f0
                         __kmem_cache_create+0x1bf/0x390
                         create_boot_cache+0x8b/0xb1
                         kmem_cache_init+0xa1/0x19e
                         start_kernel+0x270/0x4cb
                         x86_64_start_kernel+0x127/0x134
                         secondary_startup_64+0xa5/0xb0
    }
    ... key      at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
    ... acquired at:
   pcpu_alloc+0x1ac/0x5e0
   __percpu_counter_init+0x4e/0xb0
   btrfs_init_fs_root+0x99/0x1c0 [btrfs]
   btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
   resolve_indirect_refs+0x130/0x830 [btrfs]
   find_parent_nodes+0x69e/0xff0 [btrfs]
   btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
   btrfs_find_all_roots+0x50/0x70 [btrfs]
   btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
   btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
   transaction_kthread+0x176/0x1b0 [btrfs]
   kthread+0x102/0x140
   ret_from_fork+0x3a/0x50

  -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
     HARDIRQ-ON-W at:
                        down_write+0x3e/0xa0
                        cache_block_group+0x287/0x420 [btrfs]
                        find_free_extent+0x106c/0x12d0 [btrfs]
                        btrfs_reserve_extent+0xd8/0x170 [btrfs]
                        cow_file_range.isra.66+0x133/0x470 [btrfs]
                        run_delalloc_range+0x121/0x410 [btrfs]
                        writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                        __extent_writepage+0x19a/0x360 [btrfs]
                        extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                        extent_writepages+0x4d/0x60 [btrfs]
                        do_writepages+0x1a/0x70
                        __filemap_fdatawrite_range+0xa7/0xe0
                        btrfs_rename+0x5ee/0xdb0 [btrfs]
                        vfs_rename+0x52a/0x7e0
                        SyS_rename+0x351/0x3b0
                        do_syscall_64+0x79/0x1e0
                        entry_SYSCALL_64_after_hwframe+0x42/0xb7
     HARDIRQ-ON-R at:
                        down_read+0x35/0x90
                        caching_thread+0x57/0x560 [btrfs]
                        normal_work_helper+0x1c0/0x5e0 [btrfs]
                        process_one_work+0x1e0/0x5c0
                        worker_thread+0x44/0x390
                        kthread+0x102/0x140
                        ret_from_fork+0x3a/0x50
     SOFTIRQ-ON-W at:
                        down_write+0x3e/0xa0
                        cache_block_group+0x287/0x420 [btrfs]
                        find_free_extent+0x106c/0x12d0 [btrfs]
                        btrfs_reserve_extent+0xd8/0x170 [btrfs]
                        cow_file_range.isra.66+0x133/0x470 [btrfs]
                        run_delalloc_range+0x121/0x410 [btrfs]
                        writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                        __extent_writepage+0x19a/0x360 [btrfs]
                        extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                        extent_writepages+0x4d/0x60 [btrfs]
                        do_writepages+0x1a/0x70
                        __filemap_fdatawrite_range+0xa7/0xe0
                        btrfs_rename+0x5ee/0xdb0 [btrfs]
                        vfs_rename+0x52a/0x7e0
                        SyS_rename+0x351/0x3b0
                        do_syscall_64+0x79/0x1e0
                        entry_SYSCALL_64_after_hwframe+0x42/0xb7
     SOFTIRQ-ON-R at:
                        down_read+0x35/0x90
                        caching_thread+0x57/0x560 [btrfs]
                        normal_work_helper+0x1c0/0x5e0 [btrfs]
                        process_one_work+0x1e0/0x5c0
                        worker_thread+0x44/0x390
                        kthread+0x102/0x140
                        ret_from_fork+0x3a/0x50
     INITIAL USE at:
                       down_write+0x3e/0xa0
                       cache_block_group+0x287/0x420 [btrfs]
                       find_free_extent+0x106c/0x12d0 [btrfs]
                       btrfs_reserve_extent+0xd8/0x170 [btrfs]
                       cow_file_range.isra.66+0x133/0x470 [btrfs]
                       run_delalloc_range+0x121/0x410 [btrfs]
                       writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                       __extent_writepage+0x19a/0x360 [btrfs]
                       extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                       extent_writepages+0x4d/0x60 [btrfs]
                       do_writepages+0x1a/0x70
                       __filemap_fdatawrite_range+0xa7/0xe0
                       btrfs_rename+0x5ee/0xdb0 [btrfs]
                       vfs_rename+0x52a/0x7e0
                       SyS_rename+0x351/0x3b0
                       do_syscall_64+0x79/0x1e0
                       entry_SYSCALL_64_after_hwframe+0x42/0xb7
   }
   ... key      at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
   ... acquired at:
   cache_block_group+0x287/0x420 [btrfs]
   find_free_extent+0x106c/0x12d0 [btrfs]
   btrfs_reserve_extent+0xd8/0x170 [btrfs]
   btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
   btrfs_create_tree+0xbb/0x2a0 [btrfs]
   btrfs_create_uuid_tree+0x37/0x140 [btrfs]
   open_ctree+0x23c0/0x2660 [btrfs]
   btrfs_mount+0xd36/0xf90 [btrfs]
   mount_fs+0x3a/0x160
   vfs_kern_mount+0x66/0x150
   btrfs_mount+0x18c/0xf90 [btrfs]
   mount_fs+0x3a/0x160
   vfs_kern_mount+0x66/0x150
   do_mount+0x1c1/0xcc0
   SyS_mount+0x7e/0xd0
   do_syscall_64+0x79/0x1e0
   entry_SYSCALL_64_after_hwframe+0x42/0xb7

 -> (&found->groups_sem){++++..} ops: 2134587 {
    HARDIRQ-ON-W at:
                      down_write+0x3e/0xa0
                      __link_block_group+0x34/0x130 [btrfs]
                      btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                      open_ctree+0x2054/0x2660 [btrfs]
                      btrfs_mount+0xd36/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      btrfs_mount+0x18c/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      do_mount+0x1c1/0xcc0
                      SyS_mount+0x7e/0xd0
                      do_syscall_64+0x79/0x1e0
                      entry_SYSCALL_64_after_hwframe+0x42/0xb7
    HARDIRQ-ON-R at:
                      down_read+0x35/0x90
                      btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                      open_ctree+0x207b/0x2660 [btrfs]
                      btrfs_mount+0xd36/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      btrfs_mount+0x18c/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      do_mount+0x1c1/0xcc0
                      SyS_mount+0x7e/0xd0
                      do_syscall_64+0x79/0x1e0
                      entry_SYSCALL_64_after_hwframe+0x42/0xb7
    SOFTIRQ-ON-W at:
                      down_write+0x3e/0xa0
                      __link_block_group+0x34/0x130 [btrfs]
                      btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                      open_ctree+0x2054/0x2660 [btrfs]
                      btrfs_mount+0xd36/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      btrfs_mount+0x18c/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      do_mount+0x1c1/0xcc0
                      SyS_mount+0x7e/0xd0
                      do_syscall_64+0x79/0x1e0
                      entry_SYSCALL_64_after_hwframe+0x42/0xb7
    SOFTIRQ-ON-R at:
                      down_read+0x35/0x90
                      btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                      open_ctree+0x207b/0x2660 [btrfs]
                      btrfs_mount+0xd36/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      btrfs_mount+0x18c/0xf90 [btrfs]
                      mount_fs+0x3a/0x160
                      vfs_kern_mount+0x66/0x150
                      do_mount+0x1c1/0xcc0
                      SyS_mount+0x7e/0xd0
                      do_syscall_64+0x79/0x1e0
                      entry_SYSCALL_64_after_hwframe+0x42/0xb7
    INITIAL USE at:
                     down_write+0x3e/0xa0
                     __link_block_group+0x34/0x130 [btrfs]
                     btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                     open_ctree+0x2054/0x2660 [btrfs]
                     btrfs_mount+0xd36/0xf90 [btrfs]
                     mount_fs+0x3a/0x160
                     vfs_kern_mount+0x66/0x150
                     btrfs_mount+0x18c/0xf90 [btrfs]
                     mount_fs+0x3a/0x160
                     vfs_kern_mount+0x66/0x150
                     do_mount+0x1c1/0xcc0
                     SyS_mount+0x7e/0xd0
                     do_syscall_64+0x79/0x1e0
                     entry_SYSCALL_64_after_hwframe+0x42/0xb7
  }
  ... key      at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
  ... acquired at:
   find_free_extent+0xcb4/0x12d0 [btrfs]
   btrfs_reserve_extent+0xd8/0x170 [btrfs]
   btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
   __btrfs_cow_block+0x110/0x5b0 [btrfs]
   btrfs_cow_block+0xd7/0x290 [btrfs]
   btrfs_search_slot+0x1f6/0x960 [btrfs]
   btrfs_lookup_inode+0x2a/0x90 [btrfs]
   __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
   btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
   btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
   evict+0xc4/0x190
   __dentry_kill+0xbf/0x170
   dput+0x2ae/0x2f0
   SyS_rename+0x2a6/0x3b0
   do_syscall_64+0x79/0x1e0
   entry_SYSCALL_64_after_hwframe+0x42/0xb7

-> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
   HARDIRQ-ON-W at:
                    __mutex_lock+0x4e/0x8c0
                    btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                    btrfs_update_inode+0x83/0x110 [btrfs]
                    btrfs_dirty_inode+0x62/0xe0 [btrfs]
                    touch_atime+0x8c/0xb0
                    do_generic_file_read+0x818/0xb10
                    __vfs_read+0xdc/0x150
                    vfs_read+0x8a/0x130
                    SyS_read+0x45/0xa0
                    do_syscall_64+0x79/0x1e0
                    entry_SYSCALL_64_after_hwframe+0x42/0xb7
   SOFTIRQ-ON-W at:
                    __mutex_lock+0x4e/0x8c0
                    btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                    btrfs_update_inode+0x83/0x110 [btrfs]
                    btrfs_dirty_inode+0x62/0xe0 [btrfs]
                    touch_atime+0x8c/0xb0
                    do_generic_file_read+0x818/0xb10
                    __vfs_read+0xdc/0x150
                    vfs_read+0x8a/0x130
                    SyS_read+0x45/0xa0
                    do_syscall_64+0x79/0x1e0
                    entry_SYSCALL_64_after_hwframe+0x42/0xb7
   IN-RECLAIM_FS-W at:
                       __mutex_lock+0x4e/0x8c0
                       __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
                       btrfs_evict_inode+0x22c/0x6a0 [btrfs]
                       evict+0xc4/0x190
                       dispose_list+0x35/0x50
                       prune_icache_sb+0x42/0x50
                       super_cache_scan+0x139/0x190
                       shrink_slab+0x262/0x5b0
                       shrink_node+0x2eb/0x2f0
                       kswapd+0x2eb/0x890
                       kthread+0x102/0x140
                       ret_from_fork+0x3a/0x50
   INITIAL USE at:
                   __mutex_lock+0x4e/0x8c0
                   btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                   btrfs_update_inode+0x83/0x110 [btrfs]
                   btrfs_dirty_inode+0x62/0xe0 [btrfs]
                   touch_atime+0x8c/0xb0
                   do_generic_file_read+0x818/0xb10
                   __vfs_read+0xdc/0x150
                   vfs_read+0x8a/0x130
                   SyS_read+0x45/0xa0
                   do_syscall_64+0x79/0x1e0
                   entry_SYSCALL_64_after_hwframe+0x42/0xb7
 }
 ... key      at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
 ... acquired at:
   __lock_acquire+0x264/0x11c0
   lock_acquire+0xbd/0x1e0
   __mutex_lock+0x4e/0x8c0
   __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
   btrfs_evict_inode+0x22c/0x6a0 [btrfs]
   evict+0xc4/0x190
   dispose_list+0x35/0x50
   prune_icache_sb+0x42/0x50
   super_cache_scan+0x139/0x190
   shrink_slab+0x262/0x5b0
   shrink_node+0x2eb/0x2f0
   kswapd+0x2eb/0x890
   kthread+0x102/0x140
   ret_from_fork+0x3a/0x50

stack backtrace:
CPU: 1 PID: 50 Comm: kswapd0 Tainted: G        W        4.12.14-kvmsmall #8 SLE15 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0x78/0xb7
 print_irq_inversion_bug.part.38+0x19f/0x1aa
 check_usage_forwards+0x102/0x120
 ? ret_from_fork+0x3a/0x50
 ? check_usage_backwards+0x110/0x110
 mark_lock+0x16c/0x270
 __lock_acquire+0x264/0x11c0
 ? pagevec_lookup_entries+0x1a/0x30
 ? truncate_inode_pages_range+0x2b3/0x7f0
 lock_acquire+0xbd/0x1e0
 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
 __mutex_lock+0x4e/0x8c0
 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
 ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
 ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
 __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
 btrfs_evict_inode+0x22c/0x6a0 [btrfs]
 evict+0xc4/0x190
 dispose_list+0x35/0x50
 prune_icache_sb+0x42/0x50
 super_cache_scan+0x139/0x190
 shrink_slab+0x262/0x5b0
 shrink_node+0x2eb/0x2f0
 kswapd+0x2eb/0x890
 kthread+0x102/0x140
 ? mem_cgroup_shrink_node+0x2c0/0x2c0
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x3a/0x50

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:11 +02:00
Anand Jain 8ba0ae7821 btrfs: drop optimal argument from find_live_mirror()
Drop optimal argument from the function find_live_mirror() as we can
deduce it in the function itself. Also rename optimal to
preferred_mirror.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:11 +02:00
Anand Jain 99f92a7c1e btrfs: drop num argument from find_live_mirror()
Obtain the stripes info from the map directly and so no need
to pass it as an argument.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:11 +02:00
Nikolay Borisov 0a1e458a1e btrfs: Drop fs_info parameter from __btrfs_run_delayed_refs
It's provided by transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:11 +02:00
Nikolay Borisov 5ead2dd02c btrfs: Drop fs_info parameter from btrfs_finish_extent_commit
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:11 +02:00
Nikolay Borisov 460fb20a4b btrfs: Drop fs_info parameter from btrfs_qgroup_account_extents
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:10 +02:00
Nikolay Borisov c79a70b133 btrfs: drop fs_info parameter from btrfs_run_delayed_refs
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:10 +02:00
Nikolay Borisov 39d7d09dc2 btrfs: Remove unused flush var in shrink_delalloc
Added by 08e007d2e5 ("Btrfs: improve the noflush reservation") and
made redundant by 17024ad0a0 ("Btrfs: fix early ENOSPC due to
delalloc").

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:10 +02:00
Nikolay Borisov 101d2dc0b2 btrfs: Remove unused extent_root var from caching_thread
Added by b4570aa994 ("btrfs: fix compiling with CONFIG_BTRFS_DEBUG
enabled.") and obsoleted by 2ff7e61e0d ("btrfs: take an fs_info
directly when the root is not used otherwise").

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:10 +02:00
Nikolay Borisov 338dae1ae6 btrfs: remove max_active var from open_ctree
Introduced by 5cdc7ad337 ("btrfs: Replace fs_info->workers with
btrfs_workqueue.") but obsoleted by 2a4581983f ("btrfs: factor
btrfs_init_workqueues() out of open_ctree()").

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:57 +02:00
Nikolay Borisov 8535dc1967 btrfs: Remove unused root var from relink_file_extents
Added in 38c227d87c ("Btrfs: snapshot-aware defrag") but subsequently
made redundant by 0b246afa62 ("btrfs: root->fs_info cleanup, add
fs_info convenience variables").

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:57 +02:00
Nikolay Borisov 4eeb97c67a btrfs: Remove unused tot_len var from lzo_decompress
Added already unused in a6fa6fae40 ("btrfs: Add lzo compression
support").

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:57 +02:00
Nikolay Borisov d6e823a578 btrfs: Remove unused length var from scrub_handle_errored_block
Added in b5d67f64f9 ("Btrfs: change scrub to support big blocks") but
rendered redundant by be50a8ddaa ("Btrfs: Simplify
scrub_setup_recheck_block()'s argument").

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:57 +02:00
Nikolay Borisov a6dbceafb9 btrfs: Remove unused op_key var from add_delayed_refs
Added as part of 86d5f99442 ("btrfs: convert prelimary reference
tracking to use rbtrees") but never used. tmp_op_key essentially
subsumed that variable.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:57 +02:00
Qu Wenruo ba89b80268 btrfs: volumes: Remove the meaningless condition of minimal nr_devs when allocating a chunk
When checking the minimal nr_devs, there is one dead and meaningless
condition:

if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) {
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This condition is meaningless, @devs_increment has nothing to do with
@sub_stripes.

In fact, in btrfs_raid_array[], profile with sub_stripes larger than 1
(RAID10) already has the @devs_increment set to 2.
So no need to multiple it by @sub_stripes.

And above condition is also dead.
For RAID10, @devs_increment * @sub_stripes equals 4, which is also the
@devs_min of RAID10.
For other profiles, @sub_stripes is always 1, and since @ndevs is
rounded down to @devs_increment, the condition will always be true.

Remove the meaningless condition to make later reader wander less.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:56 +02:00
Nikolay Borisov 6f47c706d9 btrfs: Document parameters of btrfs_reserve_extent
This function is the entry to the extent allocator and as such has
quite a number of parameters. Some of those have subtle effects on the
allocation algorithm. Document the parameters.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:56 +02:00
Nikolay Borisov d87ff75863 btrfs: Handle error from btrfs_uuid_tree_rem call in _btrfs_ioctl_set_received_subvol
As with every function which deals with modifying the btree
btrfs_uuid_tree_rem can fail for any number of reasons (ie. EIO/ENOMEM).
Handle return error value from this function gracefully by aborting the
transaction.

Fixes: dd5f9615fc ("Btrfs: maintain subvolume items in the UUID tree")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:56 +02:00
Nikolay Borisov 776c4a7ce8 btrfs: Use sizeof directly instead of a constant variable
The kernel would like to have all stack VLA usage removed[1].
Unfortunately using an integer constant variable as the size of an
array is still considered a VLA. Instead let's use directly sizeof(var)
which removes the VLA usage. Use the occasion to remove csum_size
altogether and use sizeof() also for the size passed to memcmp

[1]: https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:56 +02:00
David Sterba d0ee393493 btrfs: rename submit callbacks and drop double underscores
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:56 +02:00
David Sterba 6c55343587 btrfs: remove unused parameters from extent_submit_bio_done_t
Remove parameters not used by any of the callbacks.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:55 +02:00
David Sterba d0779291b1 btrfs: remove unused parameters from extent_submit_bio_start_t
Remove parameters not used by any of the callbacks.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:55 +02:00
David Sterba a758781d4b btrfs: separate types for submit_bio_start and submit_bio_done
The callbacks make use of different parameters that are passed to the
other type unnecessarily. This patch adds separate types for each and
the unused parameters will be removed.

The type extent_submit_bio_hook_t keeps all parameters and can be used
where the start/done types are not appropriate.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:55 +02:00
David Sterba d9d19a010b btrfs: kill tree_mod_log_set_root_pointer helper
A useless wrapper around tree_mod_log_insert_root that hides missing
error handling. Move it to the callers.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:55 +02:00
David Sterba 0e82bcfe3c btrfs: kill tree_mod_log_set_node_key helper
A trivial wrapper that can be simply opencoded and makes the GFP
allocation request more visible. The error handling is now moved to the
callers.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:55 +02:00
David Sterba bf1d342510 btrfs: kill trivial wrapper tree_mod_log_eb_move
The wrapper is effectively an alias for tree_mod_log_insert_move but
also hides the missing error handling. To make that more visible, lift
the BUG_ON to the callers.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:55 +02:00
David Sterba b1a09f1ec5 btrfs: remove trivial locking wrappers of tree mod log
The wrappers are trivial and do not bring any extra value on top of the
plain locking primitives.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:54 +02:00
David Sterba bcd24dabe0 btrfs: drop fs_info parameter from __tree_mod_log_oldest_root
It's provided by the extent_buffer.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:54 +02:00
David Sterba b6dfa35bd5 btrfs: embed tree_mod_move structure to tree_mod_elem
The tree_mod_move is not used anywhere and can be embedded as anonymous
structure.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:54 +02:00
David Sterba a446a979ff btrfs: drop unused fs_info parameter from tree_mod_log_eb_move
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:54 +02:00
David Sterba 95b757c164 btrfs: drop fs_info parameter from tree_mod_log_free_eb
It's provided by the extent_buffer.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:54 +02:00
David Sterba db7279a20b btrfs: drop fs_info parameter from tree_mod_log_free_eb
It's provided by the extent_buffer.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:53 +02:00
David Sterba e09c2efe7e btrfs: drop fs_info parameter from tree_mod_log_insert_key
It's provided by the extent_buffer.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:53 +02:00
David Sterba 6074d45f60 btrfs: drop fs_info parameter from tree_mod_log_insert_move
It's provided by the extent_buffer.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:53 +02:00
David Sterba 3ac6de1abd btrfs: drop fs_info parameter from tree_mod_log_set_node_key
It's provided by the extent_buffer.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:53 +02:00
David Sterba b8b3d625ce btrfs: document more parameters of submit_extent_page
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:53 +02:00
David Sterba 0c8508a6e7 btrfs: cleanup merging conditions in submit_extent_page
The merge call was factored out to a separate helper but it's a trivial
one and arguably we can opencode it and cache the value.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:53 +02:00
David Sterba 8eec8296a0 btrfs: remove redundant variable in __do_readpage
The value of page_end is only stored to end, no other use.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:52 +02:00
David Sterba 5c2b1fd753 btrfs: assume that bio_ret is always valid in submit_extent_page
All callers pass a valid pointer so we can drop the redundant checks.
The call to submit_one_bio never happend and can be removed.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:52 +02:00
Liu Bo 6ca1765b36 Btrfs: scrub: batch rebuild for raid56
In case of raid56, writes and rebuilds always take BTRFS_STRIPE_LEN(64K)
as unit, however, scrub_extent() sets blocksize as unit, so rebuild
process may be triggered on every block on a same stripe.

A typical example would be that when we're replacing a disappeared disk,
all reads on the disks get -EIO, every block (size is 4K if blocksize is
4K) would go thru these,

scrub_handle_errored_block
  scrub_recheck_block # re-read pages one by one
  scrub_recheck_block # rebuild by calling raid56_parity_recover()
                        page by page

Although with raid56 stripe cache most of reads during rebuild can be
avoided, the parity recover calculation(xor or raid6 algorithms) needs to
be done $(BTRFS_STRIPE_LEN / blocksize) times.

This makes it smarter by doing raid56 scrub/replace on stripe length.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:52 +02:00
David Sterba 416a72022e btrfs: sort and group mount option definitions
Sort mount options by the primary name, followed by the 'no-'
counterpart if it exists. Group the deprecated and debugging options.
Enum and token defintions are synced.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:52 +02:00
Howard McLauchlan 62b8e07731 btrfs: Add nossd_spread mount option
Btrfs has two mount options for SSD optimizations: ssd and ssd_spread.
Presently there is an option to disable all SSD optimizations, but there
isn't an option to disable just ssd_spread.

This patch adds a mount option nossd_spread that disables ssd_spread
only.

Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:52 +02:00
Nikolay Borisov 92e2f7e370 btrfs: Remove btrfs_fs_info::open_ioctl_trans
Since userspace transaction have been removed we no longer have use
for this field so delete it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:51 +02:00
Nikolay Borisov bcf3a3e7fb btrfs: Remove code referencing unused TRANS_USERSPACE
Now that the userspace transaction ioctls have been removed,
TRANS_USERSPACE is no longer used hence we can remove it.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:51 +02:00
Nikolay Borisov 859e682d58 btrfs: Remove btrfs_file_private::trans
Now that the userspace transaction IOCTL have been removed, this member
is no longer used so just remove it

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:51 +02:00
Nikolay Borisov 7a5a07a810 btrfs: Remove userspace transaction ioctls
Commit 3558d4f88e ("btrfs: Deprecate userspace transaction ioctls")
marked the beginning of the end of userspace transaction. This commit
finishes the job! There are no known users and ceph does not use the
ioctl anymore.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Sage Weil <sage@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:51 +02:00
Qu Wenruo 4d31778aa2 btrfs: qgroup: Fix root item corruption when multiple same source snapshots are created with quota enabled
When multiple pending snapshots referring to the same source subvolume
are executed, enabled quota will cause root item corruption, where root
items are using old bytenr (no backref in extent tree).

This can be triggered by fstests btrfs/152.

The cause is when source subvolume is still dirty, extra commit
(simplied transaction commit) of qgroup_account_snapshot() can skip
dirty roots not recorded in current transaction, making root item of
source subvolume not updated.

Fix it by forcing recording source subvolume in current transaction
before qgroup sub-transaction commit.

Reported-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:51 +02:00
Nikolay Borisov 2e32ef87b0 btrfs: Relax memory barrier in btrfs_tree_unlock
When performing an unlock on an extent buffer we'd like to order the
decrement of extent_buffer::blocking_writers with waking up any
waiters. In such situations it's sufficient to use smp_mb__after_atomic
rather than the heavy smp_mb. On architectures where atomic operations
are fully ordered (such as x86 or s390) unconditionally executing
a heavyweight smp_mb instruction causes a severe hit to performance
while bringin no improvements in terms of correctness.

The better thing is to use the appropriate smp_mb__after_atomic routine
which will do the correct thing (invoke a full smp_mb or in the case
of ordered atomics insert a compiler barrier). Put another way,
an RMW atomic op + smp_load__after_atomic equals, in terms of
semantics, to a full smp_mb. This ensures that none of the problems
described in the accompanying comment of waitqueue_active occur.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:51 +02:00
Anand Jain 7c829b722d btrfs: add define for oldest generation
Some functions can filter metadata by the generation. Add a define that
will annotate such arguments.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:50 +02:00
David Sterba 051c98eb11 btrfs: open code trivial helper btrfs_page_exists_in_range
The called function name is self explanatory.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:50 +02:00
Matthew Wilcox 965aab1cfc btrfs: Use filemap_range_has_page()
The current implementation of btrfs_page_exists_in_range() gives the
wrong answer if the workingset code has stored a shadow entry in the
page cache.  The filemap_range_has_page() function does not have this
problem, and it's shared code, so use it instead.

eigned-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:26:45 +02:00
Theodore Ts'o de05ca8526 ext4: move call to ext4_error() into ext4_xattr_check_block()
Refactor the call to EXT4_ERROR_INODE() into ext4_xattr_check_block().
This simplifies the code, and fixes a problem where not all callers of
ext4_xattr_check_block() were not resulting in ext4_error() getting
called when the xattr block is corrupted.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-03-30 15:42:25 -04:00
Dan Williams 5f0663bb4a ext4, dax: introduce ext4_dax_aops
In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Otherwise, direct-I/O
triggers incorrect page cache assumptions and warnings.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-30 11:34:55 -07:00
Dan Williams 6e2608dfd9 xfs, dax: introduce xfs_dax_aops
In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Otherwise, direct-I/O
triggers incorrect page cache assumptions and warnings like the
following:

 WARNING: CPU: 27 PID: 1783 at fs/xfs/xfs_aops.c:1468
 xfs_vm_set_page_dirty+0xf3/0x1b0 [xfs]
 [..]
 CPU: 27 PID: 1783 Comm: dma-collision Tainted: G           O 4.15.0-rc2+ #984
 [..]
 Call Trace:
  set_page_dirty_lock+0x40/0x60
  bio_set_pages_dirty+0x37/0x50
  iomap_dio_actor+0x2b7/0x3b0
  ? iomap_dio_zero+0x110/0x110
  iomap_apply+0xa4/0x110
  iomap_dio_rw+0x29e/0x3b0
  ? iomap_dio_zero+0x110/0x110
  ? xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
  xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
  xfs_file_read_iter+0xa0/0xc0 [xfs]
  __vfs_read+0xf9/0x170
  vfs_read+0xa6/0x150
  SyS_pread64+0x93/0xb0
  entry_SYSCALL_64_fastpath+0x1f/0x96

...where the default set_page_dirty() handler assumes that dirty state
is being tracked in 'struct page' flags.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-30 11:34:55 -07:00
Dan Williams 15aa8a0118 block, dax: remove dead code in blkdev_writepages()
Block device inodes never have S_DAX set, so kill the check for DAX and
diversion to dax_writeback_mapping_range().

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-30 11:34:55 -07:00
Dan Williams f44c77630d fs, dax: prepare for dax-specific address_space_operations
In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Define some generic VFS aops
helpers for dax. These noop implementations are there in the dax case to
prevent the VFS from falling back to operations with page-cache
assumptions, dax_writeback_mapping_range() may not be referenced in the
FS_DAX=n case.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Matthew Wilcox <mawilcox@microsoft.com>
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-30 11:34:55 -07:00
Dan Williams 3fe0791c29 dax: store pfns in the radix
In preparation for examining the busy state of dax pages in the truncate
path, switch from sectors to pfns in the radix.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-30 11:34:54 -07:00
Yan, Zheng 85784f9395 ceph: only dirty ITER_IOVEC pages for direct read
If a page is already locked, attempting to dirty it leads to a deadlock
in lock_page().  This is what currently happens to ITER_BVEC pages when
a dio-enabled loop device is backed by ceph:

  $ losetup --direct-io /dev/loop0 /mnt/cephfs/img
  $ xfs_io -c 'pread 0 4k' /dev/loop0

Follow other file systems and only dirty ITER_IOVEC pages.

Cc: stable@kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-03-30 11:17:48 +02:00
Tyson Nottingham 27f394a771 ext4: don't show data=<mode> option if defaulted
Previously, mount -l would show data=<mode> even if the ext4 default
journaling mode was being used. Change this to be consistent with the
rest of the options.

Ext4 already did the right thing when the journaling mode being used
matched the one specified in the superblock's default mount options. The
reason it failed to do the right thing for the ext4 defaults is that,
when set, they were never included in sbi->s_def_mount_opt (unlike the
superblock's defaults, which were).

Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-03-30 00:56:10 -04:00
Tyson Nottingham ceec03764a ext4: omit init_itable=n in procfs when disabled
Don't show init_itable=n in /proc/fs/ext4/<dev>/options when filesystem
is mounted with noinit_itable.

Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-03-30 00:53:33 -04:00
Tyson Nottingham 68afa7e083 ext4: show more binary mount options in procfs
Previously, /proc/fs/ext4/<dev>/options would only show binary options
if they were set (1 in the options bit mask). E.g. it would show "grpid"
if it was set, but it would not show "nogrpid" if grpid was not set.

This seems sensible, but when an option is absent from the file, it can
be hard for the unfamiliar to know what is being used. E.g. if there
isn't a (no)grpid entry, nogrpid is in effect. But if there isn't a
(no)auto_da_alloc entry, auto_da_alloc is in effect. If there isn't a
(minixdf|bsddf) entry, it turns out bsddf is in effect. It all depends
on how the option is implemented.

It's clearer to be explicit, so print the corresponding option
regardless of whether it means a 1 or a 0 in the bit mask.

Note that options which do not have an explicit disable option aren't
indicated as being disabled even with this change (e.g. dax).

Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-03-30 00:51:10 -04:00
Tyson Nottingham bc1420ae56 ext4: simplify kobject usage
Replace kset with generic kobject provided by kobject_create_and_add(),
since the latter is sufficient.

Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-03-30 00:41:34 -04:00
Tyson Nottingham 6ca06829fb ext4: remove unused parameters in sysfs code
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-03-30 00:13:10 -04:00
Tyson Nottingham c2e5df7626 ext4: null out kobject* during sysfs cleanup
Make cleanup of ext4_feat kobject consistent with similar objects.

Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-03-30 00:03:38 -04:00
Theodore Ts'o 18db4b4e6f ext4: don't allow r/w mounts if metadata blocks overlap the superblock
If some metadata block, such as an allocation bitmap, overlaps the
superblock, it's very likely that if the file system is mounted
read/write, the results will not be pretty.  So disallow r/w mounts
for file systems corrupted in this particular way.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-03-29 22:10:35 -04:00
Theodore Ts'o a45403b515 ext4: always initialize the crc32c checksum driver
The extended attribute code now uses the crc32c checksum for hashing
purposes, so we should just always always initialize it.  We also want
to prevent NULL pointer dereferences if one of the metadata checksum
features is enabled after the file sytsem is originally mounted.

This issue has been assigned CVE-2018-1094.

https://bugzilla.kernel.org/show_bug.cgi?id=199183
https://bugzilla.redhat.com/show_bug.cgi?id=1560788

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-03-29 22:10:31 -04:00
Theodore Ts'o 8e4b5eae5d ext4: fail ext4_iget for root directory if unallocated
If the root directory has an i_links_count of zero, then when the file
system is mounted, then when ext4_fill_super() notices the problem and
tries to call iput() the root directory in the error return path,
ext4_evict_inode() will try to free the inode on disk, before all of
the file system structures are set up, and this will result in an OOPS
caused by a NULL pointer dereference.

This issue has been assigned CVE-2018-1092.

https://bugzilla.kernel.org/show_bug.cgi?id=199179
https://bugzilla.redhat.com/show_bug.cgi?id=1560777

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-03-29 21:56:09 -04:00
Al Viro cbd4a5bcb2 d_genocide: move export to definition
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:08:21 -04:00
Al Viro 42177007aa fold dentry_lock_for_move() into its sole caller and clean it up
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:49 -04:00
Al Viro 076515fc92 make non-exchanging __d_move() copy ->d_parent rather than swap them
Currently d_move(from, to) does the following:
	* name/parent of from <- old name/parent of to, from hashed there
	* to is unhashed
	* name of to is preserved
	* if from used to be detached, to gets detached
	* if from used to be attached, parent of to <- old parent of from.

That's both user-visibly bogus and complicates reasoning a lot.
Much saner semantics would be
	* name/parent of from <- name/parent of to, from hashed there.
	* to is unhashed
	* name/parent of to is unchanged.

The price, of course, is that old parent of from might lose a reference.
However,
	* all potentially cross-directory callers of d_move() have both
parents pinned directly; typically, dentries themselves are grabbed
only after we have grabbed and locked both parents.  IOW, the decrement
of old parent's refcount in case of d_move() won't reach zero.
	* __d_move() from d_splice_alias() is done to detached alias.
No refcount decrements in that case
	* __d_move() from __d_unalias() *can* get the refcount to zero.
So let's grab a reference to alias' old parent before calling __d_unalias()
and dput() it after we'd dropped rename_lock.

That does make d_splice_alias() potentially blocking.  However, it has
no callers in non-sleepable contexts (and the case where we'd grown
that dget/dput pair is _very_ rare, so performance is not an issue).

Another thing that needs adjustment is unlocking in the end of __d_move();
folded it in.  And cleaned the remnants of bogus ordering from the
"lock them in the beginning" counterpart - it's never been right and
now (well, for 7 years now) we have that thing always serialized on
rename_lock anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:49 -04:00
Al Viro cd1c0c9321 debugfs_lookup(): switch to lookup_one_len_unlocked()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:47 -04:00
Al Viro a03ece5ff2 fold lookup_real() into __lookup_hash()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:47 -04:00
Al Viro 7a5cf791a7 split d_path() and friends into a separate file
Those parts of fs/dcache.c are pretty much self-contained.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:46 -04:00
Al Viro 43986d63b6 dcache.c: trim includes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:45 -04:00
John Ogness 8f04da2adb fs/dcache: Avoid a try_lock loop in shrink_dentry_list()
shrink_dentry_list() holds dentry->d_lock and needs to acquire
dentry->d_inode->i_lock. This cannot be done with a spin_lock()
operation because it's the reverse of the regular lock order.
To avoid ABBA deadlocks it is done with a trylock loop.

Trylock loops are problematic in two scenarios:

  1) PREEMPT_RT converts spinlocks to 'sleeping' spinlocks, which are
     preemptible. As a consequence the i_lock holder can be preempted
     by a higher priority task. If that task executes the trylock loop
     it will do so forever and live lock.

  2) In virtual machines trylock loops are problematic as well. The
     VCPU on which the i_lock holder runs can be scheduled out and a
     task on a different VCPU can loop for a whole time slice. In the
     worst case this can lead to starvation. Commits 47be61845c
     ("fs/dcache.c: avoid soft-lockup in dput()") and 046b961b45
     ("shrink_dentry_list(): take parent's d_lock earlier") are
     addressing exactly those symptoms.

Avoid the trylock loop by using dentry_kill(). When pruning ancestors,
the same code applies that is used to kill a dentry in dput(). This
also has the benefit that the locking order is now the same. First
the inode is locked, then the parent.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:44 -04:00
Al Viro f657a666fd get rid of trylock loop around dentry_kill()
In case when trylock in there fails, deal with it directly in
dentry_kill().  Note that in cases when we drop and retake
->d_lock, we need to recheck whether to retain the dentry.
Another thing is that dropping/retaking ->d_lock might have
ended up with negative dentry turning into positive; that,
of course, can happen only once...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:44 -04:00
Al Viro 62d9956cef handle move to LRU in retain_dentry()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:43 -04:00
Al Viro a338579f2f dput(): consolidate the "do we need to retain it?" into an inlined helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:43 -04:00
Al Viro 8b987a46a1 split the slow part of lock_parent() off
Turn the "trylock failed" part into uninlined __lock_parent().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:42 -04:00
Al Viro 65d8eb5a8f now lock_parent() can't run into killed dentry
all remaining callers hold either a reference or ->i_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:42 -04:00
Al Viro 3b3f09f48b get rid of trylock loop in locking dentries on shrink list
In case of trylock failure don't re-add to the list - drop the locks
and carefully get them in the right order.  For shrink_dentry_list(),
somebody having grabbed a reference to dentry means that we can
kick it off-list, so if we find dentry being modified under us we
don't need to play silly buggers with retries anyway - off the list
it is.

The locking logics taken out into a helper of its own; lock_parent()
is no longer used for dentries that can be killed under us.

[fix from Eric Biggers folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-29 15:07:02 -04:00
Eric Biggers ce3fd194fc ext4: limit xattr size to INT_MAX
ext4 isn't validating the sizes of xattrs where the value of the xattr
is stored in an external inode.  This is problematic because
->e_value_size is a u32, but ext4_xattr_get() returns an int.  A very
large size is misinterpreted as an error code, which ext4_get_acl()
translates into a bogus ERR_PTR() for which IS_ERR() returns false,
causing a crash.

Fix this by validating that all xattrs are <= INT_MAX bytes.

This issue has been assigned CVE-2018-1095.

https://bugzilla.kernel.org/show_bug.cgi?id=199185
https://bugzilla.redhat.com/show_bug.cgi?id=1560793

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Fixes: e50e5129f3 ("ext4: xattr-in-inode support")
2018-03-29 14:31:42 -04:00
Abhi Das 5e86d9d122 gfs2: time journal recovery steps accurately
This patch spits out the time taken by the various steps in the
journal recover process. Previously, the journal recovery time
didn't account for finding the journal head in the log which takes
up a significant portion of time.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-29 10:41:27 -07:00
Eric Sandeen dc1baa715b xfs: do not log/recover swapext extent owner changes for deleted inodes
Today if we run xfs_fsr and crash[1], log replay can fail because
the recovery code tries to instantiate the donor inode from
disk to replay the swapext, but it's been deleted and we get
verifier failures when we try to read the inode off disk with
i_mode == 0.

This fixes both sides: We don't log the swapext change if the
inode has been deleted, and we don't try to recover it either.

[1] or if systemd doesn't cleanly unmount root, as it is wont
    to do ...

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-29 10:19:15 -07:00
Andreas Gruenbacher fffb64127a gfs2: Zero out fallocated blocks in fallocate_chunk
Instead of zeroing out fallocated blocks in gfs2_iomap_alloc, zero them
out in fallocate_chunk, much higher up the call stack.  This gets rid of
gfs2's abuse of the IOMAP_ZERO flag as well as the gfs2 specific zeronew
buffer flag.  I can't think of a reason why zeroing out the blocks in
gfs2_iomap_alloc would have any benefits: there is no additional locking
at that level that would add protection to the newly allocated blocks.

While at it, change fallocate over from gs2_block_map to gfs2_iomap_begin.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2018-03-29 06:50:32 -07:00
Colin Ian King f62fd7a777 ecryptfs: fix spelling mistake: "cadidate" -> "candidate"
Trivial fix to spelling mistake in debug message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2018-03-29 01:33:30 +00:00
Guenter Roeck ab13a9218c ecryptfs: lookup: Don't check if mount_crypt_stat is NULL
mount_crypt_stat is assigned to
&ecryptfs_superblock_to_private(ecryptfs_dentry->d_sb)->mount_crypt_stat,
and mount_crypt_stat is not the first object in struct ecryptfs_sb_info.
mount_crypt_stat is therefore never NULL. At the same time, no crash
in ecryptfs_lookup() has been reported, and the lookup functions in
other file systems don't check if d_sb is NULL either.
Given that, remove the NULL check.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2018-03-29 01:33:29 +00:00
Eric Biggers 53fedcc09c f2fs: reserve bits for fs-verity
Reserve an F2FS feature flag and inode flag for fs-verity.  This is an
in-development feature that is planned be discussed at LSF/MM 2018 [1].
It will provide file-based integrity and authenticity for read-only
files.  Most code will be in a filesystem-independent module, with
smaller changes needed to individual filesystems that opt-in to
supporting the feature.  An early prototype supporting F2FS is available
[2].  Reserving the F2FS on-disk bits for fs-verity will prevent users
of the prototype from conflicting with other new F2FS features.

Note that we're reserving the inode flag in f2fs_inode.i_advise, which
isn't really appropriate since it's not a hint or advice.  But
->i_advise is already being used to hold the 'encrypt' flag; and F2FS's
->i_flags uses the generic FS_* values, so it seems ->i_flags can't be
used for an F2FS-specific flag without additional work to remove the
assumption that ->i_flags uses the generic flags namespace.

[1] https://marc.info/?l=linux-fsdevel&m=151690752225644
[2] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/log/?h=fs-verity-dev

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-28 12:56:49 -07:00
Greg Kroah-Hartman b24d0d5b12 Merge 4.16-rc7 into char-misc-next
We want the hyperv fix in here for merging and testing.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 12:27:35 +02:00
Christoph Hellwig 0e11f6443f fs: move I_DIRTY_INODE to fs.h
And use it in a few more places rather than opencoding the values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28 01:39:02 -04:00
Christoph Hellwig f35562549f ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
I_DIRTY_DATASYNC is a strict superset of I_DIRTY_SYNC semantics, as
in mark dirty to be written out by fdatasync as well.  So dirtying
for both flags makes no sense.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28 01:39:02 -04:00
Christoph Hellwig 2c2acd2d19 ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
I_DIRTY_DATASYNC is a strict superset of I_DIRTY_SYNC semantics, as
in mark dirty to be written out by fdatasync as well.  So dirtying
for both flags makes no sense.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28 01:39:02 -04:00
Christoph Hellwig 937d330512 gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
I_DIRTY_DATASYNC is a strict superset of I_DIRTY_SYNC semantics, as
in mark dirty to be written out by fdatasync as well.  So dirtying
for both flags makes no sense.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28 01:39:01 -04:00
Christoph Hellwig cab64df194 fs: fold open_check_o_direct into do_dentry_open
do_dentry_open is where we do the actual open of the file, so this is
where we should do our O_DIRECT sanity check to cover all potential
callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28 01:39:01 -04:00
Yunlei He d21b0f238a f2fs: Add a segment type check in inplace write
This patch add a segment type check in IPU, in
case of something wrong with blkadd in dnode.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:13:54 -07:00
Yunlong Song 2f52052929 f2fs: no need to initialize zero value for GFP_F2FS_ZERO
Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not
need to initialize zero value for its member any more.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:12:53 -07:00
Chao Yu 780de47cf6 f2fs: don't track new nat entry in nat set
Nat entry set is used only in checkpoint(), and during checkpoint() we
won't flush new nat entry with unallocated address, so we don't need to
add new nat entry into nat set, then nat_entry_set::entry_cnt can
indicate actual entry count we need to flush in checkpoint().

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:10:29 -07:00
Chao Yu df033caf51 f2fs: clean up with F2FS_BLK_ALIGN
Clean up F2FS_BYTES_TO_BLK(x + F2FS_BLKSIZE - 1) with F2FS_BLK_ALIGN(x).

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:09:32 -07:00
David Howells a25e21f0bc rxrpc, afs: Use debug_ids rather than pointers in traces
In rxrpc and afs, use the debug_ids that are monotonically allocated to
various objects as they're allocated rather than pointers as kernel
pointers are now hashed making them less useful.  Further, the debug ids
aren't reused anywhere nearly as quickly.

In addition, allow kernel services that use rxrpc, such as afs, to take
numbers from the rxrpc counter, assign them to their own call struct and
pass them in to rxrpc for both client and service calls so that the trace
lines for each will have the same ID tag.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-27 23:03:00 +01:00
Kirill Tkhai 2f635ceeb2 net: Drop pernet_operations::async
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27 13:18:09 -04:00
Kirill Tkhai 67441c2472 net: Convert nfsd_net_ops
These pernet_operations look similar to rpcsec_gss_net_ops,
they just create and destroy another caches. So, they also
can be async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27 13:18:07 -04:00