Commit Graph

186 Commits

Author SHA1 Message Date
Jan Kara 1822193b5d quota: Cleanup list iteration in dqcache_shrink_scan()
Use list_first_entry() and list_empty() instead of opencoded variants.

Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-20 11:04:26 +02:00
Greg Thelen 9560ba306d quota: reclaim least recently used dquots
The dquots in the free_dquots list are not reclaimed in LRU way.
put_dquot_last() puts entries to the tail and dqcache_shrink_scan()
frees from the tail. Free unreferenced dquots in LRU order because it
seems more reasonable than freeing most recently used.

Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-20 11:04:26 +02:00
Jia-Ju Bai 1aa3b3e0cb fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init
dquot_init() is never called in atomic context.
This function is only set as a parameter of fs_initcall().

Despite never getting called from atomic context,
dquot_init() calls __get_free_pages() with GFP_ATOMIC,
which waits busily for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
to avoid busy waiting and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-04-09 17:48:54 +02:00
Tetsuo Handa 88bc0ede8d quota: Check for register_shrinker() failure.
register_shrinker() might return -ENOMEM error since Linux 3.12.
Call panic() as with other failure checks in this function if
register_shrinker() failed.

Fixes: 1d3d4437ea ("vmscan: per-node deferred work")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jan Kara <jack@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-11-29 16:46:48 +01:00
Chao Yu 1a6152d36d quota: propagate error from __dquot_initialize
In commit 6184fc0b8d ("quota: Propagate error from ->acquire_dquot()"),
we have propagated error from __dquot_initialize to caller, but we forgot
to handle such error in add_dquot_ref(), so, currently, during quota
accounting information initialization flow, if we failed for some of
inodes, we just ignore such error, and do account for others, which is
not a good implementation.

In this patch, we choose to let user be aware of such error, so after
turning on quota successfully, we can make sure all inodes disk usage
can be accounted, which will be more reasonable.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-11-28 16:08:08 +01:00
Linus Torvalds f14fc0ccee Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota, ext2, isofs and udf fixes from Jan Kara:

 - two small quota error handling fixes

 - two isofs fixes for architectures with signed char

 - several udf block number overflow and signedness fixes

 - ext2 rework of mount option handling to avoid GFP_KERNEL allocation
   with spinlock held

 - ... it also contains a patch to implement auditing of responses to
   fanotify permission events. That should have been in the fanotify
   pull request but I mistakenly merged that patch into a wrong branch
   and noticed only now at which point I don't think it's worth rebasing
   and redoing.

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: be aware of error from dquot_initialize
  quota: fix potential infinite loop
  isofs: use unsigned char types consistently
  isofs: fix timestamps beyond 2027
  udf: Fix some sign-conversion warnings
  udf: Fix signed/unsigned format specifiers
  udf: Fix 64-bit sign extension issues affecting blocks > 0x7FFFFFFF
  udf: Remove some outdate references from documentation
  udf: Avoid overflow when session starts at large offset
  ext2: Fix possible sleep in atomic during mount option parsing
  ext2: Parse mount options into a dedicated structure
  audit: Record fanotify access control decisions
2017-11-14 14:13:11 -08:00
Jan Kara 838bee9e75 Merge udf, isofs, quota, ext2 changes for 4.15-rc1. 2017-11-14 11:09:53 +01:00
Chao Yu 88d8ff976a quota: be aware of error from dquot_initialize
Commit 6184fc0b8d ("quota: Propagate error from ->acquire_dquot()")
missed to handle error from dquot_initialize in dquot_file_open, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-11-13 10:36:56 +01:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
zhangyi (F) dd5f627973 quota: fix potential infinite loop
In dquot_writeback_dquots(), we write back dquot from dirty dquots
list. There is a potential infinite loop if ->write_dquot() failure
and forget remove dquot from the list. This patch clear dirty bit
anyway to avoid it.

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-11-01 14:42:57 +01:00
Jan Kara ac3d79392f quota: Generate warnings for DQUOT_SPACE_NOFAIL allocations
Eryu has reported that since commit 7b9ca4c61b "quota: Reduce
contention on dq_data_lock" test generic/233 occasionally fails. This is
caused by the fact that since that commit we don't generate warning and
set grace time for quota allocations that have DQUOT_SPACE_NOFAIL set
(these are for example some metadata allocations in ext4). We need these
allocations to behave regularly wrt warning generation and grace time
setting so fix the code to return to the original behavior.

Reported-and-tested-by: Eryu Guan <eguan@redhat.com>
CC: stable@vger.kernel.org
Fixes: 7b9ca4c61b
Signed-off-by: Jan Kara <jack@suse.cz>
2017-10-10 17:24:46 +02:00
Konstantin Khlebnikov 0a51fb7174 quota: add missing lock into __dquot_transfer()
Lock dq_dqb_lock around dquot_decr_inodes()

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 7b9ca4c61b ("quota: Reduce contention on dq_data_lock")
Signed-off-by: Jan Kara <jack@suse.cz>
2017-09-18 12:23:58 +02:00
Linus Torvalds ae8ac6b7db Merge branch 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota scaling updates from Jan Kara:
 "This contains changes to make the quota subsystem more scalable.

  Reportedly it improves number of files created per second on ext4
  filesystem on fast storage by about a factor of 2x"

* 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
  quota: Add lock annotations to struct members
  quota: Reduce contention on dq_data_lock
  fs: Provide __inode_get_bytes()
  quota: Inline dquot_[re]claim_reserved_space() into callsite
  quota: Inline inode_{incr,decr}_space() into callsites
  quota: Inline functions into their callsites
  ext4: Disable dirty list tracking of dquots when journalling quotas
  quota: Allow disabling tracking of dirty dquots in a list
  quota: Remove dq_wait_unused from dquot
  quota: Move locking into clear_dquot_dirty()
  quota: Do not dirty bad dquots
  quota: Fix possible corruption of dqi_flags
  quota: Propagate ->quota_read errors from v2_read_file_info()
  quota: Fix error codes in v2_read_file_info()
  quota: Push dqio_sem down to ->read_file_info()
  quota: Push dqio_sem down to ->write_file_info()
  quota: Push dqio_sem down to ->get_next_id()
  quota: Push dqio_sem down to ->release_dqblk()
  quota: Remove locking for writing to the old quota format
  quota: Do not acquire dqio_sem for dquot overwrites in v2 format
  ...
2017-09-07 15:19:35 -07:00
Jan Kara 7b9ca4c61b quota: Reduce contention on dq_data_lock
dq_data_lock is currently used to protect all modifications of quota
accounting information, consistency of quota accounting on the inode,
and dquot pointers from inode. As a result contention on the lock can be
pretty heavy.

Reduce the contention on the lock by protecting quota accounting
information by a new dquot->dq_dqb_lock and consistency of quota
accounting with inode usage by inode->i_lock.

This change reduces time to create 500000 files on ext4 on ramdisk by 50
different processes in separate directories by 6% when user quota is
turned on. When those 50 processes belong to 50 different users, the
improvement is about 9%.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:07:59 +02:00
Jan Kara 3ab167d2ba quota: Inline dquot_[re]claim_reserved_space() into callsite
dquot_claim_reserved_space() and dquot_reclaim_reserved_space() have
only a single callsite. Inline them there.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:03:03 +02:00
Jan Kara a478e522e3 quota: Inline inode_{incr,decr}_space() into callsites
inode_incr_space() and inode_decr_space() have only two callsites.
Inline them there as that will make locking changes simpler.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:01:14 +02:00
Jan Kara 0ed60de34a quota: Inline functions into their callsites
inode_add_rsv_space() and inode_sub_rsv_space() had only one callsite.
Inline them there directly. inode_claim_rsv_space() and
inode_reclaim_rsv_space() had two callsites so inline them there as
well. This will simplify further locking changes.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:00:59 +02:00
Jan Kara 834057bf84 quota: Allow disabling tracking of dirty dquots in a list
Filesystems that are journalling quotas generally don't need tracking of
dirty dquots in a list since forcing a transaction commit flushes all
quotas anyway. Allow filesystem to say it doesn't want dquots to be
tracked as it reduces contention on the dq_list_lock.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:00:45 +02:00
Jan Kara 503330f382 quota: Remove dq_wait_unused from dquot
Currently every dquot carries a wait_queue_head_t used only when we are
turning quotas off to wait for last users to drop dquot references.
Since such rare case is not performance sensitive in any means, just use
a global waitqueue for this and save space in struct dquot. Also convert
the logic to use wait_event() instead of open-coding it.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:00:40 +02:00
Jan Kara 1e0b7cb062 quota: Move locking into clear_dquot_dirty()
Move locking of dq_list_lock into clear_dquot_dirty(). It makes the
function more self-contained and will simplify our life later.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:00:24 +02:00
Jan Kara 4580b30ea8 quota: Do not dirty bad dquots
Currently we mark dirty even dquots that are not active (i.e.,
initialization or reading failed for them). Thus later we have to check
whether dirty dquot is really active and just clear the dirty bit if
not. Avoid this complication by just never marking non-active dquot as
dirty.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:00:07 +02:00
Jan Kara 15512377bd quota: Fix possible corruption of dqi_flags
dqi_flags modifications are protected by dq_data_lock. However the
modifications in vfs_load_quota_inode() and in mark_info_dirty() were
not which could lead to corruption of dqi_flags. Since modifications to
dqi_flags are rare, this is hard to observe in practice but in theory it
could happen. Fix the problem by always using dq_data_lock for
protection.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:00:04 +02:00
Jan Kara 42fdb8583d quota: Push dqio_sem down to ->read_file_info()
Push down acquisition of dqio_sem into ->read_file_info() callback. This
is for consistency with other operations and it also allows us to get
rid of an ugliness in OCFS2.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 19:16:24 +02:00
Jan Kara 9a8ae30e73 quota: Push dqio_sem down to ->write_file_info()
Push down acquisition of dqio_sem into ->write_file_info() callback.
Mostly for consistency with other operations.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 19:11:23 +02:00
Jan Kara f14618c682 quota: Push dqio_sem down to ->get_next_id()
Push down acquisition of dqio_sem into ->get_next_id() callback. Mostly
for consistency with other operations.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 19:05:16 +02:00
Jan Kara b9a1a7f4b6 quota: Push dqio_sem down to ->release_dqblk()
Push down acquisition of dqio_sem into ->release_dqblk() callback. It
will allow quota formats to decide whether they need it or not.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 19:04:01 +02:00
Jan Kara 8fc32c2b0d quota: Push dqio_sem down to ->write_dqblk()
Push down acquisition of dqio_sem into ->write_dqblk() callback. It will
allow quota formats to decide whether they need it or not.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 19:03:09 +02:00
Jan Kara e342e38df9 quota: Push dqio_sem down to ->read_dqblk()
Push down acquisition of dqio_sem into ->read_dqblk() callback. It will
allow quota formats to decide whether they need it or not.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 19:01:00 +02:00
Jan Kara 5e8cb9b624 quota: Protect dquot writeout with dq_lock
Currently dquot writeout is only protected by dqio_sem held for writing.
As we transition to a finer grained locking we will use dquot->dq_lock
instead. So acquire it in dquot_commit() and move dqio_sem just around
->commit_dqblk() call as it is still needed to serialize quota file
changes.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 19:00:50 +02:00
Jan Kara d6ab366102 quota: Acquire dqio_sem for reading in vfs_load_quota_inode()
vfs_load_quota_inode() needs dqio_sem only for reading. In fact dqio_sem
is not needed there at all since the function can be called only during
quota on when quota file cannot be modified but let's leave the
protection there since it is logical and the path is in no way
performance critical.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 18:59:04 +02:00
Jan Kara 0cff9151d3 quota: Acquire dqio_sem for reading in dquot_get_next_id()
dquot_get_next_id() needs dqio_sem only for reading to protect against
racing with modification of quota file structure.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 18:58:44 +02:00
Jan Kara 62676838cb quota: Do more fine-grained locking in dquot_acquire()
We need dqio_sem held just for reading when calling ->read_dqblk() in
dquot_acquire(). Also dqio_sem is not needed when setting DQ_READ_B and
DQ_ACTIVE_B as concurrent reads and dquot activations are serialized by
dq_lock. So acquire and release dqio_sem closer to the place where it is
needed. This reduces lock hold time and will make locking changes
easier.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 18:56:07 +02:00
Jan Kara bc8230ee8e quota: Convert dqio_mutex to rwsem
Convert dqio_mutex to rwsem and call it dqio_sem. No functional changes
yet.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 18:52:48 +02:00
zhangyi (F) 41e327b586 quota: correct space limit check
Currently we compare total space (curspace + rsvspace)
with space limit in quota-tools when setting grace time
and also in check_bdq(), but we missing rsvspace in
somewhere else, correct them. This patch also fix incorrect
zero dqb_btime and grace time updating failure when we use
rsvspace(e.g. ext4 dalloc feature).

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-07 16:51:28 +02:00
Tahsin Erdogan 7a9ca53aea quota: add get_inode_usage callback to transfer multi-inode charges
Ext4 ea_inode feature allows storing xattr values in external inodes to
be able to store values that are bigger than a block in size. Ext4 also
has deduplication support for these type of inodes. With deduplication,
the actual storage waste is eliminated but the users of such inodes are
still charged full quota for the inodes as if there was no sharing
happening in the background.

This design requires ext4 to manually charge the users because the
inodes are shared.

An implication of this is that, if someone calls chown on a file that
has such references we need to transfer the quota for the file and xattr
inodes. Current dquot_transfer() function implicitly transfers one inode
charge. With ea_inode feature, we would like to transfer multiple inode
charges.

Add get_inode_usage callback which can interrogate the total number of
inodes that were charged for a given inode.

[ Applied fix from Colin King to make sure the 'ret' variable is
  initialized on the successful return path.  Detected by
  CoverityScan, CID#1446616 ("Uninitialized scalar variable") --tytso]

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jan Kara <jack@suse.cz>
2017-06-22 11:46:48 -04:00
Tahsin Erdogan b8cb5a545c ext4: fix quota charging for shared xattr blocks
ext4_xattr_block_set() calls dquot_alloc_block() to charge for an xattr
block when new references are made. However if dquot_initialize() hasn't
been called on an inode, request for charging is effectively ignored
because ext4_inode_info->i_dquot is not initialized yet.

Add dquot_initialize() to call paths that lead to ext4_xattr_block_set().

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-05-24 18:24:07 -04:00
Jan Kara 139c279fb9 quota: Remove dquot_quotactl_ops
Nobody uses them anymore.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-19 14:21:23 +02:00
Jan Kara aad6cde9ad quota: Stop setting IMMUTABLE and NOATIME flags on quota files
Currently we set IMMUTABLE and NOATIME flags on quota files to stop
userspace from messing with them. Now that all filesystems set these
flags in their quota_on handlers, we can stop setting the flags in
generic quota code. This will allow filesystems to stop copying i_flags
to their on-disk flags on various occasions.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-19 14:21:23 +02:00
Ingo Molnar 5b825c3af1 sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:31 +01:00
Linus Torvalds e93b1cc8a8 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota, fsnotify and ext2 updates from Jan Kara:
 "Changes to locking of some quota operations from dedicated quota mutex
  to s_umount semaphore, a fsnotify fix and a simple ext2 fix"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Fix bogus warning in dquot_disable()
  fsnotify: Fix possible use-after-free in inode iteration on umount
  ext2: reject inodes with negative size
  quota: Remove dqonoff_mutex
  ocfs2: Use s_umount for quota recovery protection
  quota: Remove dqonoff_mutex from dquot_scan_active()
  ocfs2: Protect periodic quota syncing with s_umount semaphore
  quota: Use s_umount protection for quota operations
  quota: Hold s_umount in exclusive mode when enabling / disabling quotas
  fs: Provide function to get superblock with exclusive s_umount
2016-12-19 08:23:53 -08:00
Jan Kara 2700e6067c quota: Fix bogus warning in dquot_disable()
dquot_disable() was warning when sb_has_quota_loaded() was true when
invalidating page cache for quota files. The thinking behind this
warning was that we must have raced with somebody else turning quotas on
and this should not happen because all places modifying quota state must
hold s_umount exclusively now. However sb_has_quota_loaded() can be also
true at this point when we are just suspending quotas on remount
read-only. Just restore the behavior to situation before commit
c3b004460d ("quota: Remove dqonoff_mutex") which introduced the
warning.

The code in dquot_disable() can be further simplified with the new
locking of quota state changes however let's leave that to a separate
commit that can get more testing exposure.

Fixes: c3b004460d
Signed-off-by: Jan Kara <jack@suse.cz>
2016-12-19 14:01:39 +01:00
Al Viro 8c54ca9c68 quota: constify struct path in quota_on
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 19:03:06 -05:00
Jan Kara c3b004460d quota: Remove dqonoff_mutex
The only places that were grabbing dqonoff_mutex are functions turning
quotas on and off and these are properly serialized using s_umount
semaphore. Remove dqonoff_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-30 08:38:07 +01:00
Jan Kara ee1ac541a2 quota: Remove dqonoff_mutex from dquot_scan_active()
All callers of dquot_scan_active() now hold s_umount so we can rely on
that lock to protect us against quota state changes.

Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-30 08:37:11 +01:00
Jan Kara 9d1ccbe70e quota: Use s_umount protection for quota operations
Writeback quota is protected by s_umount semaphore held for reading
because every writeback must be protected by that lock (grabbed either
by the generic writeback code or by quotactl handler). Getting next
available ID in quota file, querying quota state, setting quota
information, getting quota format are all quotactl operations protected
by s_umount semaphore held for reading grabbed in quotactl handler.

This also fixes lockdep splat about possible deadlock during filesystem
freezing where sync_filesystem() is called with page-faults already
blocked but sync_filesystem() calls into dquot_writeback_dquots() which
grabs dqonoff_mutex which ranks above i_mutex (vfs_load_quota_inode()
grabs i_mutex under dqonoff_mutex) which clearly ranks below page fault
freeze protection (e.g. via mmap_sem dependencies). The reported problem
is not a real deadlock possibility since during quota on we check
whether filesystem freezing is not in progress but still it is good to
have this fixed.

Reported-by: Ted Tso <tytso@mit.edu>
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-24 15:27:16 +01:00
Jan Kara 7d6cd73d33 quota: Hold s_umount in exclusive mode when enabling / disabling quotas
Currently we hold s_umount semaphore only in shared mode when enabling
or disabling quotas and use dqonoff_mutex for serializing quota state
changes on a filesystem and also quota state changes with other places
depending on current quota state. Using dedicated mutex for this causes
possible deadlocks during filesystem freezing (see following commit for
details) so we transition to using s_umount semaphore for the necessary
synchronization whose lock ordering is properly handled by the
filesystem freezing code. As a start grab s_umount in exclusive mode
when enabling / disabling quotas.

Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-24 15:26:53 +01:00
Linus Torvalds a867d7349e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns vfs updates from Eric Biederman:
 "This tree contains some very long awaited work on generalizing the
  user namespace support for mounting filesystems to include filesystems
  with a backing store.  The real world target is fuse but the goal is
  to update the vfs to allow any filesystem to be supported.  This
  patchset is based on a lot of code review and testing to approach that
  goal.

  While looking at what is needed to support the fuse filesystem it
  became clear that there were things like xattrs for security modules
  that needed special treatment.  That the resolution of those concerns
  would not be fuse specific.  That sorting out these general issues
  made most sense at the generic level, where the right people could be
  drawn into the conversation, and the issues could be solved for
  everyone.

  At a high level what this patchset does a couple of simple things:

   - Add a user namespace owner (s_user_ns) to struct super_block.

   - Teach the vfs to handle filesystem uids and gids not mapping into
     to kuids and kgids and being reported as INVALID_UID and
     INVALID_GID in vfs data structures.

  By assigning a user namespace owner filesystems that are mounted with
  only user namespace privilege can be detected.  This allows security
  modules and the like to know which mounts may not be trusted.  This
  also allows the set of uids and gids that are communicated to the
  filesystem to be capped at the set of kuids and kgids that are in the
  owning user namespace of the filesystem.

  One of the crazier corner casees this handles is the case of inodes
  whose i_uid or i_gid are not mapped into the vfs.  Most of the code
  simply doesn't care but it is easy to confuse the inode writeback path
  so no operation that could cause an inode write-back is permitted for
  such inodes (aka only reads are allowed).

  This set of changes starts out by cleaning up the code paths involved
  in user namespace permirted mounts.  Then when things are clean enough
  adds code that cleanly sets s_user_ns.  Then additional restrictions
  are added that are possible now that the filesystem superblock
  contains owner information.

  These changes should not affect anyone in practice, but there are some
  parts of these restrictions that are changes in behavior.

   - Andy's restriction on suid executables that does not honor the
     suid bit when the path is from another mount namespace (think
     /proc/[pid]/fd/) or when the filesystem was mounted by a less
     privileged user.

   - The replacement of the user namespace implicit setting of MNT_NODEV
     with implicitly setting SB_I_NODEV on the filesystem superblock
     instead.

     Using SB_I_NODEV is a stronger form that happens to make this state
     user invisible.  The user visibility can be managed but it caused
     problems when it was introduced from applications reasonably
     expecting mount flags to be what they were set to.

  There is a little bit of work remaining before it is safe to support
  mounting filesystems with backing store in user namespaces, beyond
  what is in this set of changes.

   - Verifying the mounter has permission to read/write the block device
     during mount.

   - Teaching the integrity modules IMA and EVM to handle filesystems
     mounted with only user namespace root and to reduce trust in their
     security xattrs accordingly.

   - Capturing the mounters credentials and using that for permission
     checks in d_automount and the like.  (Given that overlayfs already
     does this, and we need the work in d_automount it make sense to
     generalize this case).

  Furthermore there are a few changes that are on the wishlist:

   - Get all filesystems supporting posix acls using the generic posix
     acls so that posix_acl_fix_xattr_from_user and
     posix_acl_fix_xattr_to_user may be removed.  [Maintainability]

   - Reducing the permission checks in places such as remount to allow
     the superblock owner to perform them.

   - Allowing the superblock owner to chown files with unmapped uids and
     gids to something that is mapped so the files may be treated
     normally.

  I am not considering even obvious relaxations of permission checks
  until it is clear there are no more corner cases that need to be
  locked down and handled generically.

  Many thanks to Seth Forshee who kept this code alive, and putting up
  with me rewriting substantial portions of what he did to handle more
  corner cases, and for his diligent testing and reviewing of my
  changes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
  fs: Call d_automount with the filesystems creds
  fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
  evm: Translate user/group ids relative to s_user_ns when computing HMAC
  dquot: For now explicitly don't support filesystems outside of init_user_ns
  quota: Handle quota data stored in s_user_ns in quota_setxquota
  quota: Ensure qids map to the filesystem
  vfs: Don't create inodes with a uid or gid unknown to the vfs
  vfs: Don't modify inodes with a uid or gid unknown to the vfs
  cred: Reject inodes with invalid ids in set_create_file_as()
  fs: Check for invalid i_uid in may_follow_link()
  vfs: Verify acls are valid within superblock's s_user_ns.
  userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
  fs: Refuse uid/gid changes which don't map into s_user_ns
  selinux: Add support for unprivileged mounts from user namespaces
  Smack: Handle labels consistently in untrusted mounts
  Smack: Add support for unprivileged mounts from user namespaces
  fs: Treat foreign mounts as nosuid
  fs: Limit file caps to the user namespace of the super block
  userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
  userns: Remove implicit MNT_NODEV fragility.
  ...
2016-07-29 15:54:19 -07:00
Eric W. Biederman 5c0048280b dquot: For now explicitly don't support filesystems outside of init_user_ns
Mostly supporting filesystems outside of init_user_ns is
s/&init_usre_ns/dquot->dq_sb->s_user_ns/.  An actual need for
supporting quotas on filesystems outside of s_user_ns is quite a ways
away and to be done responsibily needs an audit on what can happen
with hostile quota files.  Until that audit is complete don't attempt
to support quota files on filesystems outside of s_user_ns.

Cc: Jan Kara <jack@suse.cz>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-07-05 15:12:39 -05:00
Eric W. Biederman d49d37624a quota: Ensure qids map to the filesystem
Introduce the helper qid_has_mapping and use it to ensure that the
quota system only considers qids that map to the filesystems
s_user_ns.

In practice for quota supporting filesystems today this is the exact
same check as qid_valid.  As only 0xffffffff aka (qid_t)-1 does not
map into init_user_ns.

Replace the qid_valid calls with qid_has_mapping as values come in
from userspace.  This is harmless today and it prepares the quota
system to work on filesystems with quotas but mounted by unprivileged
users.

Call qid_has_mapping from dqget.  This ensures the passed in qid has a
prepresentation on the underlying filesystem.  Previously this was
unnecessary as filesystesm never had qids that could not map.  With
the introduction of filesystems outside of s_user_ns this will not
remain true.

All of this ensures the quota code never has to deal with qids that
don't map to the underlying filesystem.

Cc: Jan Kara <jack@suse.cz>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-07-05 15:12:05 -05:00
Arnd Bergmann e008bb6134 quota: use time64_t internally
The quota subsystem has two formats, the old v1 format using architecture
specific time_t values on the on-disk format, while the v2 format
(introduced in Linux 2.5.16 and 2.4.22) uses fixed 64-bit little-endian.

While there is no future for the v1 format beyond y2038, the v2 format
is almost there on 32-bit architectures, as both the user interface
and the on-disk format use 64-bit timestamps, just not the time_t
inbetween.

This changes the internal representation to use time64_t, which will
end up doing the right thing everywhere for v2 format.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-06-19 18:09:31 +02:00