Commit Graph

325 Commits

Author SHA1 Message Date
Linus Torvalds 101105b171 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 ">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Replace current_fs_time() with current_time()
  fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  fs: Replace CURRENT_TIME with current_time() for inode timestamps
  fs: proc: Delete inode time initializations in proc_alloc_inode()
  vfs: Add current_time() api
  vfs: add note about i_op->rename changes to porting
  fs: rename "rename2" i_op to "rename"
  vfs: remove unused i_op->rename
  fs: make remaining filesystems use .rename2
  libfs: support RENAME_NOREPLACE in simple_rename()
  fs: support RENAME_NOREPLACE for local filesystems
  ncpfs: fix unused variable warning
2016-10-10 20:16:43 -07:00
Al Viro 3873691e5a Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
Deepa Dinamani 02027d42c3 fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
CURRENT_TIME_SEC is not y2038 safe. current_time() will
be transitioned to use 64 bit time along with vfs in a
separate patch.
There is no plan to transistion CURRENT_TIME_SEC to use
y2038 safe time interfaces.

current_time() will also be extended to use superblock
range checking parameters when range checking is introduced.

This works because alloc_super() fills in the the s_time_gran
in super block to NSEC_PER_SEC.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 21:06:22 -04:00
Miklos Szeredi 2773bf00ae fs: rename "rename2" i_op to "rename"
Generated patch:

sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-27 11:03:58 +02:00
Miklos Szeredi f03b8ad8d3 fs: support RENAME_NOREPLACE for local filesystems
This is trivial to do:

 - add flags argument to foo_rename()
 - check if flags doesn't have any other than RENAME_NOREPLACE
 - assign foo_rename() to .rename2 instead of .rename

Filesystems converted:

affs, bfs, exofs, ext2, hfs, hfsplus, jffs2, jfs, logfs, minix, msdos,
nilfs2, omfs, reiserfs, sysvfs, ubifs, udf, ufs, vfat.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Bob Copeland <me@bobcopeland.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>
2016-09-27 11:03:57 +02:00
Jan Kara 31051c85b5 fs: Give dentry to inode_change_ok() instead of inode
inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-22 10:56:19 +02:00
Linus Torvalds 6784725ab0 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted cleanups and fixes.

  Probably the most interesting part long-term is ->d_init() - that will
  have a bunch of followups in (at least) ceph and lustre, but we'll
  need to sort the barrier-related rules before it can get used for
  really non-trivial stuff.

  Another fun thing is the merge of ->d_iput() callers (dentry_iput()
  and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
  except the one in __d_lookup_lru())"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  fs/dcache.c: avoid soft-lockup in dput()
  vfs: new d_init method
  vfs: Update lookup_dcache() comment
  bdev: get rid of ->bd_inodes
  Remove last traces of ->sync_page
  new helper: d_same_name()
  dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
  vfs: clean up documentation
  vfs: document ->d_real()
  vfs: merge .d_select_inode() into .d_real()
  unify dentry_iput() and dentry_unlink_inode()
  binfmt_misc: ->s_root is not going anywhere
  drop redundant ->owner initializations
  ufs: get rid of redundant checks
  orangefs: constify inode_operations
  missed comment updates from ->direct_IO() prototype change
  file_inode(f)->i_mapping is f->f_mapping
  trim fsnotify hooks a bit
  9p: new helper - v9fs_parent_fid()
  debugfs: ->d_parent is never NULL or negative
  ...
2016-07-28 12:59:05 -07:00
Mike Christie dfec8a14fc fs: have ll_rw_block users pass in op and flags separately
This has ll_rw_block users pass in the operation and flags separately,
so ll_rw_block can setup the bio op and bi_rw flags on the bio that
is submitted.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie 2a222ca992 fs: have submit_bh users pass in op and flags separately
This has submit_bh users pass in the operation and flags separately,
so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that
is submitted.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Al Viro e0d508f109 ufs: get rid of redundant checks
ufs_check_page() makes sure there's no entries with zero ->reclen

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-29 19:07:07 -04:00
Al Viro 3b0a3c1ac1 simple local filesystems: switch to ->iterate_shared()
no changes needed (XFS isn't simple, but it has the same parallelism
in the interesting parts exercised from CXFS).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:32 -04:00
Al Viro be5b82dbfe make ext2_get_page() and friends work without external serialization
Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
relies upon the directory being locked - the way it sets and tests Checked and
Error bits would be racy without that.  Switch to a slightly different scheme,
_not_ setting Checked in case of failure.  That way the logics becomes
	if Checked => OK
	else if Error => fail
	else if !validate => fail
	else => OK
with validation setting Checked or Error on success and failure resp. and
returning which one had happened.  Equivalent to the current logics, but unlike
the current logics not sensitive to the order of set_bit, test_bit getting
reordered by CPU, etc.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:47:25 -04:00
Al Viro 84695ffee7 Merge getxattr prototype change into work.lookups
The rest of work.xattr stuff isn't needed for this branch
2016-05-02 19:45:47 -04:00
Al Viro fc64005c93 don't bother with ->d_inode->i_sb - it's always equal to ->d_sb
... and neither can ever be NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 17:11:51 -04:00
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Vladimir Davydov 5d097056c9 kmemcg: account certain kmem allocations to memcg
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14 16:00:49 -08:00
Al Viro 21fc61c73c don't put symlink bodies in pagecache into highmem
kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08 22:41:36 -05:00
Al Viro 9cdce3c074 ufs: get rid of ->setattr() for symlinks
It was to needed for a couple of months in 2010, until UFS
quota support got dropped.  Since then it's equivalent to
simple_setattr() (i.e. the default) for everything except the
regular files.  And dropping it there allows to convert all
UFS symlinks to {page,simple}_symlink_inode_operations, getting
rid of fs/ufs/symlink.c completely.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 20:43:26 -05:00
Al Viro bd2843fe1f fix ufs write vs readpage race when writing into a hole
Followup to the UFS series - with the way we clear the new blocks (via
buffer cache, possibly on more than a page worth of file) we really
should not insert a reference to new block into inode block tree until
after we'd cleared it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-09 10:43:12 -07:00
Al Viro 4e317ce73a ufs_inode_get{frag,block}(): get rid of 'phys' argument
Just pass NULL as locked_page in case of first block in the indirect
chain.  Old calling conventions aside, a reason for having 'phys'
was that ufs_inode_getfrag() used to be able to do _two_ allocations
- indirect block and extending/reallocating a tail.  We needed
locked_page for the latter (it's a data), but we also needed to
figure out that indirect block is metadata.  So we used to pass
non-NULL locked_page in all cases *and* used NULL phys as
indication of being asked to allocate an indirect.

With tail unpacking taken into a separate function we don't need
those convolutions anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:05 -04:00
Al Viro 0385f1f9e3 ufs_getfrag_block(): tidy up a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:04 -04:00
Al Viro 5fbfb238f7 ufs_inode_getblock(): failure to read an indirect block is -EIO
... and not "write to beginning of the disk", TYVM...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:03 -04:00
Al Viro 4eeff4c932 ufs_getfrag_block(): turn following indirects into a loop
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:02 -04:00
Al Viro 5336970be0 ufs_inode_getfrag(): pass index instead of 'fragment'
same story as with ufs_inode_getblock()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:01 -04:00
Al Viro 0f3c1294be ufs_inode_getfrag(): split extending the partial blocks off
ufs_extend_tail() is handling that now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:40:00 -04:00
Al Viro 619cfac091 ufs_inode_getblock(): pass indirect block number and full index
... instead of messing with buffer_head.  We can bloody well do
sb_bread() in there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:59 -04:00
Al Viro 721435a767 ufs_inode_getblock(): pass index instead of 'fragment'
The value passed to ufs_inode_getblock() as the 3rd argument
had lower bits ignored; the upper bits were shifted down
and used and they actually make sense - those are _lower_ bits
of index in indirect block (i.e. they form the index within
a fragment within an indirect block).

Pass those as argument.  Upper bits of index (i.e. the number
of fragment within indirect block) will join them shortly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:58 -04:00
Al Viro 177848a018 ufs_inode_get{frag,block}(): leave sb_getblk() to caller
just return the damn block number

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:57 -04:00
Al Viro 8d9dcf1436 ufs_getfrag_block(): get rid of macro jungles
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:56 -04:00
Al Viro bbb3eb9d34 ufs_inode_get{frag,block}(): consolidate success exits
These calling conventions are rudiments of pre-2.3 times; they
really need to be sanitized.  This is the first step; next
will be _always_ returning a block number, instead of this
"return a pointer to buffer_head, except when we get to the
actual data" crap.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:55 -04:00
Al Viro 71dd42846f ufs: use the branch depth in ufs_getfrag_block()
we'd already calculated it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:54 -04:00
Al Viro 4b7068c8b1 ufs: move calculation of offsets into ufs_getfrag_block()
... and massage ufs_frag_map() to take those instead of fragment number.

As it is, we duplicate the damn thing on the write side, open-coded and
bloody hard to follow.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:53 -04:00
Al Viro 5a39c25562 ufs_inode_get{frag,block}(): get rid of retries
We are holding ->truncate_mutex, so nobody else can alter our
block pointers.  Rechecks/retries were needed back when we
only held BKL there, and had to cope with write_begin/writepage
and writepage/truncate races.  Can't happen anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:52 -04:00
Al Viro f53bd1421b __ufs_truncate_blocks(): avoid excessive dirtying of indirect blocks
There's a case when an indirect block gets dirtied for no good
reason - when there's a hole starting in the middle of area
covered by it and spanning past its end, and truncate() is done
precisely to the beginning of the hole.

The block is obviously not modified at all - all removals happen
beyond it.  However, existing code ends up dirtying it just in
case.  It's trivial to fix and while it's not a real bug by any
stretch of imagination, it makes the damn thing harder to follow.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:51 -04:00
Al Viro cc7231e309 free_full_branch(): don't bother modifying the block we are going to free
Note that it's already made unreachable from the inode, so we don't have
to worry about ufs_frag_map() walking into something already freed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:50 -04:00
Al Viro b6eede0ec6 move marking inode dirty to the end of __ufs_truncate_blocks()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:49 -04:00
Al Viro 163073db51 free_full_branch(): saner calling conventions
Have caller fetch the block number *and* remove it from wherever
it was.  Pass the block number instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:48 -04:00
Al Viro 7b4e4f7f81 ufs_trunc_branch(): kill recursion
turn recursion into a pair of loops

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:47 -04:00
Al Viro 6aab6dd379 ufs_trunc_branch(): massage towards killing recursion
We always have 0 < depth2 <= depth in there, so
if (--depth) {
	if (--depth2)
		A
	B
} else {
	C // not using depth2
}
D // not using depth2

is equivalent to

if (--depth2)
	A with s/depth/depth - 1/
if (--depth)
	B
else
	C
D

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:46 -04:00
Al Viro 6d1ebbca2b split ufs_truncate_branch() into full- and partial-branch variants
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:45 -04:00
Al Viro a138b4b688 ufs: unify the logics for collecting adjacent data blocks to free
open-coded in several places...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:44 -04:00
Al Viro a96574233c ufs_trunc_branch(): separate the calls with non-NULL offsets
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:43 -04:00
Al Viro 97e0f8f87c ufs_trunc_branch(): never call with offsets != NULL && depth2 == 0
For calls in __ufs_truncate_blocks() it's just a matter of not
incrementing offsets[0] and not making that call - immediately
following loop will be executed one extra time and we'll be just
fine.  For recursive call in ufs_trunc_branch() itself, just
assing NULL to offsets if we would be about to make such call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:42 -04:00
Al Viro 42432739b5 __ufs_trunc_blocks(): turn the part after switch into a loop
... and turn the switch into if (), since all cases with
depth != 1 have just become identical.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:41 -04:00
Al Viro ef3a315d4c __ufs_truncate_blocks(): unify freeing the full branches
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:40 -04:00
Al Viro 9e0fbbde27 unify ufs_trunc_..indirect()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:39 -04:00
Al Viro 6775e24d9c ufs_trunc_..indirect(): more massage towards unifying
Instead of manually checking that the array contains only zeroes,
find the position of the last non-zero (in __ufs_truncate(), where
we can conveniently do that) and use that to tell if there's
any non-zero in the array tail passed to ufs_trunc_...indirect().

The goal of all that clumsiness is to get fold these functions
together.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:38 -04:00
Al Viro 85416288bf ufs_trunc_...indirect(): pass the array of indices instead of offsets
rather than bitslicing the offset just formed as sum of shifted indices,
pass the array of those indices itself.  NULL is used as equivalent
of "all zeroes" (== free the entire branch).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:37 -04:00
Al Viro 7a4fdda724 __ufs_truncate(); find cutoff distances into branches by offsets[] array
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:36 -04:00
Al Viro 7bad5939fc ufs_trunc_dindirect(): pass the number of blocks to keep
same as the previous two.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-06 17:39:35 -04:00