linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	83373f7028	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "double iput() on failure exit in lustre, racy removal of spliced dentries from ->s_anon in __d_materialise_dentry() plus a bunch of assorted RCU pathwalk fixes" The RCU pathwalk fixes end up fixing a couple of cases where we incorrectly dropped out of RCU walking, due to incorrect initialization and testing of the sequence locks in some corner cases. Since dropping out of RCU walk mode forces the slow locked accesses, those corner cases slowed down quite dramatically. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: be careful with nd->inode in path_init() and follow_dotdot_rcu() don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() fix bogus read_seqretry() checks introduced in `b37199e` move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon) [fix] lustre: d_make_root() does iput() on dentry allocation failure	2014-09-14 17:37:36 -07:00
Al Viro	6f18493e54	move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon) and lock the right list there Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-13 22:14:03 -04:00
Linus Torvalds	99d263d4c5	vfs: fix bad hashing of dentries Josef Bacik found a performance regression between 3.2 and 3.10 and narrowed it down to commit `bfcfaa77bd` ("vfs: use 'unsigned long' accesses for dcache name comparison and hashing"). He reports: "The test case is essentially for (i = 0; i < 1000000; i++) mkdir("a$i"); On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k dir/sec with 3.10. This is because we spend waaaaay more time in __d_lookup on 3.10 than in 3.2. The new hashing function for strings is suboptimal for < sizeof(unsigned long) string names (and hell even > sizeof(unsigned long) string names that I've tested). I broke out the old hashing function and the new one into a userspace helper to get real numbers and this is what I'm getting: Old hash table had 1000000 entries, 0 dupes, 0 max dupes New hash table had 12628 entries, 987372 dupes, 900 max dupes We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash My test does the hash, and then does the d_hash into a integer pointer array the same size as the dentry hash table on my system, and then just increments the value at the address we got to see how many entries we overlap with. As you can see the old hash function ended up with all 1 million entries in their own bucket, whereas the new one they are only distributed among ~12.5k buckets, which is why we're using so much more CPU in __d_lookup". The reason for this hash regression is two-fold: - On 64-bit architectures the down-mixing of the original 64-bit word-at-a-time hash into the final 32-bit hash value is very simplistic and suboptimal, and just adds the two 32-bit parts together. In particular, because there is no bit shuffling and the mixing boundary is also a byte boundary, similar character patterns in the low and high word easily end up just canceling each other out. - the old byte-at-a-time hash mixed each byte into the final hash as it hashed the path component name, resulting in the low bits of the hash generally being a good source of hash data. That is not true for the word-at-a-time case, and the hash data is distributed among all the bits. The fix is the same in both cases: do a better job of mixing the bits up and using as much of the hash data as possible. We already have the "hash_32\|64()" functions to do that. Reported-by: Josef Bacik <jbacik@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-13 11:30:10 -07:00
Fengguang Wu	49c7dd287a	fs: mark __d_obtain_alias static Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:11 -04:00
J. Bruce Fields	95ad5c2913	dcache: d_splice_alias should detect loops I believe this can only happen in the case of a corrupted filesystem. So -EIO looks like the appropriate error. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:11 -04:00
J. Bruce Fields	8d80d7dabe	dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED If we get to this point and discover the dentry is not a root dentry, or not DCACHE_DISCONNECTED--great, we always prefer that anyway. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:10 -04:00
J. Bruce Fields	52ed46f0fa	dcache: remove unused d_find_alias parameter Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:10 -04:00
J. Bruce Fields	1a0a397e41	dcache: d_obtain_alias callers don't all want DISCONNECTED There are a few d_obtain_alias callers that are using it to get the root of a filesystem which may already have an alias somewhere else. This is not the same as the filehandle-lookup case, and none of them actually need DCACHE_DISCONNECTED set. It isn't really a serious problem, but it would really be clearer if we reserved DCACHE_DISCONNECTED for those cases where it's actually needed. In the btrfs case this was causing a spurious printk from nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED dentry. Josef worked around this by unsetting DCACHE_DISCONNECTED manually in `3a0dfa6a12` "Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol", and this replaces that workaround. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:10 -04:00
J. Bruce Fields	da093a9b76	dcache: d_splice_alias should ignore DCACHE_DISCONNECTED Any IS_ROOT() alias should be safe to use; there's nothing special about DCACHE_DISCONNECTED dentries. Note that this is in fact useful for filesystems such as btrfs which can legimately encounter a directory with a preexisting IS_ROOT alias on a lookup that crosses into a subvolume. (Those aliases are currently marked DCACHE_DISCONNECTED--but not really for any good reason, and we'll change that soon.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:10 -04:00
J. Bruce Fields	908790fa3b	dcache: d_splice_alias mustn't create directory aliases Currently if d_splice_alias finds a directory with an alias that is not IS_ROOT or not DCACHE_DISCONNECTED, it creates a duplicate directory. Duplicate directory dentries are unacceptable; it is better just to error out. (In the case of a local filesystem the most likely case is filesystem corruption: for example, perhaps two directories point to the same child directory, and the other parent has already been found and cached.) Note that distributed filesystems may encounter this case in normal operation if a remote host moves a directory to a location different from the one we last cached in the dcache. For that reason, such filesystems should instead use d_materialise_unique, which tries to move the old directory alias to the right place instead of erroring out. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:10 -04:00
J. Bruce Fields	75a2352d01	dcache: close d_move race in d_splice_alias d_splice_alias will d_move an IS_ROOT() directory dentry into place if one exists. This should be safe as long as the dentry remains IS_ROOT, but I can't see what guarantees that: once we drop the i_lock all we hold here is the i_mutex on an unrelated parent directory. Instead copy the logic of d_materialise_unique. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:10 -04:00
J. Bruce Fields	3f70bd51cb	dcache: move d_splice_alias Just a trivial move to locate it near (similar) d_materialise_unique code and save some forward references in a following patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:10 -04:00
Linus Torvalds	16b9057804	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...	2014-06-12 10:30:18 -07:00
Al Viro	c2338f2dc7	lock_parent: don't step on stale ->d_parent of all-but-freed one Dentry that had been through (or into) __dentry_kill() might be seen by shrink_dentry_list(); that's normal, it'll be taken off the shrink list and freed if __dentry_kill() has already finished. The problem is, its ->d_parent might be pointing to already freed dentry, so lock_parent() needs to be careful. We need to check that dentry hasn't already gone into __dentry_kill() and grab rcu_read_lock() before dropping ->d_lock - the latter makes sure that whatever we see in ->d_parent after dropping ->d_lock it won't be freed until we drop rcu_read_lock(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:29:13 -04:00
Joe Perches	1f7e0616cd	fs: convert use of typedef ctl_table to struct ctl_table This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-06 16:08:16 -07:00
Linus Torvalds	9f12600fe4	dcache: add missing lockdep annotation lock_parent() very much on purpose does nested locking of dentries, and is careful to maintain the right order (lock parent first). But because it didn't annotate the nested locking order, lockdep thought it might be a deadlock on d_lock, and complained. Add the proper annotation for the inner locking of the child dentry to make lockdep happy. Introduced by commit `046b961b45` ("shrink_dentry_list(): take parent's ->d_lock earlier"). Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-31 09:13:21 -07:00
Al Viro	8cbf74da43	dentry_kill() doesn't need the second argument now it's 1 in the only remaining caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-30 11:10:33 -04:00
Al Viro	b2b80195d8	dealing with the rest of shrink_dentry_list() livelock We have the same problem with ->d_lock order in the inner loop, where we are dropping references to ancestors. Same solution, basically - instead of using dentry_kill() we use lock_parent() (introduced in the previous commit) to get that lock in a safe way, recheck ->d_count (in case if lock_parent() has ended up dropping and retaking ->d_lock and somebody managed to grab a reference during that window), trylock the inode->i_lock and use __dentry_kill() to do the rest. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-30 11:10:33 -04:00
Al Viro	046b961b45	shrink_dentry_list(): take parent's ->d_lock earlier The cause of livelocks there is that we are taking ->d_lock on dentry and its parent in the wrong order, forcing us to use trylock on the parent's one. d_walk() takes them in the right order, and unfortunately it's not hard to create a situation when shrink_dentry_list() can't make progress since trylock keeps failing, and shrink_dcache_parent() or check_submounts_and_drop() keeps calling d_walk() disrupting the very shrink_dentry_list() it's waiting for. Solution is straightforward - if that trylock fails, let's unlock the dentry itself and take locks in the right order. We need to stabilize ->d_parent without holding ->d_lock, but that's doable using RCU. And we'd better do that in the very beginning of the loop in shrink_dentry_list(), since the checks on refcount, etc. would need to be redone anyway. That deals with a half of the problem - killing dentries on the shrink list itself. Another one (dropping their parents) is in the next commit. locking parent is interesting - it would be easy to do rcu_read_lock(), lock whatever we think is a parent, lock dentry itself and check if the parent is still the right one. Except that we need to check that before locking the dentry, or we are risking taking ->d_lock out of order. Fortunately, once the D1 is locked, we can check if D2->d_parent is equal to D1 without the need to lock D2; D2->d_parent can start or stop pointing to D1 only under D1->d_lock, so taking D1->d_lock is enough. In other words, the right solution is rcu_read_lock/lock what looks like parent right now/check if it's still our parent/rcu_read_unlock/lock the child. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-30 11:03:21 -04:00
Al Viro	ff2fde9929	expand dentry_kill(dentry, 0) in shrink_dentry_list() Result will be massaged to saner shape in the next commits. It is ugly, no questions - the point of that one is to be a provably equivalent transformation (and it might be worth splitting a bit more). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-29 08:50:08 -04:00
Al Viro	e55fd01154	split dentry_kill() ... into trylocks and everything else. The latter (actual killing) is __dentry_kill(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-29 08:46:08 -04:00
Al Viro	64fd72e0a4	lift the "already marked killed" case into shrink_dentry_list() It can happen only when dentry_kill() is called with unlock_on_failure equal to 0 - other callers had dentry pinned until the moment they've got ->d_lock and DCACHE_DENTRY_KILLED is set only after lockref_mark_dead(). IOW, only one of three call sites of dentry_kill() might end up reaching that code. Just move it there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-28 09:48:44 -04:00
Miklos Szeredi	60942f2f23	dcache: don't need rcu in shrink_dentry_list() Since now the shrink list is private and nobody can free the dentry while it is on the shrink list, we can remove RCU protection from this. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-03 16:46:16 -04:00
Al Viro	9c8c10e262	more graceful recovery in umount_collect() Start with shrink_dcache_parent(), then scan what remains. First of all, BUG() is very much an overkill here; we are holding ->s_umount, and hitting BUG() means that a lot of interesting stuff will be hanging after that point (sync(2), for example). Moreover, in cases when there had been more than one leak, we'll be better off reporting all of them. And more than just the last component of pathname - %pd is there for just such uses... That was the last user of dentry_lru_del(), so kill it off... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-03 16:46:13 -04:00
Al Viro	fe91522a7b	don't remove from shrink list in select_collect() If we find something already on a shrink list, just increment data->found and do nothing else. Loops in shrink_dcache_parent() and check_submounts_and_drop() will do the right thing - everything we did put into our list will be evicted and if there had been nothing, but data->found got non-zero, well, we have somebody else shrinking those guys; just try again. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-03 16:45:06 -04:00
Al Viro	41edf278fc	dentry_kill(): don't try to remove from shrink list If the victim in on the shrink list, don't remove it from there. If shrink_dentry_list() manages to remove it from the list before we are done - fine, we'll just free it as usual. If not - mark it with new flag (DCACHE_MAY_FREE) and leave it there. Eventually, shrink_dentry_list() will get to it, remove the sucker from shrink list and call dentry_kill(dentry, 0). Which is where we'll deal with freeing. Since now dentry_kill(dentry, 0) may happen after or during dentry_kill(dentry, 1), we need to recognize that (by seeing DCACHE_DENTRY_KILLED already set), unlock everything and either free the sucker (in case DCACHE_MAY_FREE has been set) or leave it for ongoing dentry_kill(dentry, 1) to deal with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-01 10:30:00 -04:00
Al Viro	01b6035190	expand the call of dentry_lru_del() in dentry_kill() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-04-30 18:02:52 -04:00
Al Viro	b4f0354e96	new helper: dentry_free() The part of old d_free() that dealt with actual freeing of dentry. Taken out of dentry_kill() into a separate function. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-04-30 18:02:52 -04:00
Al Viro	5c47e6d0ad	fold try_prune_one_dentry() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-04-30 18:02:51 -04:00
Al Viro	03b3b889e7	fold d_kill() and d_free() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-04-30 18:02:51 -04:00
Al Viro	22213318af	fix races between __d_instantiate() and checks of dentry flags in non-lazy walk we need to be careful about dentry switching from negative to positive - both ->d_flags and ->d_inode are updated, and in some places we might see only one store. The cases where dentry has been obtained by dcache lookup with ->i_mutex held on parent are safe - ->d_lock and ->i_mutex provide all the barriers we need. However, there are several places where we run into trouble: * do_last() fetches ->d_inode, then checks ->d_flags and assumes that inode won't be NULL unless d_is_negative() is true. Race with e.g. creat() - we might have fetched the old value of ->d_inode (still NULL) and new value of ->d_flags (already not DCACHE_MISS_TYPE). Lin Ming has observed and reported the resulting oops. * a bunch of places checks ->d_inode for being non-NULL, then checks ->d_flags for "is it a symlink". Race with symlink(2) in case if our CPU sees ->d_inode update first - we see non-NULL there, but ->d_flags still contains DCACHE_MISS_TYPE instead of DCACHE_SYMLINK_TYPE. Result: false negative on "should we follow link here?", with subsequent unpleasantness. Cc: stable@vger.kernel.org # 3.13 and 3.14 need that one Reported-and-tested-by: Lin Ming <minggr@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-04-19 12:30:58 -04:00
Linus Torvalds	e9f37d3a8d	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "Highlights: - drm: Generic display port aux features, primary plane support, drm master management fixes, logging cleanups, enforced locking checks (instead of docs), documentation improvements, minor number handling cleanup, pseudofs for shared inodes. - ttm: add ability to allocate from both ends - i915: broadwell features, power domain and runtime pm, per-process address space infrastructure (not enabled) - msm: power management, hdmi audio support - nouveau: ongoing GPU fault recovery, initial maxwell support, random fixes - exynos: refactored driver to clean up a lot of abstraction, DP support moved into drm, LVDS bridge support added, parallel panel support - gma500: SGX MMU support, SGX irq handling, asle irq work fixes - radeon: video engine bringup, ring handling fixes, use dp aux helpers - vmwgfx: add rendernode support" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (849 commits) DRM: armada: fix corruption while loading cursors drm/dp_helper: don't return EPROTO for defers (v2) drm/bridge: export ptn3460_init function drm/exynos: remove MODULE_DEVICE_TABLE definitions ARM: dts: exynos4412-trats2: enable exynos/fimd node ARM: dts: exynos4210-trats: enable exynos/fimd node ARM: dts: exynos4412-trats2: add panel node ARM: dts: exynos4210-trats: add panel node ARM: dts: exynos4: add MIPI DSI Master node drm/panel: add S6E8AA0 driver ARM: dts: exynos4210-universal_c210: add proper panel node drm/panel: add ld9040 driver panel/ld9040: add DT bindings panel/s6e8aa0: add DT bindings drm/exynos: add DSIM driver exynos/dsim: add DT bindings drm/exynos: disallow fbdev initialization if no device is connected drm/mipi_dsi: create dsi devices only for nodes with reg property drm/mipi_dsi: add flags to DSI messages Skip intel_crt_init for Dell XPS 8700 ...	2014-04-08 09:52:16 -07:00
Miklos Szeredi	da1ce0670c	vfs: add cross-rename If flags contain RENAME_EXCHANGE then exchange source and destination files. There's no restriction on the type of the files; e.g. a directory can be exchanged with a symlink. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:43 +02:00
Daniel Vetter	0654a65f26	Linux 3.14 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTOOOnAAoJEHm+PkMAQRiGsBAH/2PAOL3TbOG6tEedxQrTwsr2 muRIRTVWawjT8/npbHupxGnAyAVdmdffBHpmCmcftKdKNryT3YZW8/JWoYc+WSlo 3vTDJHDOYAe6yCBjjhYwcu150THBQdOymOi5mbbclo0XWYG18jd3+abYprRH6SiD XqNSzYqoiv91JHBAWKBIpo1cyRDuwoM7+jZ7gX41r2800EL7loY3e08cPDDNU6HA CKaLXMwLwYTefE+Wnr+4UUr08NbNBbBUKLUSXVqKKIpd+MtbyhV1SnWzz8VQSkag K/uzsnGnE7nrqoepMSx3nXxzOWxUSY2EMbwhEjaKK4xBq9C9pzv3sG/o2/IyopU= =Nuom -----END PGP SIGNATURE----- Merge tag 'v3.14' into drm-intel-next-queued Linux 3.14 The vt-d w/a merged late in 3.14-rc needs a bit of fine-tuning, hence backmerge. Conflicts: drivers/gpu/drm/i915/i915_gem_gtt.c drivers/gpu/drm/i915/intel_ddi.c drivers/gpu/drm/i915/intel_dp.c All trivial adjacent lines changed type conflicts, so trivial git doesn't even show them in the merg commit. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-31 10:45:15 +02:00
Al Viro	e825196d48	make prepend_name() work correctly when called with negative buflen In all callchains leading to prepend_name(), the value left in buflen is eventually discarded unused if prepend_name() has returned a negative. So we are free to do what prepend() does, and subtract from buflen before* checking for underflow (which turns into checking the sign of subtraction result, of course). Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-03-23 00:28:40 -04:00
David Herrmann	31bbe16f6d	drm: add pseudo filesystem for shared inodes Our current DRM design uses a single address_space for all users of the same DRM device. However, there is no way to create an anonymous address_space without an underlying inode. Therefore, we wait for the first ->open() callback on a registered char-dev and take-over the inode of the char-dev. This worked well so far, but has several drawbacks: - We screw with FS internals and rely on some non-obvious invariants like inode->i_mapping being the same as inode->i_data for char-devs. - We don't have any address_space prior to the first ->open() from user-space. This leads to ugly fallback code and we cannot allocate global objects early. As pointed out by Al-Viro, fs/anon_inode.c is not supposed to be used by drivers for anonymous inode-allocation. Therefore, this patch follows the proposed alternative solution and adds a pseudo filesystem mount-point to DRM. We can then allocate private inodes including a private address_space for each DRM device at initialization time. Note that we could use: sysfs_get_inode(sysfs_mnt->mnt_sb, drm_device->dev->kobj.sd); to get access to the underlying sysfs-inode of a "struct device" object. However, most of this information is currently hidden and it's not clear whether this address_space is suitable for driver access. Thus, unless linux allows anonymous address_space objects or driver-core provides a public inode per device, we're left with our own private internal mount point. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>	2014-03-16 12:17:03 +01:00
Al Viro	f650080152	__dentry_path() fixes * we need to save the starting point for restarts * reject pathologically short buffers outright Spotted-by: Denys Vlasenko <dvlasenk@redhat.com> Spotted-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-01-26 12:37:55 -05:00
Eric W. Biederman	a8323da036	vfs: Remove second variable named error in __dentry_path In commit `232d2d60aa` Author: Waiman Long <Waiman.Long@hp.com> Date: Mon Sep 9 12:18:13 2013 -0400 dcache: Translating dentry into pathname without taking rename_lock The __dentry_path locking was changed and the variable error was intended to be moved outside of the loop. Unfortunately the inner declaration of error was not removed. Resulting in a version of __dentry_path that will never return an error. Remove the problematic inner declaration of error and allow __dentry_path to return errors once again. Cc: stable@vger.kernel.org Cc: Waiman Long <Waiman.Long@hp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-01-26 08:26:43 -05:00
Linus Torvalds	48ba620aab	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "This is a set of 3 regression fixes. This fixes /proc/mounts when using "ip netns add <netns>" to display the actual mount point. This fixes a regression in clone that broke lxc-attach. This fixes a regression in the permission checks for mounting /proc that made proc unmountable if binfmt_misc was in use. Oops. My apologies for sending this pull request so late. Al Viro gave interesting review comments about the d_path fix that I wanted to address in detail before I sent this pull request. Unfortunately a bad round of colds kept from addressing that in detail until today. The executive summary of the review was: Al: Is patching d_path really sufficient? The prepend_path, d_path, d_absolute_path, and __d_path family of functions is a really mess. Me: Yes, patching d_path is really sufficient. Yes, the code is mess. No it is not appropriate to rewrite all of d_path for a regression that has existed for entirely too long already, when a two line change will do" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: vfs: Fix a regression in mounting proc fork: Allow CLONE_PARENT after setns(CLONE_NEWPID) vfs: In d_path don't call d_dname on a mount point	2014-01-17 17:29:36 -08:00
Will Deacon	a5c21dcefa	dcache: allow word-at-a-time name hashing with big-endian CPUs When explicitly hashing the end of a string with the word-at-a-time interface, we have to be careful which end of the word we pick up. On big-endian CPUs, the upper-bits will contain the data we're after, so ensure we generate our masks accordingly (and avoid hashing whatever random junk may have been sitting after the string). This patch adds a new dcache helper, bytemask_from_count, which creates a mask appropriate for the CPU endianness. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-12-12 10:39:01 -08:00
Eric W. Biederman	f48cfddc67	vfs: In d_path don't call d_dname on a mount point Aditya Kali (adityakali@google.com) wrote: > Commit bf056bfa80596a5d14b26b17276a56a0dcb080e5: > "proc: Fix the namespace inode permission checks." converted > the namespace files into symlinks. The same commit changed > the way namespace bind mounts appear in /proc/mounts: > $ mount --bind /proc/self/ns/ipc /mnt/ipc > Originally: > $ cat /proc/mounts \| grep ipc > proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0 > > After commit bf056bfa80596a5d14b26b17276a56a0dcb080e5: > $ cat /proc/mounts \| grep ipc > proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0 > > This breaks userspace which expects the 2nd field in > /proc/mounts to be a valid path. The symlink /proc/<pid>/ns/{ipc,mnt,net,pid,user,uts} point to dentries allocated with d_alloc_pseudo that we can mount, and that have interesting names printed out with d_dname. When these files are bind mounted /proc/mounts is not currently displaying the mount point correctly because d_dname is called instead of just displaying the path where the file is mounted. Solve this by adding an explicit check to distinguish mounted pseudo inodes and unmounted pseudo inodes. Unmounted pseudo inodes always use mount of their filesstem as the mnt_root in their path making these two cases easy to distinguish. CC: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-11-26 20:53:58 -08:00
Al Viro	31dec1327e	fold try_to_ascend() into the sole remaining caller There used to be a bunch of tree-walkers in dcache.c, all alike. try_to_ascend() had been introduced to abstract a piece of logics duplicated in all of them. These days all these tree-walkers are implemented via the same iterator (d_walk()), which is the only remaining caller of try_to_ascend(), so let's fold it back... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-15 22:04:17 -05:00
Al Viro	482db90661	dcache.c: get rid of pointless macros D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()). At this point they are actively misguiding - they imply that values are compiler constants, which is no longer true. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-15 22:04:17 -05:00
Al Viro	2bc74feba1	take read_seqbegin_or_lock() and friends to seqlock.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-15 22:04:17 -05:00
Linus Torvalds	5e30025a31	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest changes: - add lockdep support for seqcount/seqlocks structures, this unearthed both bugs and required extra annotation. - move the various kernel locking primitives to the new kernel/locking/ directory" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) block: Use u64_stats_init() to initialize seqcounts locking/lockdep: Mark __lockdep_count_forward_deps() as static lockdep/proc: Fix lock-time avg computation locking/doc: Update references to kernel/mutex.c ipv6: Fix possible ipv6 seqlock deadlock cpuset: Fix potential deadlock w/ set_mems_allowed seqcount: Add lockdep functionality to seqcount/seqlock structures net: Explicitly initialize u64_stats_sync structures for lockdep locking: Move the percpu-rwsem code to kernel/locking/ locking: Move the lglocks code to kernel/locking/ locking: Move the rwsem code to kernel/locking/ locking: Move the rtmutex code to kernel/locking/ locking: Move the semaphore core to kernel/locking/ locking: Move the spinlock code to kernel/locking/ locking: Move the lockdep code to kernel/locking/ locking: Move the mutex code to kernel/locking/ hung_task debugging: Add tracepoint to report the hang x86/locking/kconfig: Update paravirt spinlock Kconfig description lockstat: Report avg wait and hold times lockdep, x86/alternatives: Drop ancient lockdep fixup message ...	2013-11-14 16:30:30 +09:00
Al Viro	ede4cebce1	prepend_path() needs to reinitialize dentry/vfsmount/mnt on restarts ... and equivalent is needed in 3.12; it's broken there as well Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 07:45:40 -05:00
Li Zhong	4ec6c2aeab	fix unpaired rcu lock in prepend_path() Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 07:43:10 -05:00
Linus Torvalds	9bc9ccd7db	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "All kinds of stuff this time around; some more notable parts: - RCU'd vfsmounts handling - new primitives for coredump handling - files_lock is gone - Bruce's delegations handling series - exportfs fixes plus misc stuff all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits) ecryptfs: ->f_op is never NULL locks: break delegations on any attribute modification locks: break delegations on link locks: break delegations on rename locks: helper functions for delegation breaking locks: break delegations on unlink namei: minor vfs_unlink cleanup locks: implement delegations locks: introduce new FL_DELEG lock flag vfs: take i_mutex on renamed file vfs: rename I_MUTEX_QUOTA now that it's not used for quotas vfs: don't use PARENT/CHILD lock classes for non-directories vfs: pull ext4's double-i_mutex-locking into common code exportfs: fix quadratic behavior in filehandle lookup exportfs: better variable name exportfs: move most of reconnect_path to helper function exportfs: eliminate unused "noprogress" counter exportfs: stop retrying once we race with rename/remove exportfs: clear DISCONNECTED on all parents sooner exportfs: more detailed comment for path_reconnect ...	2013-11-13 15:34:18 +09:00
J. Bruce Fields	f80de2cde1	dcache: don't clear DCACHE_DISCONNECTED too early DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is connected all the way up to the root of the filesystem. It shouldn't be cleared as soon as the dentry is connected to a parent. That will cause bugs at least on exportable filesystems. Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-09 00:16:34 -05:00
J. Bruce Fields	e1a24bb0aa	dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries I can't for the life of me see any reason why anyone should care whether a dentry that is never hooked into the dentry cache would need DCACHE_DISCONNECTED set. This originates from `4b936885ab` "fs: improve scalability of pseudo filesystems", which probably just made the false assumption the DCACHE_DISCONNECTED was meant to be set on anything not connected to a parent somehow. So this is just confusing. Ideally the only uses of DCACHE_DISCONNECTED would be in the filehandle-lookup code, which needs it to ensure dentries are connected into the dentry tree before use. I left d_alloc_pseudo there even though it's now equivalent to __d_alloc(), just on the theory the name is better documentation of its intended use outside dcache.c. Cc: Nick Piggin <npiggin@kernel.dk> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-09 00:16:33 -05:00

1 2 3 4 5 ...

356 Commits