linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	c95389b4cd	Merge branch 'akpm' (patches from Andrew Morton) Merge fixes from Andrew Morton: "Five fixes. err, make that six. let me try again" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers memcg: check that kmem_cache has memcg_params before accessing it drivers/base/memory.c: fix show_mem_removable() to handle missing sections IPC: bugfix for msgrcv with msgtyp < 0 Omnikey Cardman 4000: pull in ioctl.h in user header timer_list: correct the iterator for timer_list	2013-08-28 19:31:33 -07:00
Goldwyn Rodrigues	49fa8140e4	fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers While using pacemaker/corosync, the node numbers are generated using IP address as opposed to serial node number generation. This may not fit in a 8-byte string. Use a bigger string to print the complete node number. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-28 19:26:38 -07:00
Waiman Long	98474236f7	vfs: make the dentry cache use the lockref infrastructure This just replaces the dentry count/lock combination with the lockref structure that contains both a count and a spinlock, and does the mechanical conversion to use the lockref infrastructure. There are no semantic changes here, it's purely syntactic. The reference lockref implementation uses the spinlock exactly the same way that the old dcache code did, and the bulk of this patch is just expanding the internal "d_count" use in the dcache code to use "d_lockref.count" instead. This is purely preparation for the real change to make the reference count updates be lockless during the 3.12 merge window. [ As with the previous commit, this is a rewritten version of a concept originally from Waiman, so credit goes to him, blame for any errors goes to me. Waiman's patch had some semantic differences for taking advantage of the lockless update in dget_parent(), while this patch is intentionally a pure search-and-replace change with no semantic changes. - Linus ] Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-28 18:24:59 -07:00
Linus Torvalds	f0cc6ffb8c	Revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink" This reverts commit `bb2314b479`. It wasn't necessarily wrong per se, but we're still busily discussing the exact details of this all, so I'm going to revert it for now. It's true that you can already do flink() through /proc and that flink() isn't new. But as Brad Spengler points out, some secure environments do not mount proc, and flink adds a new interface that can avoid path lookup of the source for those kinds of environments. We may re-do this (and even mark it for stable backporting back in 3.11 and possibly earlier) once the whole discussion about the interface is done. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-28 09:18:05 -07:00
Linus Torvalds	83c425d222	One JFS patch to fix an incompatibility with NFSv4 resulting in the nfs client reporting a readdir loop. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (GNU/Linux) iQIcBAABAgAGBQJSG9IHAAoJEDaohF61QIxkptgP/jTEooTZ2uMmIouqj6rhrc81 avrqtB26Ww74XBzuyiTuSBLUUJXGaLveC/i9rR2XOyA7cXJUbrqvqpZ2OExOURsf gHeLX20RKwKgaRQ6R9Xcri6wjWql4YHL/z3tI/DGpDkMfA0siocbHW+GZSfmMZ8c EOcAECY+tqrAgFL1oESTinglGNZ2V1f/IyKB8DiUDgDuMkV6Zw3083Ph7xB/bw9T Jffg6nvkST2mzZNTKgU8W3extW/X9GxxleYLED2jwEYioOFVAlvSKZTFD65NVUya 1l3YH68jROnDSoHmgOup0C6i51/e7QFuBNdyR/8UK2OyOXkJ1wneXutUIiysWWty dY9+kxSSdpNuZvflGrZlP7yoEjGgRO/893owUcusiSgLTpQpNs2y7OVjCBwsHGa7 AIWyu+FyOnNnmO6oiYkmNQlE3bAoz3z0CO7IuP5lm5HRgMIxp+k2k8yOHon/PcuF juQsbMydGcNVAjJlQuxCDh1uijOGLbDol/NekpUnyL02oy294raiWdy4IUaqWIHG Y5i1yeVwFbr7QsI8RCbEliCRwP5YMWSp4irEUozJTDplAn4AnJ5AJdYq9KM5JHMm qBHXl/asWbxGQZhmig2xzojZ+HVPFVWilpBz7OvsMXJaulrRqxV3DgAudIlZPP4B mB8DPzctwmgzmlgCqzRW =+aYs -----END PGP SIGNATURE----- Merge tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggy Pull jfs fix from Dave Kleikamp: "One JFS patch to fix an incompatibility with NFSv4 resulting in the nfs client reporting a readdir loop" * tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggy: jfs: fix readdir cookie incompatibility with NFSv4	2013-08-26 19:22:49 -07:00
Linus Torvalds	4d4323ea2d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Assorted fixes from the last week or so" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: collect_mounts() should return an ERR_PTR bfs: iget_locked() doesn't return an ERR_PTR efs: iget_locked() doesn't return an ERR_PTR() proc: kill the extra proc_readfd_common()->dir_emit_dots() cope with potentially long ->d_dname() output for shmem/hugetlb	2013-08-25 12:25:38 -07:00
Linus Torvalds	5befb98b30	SCSI fixes on 20130824 This is a set of small bug fixes for lpfc and zfcp and a fix for a fairly nasty bug in sg where a process which cancels I/O completes in a kernel thread which would then try to write back to the now gone userspace and end up writing to a random kernel address instead. Signed-off-by: James Bottomley <JBottomley@Parallels.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJSGIaHAAoJEDeqqVYsXL0MbPUH/3UXceHlgYRrwYZ0C10Ao5XB WA8RWsDsX9UJxG68zEd8ED1aRHhmkfm4pEdMQ8DHW7+B7mvNhpb6mF0wxvmS5aIj OVI0G+3KmghA3aDWbTtg8ED0wJ4q3ftcyzl4Fhpat+yA4g/BW7iJNDCv17nvZ90f hNmdGm23wuYCid7JWNDO79spSp0q6wPJhG6ynJYOtzX1GvpEliZiGB0IOR3K44nW cF6+Uigs3+6RGXX9UHOMrk9Ug3YFHfok224vvydcbRXVh8uneB8RQ6tziJVFI8tE WPgAv2oDzly8l+ku71CqrjzG7fSwCr9Urlog9cEugE1iUmFCIQm6xJcSSnGJqaY= =jKT6 -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of small bug fixes for lpfc and zfcp and a fix for a fairly nasty bug in sg where a process which cancels I/O completes in a kernel thread which would then try to write back to the now gone userspace and end up writing to a random kernel address instead" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] zfcp: remove access control tables interface (keep sysfs files) [SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops [SCSI] zfcp: fix lock imbalance by reworking request queue locking [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal [SCSI] lpfc: Don't force CONFIG_GENERIC_CSUM on	2013-08-24 11:33:21 -07:00
Dan Carpenter	52e220d357	VFS: collect_mounts() should return an ERR_PTR This should actually be returning an ERR_PTR on error instead of NULL. That was how it was designed and all the callers expect it. [AV: actually, that's what "VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors" missed - originally collect_mounts() was expected to return NULL on failure] Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-08-24 12:10:29 -04:00
Dan Carpenter	821ff77c6c	bfs: iget_locked() doesn't return an ERR_PTR iget_locked() returns a NULL on error, it doesn't return an ERR_PTR. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-08-24 12:10:22 -04:00
Dan Carpenter	136eefa48d	efs: iget_locked() doesn't return an ERR_PTR() The iget_locked() function returns NULL on error and never an ERR_PTR. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-08-24 12:10:22 -04:00
Oleg Nesterov	a5a1955e0c	proc: kill the extra proc_readfd_common()->dir_emit_dots() proc_readfd_common() does dir_emit_dots() twice in a row, we need to do this only once. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-08-24 12:10:22 -04:00
Al Viro	118b230225	cope with potentially long ->d_dname() output for shmem/hugetlb dynamic_dname() is both too much and too little for those - the output may be well in excess of 64 bytes dynamic_dname() assumes to be enough (thanks to ashmem feeding really long names to shmem_file_setup()) and vsnprintf() is an overkill for those guys. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-08-24 12:10:17 -04:00
Vyacheslav Dubeyko	4bf93b50fd	nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection Fix the issue with improper counting number of flying bio requests for BIO_EOPNOTSUPP error detection case. The sb_nbio must be incremented exactly the same number of times as complete() function was called (or will be called) because nilfs_segbuf_wait() will call wail_for_completion() for the number of times set to sb_nbio: do { wait_for_completion(&segbuf->sb_bio_event); } while (--segbuf->sb_nbio > 0); Two functions complete() and wait_for_completion() must be called the same number of times for the same sb_bio_event. Otherwise, wait_for_completion() will hang or leak. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-23 09:51:22 -07:00
Vyacheslav Dubeyko	2df37a19c6	nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error Remove double call of bio_put() in nilfs_end_bio_write() for the case of BIO_EOPNOTSUPP error detection. The issue was found by Dan Carpenter and he suggests first version of the fix too. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-23 09:51:22 -07:00
Roland Dreier	35dc248383	[SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp->read_wait, (srp_done(sfp, srp) \|\| sdp->detached)); but neither srp_done() nor sdp->detached is true, so we end up just setting srp->orphan and returning to userspace: srp->orphan = 1; write_unlock_irq(&sfp->rq_list_lock); return result; /* -ERESTARTSYS because signal hit process / At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(&sfp->rq_list_lock, iflags); if (unlikely(srp->orphan)) { if (sfp->keep_orphan) srp->sg_io_owned = 0; else done = 0; } srp->done = done; write_unlock_irqrestore(&sfp->rq_list_lock, iflags); if (likely(done)) { / Now wake up any sg_read() that is waiting for this * packet. / wake_up_interruptible(&sfp->read_wait); kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN); kref_put(&sfp->f_ref, sg_remove_sfp); } else { INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext); schedule_work(&srp->ew.work); } Since srp->orphan is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() -> sg_finish_rem_req() -> blk_rq_unmap_user() -> ... -> bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current->mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! As suggested by James Bottomley <James.Bottomley@hansenpartnership.com>, add a check for current->mm (which is NULL if we're on a kernel thread without a real userspace address space) in bio_uncopy_user(), and skip the copy if we're on a kernel thread. There's no reason that I can think of for any caller of bio_uncopy_user() to want to do copying on a kernel thread with a random active userspace address space. Huge thanks to Costa Sapuntzakis <costa@purestorage.com> for the original pointer to this bug in the sg code. Signed-off-by: Roland Dreier <roland@purestorage.com> Tested-by: David Milburn <dmilburn@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-08-21 10:58:35 -07:00
Linus Torvalds	fd3930f70c	proc: more readdir conversion bug-fixes In the previous commit, Richard Genoud fixed proc_root_readdir(), which had lost the check for whether all of the non-process /proc entries had been returned or not. But that in turn exposed _another_ bug, namely that the original readdir conversion patch had yet another problem: it had lost the return value of proc_readdir_de(), so now checking whether it had completed successfully or not didn't actually work right anyway. This reinstates the non-zero return for the "end of base entries" that had also gotten lost in commit `f0c3b5093a` ("[readdir] convert procfs"). So now you get all the base entries and you get all the process entries, regardless of getdents buffer size. (Side note: the Linux "getdents" manual page actually has a nice example application for testing getdents, which can be easily modified to use different buffers. Who knew? Man-pages can be useful) Reported-by: Emmanuel Benisty <benisty.e@gmail.com> Reported-by: Marc Dionne <marc.c.dionne@gmail.com> Cc: Richard Genoud <richard.genoud@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-19 16:26:12 -07:00
Richard Genoud	94fc5d9de5	proc: return on proc_readdir error Commit `f0c3b5093a` ("[readdir] convert procfs") introduced a bug on the listing of the proc file-system. The return value of proc_readdir() isn't tested anymore in the proc_root_readdir function. This lead to an "interesting" behaviour when we are using the getdents() system call with a buffer too small: instead of failing, it returns the first entries of /proc (enough to fill the given buffer), plus the PID directories. This is not triggered on glibc (as getdents is called with a 32KB buffer), but on uclibc, the buffer size is only 1KB, thus some proc entries are missing. See https://lkml.org/lkml/2013/8/12/288 for more background. Signed-off-by: Richard Genoud <richard.genoud@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-19 09:47:27 -07:00
Linus Torvalds	d6a5e06cd1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes Pull gfs2 fixes from Steven Whitehouse: "Out of these five patches, the one for ensuring that the number of revokes is not exceeded, and the one for checking the glock is not already held in gfs2_getxattr are the two most important. The latter can be triggered by selinux. The other three patches are very small and fix mostly fairly trivial issues" * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: Check for glock already held in gfs2_getxattr GFS2: alloc_workqueue() doesn't return an ERR_PTR GFS2: don't overrun reserved revokes GFS2: WQ_NON_REENTRANT is meaningless and going away GFS2: Fix typo in gfs2_create_inode()	2013-08-19 09:30:12 -07:00
Steven Whitehouse	7bd9ee58a4	GFS2: Check for glock already held in gfs2_getxattr Since the introduction of atomic_open, gfs2_getxattr can be called with the glock already held, so we need to allow for this. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: David Teigland <teigland@redhat.com> Tested-by: David Teigland <teigland@redhat.com>	2013-08-19 09:33:57 +01:00
Dan Carpenter	dfc4616dde	GFS2: alloc_workqueue() doesn't return an ERR_PTR alloc_workqueue() returns a NULL on error, it doesn't return an ERR_PTR. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-08-19 09:33:43 +01:00
Benjamin Marzinski	1bc333f4cf	GFS2: don't overrun reserved revokes When run during fsync, a gfs2_log_flush could happen between the time when gfs2_ail_flush checked the number of blocks to revoke, and when it actually started the transaction to do those revokes. This occassionally caused it to need more revokes than it reserved, causing gfs2 to crash. Instead of just reserving enough revokes to handle the blocks that currently need them, this patch makes gfs2_ail_flush reserve the maximum number of revokes it can, without increasing the total number of reserved log blocks. This patch also passes the number of reserved revokes to __gfs2_ail_flush() so that it doesn't go over its limit and cause a crash like we're seeing. Non-fsync calls to __gfs2_ail_flush will still cause a BUG() necessary revokes are skipped. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-08-19 09:33:16 +01:00
Tejun Heo	d08fa65a81	GFS2: WQ_NON_REENTRANT is meaningless and going away `dbf2576e37` ("workqueue: make all workqueues non-reentrant") made WQ_NON_REENTRANT no-op and the flag is going away. Remove its usages. This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: cluster-devel@redhat.com	2013-08-19 09:33:01 +01:00
Steven Whitehouse	2523d47a79	GFS2: Fix typo in gfs2_create_inode() PTR_RET should be PTR_ERR Reported-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-08-19 09:32:29 +01:00
Linus Torvalds	a08797e853	Two jbd2 bug fixes, one of which is a regression fix. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJSD2uJAAoJENNvdpvBGATwPwEP/R1Yu0UReTDVSmRwn8MGbk5O M7QwPgUVnEdqr86v4qx7kqOmKtwSQKfSr7+07UGwEBVasPPI1ezXT5c2VAwz146z 5EClKIJoZoKIdWwM0NvUYxb7p5HgTLubho5RTsHmjOGIy37ju+5rFKZ44aHt9siY J3w5v1EruHdDXwjyiZcIeHFUBGV34x9zPtIGy3yY06gEPAUMeEXShiH/Uj3EroKH 50PkwECBvqFuO4HpunhSYfNc3csTir2ZAx4judaV0s5ebyxzyZ398ql581q73mvP m5KREYYVLQnN9VzCnsjLsnyZboh6XoTxmlmblNVOr0LihyKLS/Nof6pvrWEqgnBf 6NpR6Y8cPVUq+rhhXwFx/RIkVh2EIME0T5dZplr7fVaPxzI1ZKud8p18WifxMAix y3rbdZiEA//OuH5u5XqV0/zv3aJUP7XjdT8dia5iq3FazFOdyT1VfVXzAfaXyqZk JlFguSrOJbQe+vgk2gFLWLH8L4wsnGlSUv1AeDEboriushmZqYlAn10y6ENnM5yW IMFNA6Wyz35MghsKMKV57H7B05VS8em6NGtHZVhTBt7JoiFn7PBvic/KcK/czpDy W2Y8K8azhuscmQfa/RjuVDtwakbNwTu341e7hIUOy41qL/x9nNr07ZRAeF68Nesd p4uRmJxLSW9PReYp9n4V =OroQ -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull jbd2 bug fixes from Ted Ts'o: "Two jbd2 bug fixes, one of which is a regression fix" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: Fix oops in jbd2_journal_file_inode() jbd2: Fix use after free after error in jbd2_journal_dirty_metadata()	2013-08-17 10:43:19 -07:00
Jan Kara	a361293f5f	jbd2: Fix oops in jbd2_journal_file_inode() Commit `0713ed0cde` added jbd2_journal_file_inode() call into ext4_block_zero_page_range(). However that function gets called from truncate path and thus inode needn't have jinode attached - that happens in ext4_file_open() but the file needn't be ever open since mount. Calling jbd2_journal_file_inode() without jinode attached results in the oops. We fix the problem by attaching jinode to inode also in ext4_truncate() and ext4_punch_hole() when we are going to zero out partial blocks. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2013-08-16 21:19:41 -04:00
Linus Torvalds	2b047252d0	Fix TLB gather virtual address range invalidation corner cases Ben Tebulin reported: "Since v3.7.2 on two independent machines a very specific Git repository fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This only occurs on a very specific repository and can be reproduced stably on two independent laptops. Git mailing list ran out of ideas and for me this looks like some very exotic kernel issue" and bisected the failure to the backport of commit `53a59fc67f` ("mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"). That commit itself is not actually buggy, but what it does is to make it much more likely to hit the partial TLB invalidation case, since it introduces a new case in tlb_next_batch() that previously only ever happened when running out of memory. The real bug is that the TLB gather virtual memory range setup is subtly buggered. It was introduced in commit `597e1c3580` ("mm/mmu_gather: enable tlb flush range in generic mmu_gather"), and the range handling was already fixed at least once in commit `e6c495a96c` ("mm: fix the TLB range flushed when __tlb_remove_page() runs out of slots"), but that fix was not complete. The problem with the TLB gather virtual address range is that it isn't set up by the initial tlb_gather_mmu() initialization (which didn't get the TLB range information), but it is set up ad-hoc later by the functions that actually flush the TLB. And so any such case that forgot to update the TLB range entries would potentially miss TLB invalidates. Rather than try to figure out exactly which particular ad-hoc range setup was missing (I personally suspect it's the hugetlb case in zap_huge_pmd(), which didn't have the same logic as zap_pte_range() did), this patch just gets rid of the problem at the source: make the TLB range information available to tlb_gather_mmu(), and initialize it when initializing all the other tlb gather fields. This makes the patch larger, but conceptually much simpler. And the end result is much more understandable; even if you want to play games with partial ranges when invalidating the TLB contents in chunks, now the range information is always there, and anybody who doesn't want to bother with it won't introduce subtle bugs. Ben verified that this fixes his problem. Reported-bisected-and-tested-by: Ben Tebulin <tebulin@googlemail.com> Build-testing-by: Stephen Rothwell <sfr@canb.auug.org.au> Build-testing-by: Richard Weinberger <richard.weinberger@gmail.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-16 08:52:46 -07:00
Dave Kleikamp	44512449c0	jfs: fix readdir cookie incompatibility with NFSv4 NFSv4 reserves readdir cookie values 0-2 for special entries (. and ..), but jfs allows a value of 2 for a non-special entry. This incompatibility can result in the nfs client reporting a readdir loop. This patch doesn't change the value stored internally, but adds one to the value exposed to the iterate method. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Tested-by: Christian Kujau <lists@nerdbynature.de>	2013-08-15 17:22:29 -05:00
yonghua zheng	8c8296223f	fs/proc/task_mmu.c: fix buffer overflow in add_page_map() Recently we met quite a lot of random kernel panic issues after enabling CONFIG_PROC_PAGE_MONITOR. After debuggind we found this has something to do with following bug in pagemap: In struct pagemapread: struct pagemapread { int pos, len; pagemap_entry_t *buffer; bool v2; }; pos is number of PM_ENTRY_BYTES in buffer, but len is the size of buffer, it is a mistake to compare pos and len in add_page_map() for checking buffer is full or not, and this can lead to buffer overflow and random kernel panic issue. Correct len to be total number of PM_ENTRY_BYTES in buffer. [akpm@linux-foundation.org: document pagemapread.pos and .len units, fix PM_ENTRY_BYTES definition] Signed-off-by: Yonghua Zheng <younghua.zheng@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 17:57:50 -07:00
Jeff Liu	d6394b5900	ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id() Fix a NULL pointer deference while removing an empty directory, which was introduced by commit `3704412bdb` ("[readdir] convert ocfs2"). BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<(null)>] (null) PGD 6da85067 PUD 6da89067 PMD 0 Oops: 0010 [#1] SMP CPU: 0 PID: 6564 Comm: rmdir Tainted: G O 3.11.0-rc1 #4 RIP: 0010:[<0000000000000000>] [< (null)>] (null) Call Trace: ocfs2_dir_foreach+0x49/0x50 [ocfs2] ocfs2_empty_dir+0x12c/0x3e0 [ocfs2] ocfs2_unlink+0x56e/0xc10 [ocfs2] vfs_rmdir+0xd5/0x140 do_rmdir+0x1cb/0x1e0 SyS_rmdir+0x16/0x20 system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff88006daddc10> CR2: 0000000000000000 [dan.carpenter@oracle.com: fix pointer math] Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reported-by: David Weber <wb@munzinger.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 17:57:49 -07:00
Tiger Yang	c7dd3392ad	ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page Since ocfs2_cow_file_pos will invoke ocfs2_refcount_icow with a NULL as the struct file pointer, it finally result in a null pointer dereference in ocfs2_duplicate_clusters_by_page. This patch replace file pointer with inode pointer in cow_duplicate_clusters to fix this issue. [jeff.liu@oracle.com: rebased patch against linux-next tree] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Tao Ma <tm@tao.ma> Tested-by: David Weber <wb@munzinger.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 17:57:49 -07:00
Jie Liu	6115ea2884	ocfs2: Revert `40bd62e` to avoid regression in extended allocation Revert commit `40bd62eb7f` ("fs/ocfs2/journal.h: add bits_wanted while calculating credits in ocfs2_calc_extend_credits"). Unfortunately this change broke fallocate even if there is insufficient disk space for the preallocation, which is a serious problem. # df -h /dev/sda8 22G 1.2G 21G 6% /ocfs2 # fallocate -o 0 -l 200M /ocfs2/testfile fallocate: /ocfs2/test: fallocate failed: No space left on device and a kernel warning: CPU: 3 PID: 3656 Comm: fallocate Tainted: G W O 3.11.0-rc3 #2 Call Trace: dump_stack+0x77/0x9e warn_slowpath_common+0xc4/0x110 warn_slowpath_null+0x2a/0x40 start_this_handle+0x6c/0x640 [jbd2] jbd2__journal_start+0x138/0x300 [jbd2] jbd2_journal_start+0x23/0x30 [jbd2] ocfs2_start_trans+0x166/0x300 [ocfs2] __ocfs2_extend_allocation+0x38f/0xdb0 [ocfs2] ocfs2_allocate_unwritten_extents+0x3c9/0x520 __ocfs2_change_file_space+0x5e0/0xa60 [ocfs2] ocfs2_fallocate+0xb1/0xe0 [ocfs2] do_fallocate+0x1cb/0x220 SyS_fallocate+0x6f/0xb0 system_call_fastpath+0x16/0x1b JBD2: fallocate wants too many credits (51216 > 4381) Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 17:57:49 -07:00
Michal Hocko	b610ded719	hugetlb: fix lockdep splat caused by pmd sharing Dave has reported the following lockdep splat: ================================= [ INFO: inconsistent lock state ] 3.11.0-rc1+ #9 Not tainted --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes: (&mapping->i_mmap_mutex){+.+.?.}, at: [<c114971b>] page_referenced+0x87/0x5e3 {RECLAIM_FS-ON-W} state was registered at: mark_held_locks+0x81/0xe7 lockdep_trace_alloc+0x5e/0xbc __alloc_pages_nodemask+0x8b/0x9b6 __get_free_pages+0x20/0x31 get_zeroed_page+0x12/0x14 __pmd_alloc+0x1c/0x6b huge_pmd_share+0x265/0x283 huge_pte_alloc+0x5d/0x71 hugetlb_fault+0x7c/0x64a handle_mm_fault+0x255/0x299 __do_page_fault+0x142/0x55c do_page_fault+0xd/0x16 error_code+0x6c/0x74 irq event stamp: 3136917 hardirqs last enabled at (3136917): _raw_spin_unlock_irq+0x27/0x50 hardirqs last disabled at (3136916): _raw_spin_lock_irq+0x15/0x78 softirqs last enabled at (3136180): __do_softirq+0x137/0x30f softirqs last disabled at (3136175): irq_exit+0xa8/0xaa other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&mapping->i_mmap_mutex); <Interrupt> lock(&mapping->i_mmap_mutex); * DEADLOCK * no locks held by kswapd0/49. stack backtrace: CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9 Hardware name: Dell Inc. Precision WorkStation 490 /0DT031, BIOS A08 04/25/2008 Call Trace: dump_stack+0x4b/0x79 print_usage_bug+0x1d9/0x1e3 mark_lock+0x1e0/0x261 __lock_acquire+0x623/0x17f2 lock_acquire+0x7d/0x195 mutex_lock_nested+0x6c/0x3a7 page_referenced+0x87/0x5e3 shrink_page_list+0x3d9/0x947 shrink_inactive_list+0x155/0x4cb shrink_lruvec+0x300/0x5ce shrink_zone+0x53/0x14e kswapd+0x517/0xa75 kthread+0xa8/0xaa ret_from_kernel_thread+0x1b/0x28 which is a false positive caused by hugetlb pmd sharing code which allocates a new pmd from withing mapping->i_mmap_mutex. If this allocation causes reclaim then the lockdep detector complains that we might self-deadlock. This is not correct though, because hugetlb pages are not reclaimable so their mapping will be never touched from the reclaim path. The patch tells lockup detector that hugetlb i_mmap_mutex is special by assigning it a separate lockdep class so it won't report possible deadlocks on unrelated mappings. [peterz@infradead.org: comment for annotation] Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 17:57:48 -07:00
Cyrill Gorcunov	41bb3476b3	mm: save soft-dirty bits on file pages Andy reported that if file page get reclaimed we lose the soft-dirty bit if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get encoded into pte entry. Thus when #pf happens on such non-present pte we can restore it back. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 17:57:48 -07:00
Cyrill Gorcunov	179ef71cbc	mm: save soft-dirty bits on swapped pages Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set get swapped out, the bit is getting lost and no longer available when pte read back. To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in pte entry for the page being swapped out. When such page is to be read back from a swap cache we check for bit presence and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY bit back. One of the problem was to find a place in pte entry where we can save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was chosen for that, it doesn't intersect with swap entry format stored in pte. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-08-13 17:57:47 -07:00
Linus Torvalds	584d88b2cd	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French: "A set of small cifs fixes, including 3 relating to symlink handling" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: don't instantiate new dentries in readdir for inodes that need to be revalidated immediately cifs: set sb->s_d_op before calling d_make_root() cifs: fix bad error handling in crypto code cifs: file: initialize oparms.reconnect before using it Do not attempt to do cifs operations reading symlinks with SMB2 cifs: extend the buffer length enought for sprintf() using	2013-08-12 15:02:53 -07:00
Linus Torvalds	fd4f35d0fa	A number of miscellaneous ext4 bugs fixes for v3.11, including a fix so that if ext4 is built as a module, to allow it to be unloaded. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJSCOS0AAoJENNvdpvBGATwOYkQALgOTX2ma7hgCr91ldg3Q69e bejRm2Prdjs3HXa3MwC5yZ0i2nDXxTjMXidnVcD2QgLwS6rPGQQfrGhQ1OtJqLxA Gzf6mHIWIaRG9/JtTkzp3ucONbhc13LnaJd9B2bHASlq4+5aJhlziblwapMSvVbI YixhejVREmXdZ/typvnQOsoPfXzH9NLOTY/asT9IcZK6flhczNIlMC0/zRZQw0+I pDK/akmdOLpkd57UvJcwdcGIgcioSrg+JXk++EsDe82v2bpxleG2IdLbVZ76whIS A3FHEJvUfiFPj+GhKMQJyI1sa1ZGmD7nH8n2Yf6TVqZ4jgFx4ox9d83c0PpSLn0+ VTZD601lfAoThUwKRTk45FUI4siZ1zBP13cbTH1DJXvGVllehhC9zBCezlNma4vr cI+K3d8VBHV7bKSRSgRVcB2pY+bJE1qP5XJXGnfDhjEiecNgXFRm0i2UtYSEl6R7 aQGAdfjxp6tf+gme6c13zFK43FEJ9/DMaC8l0eamiQyH5U4OhOD8Q4ExnHjrfN69 V3yZc4AZq46J9+2H/Jc457NYiFH2YK74jpAcpsVV0fXKFDR8Ss+rKXmfB+TGS45f Q1eHbKLuA3fHzHfwJCozqnUBxQO41M4vEd/ktqenwsKzeOw0qH3qynxEUd6xgka5 1pcnJZnSGbm1d8pwYLMc =NuDG -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull more ext4 bugfixes from Ted Ts'o: "A number of miscellaneous ext4 bugs fixes for v3.11, including a fix so that if ext4 is built as a module, to allow it to be unloaded" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: flush the extent status cache during EXT4_IOC_SWAP_BOOT ext4: fix mount/remount error messages for incompatible mount options ext4: allow the mount options nodelalloc and data=journal	2013-08-12 15:00:40 -07:00
Jan Kara	91aa11fae1	jbd2: Fix use after free after error in jbd2_journal_dirty_metadata() When jbd2_journal_dirty_metadata() returns error, __ext4_handle_dirty_metadata() stops the handle. However callers of this function do not count with that fact and still happily used now freed handle. This use after free can result in various issues but very likely we oops soon. The motivation of adding __ext4_journal_stop() into __ext4_handle_dirty_metadata() in commit `9ea7a0df` seems to be only to improve error reporting. So replace __ext4_journal_stop() with ext4_journal_abort_handle() which was there before that commit and add WARN_ON_ONCE() to dump stack to provide useful information. Reported-by: Sage Weil <sage@inktank.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org # 3.2+	2013-08-12 09:53:28 -04:00
Theodore Ts'o	cde2d7a796	ext4: flush the extent status cache during EXT4_IOC_SWAP_BOOT Previously we weren't swapping only some of the extent_status LRU fields during the processing of the EXT4_IOC_SWAP_BOOT ioctl. The much safer thing to do is to just completely flush the extent status tree when doing the swap. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <gnehzuil.liu@gmail.com> Cc: stable@vger.kernel.org	2013-08-12 09:29:30 -04:00
Linus Torvalds	d92581fcad	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are assorted fixes, mostly from Josef nailing down xfstests runs. Zach also has a long standing fix for problems with readdir wrapping f_pos (or ctx->pos) These patches were spread out over different bases, so I rebased things on top of rc4 and retested overnight" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: don't loop on large offsets in readdir Btrfs: check to see if root_list is empty before adding it to dead roots Btrfs: release both paths before logging dir/changed extents Btrfs: allow splitting of hole em's when dropping extent cache Btrfs: make sure the backref walker catches all refs to our extent Btrfs: fix backref walking when we hit a compressed extent Btrfs: do not offset physical if we're compressed Btrfs: fix extent buffer leak after backref walking Btrfs: fix a bug of snapshot-aware defrag to make it work on partial extents btrfs: fix file truncation if FALLOC_FL_KEEP_SIZE is specified	2013-08-10 15:21:47 -07:00
Linus Torvalds	b8ea0d06ff	NFS client bugfixes for 3.11 - Stable patch for lockd to fix Oopses due to inappropriate calls to utsname()->nodename - Stable patches for sunrpc to fix Oopses on shutdown when using AF_LOCAL sockets with rpcbind - Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint - Fix a regression with the sync mount option failing to work for nfs4 mounts - Fix a writeback performance issue when doing cache invalidation - Remove an incorrect call to nfs_setsecurity in nfs_fhget -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJSBRqqAAoJEGcL54qWCgDyejoP/RNfkebEex2ugAAfymMOUNjA qc8X/YO8KDQD5T0X8P9MFXzgSSMPT5YfemxfKDpkjJlYwij8O9D8r1BAfgmhhKFI bVp9J8pH5Y3jXgLFVsubEc9N9CXLI0yRzcOFfzapYmGJV/ZO4cBAi8HMowFtbREj l2uk9H8XQ1p8KqVWXhFKoXVL2b9q0CGj5z3+RkE/rD5uuZyusbTB6OhNFArPnwXk aCx373mlvpIAhU6DkueCiXaHH06Mff2Vlu6eBkGrdfBGC6l+x1nwevxYH750fqSU 8O94rQkw9++qnIIvBJF/g1NzqyDhychJcXtgGLdxYUdWH3c8tevJZxCEj7U/dIJQ ndEaZGFxSFfdnxrwJBWtB+xsEfe9K4no9JwlkyVi8oZ5j2NUv7cJpA5cdQ4IUf/1 uqTlIxtPHQhHWCUUKpGLlhLiZyvwOPtJvuBl/Pc9UYQbyNtSqjYBk9mamrrIC6FK mF6jXgWe9x+miBqWYrEdPNLGdx/hUhhqGweYPJa6jTcxif+2l2xGfscWYI89Io/e qy8YNcHUrRci+o+YfY4lLhk88WBZogzFOYc4jDaLCEL1TE5B2k/jHr9v39V5S7Ks 63bhmfCTxB82uMzFeUWbiPzWQzt030pXPYy/PUPaucy/+QaZC/0lZ3HJKRKI37JJ ygSD7ndCZJCG+PxLkk49 =W12x -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.11-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: - Stable patch for lockd to fix Oopses due to inappropriate calls to utsname()->nodename - Stable patches for sunrpc to fix Oopses on shutdown when using AF_LOCAL sockets with rpcbind - Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint - Fix a regression with the sync mount option failing to work for nfs4 mounts - Fix a writeback performance issue when doing cache invalidation - Remove an incorrect call to nfs_setsecurity in nfs_fhget * tag 'nfs-for-3.11-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Fix up nfs4_proc_lookup_mountpoint NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget() NFSv4: Fix the sync mount option for nfs4 mounts NFS: Fix writeback performance issue on cache invalidation SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister SUNRPC: Don't auto-disconnect from the local rpcbind socket LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs	2013-08-10 15:20:37 -07:00
Linus Torvalds	022e5d098b	Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux Pull nfsd fixes from Bruce Fields: "Some fixes for a 4.1 feature that in retrospect probably should have waited for 3.12.... But it appears to be working now" * 'for-3.11' of git://linux-nfs.org/~bfields/linux: nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID nfsd4: Fix MACH_CRED NULL dereference	2013-08-10 15:19:58 -07:00
Zach Brown	db62efbbf8	btrfs: don't loop on large offsets in readdir When btrfs readdir() hits the last entry it sets the readdir offset to a huge value to stop buggy apps from breaking when the same name is returned by readdir() with concurrent rename()s. But unconditionally setting the offset to INT_MAX causes readdir() to loop returning any entries with offsets past INT_MAX. It only takes a few hours of constant file creation and removal to create entries past INT_MAX. So let's set the huge offset to LLONG_MAX if the last entry has already overflowed 32bit loff_t. Without large offsets behaviour is identical. With large offsets 64bit apps will work and 32bit apps will be no more broken than they currently are if they see large offsets. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:34:56 -04:00
Josef Bacik	cfad392b22	Btrfs: check to see if root_list is empty before adding it to dead roots A user reported a panic when running with autodefrag and deleting snapshots. This is because we could end up trying to add the root to the dead roots list twice. To fix this check to see if we are empty before adding ourselves to the dead roots list. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:30:23 -04:00
Josef Bacik	f3b15ccdbb	Btrfs: release both paths before logging dir/changed extents The ceph guys tripped over this bug where we were still holding onto the original path that we used to copy the inode with when logging. This is based on Chris's fix which was reported to fix the problem. We need to drop the paths in two cases anyway so just move the drop up so that we don't have duplicate code. Thanks, Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:30:16 -04:00
Josef Bacik	ee20a98314	Btrfs: allow splitting of hole em's when dropping extent cache I noticed while running multi-threaded fsync tests that sometimes fsck would complain about an improper gap. This happens because we fail to add a hole extent to the file, which was happening when we'd split a hole EM because btrfs_drop_extent_cache was just discarding the whole em instead of splitting it. So this patch fixes this by allowing us to split a hole em properly, which means that added holes actually get logged properly and we no longer see this fsck error. Thankfully we're tolerant of these sort of problems so a user would not see any adverse effects of this bug, other than fsck complaining. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:30:09 -04:00
Josef Bacik	ed8c4913da	Btrfs: make sure the backref walker catches all refs to our extent Because we don't mess with the offset into the extent for compressed we will properly find both extents for this case [extent a][extent b][rest of extent a] but because we already added a ref for the front half we won't add the inode information for the second half. This causes us to leak that memory and not print out the other offset when we do logical-resolve. So fix this by calling ulist_add_merge and then add our eie to the existing entry if there is one. With this patch we get both offsets out of logical-resolve. With this and the other 2 patches I've sent we now pass btrfs/276 on my vm with compress-force=lzo set. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:30:03 -04:00
Josef Bacik	8ca15e05e6	Btrfs: fix backref walking when we hit a compressed extent If you do btrfs inspect-internal logical-resolve on a compressed extent that has been partly overwritten it won't find anything. This is because we try and match the extent offset we've searched for based on the extent offset in the data extent entry. However this doesn't work for compressed extents because the offsets are for the uncompressed size, not the compressed size. So instead only do this check if we are not compressed, that way we can get an actual entry for the physical offset rather than nothing for compressed. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:29:56 -04:00
Josef Bacik	b76bb70136	Btrfs: do not offset physical if we're compressed xfstest btrfs/276 was freaking out on slower boxes partly because fiemap was offsetting the physical based on the extent offset. This is perfectly fine with uncompressed extents, however the extent offset is into the uncompressed area, not the compressed. So we can return a physical value that isn't at all within the area we have allocated on disk. Fix this by returning the start of the extent if it is compressed no matter what the offset. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:29:50 -04:00
Liu Bo	b5b9b5b318	Btrfs: fix extent buffer leak after backref walking commit 47fb091fb787420cd195e66f162737401cce023f(Btrfs: fix unlock after free on rewinded tree blocks) takes an extra increment on the reference of allocated dummy extent buffer, so now we cannot free this dummy one, and end up with extent buffer leak. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:29:42 -04:00
Liu Bo	e68afa49ae	Btrfs: fix a bug of snapshot-aware defrag to make it work on partial extents For partial extents, snapshot-aware defrag does not work as expected, since a) we use the wrong logical offset to search for parents, which should be disk_bytenr + extent_offset, not just disk_bytenr, b) 'offset' returned by the backref walking just refers to key.offset, not the 'offset' stored in btrfs_extent_data_ref which is (key.offset - extent_offset). The reproducer: $ mkfs.btrfs sda $ mount sda /mnt $ btrfs sub create /mnt/sub $ for i in `seq 5 -1 1`; do dd if=/dev/zero of=/mnt/sub/foo bs=5k count=1 seek=$i conv=notrunc oflag=sync; done $ btrfs sub snap /mnt/sub /mnt/snap1 $ btrfs sub snap /mnt/sub /mnt/snap2 $ sync; btrfs filesystem defrag /mnt/sub/foo; $ umount /mnt $ btrfs-debug-tree sda (Here we can check whether the defrag operation is snapshot-awared. This addresses the above two problems. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-08-09 19:29:17 -04:00

1 2 3 4 5 ...

32713 Commits