linux

Author	SHA1	Message	Date
Namhyung Kim	13ce34df11	perf tools: Use tid for finding thread I believe that passing pid (instead of tid) as the 3rd arg of the machine__find*_thread() was to find a main thread so that it can search proper map group for symbols. However with the map sharing patch applied, it now can do it in any thread. It fixes a bug when each thread has different name, it only reports a main thread for samples in other threads. Cc: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1399856202-26221-1-git-send-email-namhyung@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>	2014-05-12 11:09:50 +02:00
Namhyung Kim	bac1e4d103	perf tools: Get rid of on_exit() feature test The on_exit() function was only used in perf record but it's gone in previous patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Stephane Eranian <eranian@google.com> Cc: Bernhard Rosenkraenzer <Bernhard.Rosenkranzer@linaro.org> Cc: Irina Tirdea <irina.tirdea@intel.com> Link: http://lkml.kernel.org/r/1399855645-25815-2-git-send-email-namhyung@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>	2014-05-12 11:09:50 +02:00
Namhyung Kim	4560471053	perf record: Propagate exit status of a command line workload Currently perf record doesn't propagate the exit status of a workload given by the command line. But sometimes it'd useful if it's propagated so that a monitoring script can handle errors appropriately. To do that, it moves most of logic out of the exit handlers and run them directly in the __cmd_record(). The only thing needs to be done in the handler is propagating terminating signal so that the shell can terminate its loop properly when Ctrl-C was pressed. Also it cleaned up the resource management code in record__exit(). With this change, perf record returns the child exit status in case of normal termination and send signal to itself when terminated by signal. Example run of Stephane's case: $ perf record true && echo yes \|\| echo no [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.013 MB perf.data (~589 samples) ] yes $ perf record false && echo yes \|\| echo no [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.013 MB perf.data (~589 samples) ] no Jiri's case (error in parent): $ perf record -m 10G true && echo yes \|\| echo no rounding mmap pages size to 17179869184 bytes (4194304 pages) failed to mmap with 12 (Cannot allocate memory) no $ ulimit -n 6 $ perf record sleep 1 && echo yes \|\| echo no failed to create 'go' pipe: Too many open files Couldn't run the workload! no And Peter's case (interrupted by signal): $ while :; do perf record sleep 1; done ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB perf.data (~593 samples) ] Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1399855645-25815-1-git-send-email-namhyung@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>	2014-05-12 11:09:49 +02:00
Dongsheng	6bcab4e1ea	perf tools: Clarify the output of perf sched map. In output of perf sched map, any shortname of thread will be explained at the first time when it appear. Example: A0 228836.978985 secs A0 => perf:23032 . A0 228836.979016 secs B0 => swapper:0 . C0 228836.979099 secs C0 => migration/3:22 A0 . C0 228836.979115 secs A0 . . 228836.979115 secs But B0, which is explained as swapper:0 did not appear in the left part of output. Instead, we use '.' as the shortname of swapper:0. So the comment of "B0 => swapper:0" is not easy to understand. This patch clarify the output of perf sched map with not allocating one letter-number shortname for swapper:0 and print ". => swapper:0" as the explanation for swapper:0. Example: A0 228836.978985 secs A0 => perf:23032 * . A0 228836.979016 secs . => swapper:0 . B0 228836.979099 secs B0 => migration/3:22 A0 . B0 228836.979115 secs A0 . * . 228836.979115 secs A0 C0 . 228836.979225 secs C0 => ksoftirqd/2:18 A0 D0 . 228836.979236 secs D0 => rcu_sched:7 Signed-off-by: Dongsheng <yangds.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1399354741-19522-1-git-send-email-yangds.fnst@cn.fujitsu.com [ small style fixes to make checkpatch happy ] Signed-off-by: Jiri Olsa <jolsa@kernel.org>	2014-05-12 11:09:05 +02:00
Dongsheng	e936e8e459	perf tools: Adapt the TASK_STATE_TO_CHAR_STR to new value in kernel space. Currently, TASK_STATE_TO_CHAR_STR in kernel space is already expanded to RSDTtZXxKWP, but it is still RSDTtZX in perf sched tool. This patch update TASK_STATE_TO_CHAR_STR to the new value in kernel space. Signed-off-by: Dongsheng <yangds.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/6d2f55dc1e02c1e29a5d70bfeb9d6e8863caf2aa.1399273302.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Jiri Olsa <jolsa@kernel.org>	2014-05-12 10:01:49 +02:00
Dongsheng	7fff959783	perf tools: Add missing event for perf sched record. We should record and process sched:sched_wakeup_new event in perf sched tool, but currently, there is the process function for it, without recording it in record subcommand. This patch add -e sched:sched_wakeup_new to perf sched record. Signed-off-by: Dongsheng <yangds.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/710c6edd2162b2cea1711443f54de47c0210d9fd.1399273302.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Jiri Olsa <jolsa@kernel.org>	2014-05-12 10:01:41 +02:00
Peter Zijlstra	3a497f4863	perf: Simplify perf_event_exit_task_context() Instead of jumping through hoops to make sure to find (and exit) each event, do it the simple straight fwd way. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-tij931199thfkys8vbnokdpf@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 13:44:19 +02:00
Peter Zijlstra	683ede43dd	perf: Rework free paths Primarily make perf_event_release_kernel() into put_event(), this will allow kernel space to create per-task inherited events, and is safer in general. Also, document the free_event() assumptions. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-rk9pvr6e1d0559lxstltbztc@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 13:44:19 +02:00
Peter Zijlstra	63342411ef	perf: Validate locking assumption Document and validate the locking assumption of event_sched_in(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-sybq1publ9xt5no77cwvi0eo@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 13:44:18 +02:00
Peter Zijlstra	15a2d4de0e	perf: Always destroy groups on exit Commit `38b435b16c` ("perf: Fix tear-down of inherited group events") states that we need to destroy groups for inherited events, but it doesn't make any sense to not also destroy groups for normal events. And while it usually makes no difference (the normal events won't leak, and its very likely all the group events will die in quick succession) it does make the code more consistent and closes a potential hole for trouble. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-426egt8zmsm12d2q8k2xz4tt@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 13:44:18 +02:00
Peter Zijlstra	1f4ee5038f	perf: Ensure consistent inherit state in groups Make sure all events in a group have the same inherit state. It was possible for group leaders to have inherit set while sibling events would not have inherit set. In this case we'd still inherit the siblings, leading to some non-fatal weirdness. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-r32tt8yldvic3jlcghd3g35u@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 13:44:17 +02:00
Ingo Molnar	37b16beaa9	Merge branch 'perf/urgent' into perf/core, to avoid conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 13:39:22 +02:00
Yan, Zheng	a4b4f11b27	perf/x86/intel: Fix Silvermont's event constraints Event 0x013c is not the same as fixed counter2, remove it from Silvermont's event constraints. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 11:33:16 +02:00
Peter Zijlstra	ffb4ef21ac	perf: Fix perf_event_init_context() perf_pin_task_context() can return NULL but perf_event_init_context() assumes it will not, correct this. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/20140505171428.GU26782@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 11:33:15 +02:00
Peter Zijlstra	46ce0fe97a	perf: Fix race in removing an event When removing a (sibling) event we do: raw_spin_lock_irq(&ctx->lock); perf_group_detach(event); raw_spin_unlock_irq(&ctx->lock); <hole> perf_remove_from_context(event); raw_spin_lock_irq(&ctx->lock); ... raw_spin_unlock_irq(&ctx->lock); Now, assuming the event is a sibling, it will be 'unreachable' for things like ctx_sched_out() because that iterates the groups->siblings, and we just unhooked the sibling. So, if during <hole> we get ctx_sched_out(), it will miss the event and not call event_sched_out() on it, leaving it programmed on the PMU. The subsequent perf_remove_from_context() call will find the ctx is inactive and only call list_del_event() to remove the event from all other lists. Hereafter we can proceed to free the event; while still programmed! Close this hole by moving perf_group_detach() inside the same ctx->lock region(s) perf_remove_from_context() has. The condition on inherited events only in __perf_event_exit_task() is likely complete crap because non-inherited events are part of groups too and we're tearing down just the same. But leave that for another patch. Most-likely-Fixes: `e03a9a55b4` ("perf: Change close() semantics for group events") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Much-staring-at-traces-by: Vince Weaver <vincent.weaver@maine.edu> Much-staring-at-traces-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-05-07 11:33:14 +02:00
Linus Torvalds	38583f095c	Merge branch 'akpm' (incoming from Andrew) Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: agp: info leak in agpioc_info_wrap() fs/affs/super.c: bugfix / double free fanotify: fix -EOVERFLOW with large files on 64-bit slub: use sysfs'es release mechanism for kmem_cache revert "mm: vmscan: do not swap anon pages just because free+file is low" autofs: fix lockref lookup mm: filemap: update find_get_pages_tag() to deal with shadow entries mm/compaction: make isolate_freepages start at pageblock boundary MAINTAINERS: zswap/zbud: change maintainer email address mm/page-writeback.c: fix divide by zero in pos_ratio_polynom hugetlb: ensure hugepage access is denied if hugepages are not supported slub: fix memcg_propagate_slab_attrs drivers/rtc/rtc-pcf8523.c: fix month definition	2014-05-06 13:07:41 -07:00
Dan Carpenter	3ca9e5d36a	agp: info leak in agpioc_info_wrap() On 64 bit systems the agp_info struct has a 4 byte hole between ->agp_mode and ->aper_base. We need to clear it to avoid disclosing stack information to userspace. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:05:00 -07:00
Fabian Frederick	d353efd023	fs/affs/super.c: bugfix / double free Commit `842a859db2` ("affs: use ->kill_sb() to simplify ->put_super() and failure exits of ->mount()") adds .kill_sb which frees sbi but doesn't remove sbi free in case of parse_options error causing double free+random crash. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [3.14.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:05:00 -07:00
Will Woods	1e2ee49f7f	fanotify: fix -EOVERFLOW with large files on 64-bit On 64-bit systems, O_LARGEFILE is automatically added to flags inside the open() syscall (also openat(), blkdev_open(), etc). Userspace therefore defines O_LARGEFILE to be 0 - you can use it, but it's a no-op. Everything should be O_LARGEFILE by default. But: when fanotify does create_fd() it uses dentry_open(), which skips all that. And userspace can't set O_LARGEFILE in fanotify_init() because it's defined to 0. So if fanotify gets an event regarding a large file, the read() will just fail with -EOVERFLOW. This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit systems, using the same test as open()/openat()/etc. Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821 Signed-off-by: Will Woods <wwoods@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:59 -07:00
Christoph Lameter	41a212859a	slub: use sysfs'es release mechanism for kmem_cache debugobjects warning during netfilter exit: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G W 3.11.0-next-20130906-sasha #3984 Workqueue: netns cleanup_net Call Trace: dump_stack+0x52/0x87 warn_slowpath_common+0x8c/0xc0 warn_slowpath_fmt+0x46/0x50 debug_print_object+0x8d/0xb0 __debug_check_no_obj_freed+0xa5/0x220 debug_check_no_obj_freed+0x15/0x20 kmem_cache_free+0x197/0x340 kmem_cache_destroy+0x86/0xe0 nf_conntrack_cleanup_net_list+0x131/0x170 nf_conntrack_pernet_exit+0x5d/0x70 ops_exit_list+0x5e/0x70 cleanup_net+0xfb/0x1c0 process_one_work+0x338/0x550 worker_thread+0x215/0x350 kthread+0xe7/0xf0 ret_from_fork+0x7c/0xb0 Also during dcookie cleanup: WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408 Call Trace: dump_stack (lib/dump_stack.c:52) warn_slowpath_common (kernel/panic.c:430) warn_slowpath_fmt (kernel/panic.c:445) debug_print_object (lib/debugobjects.c:262) __debug_check_no_obj_freed (lib/debugobjects.c:697) debug_check_no_obj_freed (lib/debugobjects.c:726) kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717) kmem_cache_destroy (mm/slab_common.c:363) dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343) event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153) __fput (fs/file_table.c:217) ____fput (fs/file_table.c:253) task_work_run (kernel/task_work.c:125 (discriminator 1)) do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751) int_signal (arch/x86/kernel/entry_64.S:807) Sysfs has a release mechanism. Use that to release the kmem_cache structure if CONFIG_SYSFS is enabled. Only slub is changed - slab currently only supports /proc/slabinfo and not /sys/kernel/slab/*. We talked about adding that and someone was working on it. [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build] [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more] Signed-off-by: Christoph Lameter <cl@linux.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Greg KH <greg@kroah.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:59 -07:00
Johannes Weiner	623762517e	revert "mm: vmscan: do not swap anon pages just because free+file is low" This reverts commit `0bf1457f0c` ("mm: vmscan: do not swap anon pages just because free+file is low") because it introduced a regression in mostly-anonymous workloads, where reclaim would become ineffective and trap every allocating task in direct reclaim. The problem is that there is a runaway feedback loop in the scan balance between file and anon, where the balance tips heavily towards a tiny thrashing file LRU and anonymous pages are no longer being looked at. The commit in question removed the safe guard that would detect such situations and respond with forced anonymous reclaim. This commit was part of a series to fix premature swapping in loads with relatively little cache, and while it made a small difference, the cure is obviously worse than the disease. Revert it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:59 -07:00
Ian Kent	6b6751f7fe	autofs: fix lockref lookup autofs needs to be able to see private data dentry flags for its dentrys that are being created but not yet hashed and for its dentrys that have been rmdir()ed but not yet freed. It needs to do this so it can block processes in these states until a status has been returned to indicate the given operation is complete. It does this by keeping two lists, active and expring, of dentrys in this state and uses ->d_release() to keep them stable while it checks the reference count to determine if they should be used. But with the recent lockref changes dentrys being freed sometimes don't transition to a reference count of 0 before being freed so autofs can occassionally use a dentry that is invalid which can lead to a panic. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:59 -07:00
Johannes Weiner	139b6a6fb1	mm: filemap: update find_get_pages_tag() to deal with shadow entries Dave Jones reports the following crash when find_get_pages_tag() runs into an exceptional entry: kernel BUG at mm/filemap.c:1347! RIP: find_get_pages_tag+0x1cb/0x220 Call Trace: find_get_pages_tag+0x36/0x220 pagevec_lookup_tag+0x21/0x30 filemap_fdatawait_range+0xbe/0x1e0 filemap_fdatawait+0x27/0x30 sync_inodes_sb+0x204/0x2a0 sync_inodes_one_sb+0x19/0x20 iterate_supers+0xb2/0x110 sys_sync+0x44/0xb0 ia32_do_call+0x13/0x13 1343 /* 1344 * This function is never used on a shmem/tmpfs 1345 * mapping, so a swap entry won't be found here. 1346 / 1347 BUG(); After commit `0cd6144aad` ("mm + fs: prepare for non-page entries in page cache radix trees") this comment and BUG() are out of date because exceptional entries can now appear in all mappings - as shadows of recently evicted pages. However, as Hugh Dickins notes, "it is truly surprising for a PAGECACHE_TAG_WRITEBACK (and probably any other PAGECACHE_TAG_) to appear on an exceptional entry. I expect it comes down to an occasional race in RCU lookup of the radix_tree: lacking absolute synchronization, we might sometimes catch an exceptional entry, with the tag which really belongs with the unexceptional entry which was there an instant before." And indeed, not only is the tree walk lockless, the tags are also read in chunks, one radix tree node at a time. There is plenty of time for page reclaim to swoop in and replace a page that was already looked up as tagged with a shadow entry. Remove the BUG() and update the comment. While reviewing all other lookup sites for whether they properly deal with shadow entries of evicted pages, update all the comments and fix memcg file charge moving to not miss shmem/tmpfs swapcache pages. Fixes: `0cd6144aad` ("mm + fs: prepare for non-page entries in page cache radix trees") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:59 -07:00
Vlastimil Babka	49e068f0b7	mm/compaction: make isolate_freepages start at pageblock boundary The compaction freepage scanner implementation in isolate_freepages() starts by taking the current cc->free_pfn value as the first pfn. In a for loop, it scans from this first pfn to the end of the pageblock, and then subtracts pageblock_nr_pages from the first pfn to obtain the first pfn for the next for loop iteration. This means that when cc->free_pfn starts at offset X rather than being aligned on pageblock boundary, the scanner will start at offset X in all scanned pageblock, ignoring potentially many free pages. Currently this can happen when a) zone's end pfn is not pageblock aligned, or b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE enabled and a hole spanning the beginning of a pageblock This patch fixes the problem by aligning the initial pfn in isolate_freepages() to pageblock boundary. This also permits replacing the end-of-pageblock alignment within the for loop with a simple pageblock_nr_pages increment. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Heesub Shin <heesub.shin@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Dongjun Shin <d.j.shin@samsung.com> Cc: Sunghwan Yun <sunghwan.yun@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:59 -07:00
Seth Jennings	0e3b7e5402	MAINTAINERS: zswap/zbud: change maintainer email address sjenning@linux.vnet.ibm.com is no longer a viable entity. Signed-off-by: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:58 -07:00
Rik van Riel	d5c9fde3da	mm/page-writeback.c: fix divide by zero in pos_ratio_polynom It is possible for "limit - setpoint + 1" to equal zero, after getting truncated to a 32 bit variable, and resulting in a divide by zero error. Using the fully 64 bit divide functions avoids this problem. It also will cause pos_ratio_polynom() to return the correct value when (setpoint - limit) exceeds 2^32. Also uninline pos_ratio_polynom, at Andrew's request. Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:58 -07:00
Nishanth Aravamudan	457c1b27ed	hugetlb: ensure hugepage access is denied if hugepages are not supported Currently, I am seeing the following when I `mount -t hugetlbfs /none /dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's related to the fact that hugetlbfs is properly not correctly setting itself up in this state?: Unable to handle kernel paging request for data at address 0x00000031 Faulting instruction address: 0xc000000000245710 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries .... In KVM guests on Power, in a guest not backed by hugepages, we see the following: AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 64 kB HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages are not supported at boot-time, but this is only checked in hugetlb_init(). Extract the check to a helper function, and use it in a few relevant places. This does make hugetlbfs not supported (not registered at all) in this environment. I believe this is fine, as there are no valid hugepages and that won't change at runtime. [akpm@linux-foundation.org: use pr_info(), per Mel] [akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined] Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:58 -07:00
Vladimir Davydov	93030d83b9	slub: fix memcg_propagate_slab_attrs After creating a cache for a memcg we should initialize its sysfs attrs with the values from its parent. That's what memcg_propagate_slab_attrs is for. Currently it's broken - we clearly muddled root-vs-memcg caches there. Let's fix it up. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:58 -07:00
Chris Cui	35738392b6	drivers/rtc/rtc-pcf8523.c: fix month definition PCF8523 uses 1-12 to represent month according to datasheet. link: www.nxp.com/documents/data_sheet/PCF8523.pdf. Signed-off-by: Chris Cui <chris.wei.cui@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-06 13:04:58 -07:00
Linus Torvalds	8169d3005e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "dcache fixes + kvfree() (uninlined, exported by mm/util.c) + posix_acl bugfix from hch" The dcache fixes are for a subtle LRU list corruption bug reported by Miklos Szeredi, where people inside IBM saw list corruptions with the LTP/host01 test. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nick kvfree() from apparmor posix_acl: handle NULL ACL in posix_acl_equiv_mode dcache: don't need rcu in shrink_dentry_list() more graceful recovery in umount_collect() don't remove from shrink list in select_collect() dentry_kill(): don't try to remove from shrink list expand the call of dentry_lru_del() in dentry_kill() new helper: dentry_free() fold try_prune_one_dentry() fold d_kill() and d_free() fix races between __d_instantiate() and checks of dentry flags	2014-05-06 12:22:20 -07:00
Al Viro	39f1f78d53	nick kvfree() from apparmor too many places open-code it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 14:02:53 -04:00
Christoph Hellwig	50c6e282bd	posix_acl: handle NULL ACL in posix_acl_equiv_mode Various filesystems don't bother checking for a NULL ACL in posix_acl_equiv_mode, and thus can dereference a NULL pointer when it gets passed one. This usually happens from the NFS server, as the ACL tools never pass a NULL ACL, but instead of one representing the mode bits. Instead of adding boilerplat to all filesystems put this check into one place, which will allow us to remove the check from other filesystems as well later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>, Cc: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 13:58:42 -04:00
Linus Torvalds	256cf4c438	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "This adds ctime update in the new cached writeback mode and also fixes/simplifies the mtime update handling. Support for rename flags (aka renameat2) is also added to the userspace API" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add renameat2 support fuse: clear MS_I_VERSION fuse: clear FUSE_I_CTIME_DIRTY flag on setattr fuse: trust kernel i_ctime only fuse: remove .update_time fuse: allow ctime flushing to userspace fuse: fuse: add time_gran to INIT_OUT fuse: add .write_inode fuse: clean up fsync fuse: fuse: fallocate: use file_update_time() fuse: update mtime on open(O_TRUNC) in atomic_o_trunc mode fuse: update mtime on truncate(2) fuse: do not use uninitialized i_mode fuse: fix mtime update error in fsync fuse: check fallocate mode fuse: add __exit to fuse_ctl_cleanup	2014-05-06 09:09:35 -07:00
Linus Torvalds	9bd29c56ad	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc Pull sparc fixes from David Miller: "I've been auditing the THP support on sparc64 and found several bugs, hopefully most of which are fixed completely here. Also an RT kernel locking fix from Kirill Tkhai" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR(). sparc64: Add basic validations to {pud,pmd}_bad(). sparc64: Use 'ILOG2_4MB' instead of constant '22'. sparc64: Fix range check in kern_addr_valid(). sparc64: Fix top-level fault handling bugs. sparc64: Handle 32-bit tasks properly in compute_effective_address(). sparc64: Don't use _PAGE_PRESENT in pte_modify() mask. sparc64: Fix hex values in comment above pte_modify(). sparc64: Fix bugs in get_user_pages_fast() wrt. THP. sparc64: Fix huge PMD invalidation. sparc64: Fix executable bit testing in set_pmd_at() paths. sparc64: Normalize NMI watchdog logging and behavior. sparc64: Make itc_sync_lock raw sparc64: Fix argument sign extension for compat_sys_futex().	2014-05-06 09:08:03 -07:00
David Miller	30321c7b65	slab: Fix off by one in object max number tests. If freelist_idx_t is a byte, SLAB_OBJ_MAX_NUM should be 255 not 256, and likewise if freelist_idx_t is a short, then it should be 65535 not 65536. This was leading to all kinds of random crashes on sparc64 where PAGE_SIZE is 8192. One problem shown was that if spinlock debugging was enabled, we'd get deadlocks in copy_pte_range() or do_wp_page() with the same cpu already holding a lock it shouldn't hold, or the lock belonging to a completely unrelated process. Fixes: `a41adfaa23` ("slab: introduce byte sized index for the freelist of a slab") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-05 20:38:49 -07:00
Joonsoo Kim	7cc68973c3	slab: fix the type of the index on freelist index accessor Commit `a41adfaa23` ("slab: introduce byte sized index for the freelist of a slab") changes the size of freelist index and also changes prototype of accessor function to freelist index. And there was a mistake. The mistake is that although it changes the size of freelist index correctly, it changes the size of the index of freelist index incorrectly. With patch, freelist index can be 1 byte or 2 bytes, that means that num of object on on a slab can be more than 255. So we need more than 1 byte for the index to find the index of free object on freelist. But, above patch makes this index type 1 byte, so slab which have more than 255 objects cannot work properly and in consequence of it, the system cannot boot. This issue was reported by Steven King on m68knommu which would use 2 bytes freelist index: https://lkml.org/lkml/2014/4/16/433 To fix is easy. To change the type of the index of freelist index on accessor functions is enough to fix this bug. Although 2 bytes is enough, I use 4 bytes since it have no bad effect and make things more easier. This fix was suggested and tested by Steven in his original report. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-and-acked-by: Steven King <sfking@fdwdc.com> Acked-by: Christoph Lameter <cl@linux.com> Tested-by: James Hogan <james.hogan@imgtec.com> Tested-by: David Miller <davem@davemloft.net> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-05-05 20:38:49 -07:00
Linus Torvalds	2080cee435	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) e1000e computes header length incorrectly wrt vlans, fix from Vlad Yasevich. 2) ns_capable() check in sock_diag netlink code, from Andrew Lutomirski. 3) Fix invalid queue pairs handling in virtio_net, from Amos Kong. 4) Checksum offloading busted in sxgbe driver due to incorrect descriptor layout, fix from Byungho An. 5) Fix build failure with SMC_DEBUG set to 2 or larger, from Zi Shen Lim. 6) Fix uninitialized A and X registers in BPF interpreter, from Alexei Starovoitov. 7) Fix arch dependencies of candence driver. 8) Fix netlink capabilities checking tree-wide, from Eric W Biederman. 9) Don't dump IFLA_VF_PORTS if netlink request didn't ask for it in IFLA_EXT_MASK, from David Gibson. 10) IPV6 FIB dump restart doesn't handle table changes that happen meanwhile, causing the code to loop forever or emit dups, fix from Kumar Sandararajan. 11) Memory leak on VF removal in bnx2x, from Yuval Mintz. 12) Bug fixes for new Altera TSE driver from Vince Bridgers. 13) Fix route lookup key in SCTP, from Xugeng Zhang. 14) Use BH blocking spinlocks in SLIP, as per a similar fix to CAN/SLCAN driver. From Oliver Hartkopp. 15) TCP doesn't bump retransmit counters in some code paths, fix from Eric Dumazet. 16) Clamp delayed_ack in tcp_cubic to prevent theoretical divides by zero. Fix from Liu Yu. 17) Fix locking imbalance in error paths of HHF packet scheduler, from John Fastabend. 18) Properly reference the transport module when vsock_core_init() runs, from Andy King. 19) Fix buffer overflow in cdc_ncm driver, from Bjørn Mork. 20) IP_ECN_decapsulate() doesn't see a correct SKB network header in ip_tunnel_rcv(), fix from Ying Cai. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits) net: macb: Fix race between HW and driver net: macb: Remove 'unlikely' optimization net: macb: Re-enable RX interrupt only when RX is done net: macb: Clear interrupt flags net: macb: Pass same size to DMA_UNMAP as used for DMA_MAP ip_tunnel: Set network header properly for IP_ECN_decapsulate() e1000e: Restrict MDIO Slow Mode workaround to relevant parts e1000e: Fix issue with link flap on 82579 e1000e: Expand workaround for 10Mb HD throughput bug e1000e: Workaround for dropped packets in Gig/100 speeds on 82579 net/mlx4_core: Don't issue PCIe speed/width checks for VFs net/mlx4_core: Load the Eth driver first net/mlx4_core: Fix slave id computation for single port VF net/mlx4_core: Adjust port number in qp_attach wrapper when detaching net: cdc_ncm: fix buffer overflow Altera TSE: ALTERA_TSE should depend on HAS_DMA vsock: Make transport the proto owner net: sched: lock imbalance in hhf qdisc net: mvmdio: Check for a valid interrupt instead of an error net phy: Check for aneg completion before setting state to PHY_RUNNING ...	2014-05-05 15:59:46 -07:00
Linus Torvalds	783e9e8ede	USB fixes for 3.15-rc4 Here are some small fixes and device ids for 3.15-rc4. All have been in linux-next just fine. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEABECAAYFAlNoAMYACgkQMUfUDdst+ymm5QCgzEdgN+YTpuivw63Z/r4ZHGJW u/QAoLxnz5vjXk9bZDGfnUMEWHrw98Lu =epx3 -----END PGP SIGNATURE----- Merge tag 'usb-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small fixes and device ids for 3.15-rc4. All have been in linux-next just fine" * tag 'usb-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: Nokia 5300 should be treated as unusual dev USB: Nokia 305 should be treated as unusual dev fsl-usb: do not test for PHY_CLK_VALID bit on controller version 1.6 usb: storage: shuttle_usbat: fix discs being detected twice usb: qcserial: add a number of Dell devices USB: OHCI: fix problem with global suspend on ATI controllers usb: gadget: at91-udc: fix irq and iomem resource retrieval usb: phy: fsm: change "\|" to "\|\|" for condition OTG_STATE_A_WAIT_BCON at statemachine usb: phy: fsm: update OTG HNP state transition	2014-05-05 15:51:17 -07:00
Linus Torvalds	df862f625d	TTY/Serial fixes for 3.15-rc4 Here are some tty and serial driver fixes for things reported recently. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEABECAAYFAlNn/10ACgkQMUfUDdst+ynKXgCg1BE7sLm2Nmxkm+nfYceWmRG2 pQAAnRnY2wBwtpVa4mVy1ZR3ykiKPvmY =42O2 -----END PGP SIGNATURE----- Merge tag 'tty-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some tty and serial driver fixes for things reported recently" * tag 'tty-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: Fix lockless tty buffer race Revert "tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc" drivers/tty/hvc: don't free hvc_console_setup after init n_tty: Fix n_tty_write crash when echoing in raw mode tty: serial: 8250_core.c Bug fix for Exar chips.	2014-05-05 15:50:16 -07:00
Linus Torvalds	a1e74464ff	Staging / IIO fixes for 3.15-rc4 Here are some small IIO driver fixes for 3.15-rc4 that resolve some reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEABECAAYFAlNn+48ACgkQMUfUDdst+yn06wCgzKTAHNtiZDir6xddEr4DfkH+ XNsAoLRJAr+w9hdRuA9JcxnQBK1srTvq =vuRJ -----END PGP SIGNATURE----- Merge tag 'staging-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging / iio fixes from Greg KH: "Here are some small IIO driver fixes for 3.15-rc4 that resolve some reported issues" * tag 'staging-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: adc: Nothing in ADC should be a bool CONFIG iio: exynos_adc: use indio_dev->dev structure to handle child nodes iio:imu:mpu6050: Fixed segfault in Invensens MPU driver due to null dereference staging:iio:ad2s1200 fix missing parenthesis in a for statment.	2014-05-05 15:49:38 -07:00
Linus Torvalds	03787ff6f9	Xtensa patchset for v3.15. Fixes allmodconfig, allnoconfig builds Adds highmem support Enables build-time exception table sorting. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJTZ7iuAAoJEI9vqH3mFV2snKYP/AqEdj9JtWxg0MfB+mllOBmu nyhBW/AuVnsymy+WkOomJkxrUFAK+XTMfiOs45tdiW1dN2uJXIJ388C9pKLKtf6K vWzetOE6PbVusHuOQp6G+/jpjMwFZWeVvfWZ/DuvayjrAf281UENAD/4Wlmzakgw Qu/Qx+iQHfseMj/WdQ6N4C+zH0fjqpI6LwKkrWQ4x51tB7H9fMq9qS49BJ7WlxNg s+BQ44Qc71a5YZVwuH2mk8UqWRgCLYao+Ptp2z+buRmyz04kXvNWeVAHw2oSdeTr ug4pPPlSAdcid5fWhFOgmkvCSs1pw+fdi7honUQHmpGZRiwUdXHctlzgf4OUZjnX bsGeQ/klAmni8Ufgu0Ue+WIg7hvJUoE+AY7qp+Q32d0ln09FA89fitlb9WTT9zdq Y+6q7C/QPSqEoGe8GaLrE2o7tWqOlQ4Gd8ukbhwLNK3nInluT4PJkuknD+0s0Gmx Dc6+YXrUxMu+w0QoQ1aHrMVoAg4V3EtG7pFav3vW8jwhRaLO0hwEBlDPcQsAdoTt 250FYwL3TQoKRr3+j3OfjJBuQ61PfkEPjPfqJL5htkVQ2nZKsmyGnyWUs15ZbAiT bxT5CLrEH7rgvM1mL86TJqEu0IEYkaEnkCMHS7EYWRL7bRRHRRkNO8HmEJIwtWa0 0Io96EHboIFq3J+kWTnn =mD84 -----END PGP SIGNATURE----- Merge tag 'xtensa-next-20140503' of git://github.com/czankel/xtensa-linux Pull Xtensa fixes from Chris Zankel: - Fixes allmodconfig, allnoconfig builds - Adds highmem support - Enables build-time exception table sorting. * tag 'xtensa-next-20140503' of git://github.com/czankel/xtensa-linux: xtensa: ISS: don't depend on CONFIG_TTY xtensa: xt2000: drop redundant sysmem initialization xtensa: add support for KC705 xtensa: xtfpga: introduce SoC I/O bus xtensa: add HIGHMEM support xtensa: optimize local_flush_tlb_kernel_range xtensa: dump sysmem from the bootmem_init xtensa: handle memmap kernel option xtensa: keep sysmem banks ordered in mem_reserve xtensa: keep sysmem banks ordered in add_sysmem_bank xtensa: split bootparam and kernel meminfo xtensa: enable sorting extable at build time xtensa: export __{invalidate,flush}_dcache_range xtensa: Export __invalidate_icache_range	2014-05-05 15:36:59 -07:00
Linus Torvalds	5575eeb7b9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "First, there is a critical fix for the new primary-affinity function that went into -rc1. The second batch of patches from Zheng fix a range of problems with directory fragmentation, readdir, and a few odds and ends for cephfs" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: reserve caps for file layout/lock MDS requests ceph: avoid releasing caps that are being used ceph: clear directory's completeness when creating file libceph: fix non-default values check in apply_primary_affinity() ceph: use fpos_cmp() to compare dentry positions ceph: check directory's completeness before emitting directory entry	2014-05-05 15:17:02 -07:00
Soren Brinkmann	c8ea5a22bd	net: macb: Fix race between HW and driver Under "heavy" RX load, the driver cannot handle the descriptors fast enough. In detail, when a descriptor is consumed, its used flag is cleared and once the RX budget is consumed all descriptors with a cleared used flag are prepared to receive more data. Under load though, the HW may constantly receive more data and use those descriptors with a cleared used flag before they are actually prepared for next usage. The head and tail pointers into the RX-ring should always be valid and we can omit clearing and checking of the used flag. Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:11:18 -04:00
Soren Brinkmann	504ad98df3	net: macb: Remove 'unlikely' optimization Coverage data suggests that the unlikely case of receiving data while the receive handler is running may not be that unlikely. Coverage data after running iperf for a while: 91320: 891: work_done = bp->macbgem_ops.mog_rx(bp, budget); 91320: 892: if (work_done < budget) { 2362: 893: napi_complete(napi); -: 894: -: 895: /* Packets received while interrupts were disabled */ 4724: 896: status = macb_readl(bp, RSR); 2362: 897: if (unlikely(status)) { 762: 898: if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE) 762: 899: macb_writel(bp, ISR, MACB_BIT(RCOMP)); -: 900: napi_reschedule(napi); -: 901: } else { 1600: 902: macb_writel(bp, IER, MACB_RX_INT_FLAGS); -: 903: } -: 904: } Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:11:18 -04:00
Soren Brinkmann	02f7a34f34	net: macb: Re-enable RX interrupt only when RX is done When data is received during the driver processing received data the NAPI is re-scheduled. In that case the RX interrupt should not be re-enabled. Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:11:18 -04:00
Soren Brinkmann	6a027b705f	net: macb: Clear interrupt flags A few interrupt flags were not cleared in the ISR, resulting in a sytem trapped in the ISR in cases one of those interrupts occurred. Clear all flags to avoid such situations. Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:11:18 -04:00
Soren Brinkmann	ccd6d0a910	net: macb: Pass same size to DMA_UNMAP as used for DMA_MAP Just as commit "net: macb: DMA-unmap full rx-buffer" (`48330e08fa`), pass the size that was used for mapping the memory also to the unmap routine to avoid warnings from the DMA_API. Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:11:18 -04:00
Ying Cai	e96f2e7c43	ip_tunnel: Set network header properly for IP_ECN_decapsulate() In ip_tunnel_rcv(), set skb->network_header to inner IP header before IP_ECN_decapsulate(). Without the fix, IP_ECN_decapsulate() takes outer IP header as inner IP header, possibly causing error messages or packet drops. Note that this skb_reset_network_header() call was in this spot when the original feature for checking consistency of ECN bits through tunnels was added in `eccc1bb8d4` ("tunnel: drop packet if ECN present with not-ECT"). It was only removed from this spot in `3d7b46cd20` ("ip_tunnel: push generic protocol handling to ip_tunnel module."). Fixes: `3d7b46cd20` ("ip_tunnel: push generic protocol handling to ip_tunnel module.") Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Ying Cai <ycai@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 16:32:17 -04:00
David S. Miller	780ce3a225	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates This series contains updates to e1000e only. David provides four fixes for e1000e, first is a workaround for a hardware erratum on 82579 devices which experienced packet loss in gigabit and 100 speeds when interconnect between the PHY and MAC is exiting K1 power saving state. Second expands the scope of a workaround to include i217 and i218 parts as well to address over aggressive transmit behavior when connecting at 10Mbs half-duplex. Next is to resolve a reported link flap issue on 82579 parts which was root caused as an interoperability problem between 82579 and at least some Broadcom PHYs in the Energy Efficient Ethernet wake mechanism. Lastly, restricts the workaround of putting the PHY into MDIO slow mode to access the PHY id to relevant parts since this issue has been fixed on the newer hardware. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 16:30:03 -04:00
David Ertman	2c9826243b	e1000e: Restrict MDIO Slow Mode workaround to relevant parts It has been determined that the workaround of putting the PHY into MDIO slow mode to access the PHY id is not necessary with Lynx Point and newer parts. The issue that necessitated the workaround has been fixed on the newer hardware. We will maintains, as a last ditch attempt, the conversion to MDIO Slow Mode in the failure branch when attempting to access the PHY id so as to cover all contingencies. Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2014-05-05 13:03:27 -07:00

1 2 3 4 5 ...

442120 Commits