Commit Graph

17363 Commits

Author SHA1 Message Date
Linus Torvalds 5a667a0c02 Merge branches 'irq-urgent-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq update from Thomas Gleixner:
 "Fix from the urgent branch: a trivial oneliner adding the missing
  Kconfig dependency curing build failures which have been discovered by
  several build robots.

  The update in the irq-core branch provides a new function in the
  irq/devres code, which is a prerequisite for driver developers to get
  rid of boilerplate code all over the place.

  Not a bugfix, but it has zero impact on the current kernel due to the
  lack of users.  It's simpler to provide the infrastructure to
  interested parties via your tree than fulfilling the wishlist of
  driver maintainers on which particular commit or tag this should be
  based on"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=n

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Add devm_request_any_context_irq()
2014-02-15 16:06:12 -08:00
Linus Torvalds 3a19c07c56 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "The following trilogy of patches brings you:

   - fix for a long standing math overflow issue with HZ < 60

   - an onliner fix for a corner case in the dreaded tick broadcast
     mechanism affecting a certain range of AMD machines which are
     infested with the infamous automagic C1E power control misfeature

   - a fix for one of the ARM platforms which allows the kernel to
     proceed and boot instead of stupidly panicing for no good reason.
     The patch is slightly larger than necessary, but it's less ugly
     than the alternative 5 liner"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick: Clear broadcast pending bit when switching to oneshot
  clocksource: Kona: Print warning rather than panic
  time: Fix overflow when HZ is smaller than 60
2014-02-15 16:04:42 -08:00
Thomas Gleixner dd5fd9b91a tick: Clear broadcast pending bit when switching to oneshot
AMD systems which use the C1E workaround in the amd_e400_idle routine
trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU.

The reason is that the idle routine of those AMD systems switches the
cpu into forced broadcast mode early on before the newly brought up
CPU can switch over to high resolution / NOHZ mode. The timer related
CPU1 bringup looks like this:

  clockevent_register_device(local_apic);
  tick_setup(local_apic);
  ...
  idle()
	tick_broadcast_on_off(FORCE);
	tick_broadcast_oneshot_control(ENTER)
	  cpumask_set(cpu, broadcast_oneshot_mask);
	halt();

Now the broadcast interrupt on CPU0 sets CPU1 in the
broadcast_pending_mask and wakes CPU1. So CPU1 continues:

	local_apic_timer_interrupt()
	   tick_handle_periodic();
	   softirq()
	     tick_init_highres();
	       cpumask_clr(cpu, broadcast_oneshot_mask);
	
	tick_broadcast_oneshot_control(ENTER)
	   WARN_ON(cpumask_test(cpu, broadcast_pending_mask);

So while we remove CPU1 from the broadcast_oneshot_mask when we switch
over to highres mode, we do not clear the pending bit, which then
triggers the warning when we go back to idle.

The reason why this is only visible on C1E affected AMD systems is
that the other machines enter the deep sleep states via
acpi_idle/intel_idle and exit the broadcast mode before executing the
remote triggered local_apic_timer_interrupt. So the pending bit is
already cleared when the switch over to highres mode is clearing the
oneshot mask.

The solution is simple: Clear the pending bit together with the mask
bit when we switch over to highres mode.

Stanislaw came up independently with the same patch by enforcing the
C1E workaround and debugging the fallout. I picked mine, because mine
has a changelog :)

Reported-by: poma <pomidorabelisima@gmail.com>
Debugged-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Justin M. Forbes <jforbes@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402111434180.21991@ionos.tec.linutronix.de
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-13 21:55:54 +01:00
Tejun Heo 1a11533fbd Revert "cgroup: use an ordered workqueue for cgroup destruction"
This reverts commit ab3f5faa62.
Explanation from Hugh:

  It's because more thorough testing, by others here, found that it
  wasn't always solving the problem: so I asked Tejun privately to
  hold off from sending it in, until we'd worked out why not.

  Most of our testing being on a v3,11-based kernel, it was perfectly
  possible that the problem was merely our own e.g. missing Tejun's
  8a2b753844 ("workqueue: fix ordered workqueues in NUMA setups").

  But that turned out not to be enough to fix it either. Then Filipe
  pointed out how percpu_ref_kill_and_confirm() uses call_rcu_sched()
  before we ever get to put the offline on to the workqueue: by the
  time we get to the workqueue, the ordering has already been lost.

  So, thanks for the Acks, but I'm afraid that this ordered workqueue
  solution is just not good enough: we should simply forget that patch
  and provide a different answer."

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
2014-02-12 19:08:28 -05:00
Steven Rostedt (Red Hat) d651aa1d68 ring-buffer: Fix first commit on sub-buffer having non-zero delta
Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on
that page use a 27 bit delta against that timestamp in order to save on
bits written to the ring buffer. If the time between events is larger than
what the 27 bits can hold, a "time extend" event is added to hold the
entire 64 bit timestamp again and the events after that hold a delta from
that timestamp.

As a "time extend" is always paired with an event, it is logical to just
allocate the event with the time extend, to make things a bit more efficient.

Unfortunately, when the pairing code was written, it removed the "delta = 0"
from the first commit on a page, causing the events on the page to be
slightly skewed.

Fixes: 69d1b839f7 "ring-buffer: Bind time extend and data events together"
Cc: stable@vger.kernel.org # 2.6.37+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-11 13:38:54 -05:00
Li Zefan 0ab02ca8f8 cgroup: protect modifications to cgroup_idr with cgroup_mutex
Setup cgroupfs like this:
  # mount -t cgroup -o cpuacct xxx /cgroup
  # mkdir /cgroup/sub1
  # mkdir /cgroup/sub2

Then run these two commands:
  # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
  # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &

After seconds you may see this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
idr_remove called for id=6 which is not allocated.
...
Call Trace:
 [<ffffffff8156063c>] dump_stack+0x7a/0x96
 [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff81300aa7>] sub_remove+0x87/0x1b0
 [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
 [<ffffffff81300bf5>] idr_remove+0x25/0xd0
 [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
 [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
 [<ffffffff8107cdbc>] process_one_work+0x26c/0x550
 [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
 [<ffffffff81085f96>] kthread+0xe6/0xf0
 [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
---[ end trace 2d1577ec10cf80d0 ]---

It's because allocating/removing cgroup ID is not properly synchronized.

The bug was introduced when we converted cgroup_ida to cgroup_idr.
While synchronization is already done inside ida_simple_{get,remove}(),
users are responsible for concurrent calls to idr_{alloc,remove}().

tj: Refreshed on top of b58c89986a ("cgroup: fix error return from
cgroup_create()").

Fixes: 4e96ee8e98 ("cgroup: convert cgroup_ida to cgroup_idr")
Cc: <stable@vger.kernel.org> #3.12+
Reported-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-02-11 10:38:30 -05:00
Paul Gortmaker 2c45aada34 genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=n
In allmodconfig builds for sparc and any other arch which does
not set CONFIG_SPARSE_IRQ, the following will be seen at modpost:

  CC [M]  lib/cpu-notifier-error-inject.o
  CC [M]  lib/pm-notifier-error-inject.o
ERROR: "irq_to_desc" [drivers/gpio/gpio-mcp23s08.ko] undefined!
make[2]: *** [__modpost] Error 1

This happens because commit 3911ff30f5 ("genirq: export
handle_edge_irq() and irq_to_desc()") added one export for it, but
there were actually two instances of it, in an if/else clause for
CONFIG_SPARSE_IRQ.  Add the second one.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: stable@vger.kernel.org	# 3.4+
Link: http://lkml.kernel.org/r/1392057610-11514-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-11 10:30:36 +01:00
Stephen Boyd 0668d30651 genirq: Add devm_request_any_context_irq()
Some drivers use request_any_context_irq() but there isn't a
devm_* function for it. Add one so that these drivers don't need
to explicitly free the irq on driver detach.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: http://lkml.kernel.org/r/1388709460-19222-3-git-send-email-sboyd@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-09 15:27:21 +01:00
Linus Torvalds c132adef53 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
 "Add a missing Kconfig dependency"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Generic irq chip requires IRQ_DOMAIN
2014-02-08 12:08:48 -08:00
Tejun Heo 48573a8933 cgroup: fix locking in cgroup_cfts_commit()
cgroup_cfts_commit() walks the cgroup hierarchy that the target
subsystem is attached to and tries to apply the file changes.  Due to
the convolution with inode locking, it can't keep cgroup_mutex locked
while iterating.  It currently holds only RCU read lock around the
actual iteration and then pins the found cgroup using dget().

Unfortunately, this is incorrect.  Although the iteration does check
cgroup_is_dead() before invoking dget(), there's nothing which
prevents the dentry from going away inbetween.  Note that this is
different from the usual css iterations where css_tryget() is used to
pin the css - css_tryget() tests whether the css can be pinned and
fails if not.

The problem can be solved by simply holding cgroup_mutex instead of
RCU read lock around the iteration, which actually reduces LOC.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
2014-02-08 10:26:34 -05:00
Tejun Heo b58c89986a cgroup: fix error return from cgroup_create()
cgroup_create() was returning 0 after allocation failures.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
2014-02-08 10:26:33 -05:00
Tejun Heo eb46bf8969 cgroup: fix error return value in cgroup_mount()
When cgroup_mount() fails to allocate an id for the root, it didn't
set ret before jumping to unlock_drop ending up returning 0 after a
failure.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: stable@vger.kernel.org
2014-02-08 10:26:33 -05:00
Hugh Dickins ab3f5faa62 cgroup: use an ordered workqueue for cgroup destruction
Sometimes the cleanup after memcg hierarchy testing gets stuck in
mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.

There may turn out to be several causes, but a major cause is this: the
workitem to offline parent can get run before workitem to offline child;
parent's mem_cgroup_reparent_charges() circles around waiting for the
child's pages to be reparented to its lrus, but it's holding cgroup_mutex
which prevents the child from reaching its mem_cgroup_reparent_charges().

Just use an ordered workqueue for cgroup_destroy_wq.

tj: Committing as the temporary fix until the reverse dependency can
    be removed from memcg.  Comment updated accordingly.

Fixes: e5fca243ab ("cgroup: use a dedicated workqueue for cgroup destruction")
Suggested-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-02-07 10:21:12 -05:00
Mikulas Patocka 80d767d770 time: Fix overflow when HZ is smaller than 60
When compiling for the IA-64 ski emulator, HZ is set to 32 because the
emulation is slow and we don't want to waste too many cycles processing
timers. Alpha also has an option to set HZ to 32.

This causes integer underflow in
kernel/time/jiffies.c:
kernel/time/jiffies.c:66:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
  .mult  = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
  ^

This patch reduces the JIFFIES_SHIFT value to avoid the overflow.

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1401241639100.23871@file01.intranet.prod.int.rdu2.redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-06 16:01:40 +01:00
Linus Torvalds c4ad8f98be execve: use 'struct filename *' for executable name passing
This changes 'do_execve()' to get the executable name as a 'struct
filename', and to free it when it is done.  This is what the normal
users want, and it simplifies and streamlines their error handling.

The controlled lifetime of the executable name also fixes a
use-after-free problem with the trace_sched_process_exec tracepoint: the
lifetime of the passed-in string for kernel users was not at all
obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
the pathname allocation lifetime with the execve() having finished,
which in turn meant that the trace point that happened after
mm_release() of the old process VM ended up using already free'd memory.

To solve the kernel string lifetime issue, this simply introduces
"getname_kernel()" that works like the normal user-space getname()
function, except with the source coming from kernel memory.

As Oleg points out, this also means that we could drop the tcomm[] array
from 'struct linux_binprm', since the pathname lifetime now covers
setup_new_exec().  That would be a separate cleanup.

Reported-by: Igor Zhbanov <i.zhbanov@samsung.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-05 12:54:53 -08:00
Nitin A Kamble 923fa4ea38 genirq: Generic irq chip requires IRQ_DOMAIN
The generic_chip.c uses interfaces from irq_domain.c which is
controlled by the IRQ_DOMAIN config option, but there is no Kconfig
dependency so the build can fail:

linux/kernel/irq/generic-chip.c:400:11: error:
'irq_domain_xlate_onetwocell' undeclared here (not in a function)

Select IRQ_DOMAIN when GENERIC_IRQ_CHIP is selected.

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Link: http://lkml.kernel.org/r/1391129410-54548-2-git-send-email-nitin.a.kamble@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # 3.11+
2014-02-05 10:17:32 +01:00
Tejun Heo 1ff6bbfd13 arm, pm, vmpressure: add missing slab.h includes
arch/arm/mach-tegra/pm.c, kernel/power/console.c and mm/vmpressure.c
were somehow getting slab.h indirectly through cgroup.h which in turn
was getting it indirectly through xattr.h.  A scheduled cgroup change
drops xattr.h inclusion from cgroup.h and breaks compilation of these
three files.  Add explicit slab.h includes to the three files.

A pending cgroup patch depends on this change and it'd be great if
this can be routed through cgroup/for-3.14-fixes branch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: linux-tegra@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: cgroups@vger.kernel.org
2014-02-03 13:24:01 -05:00
Linus Torvalds aafd9d6a46 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer/dynticks updates from Ingo Molnar:
 "This tree contains misc dynticks updates: a fix and three cleanups"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/nohz: Fix overflow error in scheduler_tick_max_deferment()
  nohz_full: fix code style issue of tick_nohz_full_stop_tick
  nohz: Get timekeeping max deferment outside jiffies_lock
  tick: Rename tick_check_idle() to tick_irq_enter()
2014-01-31 09:02:51 -08:00
Linus Torvalds 595bf999e3 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A crash fix and documentation updates"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Make sched_class::get_rr_interval() optional
  sched/deadline: Add sched_dl documentation
  sched: Fix docbook parameter annotation error in wait.h
2014-01-31 09:00:44 -08:00
Linus Torvalds ab5318788c Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core debug changes from Ingo Molnar:
 "This contains mostly kernel debugging related updates:

   - make hung_task detection more configurable to distros
   - add final bits for x86 UV NMI debugging, with related KGDB changes
   - update the mailing-list of MAINTAINERS entries I'm involved with"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hung_task: Display every hung task warning
  sysctl: Add neg_one as a standard constraint
  x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured
  x86/uv/nmi: Fix Sparse warnings
  kgdb/kdb: Fix no KDB config problem
  MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries
2014-01-31 08:59:46 -08:00
Roman Gushchin 73f945505b kernel/smp.c: remove cpumask_ipi
After commit 9a46ad6d6d ("smp: make smp_call_function_many() use logic
similar to smp_call_function_single()"), cfd->cpumask is accessed only
in smp_call_function_many().  So there is no more need to copy it into
cfd->cpumask_ipi before putting csd into the list.  The cpumask_ipi
field is obsolete and can be removed.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wang YanQing <udknight@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30 16:56:54 -08:00
Christoph Hellwig 6897fc22ea kernel: use lockless list for smp_call_function_single
Make smp_call_function_single and friends more efficient by using a
lockless list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30 16:56:54 -08:00
Linus Torvalds f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Linus Torvalds bf3d846b78 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and ->set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
2014-01-28 08:38:04 -08:00
Peter Zijlstra a57beec5d4 sched: Make sched_class::get_rr_interval() optional
Not all classes implement (or can implement) a useful get_rr_interval()
function, default to a 0 time-slice for them.

This fixes a crash reported by Tommi Rantala.

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140127105413.GC11314@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-28 13:08:41 +01:00
Dario Faggioli 712e5e34ae sched/deadline: Add sched_dl documentation
Add in Documentation/scheduler/ some hints about the design
choices, the usage and the future possible developments of the
sched_dl scheduling class and of the SCHED_DEADLINE policy.

Reviewed-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
[ Re-wrote sections 2 and 3. ]
Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390821615-23247-1-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-28 13:08:40 +01:00
Joe Perches ce85b4f2ea softirq: use const char * const for softirq_to_name, whitespace neatening
Reduce data size a little.
Reduce checkpatch noise.

$ size kernel/softirq.o*
   text	   data	    bss	    dec	    hex	filename
  11554	   6013	   4008	  21575	   5447	kernel/softirq.o.new
  11474	   6093	   4008	  21575	   5447	kernel/softirq.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:40 -08:00
Joe Perches 4032276415 softirq: convert printks to pr_<level>
Use a more current logging style.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:40 -08:00
Joe Perches 2e702b9f6c softirq: use ffs() in __do_softirq()
Possible speed improvement of __do_softirq() by using ffs() instead of
using a while loop with an & 1 test then single bit shift.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:40 -08:00
Chen Gang a19428e5c3 kernel/kexec.c: use vscnprintf() instead of vsnprintf() in vmcoreinfo_append_str()
vsnprintf() may let 'r' larger than sizeof(buf), in this case, if 'r' is
also less than "vmcoreinfo_max_size - vmcoreinfo_size" (left size of
destination buffer), next memcpy() will read the unexpected addresses.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:40 -08:00
Linus Torvalds ba635f8cd2 The first two patches fix the debugfs README file to reflect better
the new features added to 3.14.
 
 The third patch is a minor bugfix to the trace_puts() functions that
 will crash the system if a developer adds one before the tracing system
 is setup. It also affects trace_printk() if it has no arguments, as
 the code will convert it to a trace_puts() as well. Note, this bug
 will not affect unmodified kernels, as trace_printk() and trace_puts()
 should only be used by developers for testing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJS5oAkAAoJEKQekfcNnQGuOfkH/1ScwoRslERCCjhPEZu958bG
 VugNvliB/uDwq3qkmBcncMHwqkFBgJay9ieah+JFeK6x3G6mO/uf+UhN5cmLzVBh
 vA5dKR2zW6WTxKYSvXYapD7OzC7R+V6CLvpMl5WMZ1t3ESQhR6gUrpPdigecBWs9
 015rMVA2xXjNnHNDM1nem1PhnMba78A/N98lFErYGpvVBjkCpB8mSt8adds9bYp8
 a83P6tGUfXvsYO3sOqqBnOOqcfzCjjJbmr94v/F+SmLyuxuV0tVzBkIwXkKTNhEK
 bQZj9Pfe+1oIkldXxstn5jWaOvI6RlAGW4b4qXt30AgrGRSyEo5xpvfLfyNUif4=
 =N3kq
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "The first two patches fix the debugfs README file to reflect better
  the new features added to 3.14.

  The third patch is a minor bugfix to the trace_puts() functions that
  will crash the system if a developer adds one before the tracing
  system is setup.  It also affects trace_printk() if it has no
  arguments, as the code will convert it to a trace_puts() as well.

  Note, this bug will not affect unmodified kernels, as trace_printk()
  and trace_puts() should only be used by developers for testing"

* tag 'trace-fixes-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Check if tracing is enabled in trace_puts()
  tracing: Fix formatting of trace README file
  tracing/README: Add event file usage to tracing mini-HOWTO
2014-01-27 08:22:30 -08:00
Linus Torvalds f6d13daadd Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A couple of regression fixes mostly hitting virtualized setups, but
  also some bare metal systems"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/x86/tsc: Initialize multiplier to 0
  sched/clock: Fixup early initialization
  sched/preempt/x86: Fix voluntary preempt for x86
  Revert "sched: Fix sleep time double accounting in enqueue entity"
2014-01-25 11:11:31 -08:00
Aaron Tomlin 270750dbc1 hung_task: Display every hung task warning
When khungtaskd detects hung tasks, it prints out
backtraces from a number of those tasks.

Limiting the number of backtraces being printed
out can result in the user not seeing the information
necessary to debug the issue. The hung_task_warnings
sysctl controls this feature.

This patch makes it possible for hung_task_warnings
to accept a special value to print an unlimited
number of backtraces when khungtaskd detects hung
tasks.

The special value is -1. To use this value it is
necessary to change types from ulong to int.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/1390239253-24030-3-git-send-email-atomlin@redhat.com
[ Build warning fix. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 12:13:33 +01:00
Oleg Nesterov a8d4b8345e introduce __fcheck_files() to fix rcu_dereference_check_fdtable(), kill rcu_my_thread_group_empty()
rcu_dereference_check_fdtable() looks very wrong,

1. rcu_my_thread_group_empty() was added by 844b9a8707 "vfs: fix
   RCU-lockdep false positive due to /proc" but it doesn't really
   fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
   hit the same race with get_files_struct().

   And otoh rcu_my_thread_group_empty() can suppress the correct
   warning if the caller is the CLONE_FILES (without CLONE_THREAD)
   task.

2. files->count == 1 check is not really right too. Even if this
   files_struct is not shared it is not safe to access it lockless
   unless the caller is the owner.

   Otoh, this check is sub-optimal. files->count == 0 always means
   it is safe to use it lockless even if files != current->files,
   but put_files_struct() has to take rcu_read_lock(). See the next
   patch.

This patch removes the buggy checks and turns fcheck_files() into
__fcheck_files() which uses rcu_dereference_raw(), the "unshared"
callers, fget_light() and fget_raw_light(), can use it to avoid
the warning from RCU-lockdep.

fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
plus __fcheck_files().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 03:14:36 -05:00
Aaron Tomlin 2397efb1bb sysctl: Add neg_one as a standard constraint
Add neg_one to the list of standard constraints - will be used by the next patch.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/1390239253-24030-2-git-send-email-atomlin@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 08:59:53 +01:00
Mike Travis fc8b13740b kgdb/kdb: Fix no KDB config problem
Some code added to the debug_core module had KDB dependencies
that it shouldn't have.  Move the KDB dependent REASON back to
the caller to remove the dependency in the debug core code.

Update the call from the UV NMI handler to conform to the new
interface.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20140114162551.318251993@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 08:55:09 +01:00
Ingo Molnar a2b4c607c9 Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent
Pull dynticks cleanups from Frederic Weisbecker.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 08:27:26 +01:00
Linus Torvalds 09da8dfa98 ACPI and power management updates for 3.14-rc1
- ACPI core changes to make it create a struct acpi_device object for every
    device represented in the ACPI tables during all namespace scans regardless
    of the current status of that device.  In accordance with this, ACPI hotplug
    operations will not delete those objects, unless the underlying ACPI tables
    go away.
 
  - On top of the above, new sysfs attribute for ACPI device objects allowing
    user space to check device status by triggering the execution of _STA for
    its ACPI object.  From Srinivas Pandruvada.
 
  - ACPI core hotplug changes reducing code duplication, integrating the
    PCI root hotplug with the core and reworking container hotplug.
 
  - ACPI core simplifications making it use ACPI_COMPANION() in the code
    "glueing" ACPI device objects to "physical" devices.
 
  - ACPICA update to upstream version 20131218.  This adds support for the
    DBG2 and PCCT tables to ACPICA, fixes some bugs and improves debug
    facilities.  From Bob Moore, Lv Zheng and Betty Dall.
 
  - Init code change to carry out the early ACPI initialization earlier.
    That should allow us to use ACPI during the timekeeping initialization
    and possibly to simplify the EFI initialization too.  From Chun-Yi Lee.
 
  - Clenups of the inclusions of ACPI headers in many places all over from
    Lv Zheng and Rashika Kheria (work in progress).
 
  - New helper for ACPI _DSM execution and rework of the code in drivers
    that uses _DSM to execute it via the new helper.  From Jiang Liu.
 
  - New Win8 OSI blacklist entries from Takashi Iwai.
 
  - Assorted ACPI fixes and cleanups from Al Stone, Emil Goode, Hanjun Guo,
    Lan Tianyu, Masanari Iida, Oliver Neukum, Prarit Bhargava, Rashika Kheria,
    Tang Chen, Zhang Rui.
 
  - intel_pstate driver updates, including proper Baytrail support, from
    Dirk Brandewie and intel_pstate documentation from Ramkumar Ramachandra.
 
  - Generic CPU boost ("turbo") support for cpufreq from Lukasz Majewski.
 
  - powernow-k6 cpufreq driver fixes from Mikulas Patocka.
 
  - cpufreq core fixes and cleanups from Viresh Kumar, Jane Li, Mark Brown.
 
  - Assorted cpufreq drivers fixes and cleanups from Anson Huang, John Tobias,
    Paul Bolle, Paul Walmsley, Sachin Kamat, Shawn Guo, Viresh Kumar.
 
  - cpuidle cleanups from Bartlomiej Zolnierkiewicz.
 
  - Support for hibernation APM events from Bin Shi.
 
  - Hibernation fix to avoid bringing up nonboot CPUs with ACPI EC disabled
    during thaw transitions from Bjørn Mork.
 
  - PM core fixes and cleanups from Ben Dooks, Leonardo Potenza, Ulf Hansson.
 
  - PNP subsystem fixes and cleanups from Dmitry Torokhov, Levente Kurusa,
    Rashika Kheria.
 
  - New tool for profiling system suspend from Todd E Brandt and a cpupower
    tool cleanup from One Thousand Gnomes.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS3a1eAAoJEILEb/54YlRxnTgP/iGawvgjKWm6Qqp7WSIvd5gQ
 zZ6q75C6Pc/W2fq1+OzVGnpCF8WYFy+nFDAXOvUHjIXuoxSwFcuW5l4aMckgl/0a
 TXEWe9MJrCHHRfDApfFacCJ44U02bjJAD5vTyL/hKA+IHeinq4WCSojryYC+8jU0
 cBrUIV0aNH8r5JR2WJNAyv/U29rXsDUOu0I4qTqZ4YaZT6AignMjtLXn1e9AH1Pn
 DPZphTIo/HMnb+kgBOjt4snMk+ahVO9eCOxh/hH8ecnWExw9WynXoU5Nsna0tSZs
 ssyHC7BYexD3oYsG8D52cFUpp4FCsJ0nFQNa2kw0LY+0FBNay43LySisKYHZPXEs
 2WpESDv+/t7yhtnrvM+TtA7aBheKm2XMWGFSu/aERLE17jIidOkXKH5Y7ryYLNf/
 uyRKxNS0NcZWZ0G+/wuY02jQYNkfYz3k/nTr8BAUItRBjdporGIRNEnR9gPzgCUC
 uQhjXWMPulqubr8xbyefPWHTEzU2nvbXwTUWGjrBxSy8zkyy5arfqizUj+VG6afT
 NsboANoMHa9b+xdzigSFdA3nbVK6xBjtU6Ywntk9TIpODKF5NgfARx0H+oSH+Zrj
 32bMzgZtHw/lAbYsnQ9OnTY6AEWQYt6NMuVbTiLXrMHhM3nWwfg/XoN4nZqs6jPo
 IYvE6WhQZU6L6fptGHFC
 =dRf6
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:
 "As far as the number of commits goes, the top spot belongs to ACPI
  this time with cpufreq in the second position and a handful of PM
  core, PNP and cpuidle updates.  They are fixes and cleanups mostly, as
  usual, with a couple of new features in the mix.

  The most visible change is probably that we will create struct
  acpi_device objects (visible in sysfs) for all devices represented in
  the ACPI tables regardless of their status and there will be a new
  sysfs attribute under those objects allowing user space to check that
  status via _STA.

  Consequently, ACPI device eject or generally hot-removal will not
  delete those objects, unless the table containing the corresponding
  namespace nodes is unloaded, which is extremely rare.  Also ACPI
  container hotplug will be handled quite a bit differently and cpufreq
  will support CPU boost ("turbo") generically and not only in the
  acpi-cpufreq driver.

  Specifics:

   - ACPI core changes to make it create a struct acpi_device object for
     every device represented in the ACPI tables during all namespace
     scans regardless of the current status of that device.  In
     accordance with this, ACPI hotplug operations will not delete those
     objects, unless the underlying ACPI tables go away.

   - On top of the above, new sysfs attribute for ACPI device objects
     allowing user space to check device status by triggering the
     execution of _STA for its ACPI object.  From Srinivas Pandruvada.

   - ACPI core hotplug changes reducing code duplication, integrating
     the PCI root hotplug with the core and reworking container hotplug.

   - ACPI core simplifications making it use ACPI_COMPANION() in the
     code "glueing" ACPI device objects to "physical" devices.

   - ACPICA update to upstream version 20131218.  This adds support for
     the DBG2 and PCCT tables to ACPICA, fixes some bugs and improves
     debug facilities.  From Bob Moore, Lv Zheng and Betty Dall.

   - Init code change to carry out the early ACPI initialization
     earlier.  That should allow us to use ACPI during the timekeeping
     initialization and possibly to simplify the EFI initialization too.
     From Chun-Yi Lee.

   - Clenups of the inclusions of ACPI headers in many places all over
     from Lv Zheng and Rashika Kheria (work in progress).

   - New helper for ACPI _DSM execution and rework of the code in
     drivers that uses _DSM to execute it via the new helper.  From
     Jiang Liu.

   - New Win8 OSI blacklist entries from Takashi Iwai.

   - Assorted ACPI fixes and cleanups from Al Stone, Emil Goode, Hanjun
     Guo, Lan Tianyu, Masanari Iida, Oliver Neukum, Prarit Bhargava,
     Rashika Kheria, Tang Chen, Zhang Rui.

   - intel_pstate driver updates, including proper Baytrail support,
     from Dirk Brandewie and intel_pstate documentation from Ramkumar
     Ramachandra.

   - Generic CPU boost ("turbo") support for cpufreq from Lukasz
     Majewski.

   - powernow-k6 cpufreq driver fixes from Mikulas Patocka.

   - cpufreq core fixes and cleanups from Viresh Kumar, Jane Li, Mark
     Brown.

   - Assorted cpufreq drivers fixes and cleanups from Anson Huang, John
     Tobias, Paul Bolle, Paul Walmsley, Sachin Kamat, Shawn Guo, Viresh
     Kumar.

   - cpuidle cleanups from Bartlomiej Zolnierkiewicz.

   - Support for hibernation APM events from Bin Shi.

   - Hibernation fix to avoid bringing up nonboot CPUs with ACPI EC
     disabled during thaw transitions from Bjørn Mork.

   - PM core fixes and cleanups from Ben Dooks, Leonardo Potenza, Ulf
     Hansson.

   - PNP subsystem fixes and cleanups from Dmitry Torokhov, Levente
     Kurusa, Rashika Kheria.

   - New tool for profiling system suspend from Todd E Brandt and a
     cpupower tool cleanup from One Thousand Gnomes"

* tag 'pm+acpi-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (153 commits)
  thermal: exynos: boost: Automatic enable/disable of BOOST feature (at Exynos4412)
  cpufreq: exynos4x12: Change L0 driver data to CPUFREQ_BOOST_FREQ
  Documentation: cpufreq / boost: Update BOOST documentation
  cpufreq: exynos: Extend Exynos cpufreq driver to support boost
  cpufreq / boost: Kconfig: Support for software-managed BOOST
  acpi-cpufreq: Adjust the code to use the common boost attribute
  cpufreq: Add boost frequency support in core
  intel_pstate: Add trace point to report internal state.
  cpufreq: introduce cpufreq_generic_get() routine
  ARM: SA1100: Create dummy clk_get_rate() to avoid build failures
  cpufreq: stats: create sysfs entries when cpufreq_stats is a module
  cpufreq: stats: free table and remove sysfs entry in a single routine
  cpufreq: stats: remove hotplug notifiers
  cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly
  cpufreq: speedstep: remove unused speedstep_get_state
  platform: introduce OF style 'modalias' support for platform bus
  PM / tools: new tool for suspend/resume performance optimization
  ACPI: fix module autoloading for ACPI enumerated devices
  ACPI: add module autoloading support for ACPI enumerated devices
  ACPI: fix create_modalias() return value handling
  ...
2014-01-24 15:51:02 -08:00
Linus Torvalds 3aacd625f2 Merge branch 'akpm' (incoming from Andrew)
Merge second patch-bomb from Andrew Morton:
 - various misc bits
 - the rest of MM
 - add generic fixmap.h, use it
 - backlight updates
 - dynamic_debug updates
 - printk() updates
 - checkpatch updates
 - binfmt_elf
 - ramfs
 - init/
 - autofs4
 - drivers/rtc
 - nilfs
 - hfsplus
 - Documentation/
 - coredump
 - procfs
 - fork
 - exec
 - kexec
 - kdump
 - partitions
 - rapidio
 - rbtree
 - userns
 - memstick
 - w1
 - decompressors

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (197 commits)
  lib/decompress_unlz4.c: always set an error return code on failures
  romfs: fix returm err while getting inode in fill_super
  drivers/w1/masters/w1-gpio.c: add strong pullup emulation
  drivers/memstick/host/rtsx_pci_ms.c: fix ms card data transfer bug
  userns: relax the posix_acl_valid() checks
  arch/sh/kernel/dwarf.c: use rbtree postorder iteration helper instead of solution using repeated rb_erase()
  fs-ext3-use-rbtree-postorder-iteration-helper-instead-of-opencoding-fix
  fs/ext3: use rbtree postorder iteration helper instead of opencoding
  fs/jffs2: use rbtree postorder iteration helper instead of opencoding
  fs/ext4: use rbtree postorder iteration helper instead of opencoding
  fs/ubifs: use rbtree postorder iteration helper instead of opencoding
  net/netfilter/ipset/ip_set_hash_netiface.c: use rbtree postorder iteration instead of opencoding
  rbtree/test: test rbtree_postorder_for_each_entry_safe()
  rbtree/test: move rb_node to the middle of the test struct
  rapidio: add modular rapidio core build into powerpc and mips branches
  partitions/efi: complete documentation of gpt kernel param purpose
  kdump: add /sys/kernel/vmcoreinfo ABI documentation
  kdump: fix exported size of vmcoreinfo note
  kexec: add sysctl to disable kexec_load
  fs/exec.c: call arch_pick_mmap_layout() only once
  ...
2014-01-23 19:11:50 -08:00
Linus Torvalds 13c789a6b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 3.14:

   - Improved crypto_memneq helper
   - Use cyprto_memneq in arch-specific crypto code
   - Replaced orphaned DCP driver with Freescale MXS DCP driver
   - Added AVX/AVX2 version of AESNI-GCM encode and decode
   - Added AMD Cryptographic Coprocessor (CCP) driver
   - Misc fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (41 commits)
  crypto: aesni - fix build on x86 (32bit)
  crypto: mxs - Fix sparse non static symbol warning
  crypto: ccp - CCP device enabled/disabled changes
  crypto: ccp - Cleanup hash invocation calls
  crypto: ccp - Change data length declarations to u64
  crypto: ccp - Check for caller result area before using it
  crypto: ccp - Cleanup scatterlist usage
  crypto: ccp - Apply appropriate gfp_t type to memory allocations
  crypto: drivers - Sort drivers/crypto/Makefile
  ARM: mxs: dts: Enable DCP for MXS
  crypto: mxs - Add Freescale MXS DCP driver
  crypto: mxs - Remove the old DCP driver
  crypto: ahash - Fully restore ahash request before completing
  crypto: aesni - fix build on x86 (32bit)
  crypto: talitos - Remove redundant dev_set_drvdata
  crypto: ccp - Remove redundant dev_set_drvdata
  crypto: crypto4xx - Remove redundant dev_set_drvdata
  crypto: caam - simplify and harden key parsing
  crypto: omap-sham - Fix Polling mode for larger blocks
  crypto: tcrypt - Added speed tests for AEAD crypto alogrithms in tcrypt test suite
  ...
2014-01-23 18:11:00 -08:00
Linus Torvalds 6dd9158ae8 Merge git://git.infradead.org/users/eparis/audit
Pull audit update from Eric Paris:
 "Again we stayed pretty well contained inside the audit system.
  Venturing out was fixing a couple of function prototypes which were
  inconsistent (didn't hurt anything, but we used the same value as an
  int, uint, u32, and I think even a long in a couple of places).

  We also made a couple of minor changes to when a couple of LSMs called
  the audit system.  We hoped to add aarch64 audit support this go
  round, but it wasn't ready.

  I'm disappearing on vacation on Thursday.  I should have internet
  access, but it'll be spotty.  If anything goes wrong please be sure to
  cc rgb@redhat.com.  He'll make fixing things his top priority"

* git://git.infradead.org/users/eparis/audit: (50 commits)
  audit: whitespace fix in kernel-parameters.txt
  audit: fix location of __net_initdata for audit_net_ops
  audit: remove pr_info for every network namespace
  audit: Modify a set of system calls in audit class definitions
  audit: Convert int limit uses to u32
  audit: Use more current logging style
  audit: Use hex_byte_pack_upper
  audit: correct a type mismatch in audit_syscall_exit()
  audit: reorder AUDIT_TTY_SET arguments
  audit: rework AUDIT_TTY_SET to only grab spin_lock once
  audit: remove needless switch in AUDIT_SET
  audit: use define's for audit version
  audit: documentation of audit= kernel parameter
  audit: wait_for_auditd rework for readability
  audit: update MAINTAINERS
  audit: log task info on feature change
  audit: fix incorrect set of audit_sock
  audit: print error message when fail to create audit socket
  audit: fix dangling keywords in audit_log_set_loginuid() output
  audit: log on errors from filter user rules
  ...
2014-01-23 18:08:10 -08:00
Vivek Goyal 77019967f0 kdump: fix exported size of vmcoreinfo note
Right now we seem to be exporting the max data size contained inside
vmcoreinfo note.  But this does not include the size of meta data around
vmcore info data.  Like name of the note and starting and ending elf_note.

I think user space expects total size and that size is put in PT_NOTE elf
header.  Things seem to be fine so far because we are not using vmcoreinfo
note to the maximum capacity.  But as it starts filling up, to capacity,
at some point of time, problem will be visible.

I don't think user space will be broken with this change.  So there is no
need to introduce vmcoreinfo2.  This change is safe and backward
compatible.  More explanation on why this change is safe is below.

vmcoreinfo contains information about kernel which user space needs to
know to do things like filtering.  For example, various kernel config
options or information about size or offset of some data structures etc.
All this information is commmunicated to user space with an ELF note
present in ELF /proc/vmcore file.

Currently vmcoreinfo data size is 4096.  With some elf note meta data
around it, actual size is 4132 bytes.  But we are using barely 25% of that
size.  Rest is empty.  So even if we tell user space that size of ELf note
is 4096 and not 4132, nothing will be broken becase after around 1000
bytes, everything is zero anyway.

But once we start filling up the note to the capacity, and not report the
full size of note, bad things will start happening.  Either some data will
be lost or tools will be confused that they did not fine the zero note at
the end.

So I think this change is safe and should not break existing tools.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Cc: Dan Aloni <da-x@monatomic.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:03 -08:00
Kees Cook 7984754b99 kexec: add sysctl to disable kexec_load
For general-purpose (i.e.  distro) kernel builds it makes sense to build
with CONFIG_KEXEC to allow end users to choose what kind of things they
want to do with kexec.  However, in the face of trying to lock down a
system with such a kernel, there needs to be a way to disable kexec_load
(much like module loading can be disabled).  Without this, it is too easy
for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM
and modules_disabled are set.  With this change, it is still possible to
load an image for use later, then disable kexec_load so the image (or lack
of image) can't be altered.

The intention is for using this in environments where "perfect"
enforcement is hard.  Without a verified boot, along with verified
modules, and along with verified kexec, this is trying to give a system a
better chance to defend itself (or at least grow the window of
discoverability) against attack in the face of a privilege escalation.

In my mind, I consider several boot scenarios:

1) Verified boot of read-only verified root fs loading fd-based
   verification of kexec images.
2) Secure boot of writable root fs loading signed kexec images.
3) Regular boot loading kexec (e.g. kcrash) image early and locking it.
4) Regular boot with no control of kexec image at all.

1 and 2 don't exist yet, but will soon once the verified kexec series has
landed.  4 is the state of things now.  The gap between 2 and 4 is too
large, so this change creates scenario 3, a middle-ground above 4 when 2
and 1 are not possible for a system.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:03 -08:00
Oleg Nesterov 8d38f203b4 kernel/signal.c: change do_signal_stop/do_sigaction to use while_each_thread()
Change do_signal_stop() and do_sigaction() to avoid next_thread() and use
while_each_thread() instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Oleg Nesterov 2e1f383582 kernel/sys.c: k_getrusage() can use while_each_thread()
Change k_getrusage() to use while_each_thread(), no changes in the
compiled code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Oleg Nesterov 98611e4e6a exec: kill task_struct->did_exec
We can kill either task->did_exec or PF_FORKNOEXEC, they are mutually
exclusive.  The patch kills ->did_exec because it has a single user.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Daeseok Youn 68ce670b6e kernel/fork.c: remove redundant NULL check in dup_mm()
current->mm doesn't need a NULL check in dup_mm().  Becasue dup_mm() is
used only in copy_mm() and current->mm is checked whether it is NULL or
not in copy_mm() before calling dup_mm().

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Daeseok Youn 5d59e18270 kernel/fork.c: fix coding style issues
Fix errors reported by checkpatch.pl.  One error is parentheses, the other
is a whitespace issue.

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
DaeSeok Youn ff252c1fc5 kernel/fork.c: make dup_mm() static
dup_mm() is used only in kernel/fork.c

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Arun KS 1d3fa37034 printk: flush conflicting continuation line
An earlier newline was missing and current print is from different task.
In this scenario flush the continuation line and store this line
seperatly.

This patch fix the below scenario of timestamp interleaving,
   [   28.154370 ] read_word_reg : reg[0x 3], reg[0x 4]  data [0x 642]
   [   28.155428 ] uart disconnect
   [   31.947341 ] dvfs[cpufreq.c<275>]:plug-in cpu<1> done
   [   28.155445 ] UART detached : send switch state 201
   [   32.014112 ] read_reg : reg[0x 3] data[0x21]

[akpm@linux-foundation.org: simplify and condense the code]
Signed-off-by: Arun KS <getarunks@gmail.com>
Signed-off-by: Arun KS <arun.ks@broadcom.com>
Cc: Joe Perches <joe@perches.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:56 -08:00
Andi Kleen 54a43d5498 numa: add a sysctl for numa_balancing
Add a working sysctl to enable/disable automatic numa memory balancing
at runtime.

This allows us to track down performance problems with this feature and
is generally a good idea.

This was possible earlier through debugfs, but only with special
debugging options set.  Also fix the boot message.

[akpm@linux-foundation.org: s/sched_numa_balancing/sysctl_numa_balancing/]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Steven Rostedt (Red Hat) 3132e107d6 tracing: Check if tracing is enabled in trace_puts()
If trace_puts() is used very early in boot up, it can crash the machine
if it is called before the ring buffer is allocated. If a trace_printk()
is used with no arguments, then it will be converted into a trace_puts()
and suffer the same fate.

Cc: stable@vger.kernel.org # 3.10+
Fixes: 09ae72348e "tracing: Add trace_puts() for even faster trace_printk() tracing"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-23 12:27:59 -05:00
Peter Zijlstra d375b4e0fa sched/clock: Fixup early initialization
The code would assume sched_clock_stable() and switch to !stable
later, this switch brings a discontinuity in time.

The discontinuity on switching from stable to unstable was always
present, but previously we would set stable/unstable before
initializing TSC and usually stick to the one we start out with.

So the static_key bits brought an extra switch where there previously
wasn't one.

Things are further complicated by the fact that we cannot use
static_key as early as we usually call set_sched_clock_stable().

Fix things by tracking the stable state in a regular variable and only
set the static_key to the right state on sched_clock_init(), which is
ran right after late_time_init->tsc_init().

Before this we would not be using the TSC anyway.

Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: dyoung@redhat.com
Fixes: 35af99e646 ("sched/clock, x86: Use a static_key for sched_clock_stable")
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: paulmck@linux.vnet.ibm.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: rui.zhang@intel.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140122115918.GG3694@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23 14:48:36 +01:00
Vincent Guittot 9390675af0 Revert "sched: Fix sleep time double accounting in enqueue entity"
This reverts commit 282cf499f0.

With the current implementation, the load average statistics of a sched entity
change according to other activity on the CPU even if this activity is done
between the running window of the sched entity and have no influence on the
running duration of the task.

When a task wakes up on the same CPU, we currently update last_runnable_update
with the return  of __synchronize_entity_decay without updating the
runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
the load_contrib of the se with the rq's blocked_load_contrib before removing
it from the latter (with __synchronize_entity_decay) but we must keep
last_runnable_update unchanged for updating runnable_avg_sum/period during the
next update_entity_load_avg.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: pjt@google.com
Cc: alex.shi@linaro.org
Link: http://lkml.kernel.org/r/1390376734-6800-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23 14:48:34 +01:00
Linus Torvalds 0dc3fd0249 You'll get a merge issue on Documentation/module-signing.txt: the one in
your tree is more recent, so ignore mine please.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJS4IBbAAoJENkgDmzRrbjxvv4QAJRyid7F0kMMaImqNtLw+8Sc
 9QouX5nE0pt9VIDVs0v3JaJVcJrFrML498DaWM8yi1M4l11EIgWH0GYJ2DLHt3yf
 F9AGNmwZRzSJnPVyG7HWOlMbqycwh95P3ldJ3NmGClI++BKI/Qn9ffa/hdy7O+1E
 y7K0TWQQmG1HgIA7BVY491bzQTiUtelqrKShup0TQ6hRYYESs1g+JOClQgeGiFLG
 Q3JInsYjc7mOtijdfivln4sZKbZjW8oP1Sud/3FrXfLEyS6WiZPDSE82H17ZpZrc
 q+tlGXziXLJKo8G3jcjCDGvM0GsMDZZe5LJQ7Ax28K73I0o9N7JTbC1cePBg3AWz
 YW7tWS3DQ9aNfQOum6aCFv9Nu9RadsIVOqMCDdlULbe7fCtK0PeIBxdgLZQOBauM
 Z5+fCwtfiW6VASnz+UFo+n6V5yzmOEsJToem1a8+UySnlKuO+NkdACBOhCxfditj
 5nfOmj1rwoTKacc0jqTZ1twVBfVDSa3PYEqFf9e9DYx17ZjKEFZ0FV9NRX3EZnaN
 n5Z/Pw+jWMDL1FXFBMNfHUpfmq1spAul+h2+OMMc7+91v1TmRI4Xbeu1jWJeYv+S
 0woqfOEHhB+0a6vgxhsXg2SWkyTGHGq/UdX0J4tvN2txKwde8zi2f2TIDdyDhSw9
 YWghWp9FuiEuKJ8UgRX5
 =9fZu
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell.

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  module: Add missing newline in printk call.
  module: fix coding style
  export: declare ksymtab symbols
  module.h: Remove unnecessary semicolon
  params: improve standard definitions
  Add Documentation/module-signing.txt file
2014-01-22 22:30:15 -08:00
Steven Rostedt (Red Hat) 71485c4589 tracing: Fix formatting of trace README file
Fix the formatting of the README file in the trace debugfs to fit in
an 80 character window.

Also add a comment about the event trigger counter with regards to
traceon and traceoff.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-23 00:10:04 -05:00
Tom Zanussi 26f255646e tracing/README: Add event file usage to tracing mini-HOWTO
It would be useful to have a cheat-sheet for everything under
tracing/events/ alongside the existing text describing the other files
in the tracing/ dir.

Add short descriptions of the directories and files under events/
along with examples, similar to the existing text for the other files
in tracing/.

Also clean up a few minor alignment problems noticed when adding the
new text.

Link: http://lkml.kernel.org/r/1389993104.3040.445.camel@empanada

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-22 23:06:57 -05:00
Linus Torvalds 60eaa0190f This pull request has a new feature to ftrace, namely the trace event
triggers by Tom Zanussi. A trigger is a way to enable an action when an
 event is hit. The actions are:
 
  o  trace on/off - enable or disable tracing
  o  snapshot     - save the current trace buffer in the snapshot
  o  stacktrace   - dump the current stack trace to the ringbuffer
  o  enable/disable events - enable or disable another event
 
 Namhyung Kim added updates to the tracing uprobes code. Having the
 uprobes add support for fetch methods.
 
 The rest are various bug fixes with the new code, and minor ones for
 the old code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJS3Z9fAAoJEKQekfcNnQGuFf0H/0CteaN+BJjpif6Tnxia15Sp
 pcftzU0lgqfNzsfitmbjiVTgXWqCghoZo8UI9tQZvBZ9wmDIxeXQR73uoBgVlSCQ
 ovyBO/R8r+lq+7EsDCwntZvrLbcdn6s/jzoruRvt7r35ghK5pH81DNR1BOzTQBhW
 x+361Xtc13aok7N7JN8KR96VDUP9f8KU6PWqJ5lgS2Zl+wbVw6b0p8OV8IMCHczP
 MdYrx8y4Jv4QWW7rMShAAVBe9qJQ56JWiWA17ysa4kY8BkKQ7QtlEFr+r1YY0nX5
 67brXiL8u0NFzRx5y2VRpGc25BbImnVBFpoLQ5Itluq9OdZE3aOQubzXlY70R6g=
 =Hkho
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This pull request has a new feature to ftrace, namely the trace event
  triggers by Tom Zanussi.  A trigger is a way to enable an action when
  an event is hit.  The actions are:

   o  trace on/off - enable or disable tracing
   o  snapshot     - save the current trace buffer in the snapshot
   o  stacktrace   - dump the current stack trace to the ringbuffer
   o  enable/disable events - enable or disable another event

  Namhyung Kim added updates to the tracing uprobes code.  Having the
  uprobes add support for fetch methods.

  The rest are various bug fixes with the new code, and minor ones for
  the old code"

* tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (38 commits)
  tracing: Fix buggered tee(2) on tracing_pipe
  tracing: Have trace buffer point back to trace_array
  ftrace: Fix synchronization location disabling and freeing ftrace_ops
  ftrace: Have function graph only trace based on global_ops filters
  ftrace: Synchronize setting function_trace_op with ftrace_trace_function
  tracing: Show available event triggers when no trigger is set
  tracing: Consolidate event trigger code
  tracing: Fix counter for traceon/off event triggers
  tracing: Remove double-underscore naming in syscall trigger invocations
  tracing/kprobes: Add trace event trigger invocations
  tracing/probes: Fix build break on !CONFIG_KPROBE_EVENT
  tracing/uprobes: Add @+file_offset fetch method
  uprobes: Allocate ->utask before handler_chain() for tracing handlers
  tracing/uprobes: Add support for full argument access methods
  tracing/uprobes: Fetch args before reserving a ring buffer
  tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg()
  tracing/probes: Implement 'memory' fetch method for uprobes
  tracing/probes: Add fetch{,_size} member into deref fetch method
  tracing/probes: Move 'symbol' fetch method to kprobes
  tracing/probes: Implement 'stack' fetch method for uprobes
  ...
2014-01-22 16:35:21 -08:00
Linus Torvalds df32e43a54 Merge branch 'akpm' (incoming from Andrew)
Merge first patch-bomb from Andrew Morton:

 - a couple of misc things

 - inotify/fsnotify work from Jan

 - ocfs2 updates (partial)

 - about half of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  mm/migrate: remove unused function, fail_migrate_page()
  mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages
  mm/migrate: correct failure handling if !hugepage_migration_support()
  mm/migrate: add comment about permanent failure path
  mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure
  mm: compaction: reset scanner positions immediately when they meet
  mm: compaction: do not mark unmovable pageblocks as skipped in async compaction
  mm: compaction: detect when scanners meet in isolate_freepages
  mm: compaction: reset cached scanner pfn's before reading them
  mm: compaction: encapsulate defer reset logic
  mm: compaction: trace compaction begin and end
  memcg, oom: lock mem_cgroup_print_oom_info
  sched: add tracepoints related to NUMA task migration
  mm: numa: do not automatically migrate KSM pages
  mm: numa: trace tasks that fail migration due to rate limiting
  mm: numa: limit scope of lock for NUMA migrate rate limiting
  mm: numa: make NUMA-migrate related functions static
  lib/show_mem.c: show num_poisoned_pages when oom
  mm/hwpoison: add '#' to hwpoison_inject
  mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter
  ...
2014-01-21 19:05:45 -08:00
Linus Torvalds f075e0f699 Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "The bulk of changes are cleanups and preparations for the upcoming
  kernfs conversion.

   - cgroup_event mechanism which is and will be used only by memcg is
     moved to memcg.

   - pidlist handling is updated so that it can be served by seq_file.

     Also, the list is not sorted if sane_behavior.  cgroup
     documentation explicitly states that the file is not sorted but it
     has been for quite some time.

   - All cgroup file handling now happens on top of seq_file.  This is
     to prepare for kernfs conversion.  In addition, all operations are
     restructured so that they map 1-1 to kernfs operations.

   - Other cleanups and low-pri fixes"

* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits)
  cgroup: trivial style updates
  cgroup: remove stray references to css_id
  doc: cgroups: Fix typo in doc/cgroups
  cgroup: fix fail path in cgroup_load_subsys()
  cgroup: fix missing unlock on error in cgroup_load_subsys()
  cgroup: remove for_each_root_subsys()
  cgroup: implement for_each_css()
  cgroup: factor out cgroup_subsys_state creation into create_css()
  cgroup: combine css handling loops in cgroup_create()
  cgroup: reorder operations in cgroup_create()
  cgroup: make for_each_subsys() useable under cgroup_root_mutex
  cgroup: css iterations and css_from_dir() are safe under cgroup_mutex
  cgroup: unify pidlist and other file handling
  cgroup: replace cftype->read_seq_string() with cftype->seq_show()
  cgroup: attach cgroup_open_file to all cgroup files
  cgroup: generalize cgroup_pidlist_open_file
  cgroup: unify read path so that seq_file is always used
  cgroup: unify cgroup_write_X64() and cgroup_write_string()
  cgroup: remove cftype->read(), ->read_map() and ->write()
  hugetlb_cgroup: convert away from cftype->read()
  ...
2014-01-21 17:51:34 -08:00
Linus Torvalds 4a2829b976 Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue update from Tejun Heo:
 "Just one patch to add destroy_work_on_stack() annotations to help
  debugobj debugging"

* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
2014-01-21 17:46:31 -08:00
Mel Gorman 286549dcaf sched: add tracepoints related to NUMA task migration
This patch adds three tracepoints
 o trace_sched_move_numa	when a task is moved to a node
 o trace_sched_swap_numa	when a task is swapped with another task
 o trace_sched_stick_numa	when a numa-related migration fails

The tracepoints allow the NUMA scheduler activity to be monitored and the
following high-level metrics can be calculated

 o NUMA migrated stuck	 nr trace_sched_stick_numa
 o NUMA migrated idle	 nr trace_sched_move_numa
 o NUMA migrated swapped nr trace_sched_swap_numa
 o NUMA local swapped	 trace_sched_swap_numa src_nid == dst_nid (should never happen)
 o NUMA remote swapped	 trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped)
 o NUMA group swapped	 trace_sched_swap_numa src_ngid == dst_ngid
			 Maybe a small number of these are acceptable
			 but a high number would be a major surprise.
			 It would be even worse if bounces are frequent.
 o NUMA avg task migs.	 Average number of migrations for tasks
 o NUMA stddev task mig	 Self-explanatory
 o NUMA max task migs.	 Maximum number of migrations for a single task

In general the intent of the tracepoints is to help diagnose problems
where automatic NUMA balancing appears to be doing an excessive amount
of useless work.

[akpm@linux-foundation.org: remove semicolon-after-if, repair coding-style]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:48 -08:00
Santosh Shilimkar c2f69cdafe kernel/power/snapshot.c: use memblock apis for early memory allocations
Switch to memblock interfaces for early memory allocator instead of
bootmem allocator.  No functional change in beahvior than what it is in
current code from bootmem users points of view.

Archs already converted to NO_BOOTMEM now directly use memblock
interfaces instead of bootmem wrappers build on top of memblock.  And
the archs which still uses bootmem, these new apis just fallback to
exiting bootmem APIs.

Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Paul Walmsley <paul@pwsan.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:47 -08:00
Santosh Shilimkar 9da791dfab kernel/printk/printk.c: use memblock apis for early memory allocations
Switch to memblock interfaces for early memory allocator instead of
bootmem allocator.  No functional change in beahvior than what it is in
current code from bootmem users points of view.

Archs already converted to NO_BOOTMEM now directly use memblock
interfaces instead of bootmem wrappers build on top of memblock.  And
the archs which still uses bootmem, these new apis just fallback to
exiting bootmem APIs.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Paul Walmsley <paul@pwsan.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:47 -08:00
Oleg Nesterov 0c740d0afc introduce for_each_thread() to replace the buggy while_each_thread()
while_each_thread() and next_thread() should die, almost every lockless
usage is wrong.

1. Unless g == current, the lockless while_each_thread() is not safe.

   while_each_thread(g, t) can loop forever if g exits, next_thread()
   can't reach the unhashed thread in this case. Note that this can
   happen even if g is the group leader, it can exec.

2. Even if while_each_thread() itself was correct, people often use
   it wrongly.

   It was never safe to just take rcu_read_lock() and loop unless
   you verify that pid_alive(g) == T, even the first next_thread()
   can point to the already freed/reused memory.

This patch adds signal_struct->thread_head and task->thread_node to
create the normal rcu-safe list with the stable head.  The new
for_each_thread(g, t) helper is always safe under rcu_read_lock() as
long as this task_struct can't go away.

Note: of course it is ugly to have both task_struct->thread_node and the
old task_struct->thread_group, we will kill it later, after we change
the users of while_each_thread() to use for_each_thread().

Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node.  But
we can't do this right now because this will lead to subtle behavioural
changes.  For example, do/while_each_thread() always sees at least one
task, while for_each_thread() can do nothing if the whole thread group
has died.  Or thread_group_empty(), currently its semantics is not clear
unless thread_group_leader(p) and we need to audit the callers before we
can change it.

So this patch adds the new interface which has to coexist with the old
one for some time, hopefully the next changes will be more or less
straightforward and the old one will go away soon.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:46 -08:00
Jerome Marchand 49f0ce5f92 mm: add overcommit_kbytes sysctl variable
Some applications that run on HPC clusters are designed around the
availability of RAM and the overcommit ratio is fine tuned to get the
maximum usage of memory without swapping.  With growing memory, the
1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
for these workload (on a 2TB machine it represents no less than 20GB).

This patch adds the new overcommit_kbytes sysctl variable that allow a
much finer grain.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:44 -08:00
Jan Kara 56b27cf603 fsnotify: remove pointless NULL initializers
We usually rely on the fact that struct members not specified in the
initializer are set to NULL.  So do that with fsnotify function pointers
as well.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Jan Kara 83c4c4b0a3 fsnotify: remove .should_send_event callback
After removing event structure creation from the generic layer there is
no reason for separate .should_send_event and .handle_event callbacks.
So just remove the first one.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Jan Kara 7053aee26a fsnotify: do not share events between notification groups
Currently fsnotify framework creates one event structure for each
notification event and links this event into all interested notification
groups.  This is done so that we save memory when several notification
groups are interested in the event.  However the need for event
structure shared between inotify & fanotify bloats the event structure
so the result is often higher memory consumption.

Another problem is that fsnotify framework keeps path references with
outstanding events so that fanotify can return open file descriptors
with its events.  This has the undesirable effect that filesystem cannot
be unmounted while there are outstanding events - a regression for
inotify compared to a situation before it was converted to fsnotify
framework.  For fanotify this problem is hard to avoid and users of
fanotify should kind of expect this behavior when they ask for file
descriptors from notified files.

This patch changes fsnotify and its users to create separate event
structure for each group.  This allows for much simpler code (~400 lines
removed by this patch) and also smaller event structures.  For example
on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
additional space for file name, additional 24 bytes for second and each
subsequent group linking the event, and additional 32 bytes for each
inotify group for private data.  After the conversion inotify event
consumes 48 bytes plus space for file name which is considerably less
memory unless file names are long and there are several groups
interested in the events (both of which are uncommon).  Fanotify event
fits in 56 bytes after the conversion (fanotify doesn't care about file
names so its events don't have to have it allocated).  A win unless
there are four or more fanotify groups interested in the event.

The conversion also solves the problem with unmount when only inotify is
used as we don't have to grab path references for inotify events.

[hughd@google.com: fanotify: fix corruption preventing startup]
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Tetsuo Handa 22e669568d module: Add missing newline in printk call.
Add missing \n and also follow commit bddb12b3 "kernel/module.c: use pr_foo()".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-01-21 09:59:16 +10:30
Linus Torvalds 6c64614356 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes from Ingo Molnar:
  - ARM clocksource/clockevent improvements and fixes
  - generic timekeeping updates: TAI fixes/improvements, cleanups
  - Posix cpu timer cleanups and improvements
  - dynticks updates: full dynticks bugfixes, optimizations and cleanups

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  clocksource: Timer-sun5i: Switch to sched_clock_register()
  timekeeping: Remove comment that's mostly out of date
  rtc-cmos: Add an alarm disable quirk
  timekeeper: fix comment typo for tk_setup_internals()
  timekeeping: Fix missing timekeeping_update in suspend path
  timekeeping: Fix CLOCK_TAI timer/nanosleep delays
  tick/timekeeping: Call update_wall_time outside the jiffies lock
  timekeeping: Avoid possible deadlock from clock_was_set_delayed
  timekeeping: Fix potential lost pv notification of time change
  timekeeping: Fix lost updates to tai adjustment
  clocksource: sh_cmt: Add clk_prepare/unprepare support
  clocksource: bcm_kona_timer: Remove unused bcm_timer_ids
  clocksource: vt8500: Remove deprecated IRQF_DISABLED
  clocksource: tegra: Remove deprecated IRQF_DISABLED
  clocksource: misc drivers: Remove deprecated IRQF_DISABLED
  clocksource: sh_mtu2: Remove unnecessary platform_set_drvdata()
  clocksource: sh_tmu: Remove unnecessary platform_set_drvdata()
  clocksource: armada-370-xp: Enable timer divider only when needed
  clocksource: clksrc-of: Warn if no clock sources are found
  clocksource: orion: Switch to sched_clock_register()
  ...
2014-01-20 11:34:26 -08:00
Linus Torvalds a0fa1dd3cd Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:

 - Add the initial implementation of SCHED_DEADLINE support: a real-time
   scheduling policy where tasks that meet their deadlines and
   periodically execute their instances in less than their runtime quota
   see real-time scheduling and won't miss any of their deadlines.
   Tasks that go over their quota get delayed (Available to privileged
   users for now)

 - Clean up and fix preempt_enable_no_resched() abuse all around the
   tree

 - Do sched_clock() performance optimizations on x86 and elsewhere

 - Fix and improve auto-NUMA balancing

 - Fix and clean up the idle loop

 - Apply various cleanups and fixes

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  sched: Fix __sched_setscheduler() nice test
  sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
  sched: Fix up attr::sched_priority warning
  sched: Fix up scheduler syscall LTP fails
  sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
  sched/core: Fix htmldocs warnings
  sched/deadline: No need to check p if dl_se is valid
  sched/deadline: Remove unused variables
  sched/deadline: Fix sparse static warnings
  m68k: Fix build warning in mac_via.h
  sched, thermal: Clean up preempt_enable_no_resched() abuse
  sched, net: Fixup busy_loop_us_clock()
  sched, net: Clean up preempt_enable_no_resched() abuse
  sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
  sched/preempt, locking: Rework local_bh_{dis,en}able()
  sched/clock, x86: Avoid a runtime condition in native_sched_clock()
  sched/clock: Fix up clear_sched_clock_stable()
  sched/clock, x86: Use a static_key for sched_clock_stable
  sched/clock: Remove local_irq_disable() from the clocks
  sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
  ...
2014-01-20 10:42:08 -08:00
Linus Torvalds 9326657abe Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Add Intel RAPL energy counter support (Stephane Eranian)
   - Clean up uprobes (Oleg Nesterov)
   - Optimize ring-buffer writes (Peter Zijlstra)

  Tooling side changes, user visible:

   - 'perf diff':
     - Add column colouring improvements (Ramkumar Ramachandra)

  - 'perf kvm':
     - Add guest related improvements, including allowing to specify a
       directory with guest specific /proc information (Dongsheng Yang)
     - Add shell completion support (Ramkumar Ramachandra)
     - Add '-v' option (Dongsheng Yang)
     - Support --guestmount (Dongsheng Yang)

   - 'perf probe':
     - Support showing source code, asking for variables to be collected
       at probe time and other 'perf probe' operations that use DWARF
       information.

       This supports only binaries with debugging information at this
       time, detached debuginfo (aka debuginfo packages) support should
       come in later patches (Masami Hiramatsu)

   - 'perf record':
     - Rename --no-delay option to --no-buffering, better reflecting its
       purpose and freeing up '--delay' to take the place of
       '--initial-delay', so that 'record' and 'stat' are consistent
       (Arnaldo Carvalho de Melo)
     - Default the -t/--thread option to no inheritance (Adrian Hunter)
     - Make per-cpu mmaps the default (Adrian Hunter)

   - 'perf report':
     - Improve callchain processing performance (Frederic Weisbecker)
     - Retain bfd reference to lookup source line numbers, greatly
       optimizing, among other use cases, 'perf report -s srcline'
       (Adrian Hunter)
     - Improve callchain processing performance even more (Namhyung Kim)
     - Add a perf.data file header window in the 'perf report' TUI,
       associated with the 'i' hotkey, providing a counterpart to the
       --header option in the stdio UI (Namhyung Kim)

   - 'perf script':
     - Add an option in 'perf script' to print the source line number
       (Adrian Hunter)
     - Add --header/--header-only options to 'script' and 'report', the
       default is not tho show the header info, but as this has been the
       default for some time, leave a single line explaining how to
       obtain that information (Jiri Olsa)
     - Add options to show comm, fork, exit and mmap PERF_RECORD_ events
       (Namhyung Kim)
     - Print callchains and symbols if they exist (David Ahern)

   - 'perf timechart'
     - Add backtrace support to CPU info
     - Print pid along the name
     - Add support for CPU topology
     - Add new option --highlight'ing threads, be it by name or, if a
       numeric value is provided, that run more than given duration
       (Stanislav Fomichev)

   - 'perf top':
     - Make 'perf top -g' refer to callchains, for consistency with
       other tools (David Ahern)

   - 'perf trace':
     - Handle old kernels where the "raw_syscalls" tracepoints were
       called plain "syscalls" (David Ahern)
     - Remove thread summary coloring, by Pekka Enberg.
     - Honour -m option in 'trace', the tool was offering the option to
       set the mmap size, but wasn't using it when doing the actual mmap
       on the events file descriptors (Jiri Olsa)

   - generic:
     - Backport libtraceevent plugin support (trace-cmd repository, with
       plugins for jbd2, hrtimer, kmem, kvm, mac80211, sched_switch,
       function, xen, scsi, cfg80211 (Jiri Olsa)
     - Print session information only if --stdio is given (Namhyung Kim)

  Tooling side changes, developer visible (plumbing):

   - Improve 'perf probe' exit path, release resources (Masami
     Hiramatsu)
   - Improve libtraceevent plugins exit path, allowing the registering
     of an unregister handler to be called at exit time (Namhyung Kim)
   - Add an alias to the build test makefile (make -C tools/perf
     build-test) (Namhyung Kim)
   - Get rid of die() and friends (good riddance!) in libtraceevent
     (Namhyung Kim)
   - Fix cross build problems related to pkgconfig and CROSS_COMPILE not
     being propagated to the feature tests, leading to features being
     tested in the host and then being enabled on the target (Mark
     Rutland)
   - Improve forked workload error reporting by sending the errno in the
     signal data queueing integer field, using sigqueue and by doing the
     signal setup in the evlist methods, removing open coded equivalents
     in various tools (Arnaldo Carvalho de Melo)
   - Do more auto exit cleanup chores in the 'evlist' destructor, so
     that the tools don't have to all do that sequence (Arnaldo Carvalho
     de Melo)
   - Pack 'struct perf_session_env' and 'struct trace' (Arnaldo Carvalho
     de Melo)
   - Add test for building detached source tarballs (Arnaldo Carvalho de
     Melo)
   - Move some header files (tools/perf/ to tools/include/ to make them
     available to other tools/ dwelling codebases (Namhyung Kim)
   - Move logic to warn about kptr_restrict'ed kernels to separate
     function in 'report' (Arnaldo Carvalho de Melo)
   - Move hist browser selection code to separate function (Arnaldo
     Carvalho de Melo)
   - Move histogram entries collapsing to separate function (Arnaldo
     Carvalho de Melo)
   - Introduce evlist__for_each() & friends (Arnaldo Carvalho de Melo)
   - Automate setup of FEATURE_CHECK_(C|LD)FLAGS-all variables (Jiri
     Olsa)
   - Move arch setup into seprate Makefile (Jiri Olsa)
   - Make libtraceevent install target quieter (Jiri Olsa)
   - Make tests/make output more compact (Jiri Olsa)
   - Ignore generated files in feature-checks (Chunwei Chen)
   - Introduce pevent_filter_strerror() in libtraceevent, similar in
     purpose to libc's strerror() function (Namhyung Kim)
   - Use perf_data_file methods to write output file in 'record' and
     'inject' (Jiri Olsa)
   - Use pr_*() functions where applicable in 'report' (Namhyumg Kim)
   - Add 'machine' 'addr_location' struct to have full picture (machine,
     thread, map, symbol, addr) for a (partially) resolved address,
     reducing function signatures (Arnaldo Carvalho de Melo)
   - Reduce code duplication in the histogram entry creation/insertion
     (Arnaldo Carvalho de Melo)
   - Auto allocate annotation histogram data structures (Arnaldo
     Carvalho de Melo)
   - No need to test against NULL before calling free, also set freed
     memory in struct pointers to NULL, to help fixing use after free
     bugs (Arnaldo Carvalho de Melo)
   - Rename some struct DSO binary_type related members and methods, to
     clarify its purpose and need for differentiation (symtab_type, ie
     one is about the files .text, CFI, etc, i.e.  its binary contents,
     and the other is about where the symbol table came from (Arnaldo
     Carvalho de Melo)
   - Convert to new topic libraries, starting with an API one (sysfs,
     debugfs, etc), renaming liblk in the process (Borislav Petkov)
   - Get rid of some more panic() like error handling in libtraceevent.
     (Namhyung Kim)
   - Get rid of panic() like calls in libtraceevent (Namyung Kim)
   - Start carving out symbol parsing routines (perf, just moving
     routines to topic files in tools/lib/symbol/, tools that want to
     use it need to integrate it directly, ie no
     tools/lib/symbol/Makefile is provided (Arnaldo Carvalho de Melo)
   - Assorted refactoring patches, moving code around and adding utility
     evlist methods that will be used in the IPT patchset (Adrian
     Hunter)
   - Assorted mmap_pages handling fixes (Adrian Hunter)
   - Several man pages typo fixes (Dongsheng Yang)
   - Get rid of several die() calls in libtraceevent (Namhyung Kim)
   - Use basename() in a more robust way, to avoid problems related to
     different system library implementations for that function
     (Stephane Eranian)
   - Remove open coded management of short_name_allocated member (Adrian
     Hunter)
   - Several cleanups in the "dso" methods, constifying some parameters
     and renaming some fields to clarify its purpose (Arnaldo Carvalho
     de Melo)
   - Add per-feature check flags, fixing libunwind related build
     problems on some architectures (Jean Pihet)
   - Do not disable source line lookup just because of one failure.
     (Adrian Hunter)
   - Several 'perf kvm' man page corrections (Dongsheng Yang)
   - Correct the message in feature-libnuma checking, swowing the right
     devel package names for various distros (Dongsheng Yang)
   - Polish 'readn()' function and introduce its counterpart,
     'writen()' (Jiri Olsa)
   - Start moving timechart state from global variables to a 'perf_tool'
     derived 'timechart' struct (Arnaldo Carvalho de Melo)

  ... and lots of fixes and improvements I forgot to list"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits)
  perf tools: Remove unnecessary callchain cursor state restore on unmatch
  perf callchain: Spare double comparison of callchain first entry
  perf tools: Do proper comm override error handling
  perf symbols: Export elf_section_by_name and reuse
  perf probe: Release all dynamically allocated parameters
  perf probe: Release allocated probe_trace_event if failed
  perf tools: Add 'build-test' make target
  tools lib traceevent: Unregister handler when xen plugin is unloaded
  tools lib traceevent: Unregister handler when scsi plugin is unloaded
  tools lib traceevent: Unregister handler when jbd2 plugin is is unloaded
  tools lib traceevent: Unregister handler when cfg80211 plugin is unloaded
  tools lib traceevent: Unregister handler when mac80211 plugin is unloaded
  tools lib traceevent: Unregister handler when sched_switch plugin is unloaded
  tools lib traceevent: Unregister handler when kvm plugin is unloaded
  tools lib traceevent: Unregister handler when kmem plugin is unloaded
  tools lib traceevent: Unregister handler when hrtimer plugin is unloaded
  tools lib traceevent: Unregister handler when function plugin is unloaded
  tools lib traceevent: Add pevent_unregister_print_function()
  tools lib traceevent: Add pevent_unregister_event_handler()
  tools lib traceevent: fix pointer-integer size mismatch
  ...
2014-01-20 10:28:30 -08:00
Linus Torvalds a693c46e14 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 - add RCU torture scripts/tooling
 - static analysis improvements
 - update RCU documentation
 - miscellaneous fixes

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  rcu: Remove "extern" from function declarations in kernel/rcu/rcu.h
  rcu: Remove "extern" from function declarations in include/linux/*rcu*.h
  rcu/torture: Dynamically allocate SRCU output buffer to avoid overflow
  rcu: Don't activate RCU core on NO_HZ_FULL CPUs
  rcu: Warn on allegedly impossible rcu_read_unlock_special() from irq
  rcu: Add an RCU_INITIALIZER for global RCU-protected pointers
  rcu: Make rcu_assign_pointer's assignment volatile and type-safe
  bonding: Use RCU_INIT_POINTER() for better overhead and for sparse
  rcu: Add comment on evaluate-once properties of rcu_assign_pointer().
  rcu: Provide better diagnostics for blocking in RCU callback functions
  rcu: Improve SRCU's grace-period comments
  rcu: Fix CONFIG_RCU_FANOUT_EXACT for odd fanout/leaf values
  rcu: Fix coccinelle warnings
  rcutorture: Stop tracking FSF's postal address
  rcutorture: Move checkarg to functions.sh
  rcutorture: Flag errors and warnings with color coding
  rcutorture: Record results from repeated runs of the same test scenario
  rcutorture: Test summary at end of run with less chattiness
  rcutorture: Update comment in kvm.sh listing typical RCU trace events
  rcutorture: Add tracing-enabled version of TREE08
  ...
2014-01-20 10:25:12 -08:00
Linus Torvalds 6ffbe7d1fa Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 - futex performance increases: larger hashes, smarter wakeups
 - mutex debugging improvements
 - lots of SMP ordering documentation updates
 - introduce the smp_load_acquire(), smp_store_release() primitives.
   (There are WIP patches that make use of them - not yet merged)
 - lockdep micro-optimizations
 - lockdep improvement: better cover IRQ contexts
 - liblockdep at last. We'll continue to monitor how useful this is

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  futexes: Fix futex_hashsize initialization
  arch: Re-sort some Kbuild files to hopefully help avoid some conflicts
  futexes: Avoid taking the hb->lock if there's nothing to wake up
  futexes: Document multiprocessor ordering guarantees
  futexes: Increase hash table size for better performance
  futexes: Clean up various details
  arch: Introduce smp_load_acquire(), smp_store_release()
  arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h
  arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h
  locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE
  mutexes: Give more informative mutex warning in the !lock->owner case
  powerpc: Full barrier for smp_mb__after_unlock_lock()
  rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods
  Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK
  locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier
  Documentation/memory-barriers.txt: Document ACCESS_ONCE()
  Documentation/memory-barriers.txt: Prohibit speculative writes
  Documentation/memory-barriers.txt: Add long atomic examples to memory-barriers.txt
  Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt
  Revert "smp/cpumask: Make CONFIG_CPUMASK_OFFSTACK=y usable without debug dependency"
  ...
2014-01-20 10:23:08 -08:00
Linus Torvalds 897aea303f Merge branch 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core debug changes from Ingo Molnar:
 "Currently there are two methods to set the panic_timeout: via
  'panic=X' boot commandline option, or via /proc/sys/kernel/panic.

  This tree adds a third panic_timeout configuration method:
  configuration via Kconfig, via CONFIG_PANIC_TIMEOUT=X - useful to
  distros that generally want their kernel defaults to come with the
  .config.

  CONFIG_PANIC_TIMEOUT defaults to 0, which was the previous default
  value of panic_timeout.

  Doing that unearthed a few arch trickeries regarding arch-special
  panic_timeout values and related complications - hopefully all
  resolved to the satisfaction of everyone"

* 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  powerpc: Clean up panic_timeout usage
  MIPS: Remove panic_timeout settings
  panic: Make panic_timeout configurable
2014-01-20 10:22:12 -08:00
Al Viro 92fdd98cf8 tracing: Fix buggered tee(2) on tracing_pipe
In kernel/trace/trace.c we have this:
static void tracing_pipe_buf_release(struct pipe_inode_info *pipe,
                                     struct pipe_buffer *buf)
{
        __free_page(buf->page);
}
static const struct pipe_buf_operations tracing_pipe_buf_ops = {
        .can_merge              = 0,
        .map                    = generic_pipe_buf_map,
        .unmap                  = generic_pipe_buf_unmap,
        .confirm                = generic_pipe_buf_confirm,
        .release                = tracing_pipe_buf_release,
        .steal                  = generic_pipe_buf_steal,
        .get                    = generic_pipe_buf_get,
};
with
void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
{
        page_cache_get(buf->page);
}

and I don't see anything that would've prevented tee(2) called on the pipe
that got stuff spliced into it from that sucker.  ->ops->get() will be
called, then buf gets copied into target pipe's ->bufs[] and eventually
readers get to both copies of the buffer.  With
	get_page(page)
	look at that page
	__free_page(page)
	look at that page
	__free_page(page)
which is not a good thing, to put it mildly.  AFAICS, that ought to use
the normal generic_pipe_buf_release() (aka page_cache_release(buf->page)),
shouldn't it?

[
 SDR - As trace_pipe just allocates the page with alloc_page(GFP_KERNEL),
  and doesn't do anything special with it (no LRU logic). The __free_page()
  should be fine, as it wont actually free a page with reference count.
  Maybe there's a chance to leak memory? Anyway, This change is at a minimum
  good for being symmetric with generic_pipe_buf_get, it is fine to add.
]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ SDR - Removed no longer used tracing_pipe_buf_release ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-19 16:53:13 -05:00
SeongJae Park dd4b0a4676 cgroup: trivial style updates
* Place newline before function opening brace in cgroup_kill_sb().

* Insert space before assignment in attach_task_by_pid()

tj: merged two patches into one.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-01-18 08:56:11 -05:00
Linus Torvalds 48ba620aab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
 "This is a set of 3 regression fixes.

  This fixes /proc/mounts when using "ip netns add <netns>" to display
  the actual mount point.

  This fixes a regression in clone that broke lxc-attach.

  This fixes a regression in the permission checks for mounting /proc
  that made proc unmountable if binfmt_misc was in use.  Oops.

  My apologies for sending this pull request so late.  Al Viro gave
  interesting review comments about the d_path fix that I wanted to
  address in detail before I sent this pull request.  Unfortunately a
  bad round of colds kept from addressing that in detail until today.
  The executive summary of the review was:

  Al: Is patching d_path really sufficient?
      The prepend_path, d_path, d_absolute_path, and __d_path family of
      functions is a really mess.

  Me: Yes, patching d_path is really sufficient.  Yes, the code is mess.
      No it is not appropriate to rewrite all of d_path for a regression
      that has existed for entirely too long already, when a two line
      change will do"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  vfs: Fix a regression in mounting proc
  fork:  Allow CLONE_PARENT after setns(CLONE_NEWPID)
  vfs: In d_path don't call d_dname on a mount point
2014-01-17 17:29:36 -08:00
Richard Guy Briggs 8626877b52 audit: fix location of __net_initdata for audit_net_ops
Fixup caught by checkpatch.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-17 17:14:32 -05:00
Eric Paris 4f066328ab audit: remove pr_info for every network namespace
A message about creating the audit socket might be fine at startup, but
a pr_info for every single network namespace created on a system isn't
useful.

Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-17 17:04:38 -05:00
Peter Zijlstra eaad45132c sched: Fix __sched_setscheduler() nice test
With the introduction of sched_attr::sched_nice we need to check
if we've got permission to actually change the nice value.

Daniel found that can_nice() would always fail; and upon
inspection it turns out that can_nice() only tests to see if we
can lower the nice value, but it doesn't validate if we're
lowering or not.

Therefore amend the test to only call can_nice() when we lower
the nice value.

Reported-and-Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: raistlin@linux.it
Cc: juri.lelli@gmail.com
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/20140116165425.GA9481@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 18:07:08 +01:00
Heiko Carstens 63b1a81699 futexes: Fix futex_hashsize initialization
"futexes: Increase hash table size for better performance"
introduces a new alloc_large_system_hash() call.

alloc_large_system_hash() however may allocate less memory than
requested, e.g. limited by MAX_ORDER.

Hence pass a pointer to alloc_large_system_hash() which will
contain the hash shift when the function returns. Afterwards
correctly set futex_hashsize.

Fixes a crash on s390 where the requested allocation size was
4MB but only 1MB was allocated.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Link: http://lkml.kernel.org/r/20140116135450.GA4345@osiris
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 15:14:32 +01:00
Ingo Molnar 860fc2f264 Merge branch 'perf/urgent' into perf/core
Pick up the latest fixes, refresh the development tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:33:30 +01:00
Peter Zijlstra 7479f3c9cf sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
I noticed the new sched_{set,get}attr() calls didn't properly deal
with the SCHED_RESET_ON_FORK hack.

Instead of propagating the flags in high bits nonsense use the brand
spanking new attr::sched_flags field.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Link: http://lkml.kernel.org/r/20140115162242.GJ31570@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:17 +01:00
Peter Zijlstra 0bb040a443 sched: Fix up attr::sched_priority warning
Fengguang Wu reported the following build warning:

  > kernel/sched/core.c:3067 __sched_setscheduler() warn: unsigned 'attr->sched_priority' is never less than zero.

Since it doesn't make sense for attr::sched_priority to be negative,
remove the check, since we already test for an upper limit any actual
negative values passed in through the old param::sched_priority field
will still be detected.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/n/tip-fid9nalzii2r5voxtf4eh5kz@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:16 +01:00
Peter Zijlstra 39fd8fd22b sched: Fix up scheduler syscall LTP fails
Wu reported LTP failures:

  > ltp.sched_setparam02.1.TFAIL
  > ltp.sched_setparam02.2.TFAIL
  > ltp.sched_setparam02.3.TFAIL
  > ltp.sched_setparam03.1.TFAIL

There were 2 things wrong; firstly __setscheduler() failed on
sched_setparam()'s policy = -1, fix that by reading from p->policy in
that case.

Secondly, getparam() (and getattr()) would still report !0
sched_priority for !FIFO/RR tasks after having been such. So
unconditionally set p->rt_priority.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/20140115153320.GH31570@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:15 +01:00
Peter Zijlstra e3de300d12 sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
Previously sched_setscheduler() and sched_setparam() would not affect
the nice value of a task, restore this behaviour.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: raistlin@linux.it
Cc: juri.lelli@gmail.com
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/20140115113015.GB31570@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:14 +01:00
Juri Lelli 5778fccf36 sched/core: Fix htmldocs warnings
Fengguang Wu's kbuild test robot reported the following new htmldocs warnings:

  >>> Warning(kernel/sched/core.c:3380): No description found for parameter 'uattr'
  >>> Warning(kernel/sched/core.c:3380): Excess function parameter 'attr' description in 'sys_sched_setattr'
  >>> Warning(kernel/sched/core.c:3520): No description found for parameter 'uattr'
  >>> Warning(kernel/sched/core.c:3520): Excess function parameter 'attr' description in 'sys_sched_getattr'

The second argument to sys_sched_{setattr,getattr}() is named uattr (not attr).

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dario Faggioli <raistlin@linux.it>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/52D5552D.5000102@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:12 +01:00
Juri Lelli 71362650b5 sched/deadline: No need to check p if dl_se is valid
Dan Carpenter reported new 'Smatch' warnings:

  > tree:   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
  > head:   130816ce4d
  > commit: 1baca4ce16 [17/50] sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic
  >
  > kernel/sched/deadline.c:937 pick_next_task_dl() warn: variable dereferenced before check 'p' (see line 934)

BUG_ON() already fires if pick_next_dl_entity() doesn't return a valid
dl_se. No need to check if p is valid afterward.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Fixes: 1baca4ce16 ("sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic")
Link: http://lkml.kernel.org/r/52D54E25.6060100@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:11 +01:00
Peter Zijlstra d8bf52311e sched/deadline: Remove unused variables
fix these new sparse warnings:

  >> kernel/sched/core.c:305:14: sparse: symbol 'sysctl_sched_dl_period' was not declared. Should it be static?
  >> kernel/sched/core.c:306:5: sparse: symbol 'sysctl_sched_dl_runtime' was not declared. Should it be static?

Better still, they're completely unused so remove them.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Link: http://lkml.kernel.org/n/tip-ke0shkG7vMnzmcdqhhiymyem@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:10 +01:00
Fengguang Wu 88f1ebbc25 sched/deadline: Fix sparse static warnings
new sparse warnings:

  >> kernel/sched/cpudeadline.c:38:6: sparse: symbol 'cpudl_exchange' was not declared. Should it be static?
  >> kernel/sched/cpudeadline.c:46:6: sparse: symbol 'cpudl_heapify' was not declared. Should it be static?
  >> kernel/sched/cpudeadline.c:71:6: sparse: symbol 'cpudl_change_key' was not declared. Should it be static?
  >> kernel/sched/cpudeadline.c:195:15: sparse: memset with byte count of 163928

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Fixes: 6bfd6d72f5 ("sched/deadline: speed up SCHED_DEADLINE pushes with a push-heap")
Link: http://lkml.kernel.org/r/52d47f8c.EYJsA5+mELPBk4t6\%fengguang.wu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:03 +01:00
Linus Torvalds 85ce70fdf4 Merge branches 'sched-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler and timer fixes from Ingo Molnar:
 "Contains a fix for a scheduler bug that manifested itself as a 3D
  performance regression and a crash fix for the ARM Cadence TTC clock
  driver"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Calculate effective load even if local weight is 0

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: cadence_ttc: Fix mutex taken inside interrupt context
2014-01-16 08:33:21 +07:00
Kevin Hilman 8fe8ff09ce sched/nohz: Fix overflow error in scheduler_tick_max_deferment()
While calculating the scheduler tick max deferment, the delta is
converted from microseconds to nanoseconds through a multiplication
against NSEC_PER_USEC.

But this microseconds operand is an unsigned int, thus the result may
likely overflow. The result is cast to u64 but only once the operation
is completed, which is too late to avoid overflown result.

This is currently not a problem because the scheduler tick max deferment
is 1 second. But this may become an issue as we plan to make this
value tunable.

So lets fix this by casting the usecs value to u64 before multiplying by
NSECS_PER_USEC.

Also to prevent from this kind of mistake to happen again, move this
ad-hoc jiffies -> nsecs conversion to a new helper.

Signed-off-by: Kevin Hilman <khilman@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387315388-31676-2-git-send-email-khilman@linaro.org
[move ad-hoc conversion to jiffies_to_nsecs helper]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-16 00:08:12 +01:00
Alex Shi e9a2eb403b nohz_full: fix code style issue of tick_nohz_full_stop_tick
Code usually starts with 'tab' instead of 7 'space' in kernel

Signed-off-by: Alex Shi <alex.shi@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1386074112-30754-2-git-send-email-alex.shi@linaro.org
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-15 23:07:11 +01:00
Frederic Weisbecker 855a0fc30b nohz: Get timekeeping max deferment outside jiffies_lock
We don't need to fetch the timekeeping max deferment under the
jiffies_lock seqlock.

If the clocksource is updated concurrently while we stop the tick,
stop machine is called and the tick will be reevaluated again along with
uptodate jiffies and its related values.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387320692-28460-9-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-15 23:07:11 +01:00
Frederic Weisbecker 5acac1be49 tick: Rename tick_check_idle() to tick_irq_enter()
This makes the code more symetric against the existing tick functions
called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().

These function are also symetric as they mirror each other's action:
we start to account idle time on irq exit and we stop this accounting
on irq entry. Also the tick is stopped on irq exit and timekeeping
catches up with the tickless time elapsed until we reach irq entry.

This rename was suggested by Peter Zijlstra a long while ago but it
got forgotten in the mass.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-15 23:05:31 +01:00
Joe Perches 3e1d0bb622 audit: Convert int limit uses to u32
The equivalent uapi struct uses __u32 so make the kernel
uses u32 too.

This can prevent some oddities where the limit is
logged/emitted as a negative value.

Convert kstrtol to kstrtouint to disallow negative values.

Signed-off-by: Joe Perches <joe@perches.com>
[eparis: do not remove static from audit_default declaration]
2014-01-14 14:54:00 -05:00
Joe Perches d957f7b726 audit: Use more current logging style
Add pr_fmt to prefix "audit: " to output
Convert printk(KERN_<LEVEL> to pr_<level>
Coalesce formats
Use pr_cont
Move a brace after switch

Signed-off-by: Joe Perches <joe@perches.com>
2014-01-14 14:53:54 -05:00
Joe Perches b8dbc3241f audit: Use hex_byte_pack_upper
Using the generic kernel function causes the
object size to increase with gcc 4.8.1.

$ size kernel/audit.o*
   text	   data	    bss	    dec	    hex	filename
  18577	   6079	   8436	  33092	   8144	kernel/audit.o.new
  18579	   6015	   8420	  33014	   80f6	kernel/audit.o.old

Unsigned...
2014-01-14 14:53:50 -05:00