linux

Author	SHA1	Message	Date
Ingo Molnar	860fc2f264	Merge branch 'perf/urgent' into perf/core Pick up the latest fixes, refresh the development tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:33:30 +01:00
Peter Zijlstra	7479f3c9cf	sched: Move SCHED_RESET_ON_FORK into attr::sched_flags I noticed the new sched_{set,get}attr() calls didn't properly deal with the SCHED_RESET_ON_FORK hack. Instead of propagating the flags in high bits nonsense use the brand spanking new attr::sched_flags field. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Link: http://lkml.kernel.org/r/20140115162242.GJ31570@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:27:17 +01:00
Peter Zijlstra	0bb040a443	sched: Fix up attr::sched_priority warning Fengguang Wu reported the following build warning: > kernel/sched/core.c:3067 __sched_setscheduler() warn: unsigned 'attr->sched_priority' is never less than zero. Since it doesn't make sense for attr::sched_priority to be negative, remove the check, since we already test for an upper limit any actual negative values passed in through the old param::sched_priority field will still be detected. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Fixes: `d50dde5a10` ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/n/tip-fid9nalzii2r5voxtf4eh5kz@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:27:16 +01:00
Peter Zijlstra	39fd8fd22b	sched: Fix up scheduler syscall LTP fails Wu reported LTP failures: > ltp.sched_setparam02.1.TFAIL > ltp.sched_setparam02.2.TFAIL > ltp.sched_setparam02.3.TFAIL > ltp.sched_setparam03.1.TFAIL There were 2 things wrong; firstly __setscheduler() failed on sched_setparam()'s policy = -1, fix that by reading from p->policy in that case. Secondly, getparam() (and getattr()) would still report !0 sched_priority for !FIFO/RR tasks after having been such. So unconditionally set p->rt_priority. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Fixes: `d50dde5a10` ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/r/20140115153320.GH31570@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:27:15 +01:00
Peter Zijlstra	e3de300d12	sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls Previously sched_setscheduler() and sched_setparam() would not affect the nice value of a task, restore this behaviour. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: raistlin@linux.it Cc: juri.lelli@gmail.com Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Fixes: `d50dde5a10` ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/r/20140115113015.GB31570@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:27:14 +01:00
Juri Lelli	5778fccf36	sched/core: Fix htmldocs warnings Fengguang Wu's kbuild test robot reported the following new htmldocs warnings: >>> Warning(kernel/sched/core.c:3380): No description found for parameter 'uattr' >>> Warning(kernel/sched/core.c:3380): Excess function parameter 'attr' description in 'sys_sched_setattr' >>> Warning(kernel/sched/core.c:3520): No description found for parameter 'uattr' >>> Warning(kernel/sched/core.c:3520): Excess function parameter 'attr' description in 'sys_sched_getattr' The second argument to sys_sched_{setattr,getattr}() is named uattr (not attr). Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Dario Faggioli <raistlin@linux.it> Fixes: `d50dde5a10` ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/r/52D5552D.5000102@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:27:12 +01:00
Juri Lelli	71362650b5	sched/deadline: No need to check p if dl_se is valid Dan Carpenter reported new 'Smatch' warnings: > tree: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core > head: `130816ce4d` > commit: `1baca4ce16` [17/50] sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic > > kernel/sched/deadline.c:937 pick_next_task_dl() warn: variable dereferenced before check 'p' (see line 934) BUG_ON() already fires if pick_next_dl_entity() doesn't return a valid dl_se. No need to check if p is valid afterward. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Fixes: `1baca4ce16` ("sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic") Link: http://lkml.kernel.org/r/52D54E25.6060100@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:27:11 +01:00
Peter Zijlstra	d8bf52311e	sched/deadline: Remove unused variables fix these new sparse warnings: >> kernel/sched/core.c:305:14: sparse: symbol 'sysctl_sched_dl_period' was not declared. Should it be static? >> kernel/sched/core.c:306:5: sparse: symbol 'sysctl_sched_dl_runtime' was not declared. Should it be static? Better still, they're completely unused so remove them. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Link: http://lkml.kernel.org/n/tip-ke0shkG7vMnzmcdqhhiymyem@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:27:10 +01:00
Fengguang Wu	88f1ebbc25	sched/deadline: Fix sparse static warnings new sparse warnings: >> kernel/sched/cpudeadline.c:38:6: sparse: symbol 'cpudl_exchange' was not declared. Should it be static? >> kernel/sched/cpudeadline.c:46:6: sparse: symbol 'cpudl_heapify' was not declared. Should it be static? >> kernel/sched/cpudeadline.c:71:6: sparse: symbol 'cpudl_change_key' was not declared. Should it be static? >> kernel/sched/cpudeadline.c:195:15: sparse: memset with byte count of 163928 Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Fixes: `6bfd6d72f5` ("sched/deadline: speed up SCHED_DEADLINE pushes with a push-heap") Link: http://lkml.kernel.org/r/52d47f8c.EYJsA5+mELPBk4t6\%fengguang.wu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-16 09:27:03 +01:00
Linus Torvalds	85ce70fdf4	Merge branches 'sched-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler and timer fixes from Ingo Molnar: "Contains a fix for a scheduler bug that manifested itself as a 3D performance regression and a crash fix for the ARM Cadence TTC clock driver" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Calculate effective load even if local weight is 0 * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: cadence_ttc: Fix mutex taken inside interrupt context	2014-01-16 08:33:21 +07:00
Joe Perches	3e1d0bb622	audit: Convert int limit uses to u32 The equivalent uapi struct uses __u32 so make the kernel uses u32 too. This can prevent some oddities where the limit is logged/emitted as a negative value. Convert kstrtol to kstrtouint to disallow negative values. Signed-off-by: Joe Perches <joe@perches.com> [eparis: do not remove static from audit_default declaration]	2014-01-14 14:54:00 -05:00
Joe Perches	d957f7b726	audit: Use more current logging style Add pr_fmt to prefix "audit: " to output Convert printk(KERN_<LEVEL> to pr_<level> Coalesce formats Use pr_cont Move a brace after switch Signed-off-by: Joe Perches <joe@perches.com>	2014-01-14 14:53:54 -05:00
Joe Perches	b8dbc3241f	audit: Use hex_byte_pack_upper Using the generic kernel function causes the object size to increase with gcc 4.8.1. $ size kernel/audit.o* text data bss dec hex filename 18577 6079 8436 33092 8144 kernel/audit.o.new 18579 6015 8420 33014 80f6 kernel/audit.o.old Unsigned...	2014-01-14 14:53:50 -05:00
Steven Rostedt (Red Hat)	dced341b2d	tracing: Have trace buffer point back to trace_array The trace buffer has a descriptor pointer that goes back to the trace array. But it was never assigned. Luckily, nothing uses it (yet), but it will in the future. Although nothing currently uses this, if any of the new features get backported to older kernels, and because this is such a simple change, I'm marking it for stable too. Cc: stable@vger.kernel.org # v3.10+ Fixes: `12883efb67` "tracing: Consolidate max_tr into main trace_array structure" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-14 10:19:46 -05:00
Eric Paris	1ce319f11c	audit: reorder AUDIT_TTY_SET arguments An admin is likely to want to see old and new values next to each other. Putting all of the old values followed by all of the new values is just hard to read as a human. Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:33:41 -05:00
Eric Paris	0e23baccaa	audit: rework AUDIT_TTY_SET to only grab spin_lock once We can simplify the AUDIT_TTY_SET code to only grab the spin_lock one time. We need to determine if the new values are valid and if so, set the new values at the same time we grab the old onces. While we are here get rid of 'res' and just use err. Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:33:41 -05:00
Eric Paris	3f0c5fad89	audit: remove needless switch in AUDIT_SET If userspace specified that it was setting values via the mask we do not need a second check to see if they also set the version field high enough to understand those values. (clearly if they set the mask they knew those values). Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:33:39 -05:00
Eric Paris	70249a9cfd	audit: use define's for audit version Give names to the audit versions. Just something for a userspace programmer to know what the version provides. Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:33:36 -05:00
Eric Paris	c81825dd6b	audit: wait_for_auditd rework for readability We had some craziness with signed to unsigned long casting which appears wholely unnecessary. Just use signed long. Even though 2 values of the math equation are unsigned longs the result is expected to be a signed long. So why keep casting the result to signed long? Just make it signed long and use it. We also remove the needless "timeout" variable. We already have the stack "sleep_time" variable. Just use that... Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:33:27 -05:00
Richard Guy Briggs	ad2ac26327	audit: log task info on feature change Add task information to the log when changing a feature state. Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:32:56 -05:00
Gao feng	de92fc97e1	audit: fix incorrect set of audit_sock NETLINK_CB(skb).sk is the socket of user space process, netlink_unicast in kauditd_send_skb wants the kernel side socket. Since the sk_state of audit netlink socket is not NETLINK_CONNECTED, so the netlink_getsockbyportid doesn't return -ECONNREFUSED. And the socket of userspace process can be released anytime, so the audit_sock may point to invalid socket. this patch sets the audit_sock to the kernel side audit netlink socket. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:32:49 -05:00
Gao feng	11ee39ebf7	audit: print error message when fail to create audit socket print the error message and then return -ENOMEM. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:32:44 -05:00
Richard Guy Briggs	5ee9a75c9f	audit: fix dangling keywords in audit_log_set_loginuid() output Remove spaces between "new", "old" label modifiers and "auid", "ses" labels in log output since userspace tools can't parse orphaned keywords. Make variable names more consistent and intuitive. Make audit_log_format() argument code easier to read. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:32:38 -05:00
Richard Guy Briggs	724e4fcc8d	audit: log on errors from filter user rules An error on an AUDIT_NEVER rule disabled logging on that rule. On error on AUDIT_NEVER rules, log. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:32:31 -05:00
Toshiyuki Okajima	6dd80aba90	audit: audit_log_start running on auditd should not stop The backlog cannot be consumed when audit_log_start is running on auditd even if audit_log_start calls wait_for_auditd to consume it. The situation is the deadlock because only auditd can consume the backlog. If the other process needs to send the backlog, it can be also stopped by the deadlock. So, audit_log_start running on auditd should not stop. You can see the deadlock with the following reproducer: # auditctl -a exit,always -S all # reboot Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Reviewed-by: gaofeng@cn.fujitsu.com Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:32:22 -05:00
Richard Guy Briggs	1b7b533f65	audit: drop audit_cmd_lock in AUDIT_USER family of cases We do not need to hold the audit_cmd_mutex for this family of cases. The possible exception to this is the call to audit_filter_user(), so drop the lock immediately after. To help in fixing the race we are trying to avoid, make sure that nothing called by audit_filter_user() calls audit_log_start(). In particular, watch out for *_audit_rule_match(). This fix will take care of systemd and anything USING audit. It still means that we could race with something configuring audit and auditd shutting down. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Reported-by: toshi.okajima@jp.fujitsu.com Tested-by: toshi.okajima@jp.fujitsu.com Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:32:11 -05:00
Eric Paris	4440e85481	audit: convert all sessionid declaration to unsigned int Right now the sessionid value in the kernel is a combination of u32, int, and unsigned int. Just use unsigned int throughout. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:31:46 -05:00
Paul Davies C	ff235f51a1	audit: Added exe field to audit core dump signal log Currently when the coredump signals are logged by the audit system, the actual path to the executable is not logged. Without details of exe, the system admin may not have an exact idea on what program failed. This patch changes the audit_log_task() so that the path to the exe is also logged. This was copied from audit_log_task_info() and the latter enhanced to avoid disappearing text fields. Signed-off-by: Paul Davies C <pauldaviesc@gmail.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:31:38 -05:00
Richard Guy Briggs	34eab0a7cd	audit: prevent an older auditd shutdown from orphaning a newer auditd startup There have been reports of auditd restarts resulting in kaudit not being able to find a newly registered auditd. It results in reports such as: kernel: [ 2077.233573] audit: NO daemon at audit_pid=1614 kernel: [ 2077.234712] audit: audit_lost=97 audit_rate_limit=0 audit_backlog_limit=320 kernel: [ 2077.234718] audit: auditd disappeared (previously mis-spelled "dissapeared") One possible cause is a race between the shutdown of an older auditd and a newer one. If the newer one sets the daemon pid to itself in kauditd before the older one has cleared the daemon pid, the newer daemon pid will be erased. This could be caused by an automated system, or by manual intervention, but in either case, there is no use in having the older daemon clear the daemon pid reference since its old pid is no longer being referenced. This patch will prevent that specific case, returning an error of EACCES. The case for preventing a newer auditd from registering itself if there is an existing auditd is a more difficult case that is beyond the scope of this patch. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:31:27 -05:00
Richard Guy Briggs	ce0d9f0469	audit: refactor audit_receive_msg() to clarify AUDIT__RULE cases audit_receive_msg() needlessly contained a fallthrough case that called audit_receive_filter(), containing no common code between the cases. Separate them to make the logic clearer. Refactor AUDIT_LIST_RULES, AUDIT_ADD_RULE, AUDIT_DEL_RULE cases to create audit_rule_change(), audit_list_rules_send() functions. This should not functionally change the logic. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:31:22 -05:00
Richard Guy Briggs	a06e56b2a1	audit: log AUDIT_TTY_SET config changes Log transition of config changes when AUDIT_TTY_SET is called, including both enabled and log_passwd values now in the struct. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:31:15 -05:00
Richard Guy Briggs	04ee1a3b8f	audit: get rid of NO daemon at audit_pid=0 message kauditd_send_skb is called after audit_pid was checked to be non-zero. However, it can be set to 0 due to auditd exiting while kauditd_send_skb is still executed and this can result in a spurious warning about missing auditd. Re-check audit_pid before printing the message. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: linux-kernel@vger.kernel.org Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:31:07 -05:00
Paul Davies C	61c0ee8792	audit: drop audit_log_abend() The audit_log_abend() is used only by the audit_core_dumps(). Thus there is no need of maintaining the audit_log_abend() as a separate function. This patch drops the audit_log_abend() and pushes its functionalities back to the audit_core_dumps(). Apart from that the "reason" field is also dropped from being logged since the reason can be deduced from the signal number. Signed-off-by: Paul Davies C <pauldaviesc@gmail.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:30:59 -05:00
Richard Guy Briggs	40c0775e5e	audit: allow unlimited backlog queue Since audit can already be disabled by "audit=0" on the kernel boot line, or by the command "auditctl -e 0", it would be more useful to have the audit_backlog_limit set to zero mean effectively unlimited (limited only by system RAM). Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:30:38 -05:00
Gao feng	c2412d91c6	audit: don't generate loginuid log when audit disabled If audit is disabled, we shouldn't generate loginuid audit log. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:30:25 -05:00
Gao feng	4547b3bc43	audit: use old_lock in audit_set_feature we already have old_lock, no need to calculate it again. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:30:19 -05:00
Gao feng	b6c50fe0be	audit: don't generate audit feature changed log when audit disabled If audit is disabled,we shouldn't generate the audit log. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:29:06 -05:00
Gao feng	aabce351b5	audit: fix incorrect order of log new and old feature The order of new feature and old feature is incorrect, this patch fix it. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:29:00 -05:00
Gao feng	d3ca0344b2	audit: remove useless code in audit_enable Since kernel parameter is operated before initcall, so the audit_initialized must be AUDIT_UNINITIALIZED or DISABLED in audit_enable. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:28:53 -05:00
Richard Guy Briggs	51cc83f024	audit: add audit_backlog_wait_time configuration option reaahead-collector abuses the audit logging facility to discover which files are accessed at boot time to make a pre-load list Add a tuning option to audit_backlog_wait_time so that if auditd can't keep up, or gets blocked, the callers won't be blocked. Bump audit_status API version to "2". Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:28:45 -05:00
Richard Guy Briggs	09f883a902	audit: clean up AUDIT_GET/SET local variables and future-proof API Re-named confusing local variable names (status_set and status_get didn't agree with their command type name) and reduced their scope. Future-proof API changes by not depending on the exact size of the audit_status struct and by adding an API version field. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:28:39 -05:00
Richard Guy Briggs	f910fde730	audit: add kernel set-up parameter to override default backlog limit The default audit_backlog_limit is 64. This was a reasonable limit at one time. systemd causes so much audit queue activity on startup that auditd doesn't start before the backlog queue has already overflowed by more than a factor of 2. On a system with audit= not set on the kernel command line, this isn't an issue since that history isn't kept for auditd when it is available. On a system with audit=1 set on the kernel command line, kaudit tries to keep that history until auditd is able to drain the queue. This default can be changed by the "-b" option in audit.rules once the system has booted, but won't help with lost messages on boot. One way to solve this would be to increase the default backlog queue size to avoid losing any messages before auditd is able to consume them. This would be overkill to the embedded community and insufficient for some servers. Another way to solve it might be to add a kconfig option to set the default based on the system type. An embedded system would get the current (or smaller) default, while Workstations might get more than now and servers might get more. None of these solutions helps if a system's compiled default is too small to see the lost messages without compiling a new kernel. This patch adds a kernel set-up parameter (audit already has one to enable/disable it) "audit_backlog_limit=<n>" that overrides the default to allow the system administrator to set the backlog limit. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:28:31 -05:00
Dan Duval	7ecf69bf50	audit: efficiency fix 2: request exclusive wait since all need same resource These and similar errors were seen on a patched 3.8 kernel when the audit subsystem was overrun during boot: udevd[876]: worker [887] unexpectedly returned with status 0x0100 udevd[876]: worker [887] failed while handling '/devices/pci0000:00/0000:00:03.0/0000:40:00.0' udevd[876]: worker [880] unexpectedly returned with status 0x0100 udevd[876]: worker [880] failed while handling '/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1' udevadm settle - timeout of 180 seconds reached, the event queue contains: /sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1 (3995) /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/INT3F0D:00 (4034) audit: audit_backlog=258 > audit_backlog_limit=256 audit: audit_lost=1 audit_rate_limit=0 audit_backlog_limit=256 The change below increases the efficiency of the audit code and prevents it from being overrun: Use add_wait_queue_exclusive() in wait_for_auditd() to put the thread on the wait queue. When kauditd dequeues an skb, all of the waiting threads are waiting for the same resource, but only one is going to get it, so there's no need to wake up more than one waiter. See: https://lkml.org/lkml/2013/9/2/479 Signed-off-by: Dan Duval <dan.duval@oracle.com> Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:28:19 -05:00
Dan Duval	db89731940	audit: efficiency fix 1: only wake up if queue shorter than backlog limit These and similar errors were seen on a patched 3.8 kernel when the audit subsystem was overrun during boot: udevd[876]: worker [887] unexpectedly returned with status 0x0100 udevd[876]: worker [887] failed while handling '/devices/pci0000:00/0000:00:03.0/0000:40:00.0' udevd[876]: worker [880] unexpectedly returned with status 0x0100 udevd[876]: worker [880] failed while handling '/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1' udevadm settle - timeout of 180 seconds reached, the event queue contains: /sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1/event1 (3995) /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/INT3F0D:00 (4034) audit: audit_backlog=258 > audit_backlog_limit=256 audit: audit_lost=1 audit_rate_limit=0 audit_backlog_limit=256 The change below increases the efficiency of the audit code and prevents it from being overrun: Only issue a wake_up in kauditd if the length of the skb queue is less than the backlog limit. Otherwise, threads waiting in wait_for_auditd() will simply wake up, discover that the queue is still too long for them to proceed, and go back to sleep. This results in wasted context switches and machine cycles. kauditd_thread() is the only function that removes buffers from audit_skb_queue so we can't race. If we did, the timeout in wait_for_auditd() would expire and the waiting thread would continue. See: https://lkml.org/lkml/2013/9/2/479 Signed-off-by: Dan Duval <dan.duval@oracle.com> Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:28:08 -05:00
Richard Guy Briggs	ae887e0bdc	audit: make use of remaining sleep time from wait_for_auditd If wait_for_auditd() times out, go immediately to the error function rather than retesting the loop conditions. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:27:53 -05:00
Richard Guy Briggs	e789e561a5	audit: reset audit backlog wait time after error recovery When the audit queue overflows and times out (audit_backlog_wait_time), the audit queue overflow timeout is set to zero. Once the audit queue overflow timeout condition recovers, the timeout should be reset to the original value. See also: https://lkml.org/lkml/2013/9/2/473 Cc: stable@vger.kernel.org # v3.8-rc4+ Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Dan Duval <dan.duval@oracle.com> Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:27:30 -05:00
Richard Guy Briggs	33faba7fa7	audit: listen in all network namespaces Convert audit from only listening in init_net to use register_pernet_subsys() to dynamically manage the netlink socket list. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:27:24 -05:00
Richard Guy Briggs	2f2ad10133	audit: restore order of tty and ses fields in log output When being refactored from audit_log_start() to audit_log_task_info(), in commit `e23eb920` the tty and ses fields in the log output got transposed. Restore to original order to avoid breaking search tools. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:27:16 -05:00
Richard Guy Briggs	f9441639e6	audit: fix netlink portid naming and types Normally, netlink ports use the PID of the userspace process as the port ID. If the PID is already in use by a port, the kernel will allocate another port ID to avoid conflict. Re-name all references to netlink ports from pid to portid to reflect this reality and avoid confusion with actual PIDs. Ports use the __u32 type, so re-type all portids accordingly. (This patch is very similar to ebiederman's 5deadd69) Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:26:52 -05:00
Eric W. Biederman	ca24a23ebc	audit: Simplify and correct audit_log_capset - Always report the current process as capset now always only works on the current process. This prevents reporting 0 or a random pid in a random pid namespace. - Don't bother to pass the pid as is available. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit bcc85f0af31af123e32858069eb2ad8f39f90e67) (cherry picked from commit f911cac4556a7a23e0b3ea850233d13b32328692) Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [eparis: fix build error when audit disabled] Signed-off-by: Eric Paris <eparis@redhat.com>	2014-01-13 22:26:48 -05:00
Steven Rostedt (Red Hat)	a4c35ed241	ftrace: Fix synchronization location disabling and freeing ftrace_ops The synchronization needed after ftrace_ops are unregistered must happen after the callback is disabled from becing called by functions. The current location happens after the function is being removed from the internal lists, but not after the function callbacks were disabled, leaving the functions susceptible of being called after their callbacks are freed. This affects perf and any externel users of function tracing (LTTng and SystemTap). Cc: stable@vger.kernel.org # 3.0+ Fixes: `cdbe61bfe7` "ftrace: Allow dynamically allocated function tracers" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-13 12:56:21 -05:00
Peter Zijlstra	8cb75e0c4e	sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding With various drivers wanting to inject idle time; we get people calling idle routines outside of the idle loop proper. Therefore we need to be extra careful about not missing TIF_NEED_RESCHED -> PREEMPT_NEED_RESCHED propagations. While looking at this, I also realized there's a small window in the existing idle loop where we can miss TIF_NEED_RESCHED; when it hits right after the tif_need_resched() test at the end of the loop but right before the need_resched() test at the start of the loop. So move preempt_fold_need_resched() out of the loop where we're guaranteed to have TIF_NEED_RESCHED set. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-x9jgh45oeayzajz2mjt0y7d6@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 17:38:55 +01:00
Peter Zijlstra	0bd3a173d7	sched/preempt, locking: Rework local_bh_{dis,en}able() Currently local_bh_disable() is out-of-line for no apparent reason. So inline it to save a few cycles on call/return nonsense, the function body is a single add on x86 (a few loads and store extra on load/store archs). Also expose two new local_bh functions: __local_bh_{dis,en}able_ip(unsigned long ip, unsigned int cnt); Which implement the actual local_bh_{dis,en}able() behaviour. The next patch uses the exposed @cnt argument to optimize bh lock functions. With build fixes from Jacob Pan. Cc: rjw@rjwysocki.net Cc: rui.zhang@intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 17:32:27 +01:00
Steven Rostedt (Red Hat)	23a8e8441a	ftrace: Have function graph only trace based on global_ops filters Doing some different tests, I discovered that function graph tracing, when filtered via the set_ftrace_filter and set_ftrace_notrace files, does not always keep with them if another function ftrace_ops is registered to trace functions. The reason is that function graph just happens to trace all functions that the function tracer enables. When there was only one user of function tracing, the function graph tracer did not need to worry about being called by functions that it did not want to trace. But now that there are other users, this becomes a problem. For example, one just needs to do the following: # cd /sys/kernel/debug/tracing # echo schedule > set_ftrace_filter # echo function_graph > current_tracer # cat trace [..] 0) \| schedule() { ------------------------------------------ 0) <idle>-0 => rcu_pre-7 ------------------------------------------ 0) ! 2980.314 us \| } 0) \| schedule() { ------------------------------------------ 0) rcu_pre-7 => <idle>-0 ------------------------------------------ 0) + 20.701 us \| } # echo 1 > /proc/sys/kernel/stack_tracer_enabled # cat trace [..] 1) + 20.825 us \| } 1) + 21.651 us \| } 1) + 30.924 us \| } /* SyS_ioctl */ 1) \| do_page_fault() { 1) \| __do_page_fault() { 1) 0.274 us \| down_read_trylock(); 1) 0.098 us \| find_vma(); 1) \| handle_mm_fault() { 1) \| _raw_spin_lock() { 1) 0.102 us \| preempt_count_add(); 1) 0.097 us \| do_raw_spin_lock(); 1) 2.173 us \| } 1) \| do_wp_page() { 1) 0.079 us \| vm_normal_page(); 1) 0.086 us \| reuse_swap_page(); 1) 0.076 us \| page_move_anon_rmap(); 1) \| unlock_page() { 1) 0.082 us \| page_waitqueue(); 1) 0.086 us \| __wake_up_bit(); 1) 1.801 us \| } 1) 0.075 us \| ptep_set_access_flags(); 1) \| _raw_spin_unlock() { 1) 0.098 us \| do_raw_spin_unlock(); 1) 0.105 us \| preempt_count_sub(); 1) 1.884 us \| } 1) 9.149 us \| } 1) + 13.083 us \| } 1) 0.146 us \| up_read(); When the stack tracer was enabled, it enabled all functions to be traced, which now the function graph tracer also traces. This is a side effect that should not occur. To fix this a test is added when the function tracing is changed, as well as when the graph tracer is enabled, to see if anything other than the ftrace global_ops function tracer is enabled. If so, then the graph tracer calls a test trampoline that will look at the function that is being traced and compare it with the filters defined by the global_ops. As an optimization, if there's no other function tracers registered, or if the only registered function tracers also use the global ops, the function graph infrastructure will call the registered function graph callback directly and not go through the test trampoline. Cc: stable@vger.kernel.org # 3.3+ Fixes: `d2d45c7a03` "tracing: Have stack_tracer use a separate list of functions" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-13 10:52:58 -05:00
Peter Zijlstra	6577e42a3e	sched/clock: Fix up clear_sched_clock_stable() The below tells us the static_key conversion has a problem; since the exact point of clearing that flag isn't too important, delay the flip and use a workqueue to process it. [ ] TSC synchronization [CPU#0 -> CPU#22]: [ ] Measured 8 cycles TSC warp between CPUs, turning off TSC clock. [ ] [ ] ====================================================== [ ] [ INFO: possible circular locking dependency detected ] [ ] 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 Not tainted [ ] ------------------------------------------------------- [ ] swapper/0/1 is trying to acquire lock: [ ] (jump_label_mutex){+.+...}, at: [<ffffffff8115a637>] jump_label_lock+0x17/0x20 [ ] [ ] but task is already holding lock: [ ] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60 [ ] [ ] which lock already depends on the new lock. [ ] [ ] [ ] the existing dependency chain (in reverse order) is: [ ] [ ] -> #1 (cpu_hotplug.lock){+.+.+.}: [ ] [<ffffffff810def00>] lock_acquire+0x90/0x130 [ ] [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0 [ ] [<ffffffff81093fdc>] get_online_cpus+0x3c/0x60 [ ] [<ffffffff8104cc67>] arch_jump_label_transform+0x37/0x130 [ ] [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80 [ ] [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0 [ ] [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0 [ ] [<ffffffff810c0f65>] sched_feat_set+0xf5/0x100 [ ] [<ffffffff810c5bdc>] set_numabalancing_state+0x2c/0x30 [ ] [<ffffffff81d12f3d>] numa_policy_init+0x1af/0x1b7 [ ] [<ffffffff81cebdf4>] start_kernel+0x35d/0x41f [ ] [<ffffffff81ceb5a5>] x86_64_start_reservations+0x2a/0x2c [ ] [<ffffffff81ceb6a2>] x86_64_start_kernel+0xfb/0xfe [ ] [ ] -> #0 (jump_label_mutex){+.+...}: [ ] [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0 [ ] [<ffffffff810def00>] lock_acquire+0x90/0x130 [ ] [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0 [ ] [<ffffffff8115a637>] jump_label_lock+0x17/0x20 [ ] [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0 [ ] [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20 [ ] [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70 [ ] [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150 [ ] [<ffffffff81076612>] native_cpu_up+0x3a2/0x890 [ ] [<ffffffff810941cb>] _cpu_up+0xdb/0x160 [ ] [<ffffffff810942c9>] cpu_up+0x79/0x90 [ ] [<ffffffff81d0af6b>] smp_init+0x60/0x8c [ ] [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197 [ ] [<ffffffff8164e32e>] kernel_init+0xe/0x130 [ ] [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0 [ ] [ ] other info that might help us debug this: [ ] [ ] Possible unsafe locking scenario: [ ] [ ] CPU0 CPU1 [ ] ---- ---- [ ] lock(cpu_hotplug.lock); [ ] lock(jump_label_mutex); [ ] lock(cpu_hotplug.lock); [ ] lock(jump_label_mutex); [ ] [ ] * DEADLOCK * [ ] [ ] 2 locks held by swapper/0/1: [ ] #0: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81094037>] cpu_maps_update_begin+0x17/0x20 [ ] #1: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60 [ ] [ ] stack backtrace: [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010 [ ] ffffffff82c9c270 ffff880236843bb8 ffffffff8165c5f5 ffffffff82c9c270 [ ] ffff880236843bf8 ffffffff81658c02 ffff880236843c80 ffff8802368586a0 [ ] ffff880236858678 0000000000000001 0000000000000002 ffff880236858000 [ ] Call Trace: [ ] [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a [ ] [<ffffffff81658c02>] print_circular_bug+0x1f9/0x207 [ ] [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0 [ ] [<ffffffff816680ff>] ? __atomic_notifier_call_chain+0x8f/0xb0 [ ] [<ffffffff810def00>] lock_acquire+0x90/0x130 [ ] [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20 [ ] [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20 [ ] [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0 [ ] [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20 [ ] [<ffffffff8115a637>] jump_label_lock+0x17/0x20 [ ] [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0 [ ] [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20 [ ] [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70 [ ] [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150 [ ] [<ffffffff81076612>] native_cpu_up+0x3a2/0x890 [ ] [<ffffffff810941cb>] _cpu_up+0xdb/0x160 [ ] [<ffffffff810942c9>] cpu_up+0x79/0x90 [ ] [<ffffffff81d0af6b>] smp_init+0x60/0x8c [ ] [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197 [ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0 [ ] [<ffffffff8164e32e>] kernel_init+0xe/0x130 [ ] [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0 [ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0 [ ] ------------[ cut here ]------------ [ ] WARNING: CPU: 0 PID: 1 at /usr/src/linux-2.6/kernel/smp.c:374 smp_call_function_many+0xad/0x300() [ ] Modules linked in: [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010 [ ] 0000000000000009 ffff880236843be0 ffffffff8165c5f5 0000000000000000 [ ] ffff880236843c18 ffffffff81093d8c 0000000000000000 0000000000000000 [ ] ffffffff81ccd1a0 ffffffff810ca951 0000000000000000 ffff880236843c28 [ ] Call Trace: [ ] [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a [ ] [<ffffffff81093d8c>] warn_slowpath_common+0x8c/0xc0 [ ] [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0 [ ] [<ffffffff81093dda>] warn_slowpath_null+0x1a/0x20 [ ] [<ffffffff8110b72d>] smp_call_function_many+0xad/0x300 [ ] [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30 [ ] [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30 [ ] [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0 [ ] [<ffffffff8110ba96>] smp_call_function+0x46/0x80 [ ] [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30 [ ] [<ffffffff8110bb3c>] on_each_cpu+0x3c/0xa0 [ ] [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20 [ ] [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0 [ ] [<ffffffff8104f964>] text_poke_bp+0x64/0xd0 [ ] [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20 [ ] [<ffffffff8104ccde>] arch_jump_label_transform+0xae/0x130 [ ] [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80 [ ] [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0 [ ] [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0 [ ] [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20 [ ] [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70 [ ] [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150 [ ] [<ffffffff81076612>] native_cpu_up+0x3a2/0x890 [ ] [<ffffffff810941cb>] _cpu_up+0xdb/0x160 [ ] [<ffffffff810942c9>] cpu_up+0x79/0x90 [ ] [<ffffffff81d0af6b>] smp_init+0x60/0x8c [ ] [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197 [ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0 [ ] [<ffffffff8164e32e>] kernel_init+0xe/0x130 [ ] [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0 [ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0 [ ] ---[ end trace 6ff1df5620c49d26 ]--- [ ] tsc: Marking TSC unstable due to check_tsc_sync_source failed Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-v55fgqj3nnyqnngmvuu8ep6h@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 15:13:15 +01:00
Peter Zijlstra	35af99e646	sched/clock, x86: Use a static_key for sched_clock_stable In order to avoid the runtime condition and variable load turn sched_clock_stable into a static_key. Also provide a shorter implementation of local_clock() and cpu_clock(int) when sched_clock_stable==1. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 221876 215295 (cold) local_clock: 301773 234692 220773 (warm) sched_clock: 38375 25602 25659 (warm) local_clock: 100371 33265 27242 (warm) rdtsc: 27340 24214 24208 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 235941 237019 (cold) local_clock: 396890 297017 294819 (warm) sched_clock: 38194 25233 25609 (warm) local_clock: 143452 71234 71232 (warm) rdtsc: 27345 24245 24243 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 15:13:13 +01:00
Peter Zijlstra	ef08f0fff8	sched/clock: Remove local_irq_disable() from the clocks Now that x86 no longer requires IRQs disabled for sched_clock() and ia64 never had this requirement (it doesn't seem to do cpufreq at all), we can remove the requirement of disabling IRQs. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 257223 221876 (cold) local_clock: 301773 309889 234692 (warm) sched_clock: 38375 25280 25602 (warm) local_clock: 100371 85268 33265 (warm) rdtsc: 27340 24247 24214 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 301224 235941 (cold) local_clock: 396890 399870 297017 (warm) sched_clock: 38194 25630 25233 (warm) local_clock: 143452 129629 71234 (warm) rdtsc: 27345 24307 24245 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-36e5kohiasnr106d077mgubp@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 15:13:11 +01:00
Peter Zijlstra	9ea4c38006	locking: Optimize lock_bh functions Currently all _bh_ lock functions do two preempt_count operations: local_bh_disable(); preempt_disable(); and for the unlock: preempt_enable_no_resched(); local_bh_enable(); Since its a waste of perfectly good cycles to modify the same variable twice when you can do it in one go; use the new __local_bh_{dis,en}able_ip() functions that allow us to provide a preempt_count value to add/sub. So define SOFTIRQ_LOCK_OFFSET as the offset a _bh_ lock needs to add/sub to be done in one go. As a bonus it gets rid of the preempt_enable_no_resched() usage. This reduces a 1000 loops of: spin_lock_bh(&bh_lock); spin_unlock_bh(&bh_lock); from 53596 cycles to 51995 cycles. I didn't do enough measurements to say for absolute sure that the result is significant but the the few runs I did for each suggest it is so. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Cc: rjw@rjwysocki.net Cc: rui.zhang@intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:36 +01:00
Daniel Lezcano	c726099ec2	sched: Factor out the on_null_domain() checks in trigger_load_balance() The test on_null_domain is done twice in the trigger_load_balance function. Move the test at the begin of the function, so there is only one check. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-9-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:35 +01:00
Daniel Lezcano	208cb16ba3	sched: Pass 'struct rq' to nohz_idle_balance() The cpu information is stored in the struct rq. Pass the struct rq to nohz_idle_balance, so all the functions called in run_rebalance_domains have the same parameters and the 'this_cpu' variable becomes pointless. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> [ Added !SMP build fix. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-8-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:33 +01:00
Daniel Lezcano	f7ed0a895e	sched: Pass 'struct rq' to rebalance_domains() The cpu information is stored in the struct rq and the caller of the rebalance_domains function pass the cpu to retrieve the struct rq but it already has the struct rq info. Replace the cpu parameter with the struct rq. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-7-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:31 +01:00
Daniel Lezcano	0aeeeebac8	sched: Remove unused parameter from nohz_balancer_kick() The cpu parameter is no longer needed in nohz_balancer_kick, let's remove the parameter. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-6-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:30 +01:00
Daniel Lezcano	3dd0337d6d	sched: Remove unused parameter from find_new_ilb() The 'call_cpu' is never used in the function. Remove it. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-5-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:29 +01:00
Daniel Lezcano	63f609b160	sched: Pass 'struct rq' to on_null_domain() The on_null_domain() function is getting the cpu to retrieve the struct rq associated with it. Pass 'struct rq' directly to the function as the caller already has the info. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-4-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:28 +01:00
Daniel Lezcano	4a725627f2	sched: Reduce nohz_kick_needed() parameters The cpu information is already stored in the struct rq, so no need to pass it as parameter to the nohz_kick_needed function. The caller of this function just called idle_cpu() before to fill the rq->idle_balance field. Use rq->cpu and rq->idle_balance. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-3-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:27 +01:00
Daniel Lezcano	7caff66f36	sched: Reduce trigger_load_balance() parameters The cpu information is already stored in the struct rq, so no need to pass it as parameter to the trigger_load_balance function. Cc: linaro-kernel@lists.linaro.org Cc: preeti.lkml@gmail.com Cc: mingo@redhat.com Cc: peterz@infradead.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-2-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:26 +01:00
Peter Zijlstra	de212f18e9	sched/deadline: Fix hotplug admission control The current hotplug admission control is broken because: CPU_DYING -> migration_call() -> migrate_tasks() -> __migrate_task() cannot fail and hard assumes it _will_ move all tasks off of the dying cpu, failing this will break hotplug. The much simpler solution is a DOWN_PREPARE handler that fails when removing one CPU gets us below the total allocated bandwidth. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131220171343.GL2480@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:25 +01:00
Peter Zijlstra	1724813d9f	sched/deadline: Remove the sysctl_sched_dl knobs Remove the deadline specific sysctls for now. The problem with them is that the interaction with the exisiting rt knobs is nearly impossible to get right. The current (as per before this patch) situation is that the rt and dl bandwidth is completely separate and we enforce rt+dl < 100%. This is undesirable because this means that the rt default of 95% leaves us hardly any room, even though dl tasks are saver than rt tasks. Another proposed solution was (a discarted patch) to have the dl bandwidth be a fraction of the rt bandwidth. This is highly confusing imo. Furthermore neither proposal is consistent with the situation we actually want; which is rt tasks ran from a dl server. In which case the rt bandwidth is a direct subset of dl. So whichever way we go, the introduction of dl controls at this point is painful. Therefore remove them and instead share the rt budget. This means that for now the rt knobs are used for dl admission control and the dl runtime is accounted against the rt runtime. I realise that this isn't entirely desirable either; but whatever we do we appear to need to change the interface later, so better have a small interface for now. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-zpyqbqds1r0vyxtxza1e7rdc@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:23 +01:00
Peter Zijlstra	e4099a5e92	sched/deadline: Fix up the smp-affinity mask tests For now deadline tasks are not allowed to set smp affinity; however the current tests are wrong, cure this. The test in __sched_setscheduler() also uses an on-stack cpumask_t which is a no-no. Change both tests to use cpumask_subset() such that we test the root domain span to be a subset of the cpus_allowed mask. This way we're sure the tasks can always run on all CPUs they can be balanced over, and have no effective affinity constraints. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-fyqtb1lapxca3lhsxv9cumdc@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:47:22 +01:00
Juri Lelli	6bfd6d72f5	sched/deadline: speed up SCHED_DEADLINE pushes with a push-heap Data from tests confirmed that the original active load balancing logic didn't scale neither in the number of CPU nor in the number of tasks (as sched_rt does). Here we provide a global data structure to keep track of deadlines of the running tasks in the system. The structure is composed by a bitmask showing the free CPUs and a max-heap, needed when the system is heavily loaded. The implementation and concurrent access scheme are kept simple by design. However, our measurements show that we can compete with sched_rt on large multi-CPUs machines [1]. Only the push path is addressed, the extension to use this structure also for pull decisions is straightforward. However, we are currently evaluating different (in order to decrease/avoid contention) data structures to solve possibly both problems. We are also going to re-run tests considering recent changes inside cpupri [2]. [1] http://retis.sssup.it/~jlelli/papers/Ospert11Lelli.pdf [2] http://www.spinics.net/lists/linux-rt-users/msg06778.html Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-14-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:46:46 +01:00
Dario Faggioli	332ac17ef5	sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks In order of deadline scheduling to be effective and useful, it is important that some method of having the allocation of the available CPU bandwidth to tasks and task groups under control. This is usually called "admission control" and if it is not performed at all, no guarantee can be given on the actual scheduling of the -deadline tasks. Since when RT-throttling has been introduced each task group have a bandwidth associated to itself, calculated as a certain amount of runtime over a period. Moreover, to make it possible to manipulate such bandwidth, readable/writable controls have been added to both procfs (for system wide settings) and cgroupfs (for per-group settings). Therefore, the same interface is being used for controlling the bandwidth distrubution to -deadline tasks and task groups, i.e., new controls but with similar names, equivalent meaning and with the same usage paradigm are added. However, more discussion is needed in order to figure out how we want to manage SCHED_DEADLINE bandwidth at the task group level. Therefore, this patch adds a less sophisticated, but actually very sensible, mechanism to ensure that a certain utilization cap is not overcome per each root_domain (the single rq for !SMP configurations). Another main difference between deadline bandwidth management and RT-throttling is that -deadline tasks have bandwidth on their own (while -rt ones doesn't!), and thus we don't need an higher level throttling mechanism to enforce the desired bandwidth. This patch, therefore: - adds system wide deadline bandwidth management by means of: * /proc/sys/kernel/sched_dl_runtime_us, * /proc/sys/kernel/sched_dl_period_us, that determine (i.e., runtime / period) the total bandwidth available on each CPU of each root_domain for -deadline tasks; - couples the RT and deadline bandwidth management, i.e., enforces that the sum of how much bandwidth is being devoted to -rt -deadline tasks to stay below 100%. This means that, for a root_domain comprising M CPUs, -deadline tasks can be created until the sum of their bandwidths stay below: M * (sched_dl_runtime_us / sched_dl_period_us) It is also possible to disable this bandwidth management logic, and be thus free of oversubscribing the system up to any arbitrary level. Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-12-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:46:42 +01:00
Dario Faggioli	2d3d891d33	sched/deadline: Add SCHED_DEADLINE inheritance logic Some method to deal with rt-mutexes and make sched_dl interact with the current PI-coded is needed, raising all but trivial issues, that needs (according to us) to be solved with some restructuring of the pi-code (i.e., going toward a proxy execution-ish implementation). This is under development, in the meanwhile, as a temporary solution, what this commits does is: - ensure a pi-lock owner with waiters is never throttled down. Instead, when it runs out of runtime, it immediately gets replenished and it's deadline is postponed; - the scheduling parameters (relative deadline and default runtime) used for that replenishments --during the whole period it holds the pi-lock-- are the ones of the waiting task with earliest deadline. Acting this way, we provide some kind of boosting to the lock-owner, still by using the existing (actually, slightly modified by the previous commit) pi-architecture. We would stress the fact that this is only a surely needed, all but clean solution to the problem. In the end it's only a way to re-start discussion within the community. So, as always, comments, ideas, rants, etc.. are welcome! :-) Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> [ Added !RT_MUTEXES build fix. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-11-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:42:56 +01:00
Peter Zijlstra	fb00aca474	rtmutex: Turn the plist into an rb-tree Turn the pi-chains from plist to rb-tree, in the rt_mutex code, and provide a proper comparison function for -deadline and -priority tasks. This is done mainly because: - classical prio field of the plist is just an int, which might not be enough for representing a deadline; - manipulating such a list would become O(nr_deadline_tasks), which might be to much, as the number of -deadline task increases. Therefore, an rb-tree is used, and tasks are queued in it according to the following logic: - among two -priority (i.e., SCHED_BATCH/OTHER/RR/FIFO) tasks, the one with the higher (lower, actually!) prio wins; - among a -priority and a -deadline task, the latter always wins; - among two -deadline tasks, the one with the earliest deadline wins. Queueing and dequeueing functions are changed accordingly, for both the list of a task's pi-waiters and the list of tasks blocked on a pi-lock. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-again-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-10-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:41:50 +01:00
Dario Faggioli	af6ace764d	sched/deadline: Add latency tracing for SCHED_DEADLINE tasks It is very likely that systems that wants/needs to use the new SCHED_DEADLINE policy also want to have the scheduling latency of the -deadline tasks under control. For this reason a new version of the scheduling wakeup latency, called "wakeup_dl", is introduced. As a consequence of applying this patch there will be three wakeup latency tracer: * "wakeup", that deals with all tasks in the system; * "wakeup_rt", that deals with -rt and -deadline tasks only; * "wakeup_dl", that deals with -deadline tasks only. Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-9-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:41:11 +01:00
Harald Gustafsson	755378a471	sched/deadline: Add period support for SCHED_DEADLINE tasks Make it possible to specify a period (different or equal than deadline) for -deadline tasks. Relative deadlines (D_i) are used on task arrivals to generate new scheduling (absolute) deadlines as "d = t + D_i", and periods (P_i) to postpone the scheduling deadlines as "d = d + P_i" when the budget is zero. This is in general useful to model (and schedule) tasks that have slow activation rates (long periods), but have to be scheduled soon once activated (short deadlines). Signed-off-by: Harald Gustafsson <harald.gustafsson@ericsson.com> Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-7-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:41:09 +01:00
Dario Faggioli	239be4a982	sched/deadline: Add SCHED_DEADLINE avg_update accounting Make the core scheduler and load balancer aware of the load produced by -deadline tasks, by updating the moving average like for sched_rt. Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-6-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:41:08 +01:00
Juri Lelli	1baca4ce16	sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic Introduces data structures relevant for implementing dynamic migration of -deadline tasks and the logic for checking if runqueues are overloaded with -deadline tasks and for choosing where a task should migrate, when it is the case. Adds also dynamic migrations to SCHED_DEADLINE, so that tasks can be moved among CPUs when necessary. It is also possible to bind a task to a (set of) CPU(s), thus restricting its capability of migrating, or forbidding migrations at all. The very same approach used in sched_rt is utilised: - -deadline tasks are kept into CPU-specific runqueues, - -deadline tasks are migrated among runqueues to achieve the following: * on an M-CPU system the M earliest deadline ready tasks are always running; * affinity/cpusets settings of all the -deadline tasks is always respected. Therefore, this very special form of "load balancing" is done with an active method, i.e., the scheduler pushes or pulls tasks between runqueues when they are woken up and/or (de)scheduled. IOW, every time a preemption occurs, the descheduled task might be sent to some other CPU (depending on its deadline) to continue executing (push). On the other hand, every time a CPU becomes idle, it might pull the second earliest deadline ready task from some other CPU. To enforce this, a pull operation is always attempted before taking any scheduling decision (pre_schedule()), as well as a push one after each scheduling decision (post_schedule()). In addition, when a task arrives or wakes up, the best CPU where to resume it is selected taking into account its affinity mask, the system topology, but also its deadline. E.g., from the scheduling point of view, the best CPU where to wake up (and also where to push) a task is the one which is running the task with the latest deadline among the M executing ones. In order to facilitate these decisions, per-runqueue "caching" of the deadlines of the currently running and of the first ready task is used. Queued but not running tasks are also parked in another rb-tree to speed-up pushes. Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-5-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:41:07 +01:00
Dario Faggioli	aab03e05e8	sched/deadline: Add SCHED_DEADLINE structures & implementation Introduces the data structures, constants and symbols needed for SCHED_DEADLINE implementation. Core data structure of SCHED_DEADLINE are defined, along with their initializers. Hooks for checking if a task belong to the new policy are also added where they are needed. Adds a scheduling class, in sched/dl.c and a new policy called SCHED_DEADLINE. It is an implementation of the Earliest Deadline First (EDF) scheduling algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS) that makes it possible to isolate the behaviour of tasks between each other. The typical -deadline task will be made up of a computation phase (instance) which is activated on a periodic or sporadic fashion. The expected (maximum) duration of such computation is called the task's runtime; the time interval by which each instance need to be completed is called the task's relative deadline. The task's absolute deadline is dynamically calculated as the time instant a task (better, an instance) activates plus the relative deadline. The EDF algorithms selects the task with the smallest absolute deadline as the one to be executed first, while the CBS ensures each task to run for at most its runtime every (relative) deadline length time interval, avoiding any interference between different tasks (bandwidth isolation). Thanks to this feature, also tasks that do not strictly comply with the computational model sketched above can effectively use the new policy. To summarize, this patch: - introduces the data structures, constants and symbols needed; - implements the core logic of the scheduling algorithm in the new scheduling class file; - provides all the glue code between the new scheduling class and the core scheduler and refines the interactions between sched/dl and the other existing scheduling classes. Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com> Signed-off-by: Fabio Checconi <fchecconi@gmail.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:41:06 +01:00
Dario Faggioli	d50dde5a10	sched: Add new scheduler syscalls to support an extended scheduling parameters ABI Add the syscalls needed for supporting scheduling algorithms with extended scheduling parameters (e.g., SCHED_DEADLINE). In general, it makes possible to specify a periodic/sporadic task, that executes for a given amount of runtime at each instance, and is scheduled according to the urgency of their own timing constraints, i.e.: - a (maximum/typical) instance execution time, - a minimum interval between consecutive instances, - a time constraint by which each instance must be completed. Thus, both the data structure that holds the scheduling parameters of the tasks and the system calls dealing with it must be extended. Unfortunately, modifying the existing struct sched_param would break the ABI and result in potentially serious compatibility issues with legacy binaries. For these reasons, this patch: - defines the new struct sched_attr, containing all the fields that are necessary for specifying a task in the computational model described above; - defines and implements the new scheduling related syscalls that manipulate it, i.e., sched_setattr() and sched_getattr(). Syscalls are introduced for x86 (32 and 64 bits) and ARM only, as a proof of concept and for developing and testing purposes. Making them available on other architectures is straightforward. Since no "user" for these new parameters is introduced in this patch, the implementation of the new system calls is just identical to their already existing counterpart. Future patches that implement scheduling policies able to exploit the new data structure must also take care of modifying the sched_*attr() calls accordingly with their own purposes. Signed-off-by: Dario Faggioli <raistlin@linux.it> [ Rewrote to use sched_attr. ] Signed-off-by: Juri Lelli <juri.lelli@gmail.com> [ Removed sched_setscheduler2() for now. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:41:04 +01:00
Ingo Molnar	56b4811039	Merge branch 'sched/urgent' into sched/core Pick up the latest fixes before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:35:28 +01:00
Davidlohr Bueso	b0c29f79ec	futexes: Avoid taking the hb->lock if there's nothing to wake up In futex_wake() there is clearly no point in taking the hb->lock if we know beforehand that there are no tasks to be woken. While the hash bucket's plist head is a cheap way of knowing this, we cannot rely 100% on it as there is a racy window between the futex_wait call and when the task is actually added to the plist. To this end, we couple it with the spinlock check as tasks trying to enter the critical region are most likely potential waiters that will be added to the plist, thus preventing tasks sleeping forever if wakers don't acknowledge all possible waiters. Furthermore, the futex ordering guarantees are preserved, ensuring that waiters either observe the changed user space value before blocking or is woken by a concurrent waker. For wakers, this is done by relying on the barriers in get_futex_key_refs() -- for archs that do not have implicit mb in atomic_inc(), we explicitly add them through a new futex_get_mm function. For waiters we rely on the fact that spin_lock calls already update the head counter, so spinners are visible even if the lock hasn't been acquired yet. For more details please refer to the updated comments in the code and related discussion: https://lkml.org/lkml/2013/11/26/556 Special thanks to tglx for careful review and feedback. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Scott Norton <scott.norton@hp.com> Cc: Tom Vaden <tom.vaden@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Jason Low <jason.low2@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1389569486-25487-5-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:45:21 +01:00
Thomas Gleixner	99b60ce697	futexes: Document multiprocessor ordering guarantees That's essential, if you want to hack on futexes. Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Scott Norton <scott.norton@hp.com> Cc: Tom Vaden <tom.vaden@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Jason Low <jason.low2@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1389569486-25487-4-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:45:19 +01:00
Davidlohr Bueso	a52b89ebb6	futexes: Increase hash table size for better performance Currently, the futex global hash table suffers from its fixed, smallish (for today's standards) size of 256 entries, as well as its lack of NUMA awareness. Large systems, using many futexes, can be prone to high amounts of collisions; where these futexes hash to the same bucket and lead to extra contention on the same hb->lock. Furthermore, cacheline bouncing is a reality when we have multiple hb->locks residing on the same cacheline and different futexes hash to adjacent buckets. This patch keeps the current static size of 16 entries for small systems, or otherwise, 256 * ncpus (or larger as we need to round the number to a power of 2). Note that this number of CPUs accounts for all CPUs that can ever be available in the system, taking into consideration things like hotpluging. While we do impose extra overhead at bootup by making the hash table larger, this is a one time thing, and does not shadow the benefits of this patch. Furthermore, as suggested by tglx, by cache aligning the hash buckets we can avoid access across cacheline boundaries and also avoid massive cache line bouncing if multiple cpus are hammering away at different hash buckets which happen to reside in the same cache line. Also, similar to other core kernel components (pid, dcache, tcp), by using alloc_large_system_hash() we benefit from its NUMA awareness and thus the table is distributed among the nodes instead of in a single one. For a custom microbenchmark that pounds on the uaddr hashing -- making the wait path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of futexes, we can see the following benefits on a 80-core, 8-socket 1Tb server: +---------+--------------------+------------------------+-----------------------+-------------------------------+ \| threads \| baseline (ops/sec) \| aligned-only (ops/sec) \| large table (ops/sec) \| large table+aligned (ops/sec) \| +---------+--------------------+------------------------+-----------------------+-------------------------------+ \| 512 \| 32426 \| 50531 (+55.8%) \| 255274 (+687.2%) \| 292553 (+802.2%) \| \| 256 \| 65360 \| 99588 (+52.3%) \| 443563 (+578.6%) \| 508088 (+677.3%) \| \| 128 \| 125635 \| 200075 (+59.2%) \| 742613 (+491.1%) \| 835452 (+564.9%) \| \| 80 \| 193559 \| 323425 (+67.1%) \| 1028147 (+431.1%) \| 1130304 (+483.9%) \| \| 64 \| 247667 \| 443740 (+79.1%) \| 997300 (+302.6%) \| 1145494 (+362.5%) \| \| 32 \| 628412 \| 721401 (+14.7%) \| 965996 (+53.7%) \| 1122115 (+78.5%) \| +---------+--------------------+------------------------+-----------------------+-------------------------------+ Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Waiman Long <Waiman.Long@hp.com> Reviewed-and-tested-by: Jason Low <jason.low2@hp.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Scott Norton <scott.norton@hp.com> Cc: Tom Vaden <tom.vaden@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Link: http://lkml.kernel.org/r/1389569486-25487-3-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:45:18 +01:00
Jason Low	0d00c7b20c	futexes: Clean up various details - Remove unnecessary head variables. - Delete unused parameter in queue_unlock(). Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Scott Norton <scott.norton@hp.com> Cc: Tom Vaden <tom.vaden@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1389569486-25487-2-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:45:17 +01:00
Ingo Molnar	1c62448e39	Linux 3.13-rc8 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJS0miqAAoJEHm+PkMAQRiGbfgIAJSWEfo8ludknhPcHJabBtxu 75SQAKJlL3sBVnxEc58Rtt8gsKYQIrm4IY5Slunklsn04RxuDUIQMgFoAYR5gQwz +Myqkw/HOqDe5VStGxtLYpWnfglxVwGDCd7ISfL9AOVy5adMWBxh4Tv+qqQc7aIZ eF7dy+DD+C6Q3Z5OoV8s0FZDxse29vOf17Nki7+7t8WMqyegYwjoOqNeqocGKsPi eHLrJgTl4T6jB4l9LKKC154DSKjKOTSwZMWgwK8mToyNLT/ufCiKgXloIjEvZZcY VVKUtncdHiTf+iqVojgpGBzOEeB5DM83iiapFeDiJg8C9yBzvT8lBtA9aPb5Wgw= =lEeV -----END PGP SIGNATURE----- Merge tag 'v3.13-rc8' into core/locking Refresh the tree with the latest fixes, before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 11:44:41 +01:00
Rafael J. Wysocki	4ff913373a	Merge branches 'pm-sleep', 'pm-runtime' and 'pm-apm' * pm-sleep: PM / hibernate: Call platform_leave() in suspend path too PM / Sleep: Add macro to define common late/early system PM callbacks PM / hibernate: export hibernation_set_ops * pm-runtime: PM / Runtime: Implement the pm_generic_runtime functions for CONFIG_PM PM / Runtime: Add second macro for definition of runtime PM callbacks * pm-apm: apm-emulation: add hibernation APM events to support suspend2disk	2014-01-12 23:50:03 +01:00
Ingo Molnar	d05d24a984	Merge branch 'fortglx/3.14/time' of git://git.linaro.org/people/john.stultz/linux into timers/core Pull timekeeping updates from John Stultz. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 14:13:31 +01:00
Ingo Molnar	dba861461f	Merge branch 'linus' into timers/core Pick up the latest fixes and refresh the branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 14:12:44 +01:00
Yann Droneaud	a21b0b354d	perf: Introduce a flag to enable close-on-exec in perf_event_open() Unlike recent modern userspace API such as: epoll_create1 (EPOLL_CLOEXEC), eventfd (EFD_CLOEXEC), fanotify_init (FAN_CLOEXEC), inotify_init1 (IN_CLOEXEC), signalfd (SFD_CLOEXEC), timerfd_create (TFD_CLOEXEC), or the venerable general purpose open (O_CLOEXEC), perf_event_open() syscall lack a flag to atomically set FD_CLOEXEC (eg. close-on-exec) flag on file descriptor it returns to userspace. The present patch adds a PERF_FLAG_FD_CLOEXEC flag to allow perf_event_open() syscall to atomically set close-on-exec. Having this flag will enable userspace to remove the file descriptor from the list of file descriptors being inherited across exec, without the need to call fcntl(fd, F_SETFD, FD_CLOEXEC) and the associated race condition between the current thread and another thread calling fork(2) then execve(2). Links: - Secure File Descriptor Handling (Ulrich Drepper, 2008) http://udrepper.livejournal.com/20407.html - Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) http://danwalsh.livejournal.com/53603.html - Notes in DMA buffer sharing: leak and security hole http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/dma-buf-sharing.txt?id=v3.13-rc3#n428 Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/8c03f54e1598b1727c19706f3af03f98685d9fe6.1388952061.git.ydroneaud@opteya.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:16:59 +01:00
Stephane Eranian	f3ae75de98	perf/x86: Fix active_entry initialization This patch fixes a problem with the initialization of the struct perf_event active_entry field. It is defined inside an anonymous union and was initialized in perf_event_alloc() using INIT_LIST_HEAD(). However at that time, we do not know whether the event is going to use active_entry or hlist_entry (SW). Or at last, we don't want to make that determination there. The problem is that hlist and list_head are not initialized the same way. One is okay with NULL (from kzmalloc), the other needs to pointers to point to self. This patch resolves this problem by dropping the union. This will avoid problems later on, if someone starts using active_entry or hlist_entry without verifying that they actually overlap. This also solves the initialization problem. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Cc: maria.n.dimakopoulou@gmail.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389176153-3128-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:16:07 +01:00
John Stultz	7a06c41cbe	sched_clock: Disable seqlock lockdep usage in sched_clock() Unfortunately the seqlock lockdep enablement can't be used in sched_clock(), since the lockdep infrastructure eventually calls into sched_clock(), which causes a deadlock. Thus, this patch changes all generic sched_clock() usage to use the raw_* methods. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Reported-by: Krzysztof Hałasa <khalasa@piap.pl> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 10:14:00 +01:00
Rik van Riel	9722c2dac7	sched: Calculate effective load even if local weight is 0 Thomas Hellstrom bisected a regression where erratic 3D performance is experienced on virtual machines as measured by glxgears. It identified commit `58d081b5` ("sched/numa: Avoid overloading CPUs on a preferred NUMA node") as the problem which had modified the behaviour of effective_load. Effective load calculates the difference to the system-wide load if a scheduling entity was moved to another CPU. The task group is not heavier as a result of the move but overall system load can increase/decrease as a result of the change. Commit `58d081b5` ("sched/numa: Avoid overloading CPUs on a preferred NUMA node") changed effective_load to make it suitable for calculating if a particular NUMA node was compute overloaded. To reduce the cost of the function, it assumed that a current sched entity weight of 0 was uninteresting but that is not the case. wake_affine() uses a weight of 0 for sync wakeups on the grounds that it is assuming the waking task will sleep and not contribute to load in the near future. In this case, we still want to calculate the effective load of the sched entity hierarchy. As effective_load is no longer used by task_numa_compare since commit `fb13c7ee` (sched/numa: Use a system-wide search to find swap/migration candidates), this patch simply restores the historical behaviour. Reported-and-tested-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Rik van Riel <riel@redhat.com> [ Wrote changelog] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140106113912.GC6178@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-12 09:22:15 +01:00
Chuansheng Liu	440a113603	workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to call destroy_work_on_stack() which frees the debug object to pair with INIT_WORK_ONSTACK(). Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-01-11 22:26:33 -05:00
Steven Rostedt (Red Hat)	405e1d8348	ftrace: Synchronize setting function_trace_op with ftrace_trace_function ftrace_trace_function is a variable that holds what function will be called directly by the assembly code (mcount). If just a single function is registered and it handles recursion itself, then the assembly will call that function directly without any helper function. It also passes in the ftrace_op that was registered with the callback. The ftrace_op to send is stored in the function_trace_op variable. The ftrace_trace_function and function_trace_op needs to be coordinated such that the called callback wont be called with the wrong ftrace_op, otherwise bad things can happen if it expected a different op. Luckily, there's no callback that doesn't use the helper functions that requires this. But there soon will be and this needs to be fixed. Use a set_function_trace_op to store the ftrace_op to set the function_trace_op to when it is safe to do so (during the update function within the breakpoint or stop machine calls). Or if dynamic ftrace is not being used (static tracing) then we have to do a bit more synchronization when the ftrace_trace_function is set as that takes affect immediately (as oppose to dynamic ftrace doing it with the modification of the trampoline). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-09 22:00:25 -05:00
Steven Rostedt (Red Hat)	dd97b95438	tracing: Show available event triggers when no trigger is set Currently there's no way to know what triggers exist on a kernel without looking at the source of the kernel or randomly trying out triggers. Instead of creating another file in the debugfs system, simply show what available triggers are there when cat'ing the trigger file when it has no events: [root /sys/kernel/debug/tracing]# cat events/sched/sched_switch/trigger # Available triggers: # traceon traceoff snapshot stacktrace enable_event disable_event This stays consistent with other debugfs files where meta data like this is always proceeded with a '#' at the start of the line so that tools can strip these out. Link: http://lkml.kernel.org/r/20140107103548.0a84536d@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-09 21:20:32 -05:00
Steven Rostedt (Red Hat)	13a1e4aef5	tracing: Consolidate event trigger code The event trigger code that checks for callback triggers before and after recording of an event has lots of flags checks. This code is duplicated throughout the ftrace events, kprobes and system calls. They all do the exact same checks against the event flags. Added helper functions ftrace_trigger_soft_disabled(), event_trigger_unlock_commit() and event_trigger_unlock_commit_regs() that consolidated the code and these are used instead. Link: http://lkml.kernel.org/r/20140106222703.5e7dbba2@gandalf.local.home Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-09 21:20:07 -05:00
Steven Rostedt (Red Hat)	e8dc637152	tracing: Fix counter for traceon/off event triggers The counters for the traceon and traceoff are only suppose to decrement when the trigger enables or disables tracing. It is not suppose to decrement every time the event is hit. Only decrement the counter if the trigger actually did something. Link: http://lkml.kernel.org/r/20140106223124.0e5fd0b4@gandalf.local.home Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-09 21:19:44 -05:00
Tom Zanussi	4bf0566db1	tracing: Remove double-underscore naming in syscall trigger invocations There's no reason to use double-underscores for any variable name in ftrace_syscall_enter()/exit(), since those functions aren't generated and there's no need to avoid namespace collisions as with the event macros, which is where the original invocation code came from. Link: http://lkml.kernel.org/r/0b489c9d1f7ee315cff60fa0e4c2b433ade8ae0d.1389036657.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-06 15:22:07 -05:00
Tom Zanussi	0641d368f2	tracing/kprobes: Add trace event trigger invocations Add code to the kprobe/kretprobe event functions that will invoke any event triggers associated with a probe's ftrace_event_file. The code to do this is very similar to the invocation code already used to invoke the triggers associated with static events and essentially replaces the existing soft-disable checks with a superset that preserves the original behavior but adds the bits needed to support event triggers. Link: http://lkml.kernel.org/r/f2d49f157b608070045fdb26c9564d5a05a5a7d0.1389036657.git.tom.zanussi@linux.intel.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2014-01-06 15:21:43 -05:00
Bjørn Mork	362e77d1cb	PM / hibernate: Call platform_leave() in suspend path too Since create_image() only executes platform_leave() if in_suspend is not set, enable_nonboot_cpus() is run by it with EC transactions blocked (on ACPI systems) in the image creation code path (that is, for in_suspend set), which may cause CPU online to fail for the CPUs in question. In particular, this causes the acpi_cpufreq driver's initialization to fail for those CPUs on some systems with the following dmesg: cpufreq: adding CPU 1 acpi_cpufreq_cpu_init cpufreq: FREQ: 1401000 - CPU: 0 ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20130725/evregion-287) ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff88023249ab28), AE_BAD_PARAMETER (20130725/psparse-536) ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff88023270e3f8), AE_BAD_PARAMETER (20130725/psparse-536) ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff88023270e290), AE_BAD_PARAMETER (20130725/psparse-536) ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20130725/processor_perflib-140) cpufreq: initialization failed CPU1 is up To fix this problem, modify create_image() to execute platform_leave() unconditionally. [rjw: This shouldn't lead to any significant side effects on ACPI systems.] Signed-off-by: Bjørn Mork <bjorn@mork.no> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-01-06 14:08:39 +01:00

1 2 3 4 5 ...

17274 Commits