linux

Commit Graph

Author	SHA1	Message	Date
Gautham R Shenoy	d221938c04	cpu-hotplug: refcount based cpu hotplug This patch implements a Refcount + Waitqueue based model for cpu-hotplug. Now, a thread which wants to prevent cpu-hotplug, will bump up a global refcount and the thread which wants to perform a cpu-hotplug operation will block till the global refcount goes to zero. The readers, if any, during an ongoing cpu-hotplug operation are blocked until the cpu-hotplug operation is over. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:01 +01:00
Srivatsa Vaddagiri	6b2d770026	sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups The current load balancing scheme isn't good enough for precise group fairness. For example: on a 8-cpu system, I created 3 groups as under: a = 8 tasks (cpu.shares = 1024) b = 4 tasks (cpu.shares = 1024) c = 3 tasks (cpu.shares = 1024) a, b and c are task groups that have equal weight. We would expect each of the groups to receive 33.33% of cpu bandwidth under a fair scheduler. This is what I get with the latest scheduler git tree: Signed-off-by: Ingo Molnar <mingo@elte.hu> -------------------------------------------------------------------------------- Col1 \| Col2 \| Col3 \| Col4 ------\|---------\|-------\|------------------------------------------------------- a \| 277.676 \| 57.8% \| 54.1% 54.1% 54.1% 54.2% 56.7% 62.2% 62.8% 64.5% b \| 116.108 \| 24.2% \| 47.4% 48.1% 48.7% 49.3% c \| 86.326 \| 18.0% \| 47.5% 47.9% 48.5% -------------------------------------------------------------------------------- Explanation of o/p: Col1 -> Group name Col2 -> Cumulative execution time (in seconds) received by all tasks of that group in a 60sec window across 8 cpus Col3 -> CPU bandwidth received by the group in the 60sec window, expressed in percentage. Col3 data is derived as: Col3 = 100 * Col2 / (NR_CPUS * 60) Col4 -> CPU bandwidth received by each individual task of the group. Col4 = 100 * cpu_time_recd_by_task / 60 [I can share the test case that produces a similar o/p if reqd] The deviation from desired group fairness is as below: a = +24.47% b = -9.13% c = -15.33% which is quite high. After the patch below is applied, here are the results: -------------------------------------------------------------------------------- Col1 \| Col2 \| Col3 \| Col4 ------\|---------\|-------\|------------------------------------------------------- a \| 163.112 \| 34.0% \| 33.2% 33.4% 33.5% 33.5% 33.7% 34.4% 34.8% 35.3% b \| 156.220 \| 32.5% \| 63.3% 64.5% 66.1% 66.5% c \| 160.653 \| 33.5% \| 85.8% 90.6% 91.4% -------------------------------------------------------------------------------- Deviation from desired group fairness is as below: a = +0.67% b = -0.83% c = +0.17% which is far better IMO. Most of other runs have yielded a deviation within +-2% at the most, which is good. Why do we see bad (group) fairness with current scheuler? ========================================================= Currently cpu's weight is just the summation of individual task weights. This can yield incorrect results. For ex: consider three groups as below on a 2-cpu system: CPU0 CPU1 --------------------------- A (10) B(5) C(5) --------------------------- Group A has 10 tasks, all on CPU0, Group B and C have 5 tasks each all of which are on CPU1. Each task has the same weight (NICE_0_LOAD = 1024). The current scheme would yield a cpu weight of 10240 (101024) for each cpu and the load balancer will think both CPUs are perfectly balanced and won't move around any tasks. This, however, would yield this bandwidth: A = 50% B = 25% C = 25% which is not the desired result. What's changing in the patch? ============================= - How cpu weights are calculated when CONFIF_FAIR_GROUP_SCHED is defined (see below) - API Change - Two tunables introduced in sysfs (under SCHED_DEBUG) to control the frequency at which the load balance monitor thread runs. The basic change made in this patch is how cpu weight (rq->load.weight) is calculated. Its now calculated as the summation of group weights on a cpu, rather than summation of task weights. Weight exerted by a group on a cpu is dependent on the shares allocated to it and also the number of tasks the group has on that cpu compared to the total number of (runnable) tasks the group has in the system. Let, W(K,i) = Weight of group K on cpu i T(K,i) = Task load present in group K's cfs_rq on cpu i T(K) = Total task load of group K across various cpus S(K) = Shares allocated to group K NRCPUS = Number of online cpus in the scheduler domain to which group K is assigned. Then, W(K,i) = S(K) NRCPUS * T(K,i) / T(K) A load balance monitor thread is created at bootup, which periodically runs and adjusts group's weight on each cpu. To avoid its overhead, two min/max tunables are introduced (under SCHED_DEBUG) to control the rate at which it runs. Fixes from: Peter Zijlstra <a.p.zijlstra@chello.nl> - don't start the load_balance_monitor when there is only a single cpu. - rename the kthread because its currently longer than TASK_COMM_LEN Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:00 +01:00
Srivatsa Vaddagiri	a183561567	sched: introduce a mutex and corresponding API to serialize access to doms_curarray doms_cur[] array represents various scheduling domains which are mutually exclusive. Currently cpusets code can modify this array (by calling partition_sched_domains()) as a result of user modifying sched_load_balance flag for various cpusets. This patch introduces a mutex and corresponding API (only when CONFIG_FAIR_GROUP_SCHED is defined) which allows a reader to safely read the doms_cur[] array w/o worrying abt concurrent modifications to the array. The fair group scheduler code (introduced in next patch of this series) makes use of this mutex to walk thr' doms_cur[] array while rebalancing shares of task groups across cpus. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:00 +01:00
Srivatsa Vaddagiri	58e2d4ca58	sched: group scheduling, change how cpu load is calculated This patch changes how the cpu load exerted by fair_sched_class tasks is calculated. Load exerted by fair_sched_class tasks on a cpu is now a summation of the group weights, rather than summation of task weights. Weight exerted by a group on a cpu is dependent on the shares allocated to it. This version of patch has a minor impact on code size, but should have no runtime/functional impact for !CONFIG_FAIR_GROUP_SCHED. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:00 +01:00
Srivatsa Vaddagiri	ec2c507fe8	sched: group scheduling, minor fixes Minor bug fixes for the group scheduler: - Use a mutex to serialize add/remove of task groups and also when changing shares of a task group. Use the same mutex when printing cfs_rq debugging stats for various task groups. - Use list_for_each_entry_rcu in for_each_leaf_cfs_rq macro (when walking task group list) Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:07:59 +01:00
Srivatsa Vaddagiri	93f992ccc0	sched: group scheduling code cleanup Minor cleanups: - Fix coding style - remove obsolete comment Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:07:59 +01:00
Ingo Molnar	b842271fbb	sched: remove printk_clock() printk_clock() is obsolete - it has been replaced with cpu_clock(). Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:07:59 +01:00
Ingo Molnar	d713f51933	sched: fix CONFIG_PRINT_TIME's reliance on sched_clock() Stefano Brivio reported weird printk timestamp behavior during CPU frequency changes: http://bugzilla.kernel.org/show_bug.cgi?id=9475 fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock() instead. Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:07:58 +01:00
Ingo Molnar	32a7600668	printk: make printk more robust by not allowing recursion make printk more robust by allowing recursion only if there's a crash going on. Also add recursion detection. I've tested it with an artificially injected printk recursion - instead of a lockup or spontaneous reboot or other crash, the output was a well controlled: [ 41.057335] SysRq : <2>BUG: recent printk recursion! [ 41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks also do all this printk-debug logic with irqs disabled. Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Nick Piggin <npiggin@suse.de>	2008-01-25 21:07:58 +01:00
Linus Torvalds	7556afa0e0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6: [AVR32] extint: Set initial irq type to low level [AVR32] extint: change set_irq_type() handling [AVR32] NMI debugging [AVR32] constify function pointer tables [AVR32] ATNGW100: Update defconfig [AVR32] ATSTK1002: Update defconfig [AVR32] Kconfig: Choose daughterboard instead of CPU [AVR32] Add support for ATSTK1003 and ATSTK1004 [AVR32] Clean up external DAC setup code [AVR32] ATSTK1000: Move gpio-leds setup to setup.c [AVR32] Add support for AT32AP7001 and AT32AP7002 [AVR32] Provide more CPU information in /proc/cpuinfo and dmesg [AVR32] Oprofile support [AVR32] Include instrumentation menu Disable VGA text console for AVR32 architecture [AVR32] Enable debugging only when needed ptrace: Call arch_ptrace_attach() when request=PTRACE_TRACEME [AVR32] Remove redundant try_to_freeze() call from do_signal() [AVR32] Drop GFP_COMP for DMA memory allocations	2008-01-25 08:40:02 -08:00
Haavard Skinnemoen	6ea6dd93c9	ptrace: Call arch_ptrace_attach() when request=PTRACE_TRACEME arch_ptrace_attach() is a hook that allows the architecture to do book-keeping after a ptrace attach. This patch adds a call to this hook when handling a PTRACE_TRACEME request as well. Currently only one architecture, m32r, implements this hook. When called, it initializes a number of debug trap slots in the ptraced task's thread struct, and it looks to me like this is the right thing to do after a PTRACE_TRACEME request as well, not only after PTRACE_ATTACH. Please correct me if I'm wrong. I want to use this hook on AVR32 to turn the debugging hardware on when a process is actually being debugged and keep it off otherwise. To be able to do this, I need to intercept PTRACE_TRACEME and PTRACE_ATTACH, as well as PTRACE_DETACH and thread exit. The latter two can be handled by existing hooks. Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>	2008-01-25 08:31:39 +01:00
Kay Sievers	af5ca3f4ec	Driver core: change sysdev classes to use dynamic kobject names All kobjects require a dynamically allocated name now. We no longer need to keep track if the name is statically assigned, we can just unconditionally free() all kobject names on cleanup. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman	78a2d906b4	Kobject: convert remaining kobject_unregister() to kobject_put() There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman	7a6a41615b	Modules: remove unneeded release function Now that kobjects properly clean up their name structures, no matter if they have a release function or not, we can drop this empty module kobject release function too (it was needed prior to this because of the way we handled static kobject names, we based the fact that if a release function was present, then we could safely free the name string, now we are more smart about things and only free names we have previously set.) Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:39 -08:00
Greg Kroah-Hartman	ac3c8141f6	Kobject: convert kernel/module.c to use kobject_init/add_ng() This converts the code to use the new kobject functions, cleaning up the logic in doing so. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:38 -08:00
Kay Sievers	97c146ef07	sysfs: fix /sys/module/*/holders after sysfs logic change Sysfs symlinks now require fully registered kobjects as a target, otherwise the call to create a symlink will fail. Here we register the kobject before we request the symlink in the holders directory. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Tejun Heo <teheo@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:36 -08:00
Greg Kroah-Hartman	c63469a398	Driver core: move the driver specific module code into the driver core The module driver specific code should belong in the driver core, not in the kernel/ directory. So move this code. This is done in preparation for some struct device_driver rework that should be confined to the driver core code only. This also lets us keep from exporting these functions, as no external code should ever be calling it. Thanks to Andrew Morton for the !CONFIG_MODULES fix. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:35 -08:00
Greg Kroah-Hartman	cf15126b3d	Kobject: convert kernel/user.c to use kobject_init/add_ng() This converts the code to use the new kobject functions, cleaning up the logic in doing so. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:31 -08:00
Greg Kroah-Hartman	e43b9192c5	Kobject: convert kernel/params.c to use kobject_init/add_ng() This converts the code to use the new kobject functions, cleaning up the logic in doing so. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:31 -08:00
Greg Kroah-Hartman	d76e15fb20	driver core: make /sys/power a kobject /sys/power should not be a kset, that's overkill. This patch renames it to power_kset and fixes up all usages of it in the tree. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:25 -08:00
Greg Kroah-Hartman	0ff21e4663	kobject: convert kernel_kset to be a kobject kernel_kset does not need to be a kset, but a much simpler kobject now that we have kobj_attributes. We also rename kernel_kset to kernel_kobj to catch all users of this symbol with a build error instead of an easy-to-ignore build warning. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:24 -08:00
Kay Sievers	eb41d9465c	fix struct user_info export's sysfs interaction Clean up the use of ksets and kobjects. Kobjects are instances of objects (like struct user_info), ksets are collections of objects of a similar type (like the uids directory containing the user_info directories). So, use kobjects for the user_info directories, and a kset for the "uids" directory. On object cleanup, the final kobject_put() was missing. Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:18 -08:00
Kay Sievers	386f275f5d	Driver Core: switch all dynamic ksets to kobj_sysfs_ops Switch all dynamically created ksets, that export simple attributes, to kobj_attribute from subsys_attribute. Struct subsys_attribute will be removed. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Mike Halcrow <mhalcrow@us.ibm.com> Cc: Phillip Hellewell <phillip@hellewell.homeip.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:18 -08:00
Greg Kroah-Hartman	039a5dcd2f	kset: convert /sys/power to use kset_create Dynamically create the kset instead of declaring it statically. We also rename power_subsys to power_kset to catch all users of the variable and we properly export it so that people don't have to guess that it really is present in the system. The pseries code is wierd, why is it createing /sys/power if CONFIG_PM is disabled? Oh well, stupid big boxes ignoring config options... Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:16 -08:00
Greg Kroah-Hartman	7405c1e15e	kset: convert /sys/module to use kset_create Dynamically create the kset instead of declaring it statically. We also rename module_subsys to module_kset to catch all users of the variable. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:16 -08:00
Greg Kroah-Hartman	bd35b93d80	kset: convert kernel_subsys to use kset_create Dynamically create the kset instead of declaring it statically. We also rename kernel_subsys to kernel_kset to catch all users of this symbol with a build error instead of an easy-to-ignore build warning. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman	4ff6abff83	kobject: get rid of kobject_add_dir kobject_create_and_add is the same as kobject_add_dir, so drop kobject_add_dir. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman	3514faca19	kobject: remove struct kobj_type from struct kset We don't need a "default" ktype for a kset. We should set this explicitly every time for each kset. This change is needed so that we can make ksets dynamic, and cleans up one of the odd, undocumented assumption that the kset/kobject/ktype model has. This patch is based on a lot of help from Kay Sievers. Nasty bug in the block code was found by Dave Young <hidave.darkstar@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:10 -08:00
Arjan van de Ven	fabe874a48	lockdep: fix kernel crash on module unload Michael Wu noticed in his lkml post at http://marc.info/?l=linux-kernel&m=119396182726091&w=2 that certain wireless drivers ended up having their name in module memory, which would then crash the kernel on module unload. The patch he proposed was a bit clumsy in that it increased the size of a lockdep entry significantly; the patch below tries another approach, it checks, on module teardown, if the name of a class is in module space and then zaps the class. This is very similar to what we already do with keys that are in module space. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-24 08:01:09 -08:00
Linus Torvalds	b2214fca2b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: group scheduler, set uid share fix	2008-01-22 09:18:45 -08:00
Randy Dunlap	00e10776ff	rcu: fix section mismatch rcu_online_cpu() should be __cpuinit instead of __devinit. WARNING: vmlinux.o(.text+0x4b6d5): Section mismatch: reference to .init.text: (between 'rcu_cpu_notify' and 'wakeme_after_rcu') Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-22 09:17:48 -08:00
Ingo Molnar	c61935fd0e	sched: group scheduler, set uid share fix setting cpu share to 1 causes hangs, as reported in: http://bugzilla.kernel.org/show_bug.cgi?id=9779 as the default share is 1024, the values of 0 and 1 can indeed cause problems. Limit it to 2 or higher values. These values can only be set by the root user - but still it makes sense to protect against nonsensical values. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-22 11:24:58 +01:00
Randy Dunlap	48ccf3dac3	timer: fix section mismatch The caller is __cpuinit. Also, this code block and its caller are inside #ifdef CONFIG_HOTPLUG_CPU blocks, so this code should reflect that config symbol's usage. WARNING: vmlinux.o(.text+0x4252f): Section mismatch: reference to .init.text: (between 'timer_cpu_notify' and 'msleep') Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@akpm@linux-foundation.org>	2008-01-21 19:39:41 -08:00
Randy Dunlap	0ec160dd48	hrtimer: fix section mismatch Fix section mismatch in hrtimer.c: WARNING: vmlinux.o(.text+0x50c61): Section mismatch: reference to .init.text: (between 'hrtimer_cpu_notify' and 'down_read_trylock') Noticed by Johannes Berg and confirmed by Sam Ravnborg. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@akpm@linux-foundation.org>	2008-01-21 19:39:41 -08:00
Nigel Cunningham	784680336b	Fix unbalanced helper_lock in kernel/kmod.c call_usermodehelper_exec() has an exit path that can leave the helper_lock() call at the top of the routine unbalanced. The attached patch fixes this issue. Signed-off-by: Nigel Cunningham <nigel@tuxonice.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-17 15:38:59 -08:00
Johannes Berg	eb13ba8738	lockdep: fix workqueue creation API lockdep interaction Dave Young reported warnings from lockdep that the workqueue API can sometimes try to register lockdep classes with the same key but different names. This is not permitted in lockdep. Unfortunately, I was unaware of that restriction when I wrote the code to debug workqueue problems with lockdep and used the workqueue name as the lockdep class name. This can obviously lead to the problem if the workqueue name is dynamic. This patch solves the problem by always using a constant name for the workqueue's lockdep class, namely either the constant name that was passed in or a string consisting of the variable name. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2008-01-16 09:51:58 +01:00
Nick Piggin	5a26db5bd2	lockdep: fix internal double unlock during self-test Lockdep, during self-test (when it was simulating double unlocks) was sometimes unconditionally unlocking a spinlock when it had not been locked. This won't work for ticket locks. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2008-01-16 09:51:58 +01:00
Rusty Russell	cb2a52052c	modules: de-mutex more symbol lookup paths in the module code Kyle McMartin reports sysrq_timer_list_show() can hit the module mutex from hard interrupt context. These paths don't need to though, since we long ago changed all the module list manipulation to occur via stop_machine(). Disabling preemption is enough. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-14 08:52:22 -08:00
Linus Torvalds	417009f64f	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: pnpacpi: print resource shortage message only once PM: ACPI and APM must not be enabled at the same time ACPI: apply quirk_ich6_lpc_acpi to more ICH8 and ICH9 ACPICA: fix acpi_serialize hang regression ACPI : Not register gsi for PCI IDE controller in legacy mode ACPI: Reintroduce run time configurable max_cstate for !CPU_IDLE case ACPI: Make sysfs interface in ACPI power optional. ACPI: EC: Enable boot EC before bus_scan increase PNP_MAX_PORT to 40 from 24	2008-01-13 09:58:22 -08:00
Roland McGrath	84427eaef1	remove task_ppid_nr_ns task_ppid_nr_ns is called in three places. One of these should never have called it. In the other two, using it broke the existing semantics. This was presumably accidental. If the function had not been there, it would have been much more obvious to the eye that those patches were changing the behavior. We don't need this function. In task_state, the pid of the ptracer is not the ppid of the ptracer. In do_task_stat, ppid is the tgid of the real_parent, not its pid. I also moved the call outside of lock_task_sighand, since it doesn't need it. In sys_getppid, ppid is the tgid of the real_parent, not its pid. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-13 09:56:43 -08:00
Len Brown	02d5bccf8e	Pull bugzilla-9194 into release branch	2008-01-11 12:27:13 -05:00
Len Brown	9f9adecd2d	PM: ACPI and APM must not be enabled at the same time ACPI and APM used "pm_active" to guarantee that they would not be simultaneously active. But pm_active was recently moved under CONFIG_PM_LEGACY, so that without CONFIG_PM_LEGACY, pm_active became a NOP -- allowing ACPI and APM to both be simultaneously enabled. This caused unpredictable results, including boot hangs. Further, the code under CONFIG_PM_LEGACY is scheduled for removal. So replace pm_active with pm_flags. pm_flags depends only on CONFIG_PM, which is present for both CONFIG_APM and CONFIG_ACPI. http://bugzilla.kernel.org/show_bug.cgi?id=9194 Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2008-01-11 12:26:47 -05:00
Roland McGrath	fcfd50afb6	show_task: real_parent The show_task function invoked by sysrq-t et al displays the pid and parent's pid of each task. It seems more useful to show the actual process hierarchy here than who is using ptrace on each process. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-09 08:03:58 -08:00
Thomas Gleixner	cdf71a10c7	futex: Prevent stale futex owner when interrupted/timeout Roland Westrelin did a great analysis of a long standing thinko in the return path of futex_lock_pi. While we fixed the lock steal case long ago, which was easy to trigger, we never had a test case which exposed this problem and stupidly never thought about the reverse lock stealing scenario and the return to user space with a stale state. When a blocked tasks returns from rt_mutex_timed_locked without holding the rt_mutex (due to a signal or timeout) and at the same time the task holding the futex is releasing the futex and assigning the ownership of the futex to the returning task, then it might happen that a third task acquires the rt_mutex before the final rt_mutex_trylock() of the returning task happens under the futex hash bucket lock. The returning task returns to user space with ETIMEOUT or EINTR, but the user space futex value is assigned to this task. The task which acquired the rt_mutex fixes the user space futex value right after the hash bucket lock has been released by the returning task, but for a short period of time the user space value is wrong. Detailed description is available at: https://bugzilla.redhat.com/show_bug.cgi?id=400541 The fix for this is the same as we do when the rt_mutex was acquired by a higher priority task via lock stealing from the designated new owner. In that case we already fix the user space value and the internal pi_state up before we return. This mechanism can be used to fixup the above corner case as well. When the returning task, which failed to acquire the rt_mutex, notices that it is the designated owner of the futex, then it fixes up the stale user space value and the pi_state, before returning to user space. This happens with the futex hash bucket lock held, so the task which acquired the rt_mutex is guaranteed to be blocked on the hash bucket lock. We can access the rt_mutex owner, which gives us the pid of the new owner, safely here as the owner is not able to modify (release) it while waiting on the hash bucket lock. Rename the "curr" argument of fixup_pi_state_owner() to "newowner" to avoid confusion with current and add the check for the stale state into the failure path of rt_mutex_trylock() in the return path of unlock_futex_pi(). If the situation is detected use fixup_pi_state_owner() to assign everything to the owner of the rt_mutex. Pointed-out-and-tested-by: Roland Westrelin <roland.westrelin@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-08 16:21:39 -08:00
Ken'ichi Ohmichi	83a08e7c6e	vmcoreinfo: add the array length of "free_list" for filtering free pages This patch adds the array length of "free_area.free_list" to the vmcoreinfo data so that makedumpfile (dump filtering command) can exclude all free pages in linux-2.6.24. makedumpfile creates a small dumpfile by excluding unnecessary pages for the analysis. To distinguish unnecessary pages, makedumpfile gets the vmcoreinfo data which has the minimum debugging information only for dump filtering. In 2.6.24-rc1 or later, the free_area.free_list is an array which has one list for each migrate types instead of a single list. makedumpfile needs the array length of "free_area.free_list" and the vmcoreinfo data should contain it. Signed-off-by: Huang Ying <ying.huang@intel.com> Tested-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Acked-by: Simon Horman <horms@verge.net.au> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-08 16:10:36 -08:00
Roland McGrath	b59f8197c5	acct: real_parent ppid The ac_ppid field reported in process accounting records should match what getppid() would have returned to that process, regardless of whether a debugger is attached. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-07 14:55:37 -08:00
Linus Torvalds	39cd72de49	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: fix gcc warnings	2008-01-03 12:01:00 -08:00
Linus Torvalds	b8c9a18712	Fix kernel/ptrace.c compile problem (missing "may_attach()") The previous commit missed one use of "may_attach()" that had been renamed to __ptrace_may_attach(). Tssk, tssk, Al. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-02 13:48:27 -08:00
Al Viro	831830b5a2	restrict reading from /proc/<pid>/maps to those who share ->mm or can ptrace pid Contents of /proc/*/maps is sensitive and may become sensitive after open() (e.g. if target originally shares our ->mm and later does exec on suid-root binary). Check at read() (actually, ->start() of iterator) time that mm_struct we'd grabbed and locked is - still the ->mm of target - equal to reader's ->mm or the target is ptracable by reader. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-02 13:13:27 -08:00
Ingo Molnar	90b2628f1f	sched: fix gcc warnings Meelis Roos reported these warnings on sparc64: CC kernel/sched.o In file included from kernel/sched.c:879: kernel/sched_debug.c: In function 'nsec_high': kernel/sched_debug.c:38: warning: comparison of distinct pointer types lacks a cast the debug check in do_div() is over-eager here, because the long long is always positive in these places. Mark this by casting them to unsigned long long. no change in code output: text data bss dec hex filename 51471 6582 376 58429 e43d sched.o.before 51471 6582 376 58429 e43d sched.o.after md5: 7f7729c111f185bf3ccea4d542abc049 sched.o.before.asm 7f7729c111f185bf3ccea4d542abc049 sched.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-12-30 17:24:35 +01:00

1 2 3 4 5 ...

3091 Commits