11441 Commits

Author SHA1 Message Date
Linus Torvalds
0586bed3e8 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rtmutex: tester: Remove the remaining BKL leftovers
  lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause
  rtmutex: Simplify PI algorithm and make highest prio task get lock
  rwsem: Remove redundant asmregparm annotation
  rwsem: Move duplicate function prototypes to linux/rwsem.h
  rwsem: Unify the duplicate rwsem_is_locked() inlines
  rwsem: Move duplicate init macros and functions to linux/rwsem.h
  rwsem: Move duplicate struct rwsem declaration to linux/rwsem.h
  x86: Cleanup rwsem_count_t typedef
  rwsem: Cleanup includes
  locking: Remove deprecated lock initializers
  cred: Replace deprecated spinlock initialization
  kthread: Replace deprecated spinlock initialization
  xtensa: Replace deprecated spinlock initialization
  um: Replace deprecated spinlock initialization
  sparc: Replace deprecated spinlock initialization
  mips: Replace deprecated spinlock initialization
  cris: Replace deprecated spinlock initialization
  alpha: Replace deprecated spinlock initialization
  rtmutex-tester: Remove BKL tests
2011-03-15 18:28:30 -07:00
Linus Torvalds
b80cd62b7d Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic()
  futex: Deobfuscate handle_futex_death()
  plist: Add priority list test
  plist: Shrink struct plist_head
  futex,plist: Remove debug lock assignment from plist_node
  futex,plist: Pass the real head of the priority list to plist_del()
  futex: Sanitize futex ops argument types
  futex: Sanitize cmpxchg_futex_value_locked API
  futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic()
  futex: Avoid redudant evaluation of task_pid_vnr()
  futex: Update futex_wait_setup comments about locking
2011-03-15 18:23:52 -07:00
Linus Torvalds
c345f60a5f Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  debugobjects: Add hint for better object identification
2011-03-15 18:23:25 -07:00
Linus Torvalds
422e6c4bc4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits)
  tidy the trailing symlinks traversal up
  Turn resolution of trailing symlinks iterative everywhere
  simplify link_path_walk() tail
  Make trailing symlink resolution in path_lookupat() iterative
  update nd->inode in __do_follow_link() instead of after do_follow_link()
  pull handling of one pathname component into a helper
  fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
  Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
  readlinkat(), fchownat() and fstatat() with empty relative pathnames
  Allow O_PATH for symlinks
  New kind of open files - "location only".
  ext4: Copy fs UUID to superblock
  ext3: Copy fs UUID to superblock.
  vfs: Export file system uuid via /proc/<pid>/mountinfo
  unistd.h: Add new syscalls numbers to asm-generic
  x86: Add new syscalls for x86_64
  x86: Add new syscalls for x86_32
  fs: Remove i_nlink check from file system link callback
  fs: Don't allow to create hardlink for deleted file
  vfs: Add open by file handle support
  ...
2011-03-15 15:48:13 -07:00
James Morris
a002951c97 Merge branch 'next' into for-linus 2011-03-16 09:41:17 +11:00
David S. Miller
30df754ded Merge branch 'irq/numa' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 15:06:35 -07:00
Linus Torvalds
397fae0818 Merge branches 'stable/irq.rework' and 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/irq.rework' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well.
  xen: Use IRQF_FORCE_RESUME
  xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend.
  xen: Fix compile error introduced by "switch to new irq_chip functions"
  xen: Switch to new irq_chip functions
  xen: Remove stale irq_chip.end
  xen: events: do not free legacy IRQs
  xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges.
  xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq
  xen:events: move find_unbound_irq inside CONFIG_PCI_MSI
  xen: handled remapped IRQs when enabling a pcifront PCI device.
  genirq: Add IRQF_FORCE_RESUME

* 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code.
  pci/xen: Cleanup: convert int** to int[]
  pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq
  xen-pcifront: Sanity check the MSI/MSI-X values
  xen-pcifront: don't use flush_scheduled_work()
2011-03-15 10:47:16 -07:00
Aneesh Kumar K.V
becfd1f375 vfs: Add open by file handle support
[AV: duplicate of open() guts removed; file_open_root() used instead]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:44 -04:00
Aneesh Kumar K.V
990d6c2d7a vfs: Add name to file handle conversion support
The syscall also return mount id which can be used
to lookup file system specific information such as uuid
in /proc/<pid>/mountinfo

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:37 -04:00
Rafael J. Wysocki
bea3864fb6 PM / Hibernate: Reduce autotuned default image size
The hibernate image size autotuning mechanism sets the default
image size to 5/2 of the total system RAM, but it is reported
that on some systems device drivers allocate substantial
amounts of memory during suspend and the creation of the image
fails as a result (too little memory is preallocated).

Modify the autotuning mechanism to use 1/3 instead of 2/5 of RAM
as the default image size, which is reported to be sufficient for
the affected systems.

References: https://bugzilla.kernel.org/show_bug.cgi?id=30482
Reported-and-tested-by: Martin Steigerwald <Martin@Lichtvoll.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-03-15 00:45:46 +01:00
Rafael J. Wysocki
40dc166cb5 PM / Core: Introduce struct syscore_ops for core subsystems PM
Some subsystems need to carry out suspend/resume and shutdown
operations with one CPU on-line and interrupts disabled.  The only
way to register such operations is to define a sysdev class and
a sysdev specifically for this purpose which is cumbersome and
inefficient.  Moreover, the arguments taken by sysdev suspend,
resume and shutdown callbacks are practically never necessary.

For this reason, introduce a simpler interface allowing subsystems
to register operations to be executed very late during system suspend
and shutdown and very early during resume in the form of
strcut syscore_ops objects.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-15 00:43:46 +01:00
Thomas Renninger
f9b9e806ae PM QoS: Make pm_qos settings readable
I have a machine where entering deep C-states broke.
pm_qos was a hot candidate, but I couldn't find any way to double
check without the need of recompiling.

While in this case it was a driver bug (ath9k):
https://bugzilla.kernel.org/show_bug.cgi?id=27532

powertop or others may want to read out cpu_dma_latency
restrictions which could be the cause of preventing a machine
entering deeper C-states.

Output with this patch:

# default value of 2000 * USEC_PER_SEC (0x77359400)
cat /dev/network_latency |hexdump
0000000 9400 7735
0000004

# value of 55 us which is the reason for not entering C2
cat /dev/cpu_dma_latency |hexdump
0000000 0037 0000
0000004

There is no reason to hide this info -> make pm_qos files readable.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-03-15 00:43:18 +01:00
Jan Beulich
cf4fb80ca3 PM: Simplify kernel/power/Kconfig
'n' defaults are pretty pointless and actually bogus when used with
prompt-less config options.

The "bool"/"default y" pair with no prompt can be expressed more
compactly using def_bool.

[rjw: Rebased on top of earlier patches modifying this file.]

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-03-15 00:43:17 +01:00
Rafael J. Wysocki
6831c6edc7 PM: Drop pm_flags that is not necessary
The variable pm_flags is used to prevent APM from being enabled
along with ACPI, which would lead to problems.  However, acpi_init()
is always called before apm_init() and after acpi_init() has
returned, it is known whether or not ACPI will be used.  Namely, if
acpi_disabled is not set after acpi_init() has returned, this means
that ACPI is enabled.  Thus, it is sufficient to check acpi_disabled
in apm_init() to prevent APM from being enabled in parallel with
ACPI.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Len Brown <len.brown@intel.com>
2011-03-15 00:43:16 +01:00
Rafael J. Wysocki
88a6f33e4d PM: Clean up PM_TRACE dependencies and drop unnecessary Kconfig option
CONFIG_PM_SLEEP_ADVANCED_DEBUG is not used any more, so drop it
and CONFIG_CAN_PM_TRACE need not depend on EXPERIMENTAL, so remove
that dependency.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-03-15 00:43:15 +01:00
Rafael J. Wysocki
aa33860158 PM: Remove CONFIG_PM_OPS
After redefining CONFIG_PM to depend on (CONFIG_PM_SLEEP ||
CONFIG_PM_RUNTIME) the CONFIG_PM_OPS option is redundant and can be
replaced with CONFIG_PM.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-03-15 00:43:15 +01:00
Rafael J. Wysocki
196ec24322 PM: Reorder power management Kconfig options
Reorder configuration options in kernel/power/Kconfig so that
the options depended on are at the top of the list.

This patch doesn't introduce any functional changes.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-03-15 00:43:15 +01:00
Rafael J. Wysocki
1eb208aea3 PM: Make CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME)
From the users' point of view CONFIG_PM is really only used for
making it possible to set CONFIG_SUSPEND, CONFIG_HIBERNATION,
CONFIG_PM_RUNTIME and (surprisingly enough) CONFIG_XEN_SAVE_RESTORE
(CONFIG_PM_OPP also depends on CONFIG_PM, but quite artificially).
However, both CONFIG_SUSPEND and CONFIG_HIBERNATION require platform
support (independent of CONFIG_PM) and it is not quite obvious that
CONFIG_PM has to be set for CONFIG_XEN_SAVE_RESTORE to be available.
Thus, from the users' point of view, it would be more logical to
automatically select CONFIG_PM if any of the above options depending
on it are set.

Make CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME),
which will cause it to be selected when any of CONFIG_SUSPEND,
CONFIG_HIBERNATION, CONFIG_PM_RUNTIME, CONFIG_XEN_SAVE_RESTORE is
set and will clarify its meaning.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-03-15 00:43:15 +01:00
Rafael J. Wysocki
cd51e61cf4 PM / ACPI: Remove references to pm_flags from bus.c
If direct references to pm_flags are removed from drivers/acpi/bus.c,
CONFIG_ACPI will not need to depend on CONFIG_PM any more.  Make that
happen.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Len Brown <len.brown@intel.com>
2011-03-15 00:43:15 +01:00
Thomas Gleixner
6e0aa9f8a8 futex: Deobfuscate handle_futex_death()
handle_futex_death() uses futex_atomic_cmpxchg_inatomic() without
disabling page faults. That's ok, but totally non obvious.

We don't hold locks so we actually can and want to fault here, because
the get_user() before futex_atomic_cmpxchg_inatomic() does not
guarantee a R/W mapping.

We could just add a big fat comment to explain this, but actually
changing the code so that the functionality is entirely clear is
better.

Use the helper function which disables page faults around the
futex_atomic_cmpxchg_inatomic() and handle a fault with a call to
fault_in_user_writeable() as all other places in the futex code do as
well.

Pointed-out-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Darren Hart <darren@dvhart.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
LKML-Reference: <alpine.LFD.2.00.1103141126590.2787@localhost6.localdomain6>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-14 21:08:47 +01:00
Linus Torvalds
5f40d42094 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: NFSROOT should default to "proto=udp"
  nfs4: remove duplicated #include
  NFSv4: nfs4_state_mark_reclaim_nograce() should be static
  NFSv4: Fix the setlk error handler
  NFSv4.1: Fix the handling of the SEQUENCE status bits
  NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
  NFSv4.1 reclaim complete must wait for completion
  NFSv4: remove duplicate clientid in struct nfs_client
  NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY
  sunrpc: Propagate errors from xs_bind() through xs_create_sock()
  (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid
  nfs: fix compilation warning
  nfs: add kmalloc return value check in decode_and_add_ds
  SUNRPC: Remove resource leak in svc_rdma_send_error()
  nfs: close NFSv4 COMMIT vs. CLOSE race
  SUNRPC: Close a race in __rpc_wait_for_completion_task()
2011-03-14 11:19:50 -07:00
Kay Sievers
9d90c8d9cd printk: do not mangle valid userspace syslog prefixes
printk: do not mangle valid userspace syslog prefixes with /dev/kmsg

Log messages passed to the kernel log by using /dev/kmsg or /dev/ttyprintk
might contain a syslog prefix including the syslog facility value.

This makes printk to recognize these headers properly, extract the real log
level from it to use, and add the prefix as a proper prefix to the
log buffer, instead of wrongly printing it as the log message text.

Before:
  $ echo '<14>text' > /dev/kmsg
  $ dmesg -r
  <4>[135159.594810] <14>text

After:
  $ echo '<14>text' > /dev/kmsg
  $ dmesg -r
  <14>[   50.750654] text

Cc: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-14 08:49:16 -07:00
Al Viro
73d049a40f open-style analog of vfs_path_lookup()
new function: file_open_root(dentry, mnt, name, flags) opens the file
vfs_path_lookup would arrive to.

Note that name can be empty; in that case the usual requirement that
dentry should be a directory is lifted.

open-coded equivalents switched to it, may_open() got down exactly
one caller and became static.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:28 -04:00
Al Viro
c9c6cac0c2 kill path_lookup()
all remaining callers pass LOOKUP_PARENT to it, so
flags argument can die; renamed to kern_path_parent()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:23 -04:00
Al Viro
15a9155fe3 fix race in audit_get_nd()
don't rely on pathname resolution ending up twice at the same point...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:23 -04:00
Torben Hohn
6e6823d17b posix-clocks: Check write permissions in posix syscalls
pc_clock_settime() and pc_clock_adjtime() do not check whether the fd
was opened in write mode, so a clock can be set with a read only fd.

[ tglx: We deliberately do not return -EPERM as we want this to be
  	distingushable from the capability based permission check ]

Signed-off-by: Torben Hohn <torbenh@gmx.de>
LKML-Reference: <1299173174-348-4-git-send-email-torbenh@gmx.de>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
2011-03-12 21:27:07 +01:00
Thomas Gleixner
995612178c Merge branch 'tip/futex/devel' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-rt into core/futexes
futex,plist: Pass the real head of the priority list to plist_del()
 futex,plist: Remove debug lock assignment from plist_node
 plist: Shrink struct plist_head
 plist: Add priority list test
2011-03-12 11:43:32 +01:00
Thomas Gleixner
d209a699a0 genirq: Add chip flag to force mask on suspend
On suspend we disable all interrupts in the core code, but this does
not mask the interrupt line in the default implementation as we use a
lazy disable approach. That means we mark the interrupt disabled, but
leave the hardware unmasked. That's an optimization because we avoid
the hardware access for the common case where no interrupt happens
after we marked it disabled. If an interrupt happens, then the
interrupt flow handler masks the line at the hardware level and marks
it pending.

Suspend makes use of this delayed disable as it "disables" all
interrupts when preparing the suspend transition. Right before the
system goes into hardware suspend state it checks whether one of the
interrupts which is marked as a wakeup interrupt came in after
disabling it.

Most interrupt chips have a separate register which selects the
interrupts which can wake up the system from suspend, so we don't have
to mask any on the non wakeup interrupts.

But now we have to deal with brilliant designed hardware which lacks
such a wakeup configuration facility. For such hardware it's necessary
to mask all non wakeup interrupts before going into suspend in order
to avoid the wakeup from random interrupts.

Rather than working around this in the affected interrupt chip
implementations we can solve this elegant in the core code itself.

Add a flag IRQCHIP_MASK_ON_SUSPEND which can be set by the irq chip
implementation to indicate, that the interrupts which are not selected
as wakeup sources must be masked in the suspend path. Mask them in the
loop which checks the wakeup interrupts pending flag.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
LKML-Reference: <alpine.LFD.2.00.1103112112310.2787@localhost6.localdomain6>
2011-03-12 11:12:58 +01:00
Lai Jiangshan
017f2b239d futex,plist: Remove debug lock assignment from plist_node
The original code uses &plist_node->plist as the fake head of
the priority list for plist_del(), these debug locks in
the fake head are needed for CONFIG_DEBUG_PI_LIST.

But now we always pass the real head to plist_del(), the debug locks
in plist_node will not be used, so we remove these assignments.

Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by:  Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4D10797E.7040803@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-11 15:09:53 -05:00
Lai Jiangshan
2e12978a9f futex,plist: Pass the real head of the priority list to plist_del()
Some plist_del()s in kernel/futex.c are passed a faked head of the
priority list.

It does not fail because the current code does not require the real head
in plist_del(). The current code of plist_del() just uses the head for checking,
so it will not cause a bad result even when we use a faked head.

But it is undocumented usage:

/**
 * plist_del - Remove a @node from plist.
 *
 * @node:	&struct plist_node pointer - entry to be removed
 * @head:	&struct plist_head pointer - list head
 */

The document says that the @head is the "list head" head of the priority list.

In futex code, several places use "plist_del(&q->list, &q->list.plist);",
they pass a fake head. We need to fix them all.

Thanks to Darren Hart for many suggestions.

Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by:  Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4D11984A.5030203@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-11 15:09:52 -05:00
Tao Ma
805f6b5e1c blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
In blk_add_trace_rq, we only chose the minor 2 bits from
request's cmd_flags and did some check for discard.
so most of other flags(e.g, REQ_SYNC) are missing.

For example, with a sync write after blkparse we get:
  8,16   1        1     0.001776503  7509  A  WS 1349632 + 1024 <- (8,17) 1347584
  8,16   1        2     0.001776813  7509  Q  WS 1349632 + 1024 [dd]
  8,16   1        3     0.001780395  7509  G  WS 1349632 + 1024 [dd]
  8,16   1        5     0.001783186  7509  I   W 1349632 + 1024 [dd]
  8,16   1       11     0.001816987  7509  D   W 1349632 + 1024 [dd]
  8,16   0        2     0.006218192     0  C   W 1349632 + 1024 [0]

Since now we have integrated the flags of both bio and request,
it is safe to pass rq->cmd_flags directly to __blk_add_trace.

With this patch, after a sync write we get:
  8,16   1        1     0.001776900  5425  A  WS 1189888 + 1024 <- (8,17) 1187840
  8,16   1        2     0.001777179  5425  Q  WS 1189888 + 1024 [dd]
  8,16   1        3     0.001780797  5425  G  WS 1189888 + 1024 [dd]
  8,16   1        5     0.001783402  5425  I  WS 1189888 + 1024 [dd]
  8,16   1       11     0.001817468  5425  D  WS 1189888 + 1024 [dd]
  8,16   0        2     0.005640709     0  C  WS 1189888 + 1024 [0]

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-11 20:11:59 +01:00
Michel Lespinasse
37a9d912b2 futex: Sanitize cmpxchg_futex_value_locked API
The cmpxchg_futex_value_locked API was funny in that it returned either
the original, user-exposed futex value OR an error code such as -EFAULT.
This was confusing at best, and could be a source of livelocks in places
that retry the cmpxchg_futex_value_locked after trying to fix the issue
by running fault_in_user_writeable().
    
This change makes the cmpxchg_futex_value_locked API more similar to the
get_futex_value_locked one, returning an error code and updating the
original value through a reference argument.
    
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>  [tile]
Acked-by: Tony Luck <tony.luck@intel.com>  [ia64]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu>  [microblaze]
Acked-by: David Howells <dhowells@redhat.com> [frv]
Cc: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110311024851.GC26122@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-11 12:23:08 +01:00
Thomas Gleixner
c0c9ed1504 futex: Avoid redudant evaluation of task_pid_vnr()
The result is not going to change under us, so no need to reevaluate
this over and over. Seems to be a leftover from the mechanical mass
conversion of task->pid to task_pid_vnr(tsk).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-11 12:23:07 +01:00
David S. Miller
33175d84ee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x_cmn.c
2011-03-10 14:26:00 -08:00
Linus Torvalds
bf98f77888 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix sched rt group scheduling when hierachy is enabled
2011-03-10 13:08:59 -08:00
Trond Myklebust
bf294b41ce SUNRPC: Close a race in __rpc_wait_for_completion_task()
Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:52 -05:00
Michel Lespinasse
8fe8f545c6 futex: Update futex_wait_setup comments about locking
Reviving a cleanup I had done about a year ago as part of a larger
futex_set_wait proposal. Over the years, the locking of the hashed
futex queue got improved, so that some of the "rare but normal" race
conditions described in comments can't actually happen anymore.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110307020750.GA31188@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-10 19:56:18 +01:00
Thomas Gleixner
a9e7acfff0 hrtimer: Remove empty hrtimer_init_hres_timer()
Leftover from earlier implementation. All empty, remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-10 19:15:59 +01:00
Steven Rostedt
4a0b1665db tracing: Fix irqoff selftest expanding max buffer
If the kernel command line declares a tracer "ftrace=sometracer" and
that tracer is either not defined or is enabled after irqsoff,
then the irqs off selftest will fail with the following error:

Testing tracer irqsoff:
------------[ cut here ]------------
WARNING: at /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/tra
ce.c:713 update_max_tr_single+0xfa/0x11b()
Hardware name:
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.38-rc8-test #1
Call Trace:
 [<c0441d9d>] ? warn_slowpath_common+0x65/0x7a
 [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
 [<c0441dc1>] ? warn_slowpath_null+0xf/0x13
 [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
 [<c049e454>] ? stop_critical_timing+0x154/0x204
 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
 [<c049e529>] ? time_hardirqs_on+0x25/0x28
 [<c0468bca>] ? trace_hardirqs_on_caller+0x18/0x12f
 [<c0468cec>] ? trace_hardirqs_on+0xb/0xd
 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
 [<c049b6b8>] ? register_tracer+0xf8/0x1a3
 [<c14e93fe>] ? init_irqsoff_tracer+0xd/0x11
 [<c040115e>] ? do_one_initcall+0x71/0x121
 [<c14e93f1>] ? init_irqsoff_tracer+0x0/0x11
 [<c14ce3a9>] ? kernel_init+0x13a/0x1b6
 [<c14ce26f>] ? kernel_init+0x0/0x1b6
 [<c0403842>] ? kernel_thread_helper+0x6/0x10
---[ end trace e93713a9d40cd06c ]---
.. no entries found ..FAILED!

What happens is the "ftrace=..." will expand the ring buffer to its
default size (from its minimum size) but it will not expand the
max ring buffer (the ring buffer to store maximum latencies).
When the irqsoff test runs, it will call the ring buffer swap routine
that checks if the max ring buffer is the same size as the normal
ring buffer, and will fail if it is not. This causes the test to fail.

The solution is to expand the max ring buffer before running the self
test if the max ring buffer is used by that tracer and the normal ring
buffer is expanded. The max ring buffer should be shrunk again after
the test is done to save space.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:34:58 -05:00
Steven Rostedt
9a24470b28 tracing: Align 4 byte ints together in struct tracer
Move elements in struct tracer for better alignment.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:34:54 -05:00
Yuanhan Liu
56355b83e2 tracing: Export trace_set_clr_event()
Trace events belonging to a module only exists when the module is
loaded. Well, we can use trace_set_clr_event funtion to enable some
trace event at the module init routine, so that we will not miss
something while loading then module.

So, Export the trace_set_clr_event function so that module can use it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
LKML-Reference: <1289196312-25323-1-git-send-email-yuanhan.liu@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:34:51 -05:00
Jiri Olsa
31274d72f0 tracing: Explain about unstable clock on resume with ring buffer warning
The "Delta way too big" warning might appear on a system with a
unstable shed clock right after the system is resumed and tracing
was enabled at time of suspend.

Since it's not realy a bug, and the unstable sched clock is working
fast and reliable otherwise, Steven suggested to keep using the
sched clock in any case and just to make note in the warning itself.

v2 changes:
- added #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <20110218145219.GD2604@jolsa.brq.redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:34:47 -05:00
David Sharp
10da37a645 tracing: Adjust conditional expression latency formatting.
Formatting change only to improve code readability. No code changes except to
introduce intermediate variables.

Signed-off-by: David Sharp <dhsharp@google.com>
LKML-Reference: <1291421609-14665-13-git-send-email-dhsharp@google.com>

[ Keep variable declarations and assignment separate ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:34:35 -05:00
David Sharp
140e4f2d1c tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup
Signed-off-by: David Sharp <dhsharp@google.com>
LKML-Reference: <1291421609-14665-6-git-send-email-dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:34:16 -05:00
Steven Rostedt
e6e1e25935 tracing: Remove lock_depth from event entry
The lock_depth field in the event headers was added as a temporary
data point for help in removing the BKL. Now that the BKL is pretty
much been removed, we can remove this field.

This in turn changes the header from 12 bytes to 8 bytes,
removing the 4 byte buffer that gcc would insert if the first field
in the data load was 8 bytes in size.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:31:48 -05:00
Jens Axboe
4c63f5646e Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core
Conflicts:
	block/blk-core.c
	block/blk-flush.c
	drivers/md/raid1.c
	drivers/md/raid10.c
	drivers/md/raid5.c
	fs/nilfs2/btnode.c
	fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:58:35 +01:00
Jens Axboe
721a9602e6 block: kill off REQ_UNPLUG
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:27 +01:00
Jens Axboe
73c1010119 block: initial patch for on-stack per-task plugging
This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.

The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.

The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:45:54 +01:00
Linus Torvalds
78833dd706 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  nd->inode is not set on the second attempt in path_walk()
  unfuck proc_sysctl ->d_compare()
  minimal fix for do_filp_open() race
2011-03-09 13:55:51 -08:00
David Sharp
de29be5e71 ring-buffer: Remove unused #include <linux/trace_irq.h>
Signed-off-by: David Sharp <dhsharp@google.com>
LKML-Reference: <1291421609-14665-3-git-send-email-dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-09 13:52:28 -05:00