Commit Graph

212233 Commits

Author SHA1 Message Date
Sage Weil efa4c1206e ceph: do not carry i_lock for readdir from dcache
We were taking dcache_lock inside of i_lock, which introduces a dependency
not found elsewhere in the kernel, complicationg the vfs locking
scalability work.  Since we don't actually need it here anyway, remove
it.

We only need i_lock to test for the I_COMPLETE flag, so be careful to do
so without dcache_lock held.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:27 -07:00
Julia Lawall 61413c2f59 fs/ceph/xattr.c: Use kmemdup
Convert a sequence of kmalloc and memcpy to use kmemdup.

The semantic patch that performs this transformation is:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression a,flag,len;
expression arg,e1,e2;
statement S;
@@

  a =
-  \(kmalloc\|kzalloc\)(len,flag)
+  kmemdup(arg,len,flag)
  <... when != a
  if (a == NULL || ...) S
  ...>
- memcpy(a,arg,len+1);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:26 -07:00
Dan Carpenter 85b5aaa624 rbd: passing wrong variable to bvec_kunmap_irq()
We should be passing "buf" here insead of "bv".  This is tricky because
it's not the same as kmap() and kunmap().  GCC does warn about it if you
compile on i386 with CONFIG_HIGHMEM.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:25 -07:00
Dan Carpenter b8d0638a98 rbd: null vs ERR_PTR
ceph_alloc_page_vector() returns ERR_PTR(-ENOMEM) on errors.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:24 -07:00
Sage Weil 240634e9b3 ceph: fix num_pages_free accounting in pagelist
Decrement the free page counter when removing a page from the free_list.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:23 -07:00
Greg Farnum 571dba52a3 ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl.
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:23 -07:00
Yehuda Sadeh 010e3b48fc ceph: don't crash when passed bad mount options
This only happened when parse_extra_token was not passed
to ceph_parse_option() (hence, only happened in rbd).

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20 15:38:22 -07:00
Randy Dunlap 6f453ed6c0 ceph: fix debugfs warnings
Include "super.h" outside of CONFIG_DEBUG_FS to eliminate a compiler warning:

fs/ceph/debugfs.c:266: warning: 'struct ceph_fs_client' declared inside parameter list
fs/ceph/debugfs.c:266: warning: its scope is only this definition or declaration, which is probably not what you want
fs/ceph/debugfs.c:271: warning: 'struct ceph_fs_client' declared inside parameter list

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20 15:38:21 -07:00
Yehuda Sadeh f4cf3deef4 block: rbd: removing unnecessary test
rbd_get_segment() can't return a negative value, we don't need to check
the return output.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20 15:38:20 -07:00
Vasiliy Kulikov 28f259b7cd block: rbd: fixed may leaks
rbd_client_create() doesn't free rbdc, this leads to many leaks.

seg_len in rbd_do_op() is unsigned, so (seg_len < 0) makes no sense.
Also if fixed check fails then seg_name is leaked.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20 15:38:19 -07:00
Sage Weil 496e59553c ceph: switch from BKL to lock_flocks()
Switch from using the BKL explicitly to the new lock_flocks() interface.
Eventually this will turn into a spinlock.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:18 -07:00
Greg Farnum fca4451acf ceph: preallocate flock state without locks held
When the lock_kernel() turns into lock_flocks() and a spinlock, we won't
be able to do allocations with the lock held.  Preallocate space without
the lock, and retry if the lock state changes out from underneath us.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:17 -07:00
Greg Farnum ac0b74d8a1 ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursor
These facilitate preallocation of pages so that we can encode into the pagelist
in an atomic context.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:16 -07:00
Sage Weil 18a38193ef ceph: use mapping->nrpages to determine if mapping is empty
This is simpler and faster.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:15 -07:00
Sage Weil 93afd449aa ceph: only invalidate on check_caps if we actually have pages
The i_rdcache_gen value only implies we MAY have cached pages; actually
check the mapping to see if it's worth bothering with an invalidate.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:15 -07:00
Sage Weil 4c32f5dda5 ceph: do not hide .snap in root directory
Snaps in the root directory are now supported by the MDS, and harmless on
older versions.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:14 -07:00
Yehuda Sadeh 602adf4002 rbd: introduce rados block device (rbd), based on libceph
The rados block device (rbd), based on osdblk, creates a block device
that is backed by objects stored in the Ceph distributed object storage
cluster.  Each device consists of a single metadata object and data
striped over many data objects.

The rbd driver supports read-only snapshots.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:13 -07:00
Yehuda Sadeh 3d14c5d2b6 ceph: factor out libceph from Ceph file system
This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:37:28 -07:00
Yehuda Sadeh ae1533b62b ceph-rbd: osdc support for osd call and rollback operations
This will be used for rbd snapshots administration.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20 15:37:25 -07:00
Yehuda Sadeh 68b4476b0b ceph: messenger and osdc changes for rbd
Allow the messenger to send/receive data in a bio.  This is added
so that we wouldn't need to copy the data into pages or some other buffer
when doing IO for an rbd block device.

We can now have trailing variable sized data for osd
ops.  Also osd ops encoding is more modular.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:37:18 -07:00
Yehuda Sadeh 3499e8a5d4 ceph: refactor osdc requests creation functions
The osd requests creation are being decoupled from the
vino parameter, allowing clients using the osd to use
other arbitrary object names that are not necessarily
vino based. Also, calc_raw_layout now takes a snap id.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:36:01 -07:00
Yehuda Sadeh 7669a2c95e ceph: lookup pool in osdmap by name
Implement a pool lookup by name.  This will be used by rbd.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:35:36 -07:00
Shaohua Li 9329672021 x86: Spread tlb flush vector between nodes
Currently flush tlb vector allocation is based on below equation:
	sender = smp_processor_id() % 8
This isn't optimal, CPUs from different node can have the same vector, this
causes a lot of lock contention. Instead, we can assign the same vectors to
CPUs from the same node, while different node has different vectors. This has
below advantages:
a. if there is lock contention, the lock contention is between CPUs from one
node. This should be much cheaper than the contention between nodes.
b. completely avoid lock contention between nodes. This especially benefits
kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
to specific node.

In my test, this could reduce > 20% CPU overhead in extreme case.The test
machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
to the first CPU of the node. I run a workload with 4 sequential mmap file
read thread. The files are empty sparse file. This workload will trigger a
lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
extreme tlb flush lock contention because otherwise kswapd keeps migrating
between CPUs of a node and I can't get stable result. Sure in real workload,
we can't always see so big tlb flush lock contention, but it's possible.

[ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 14:44:42 -07:00
Shaohua Li c957ef2c59 percpu: Introduce a read-mostly percpu API
Add a new readmostly percpu section and API.  This can be used to
avoid dirtying data lines which are generally not written to, which is
especially important for data which may be accessed by processors
other than the one for which the percpu area belongs to.

[ hpa: moved it *after* the page-aligned section, for obvious
  reasons. ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544022.4571.7.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 14:33:58 -07:00
Linus Torvalds f6f94e2ab1 Linux 2.6.36 2010-10-20 13:30:22 -07:00
Linus Torvalds 7d7c4d06be Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
  MIPS: O32 compat/N32: Fix to use compat syscall wrappers for AIO syscalls.
  MAINTAINERS: Change list for ioc_serial to linux-serial.
  SERIAL: ioc3_serial: Return -ENOMEM on memory allocation failure
  MIPS: jz4740: Fix Kbuild Platform file.
  MIPS: Repair Kbuild make clean breakage.
2010-10-20 13:18:21 -07:00
Amit Shah 531295e63b virtio: console: Don't block entire guest if host doesn't read data
If the host is slow in reading data or doesn't read data at all,
blocking write calls not only blocked the program that called write()
but the entire guest itself.

To overcome this, let's not block till the host signals it has given
back the virtio ring element we passed it.  Instead, send the buffer to
the host and return to userspace.  This operation then becomes similar
to how non-blocking writes work, so let's use the existing code for this
path as well.

This code change also ensures blocking write calls do get blocked if
there's not enough room in the virtio ring as well as they don't return
-EAGAIN to userspace.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
CC: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-20 13:18:04 -07:00
Linus Torvalds 30c278192f Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] bsg: fix incorrect device_status value
  [SCSI] Fix VPD inquiry page wrapper
2010-10-20 13:13:09 -07:00
Borislav Petkov f01f7c56a1 x86, mm: Fix incorrect data type in vmalloc_sync_all()
arch/x86/mm/fault.c: In function 'vmalloc_sync_all':
arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast

introduced by 617d34d9e5.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20101020103642.GA3135@kryptos.osrc.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 12:54:04 -07:00
Linus Torvalds ef2533dae5 Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Fix fs/gs reload oops with invalid ldt
2010-10-20 09:00:44 -07:00
Robert Richter 27afdf2008 apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
We want the BIOS to setup the EILVT APIC registers. The offsets
were hardcoded and BIOS settings were overwritten by the OS.
Now, the subsystems for MCE threshold and IBS determine the LVT
offset from the registers the BIOS has setup. If the BIOS setup
is buggy on a family 10h system, a workaround enables IBS. If
the OS determines an invalid register setup, a "[Firmware Bug]:
" error message is reported.

We need this change also for upcomming cpu families.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1286360874-1471-3-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-20 04:42:13 +02:00
Robert Richter a68c439b19 apic, x86: Check if EILVT APIC registers are available (AMD only)
This patch implements checks for the availability of LVT entries
(APIC500-530) and reserves it if used. The check becomes
necessary since we want to let the BIOS provide the LVT offsets.
 The offsets should be determined by the subsystems using it
like those for MCE threshold or IBS.  On K8 only offset 0
(APIC500) and MCE interrupts are supported. Beginning with
family 10h at least 4 offsets are available.

Since offsets must be consistent for all cores, we keep track of
the LVT offsets in software and reserve the offset for the same
vector also to be used on other cores. An offset is freed by
setting the entry to APIC_EILVT_MASKED.

If the BIOS is right, there should be no conflicts. Otherwise a
"[Firmware Bug]: ..." error message is generated. However, if
software does not properly determines the offsets, it is not
necessarily a BIOS bug.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1286360874-1471-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-20 04:42:13 +02:00
Ingo Molnar 14d4962dc8 Merge branch 'linus' into irq/core
Merge reason: update to almost-final-.36

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-20 04:38:59 +02:00
Jeremy Fitzhardinge 617d34d9e5 x86, mm: Hold mm->page_table_lock while doing vmalloc_sync
Take mm->page_table_lock while syncing the vmalloc region.  This prevents
a race with the Xen pagetable pin/unpin code, which expects that the
page_table_lock is already held.  If this race occurs, then Xen can see
an inconsistent page type (a page can either be read/write or a pagetable
page, and pin/unpin converts it between them), which will cause either
the pin or the set_p[gm]d to fail; either will crash the kernel.

vmalloc_sync_all() should be called rarely, so this extra use of
page_table_lock should not interfere with its normal users.

The mm pointer is stashed in the pgd page's index field, as that won't
be otherwise used for pgds.

Reported-by: Ian Campbell <ian.cambell@eu.citrix.com>
Originally-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4CB88A4C.1080305@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19 13:57:08 -07:00
Jeremy Fitzhardinge 44235dcde4 x86, mm: Fix bogus whitespace in sync_global_pgds()
Whitespace cleanup only.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19 13:56:03 -07:00
Ingo Molnar 750ed158bf Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-10-19 20:41:38 +02:00
Michel Thebeau e2cc502c3f MIPS: O32 compat/N32: Fix to use compat syscall wrappers for AIO syscalls.
[Ralf: Michel's original patch only fixed N32; I replicated the same fix
for O32.]

Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
Cc: paul.gortmaker@windriver.com
Cc: bruce.ashfield@windriver.com
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:41 +01:00
Ralf Baechle d39e072166 MAINTAINERS: Change list for ioc_serial to linux-serial.
IOC3 is also being used on SGI MIPS systems but this particular driver is
only being used on IA64 systems so linux-mips made no sense as a list.  Pat
also thinks linux-serial@vger.kernel.org is the better list.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:41 +01:00
Julia Lawall 6cc0cc4a35 SERIAL: ioc3_serial: Return -ENOMEM on memory allocation failure
In this code, 0 is returned on memory allocation failure, even though other
failures return -ENOMEM or other similar values.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression ret;
expression x,e1,e2,e3;
@@

ret = 0
... when != ret = e1
*x = \(kmalloc\|kcalloc\|kzalloc\)(...)
... when != ret = e2
if (x == NULL) { ... when != ret = e3
  return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
To: Pat Gefre <pfg@sgi.com>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1704/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:40 +01:00
David Daney 08be7b2bc7 MIPS: jz4740: Fix Kbuild Platform file.
The platform specific files should be included via the platform-y
variable.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Patchwork: https://patchwork.linux-mips.org/patch/1719/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:39 +01:00
David Daney ad4b2b627c MIPS: Repair Kbuild make clean breakage.
When running make clean, Kbuild doesn't process the .config file, so nothing
generates a platform-y variable.  We can get it to descend into the platform
directories by setting $(obj-).

The dec Platform file was unconditionally setting platform-, obliterating
its previous contents and preventing some directories from being cleaned.
This is change to an append operation '+=' to allow cavium-octeon to be
cleaned.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Patchwork: https://patchwork.linux-mips.org/patch/1718/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-10-19 18:32:39 +01:00
Linus Torvalds 51ea8a88aa Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: avivo cursor workaround applies to evergreen as well
2010-10-19 10:10:20 -07:00
Avi Kivity 9581d442b9 KVM: Fix fs/gs reload oops with invalid ldt
kvm reloads the host's fs and gs blindly, however the underlying segment
descriptors may be invalid due to the user modifying the ldt after loading
them.

Fix by using the safe accessors (loadsegment() and load_gs_index()) instead
of home grown unsafe versions.

This is CVE-2010-3698.

KVM-Stable-Tag.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-19 14:21:45 -02:00
Steven Rostedt 7e40798f40 tracing: Fix compile issue for trace_sched_wakeup.c
The function start_func_tracer() was incorrectly added in the
 #ifdef CONFIG_FUNCTION_TRACER condition, but is still used even
when function tracing is not enabled.

The calls to register_ftrace_function() and register_ftrace_graph()
become nops (and their arguments are even ignored), thus there is
no reason to hide start_func_tracer() when function tracing is
not enabled.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-19 10:56:19 -04:00
Heiko Carstens 3f7edb1656 [S390] hardirq: remove pointless header file includes
Remove a couple of pointless header file includes.
Fixes a compile bug caused by header file include dependencies with
"irq: Add tracepoint to softirq_raise" within linux-next.

Reported-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[ cherry-picked from the s390 tree to fix "2bf2160: irq: Add tracepoint to softirq_raise" ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-19 13:34:03 +02:00
Tony Luck 3c4ea5b498 [IA64] Move local_softirq_pending() definition
Ugly #include dependencies. We need to have local_softirq_pending()
defined before it gets used in <linux/interrupt.h>. But <asm/hardirq.h>
provides the definition *after* this #include chain:
  <linux/irq.h>
    <asm/irq.h>
      <asm/hw_irq.h>
        <linux/interrupt.h>

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ cherry-picked from the ia64 tree to fix "2bf2160: irq: Add tracepoint to softirq_raise" ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-19 13:31:11 +02:00
Darren Hart 7ada876a87 futex: Fix errors in nested key ref-counting
futex_wait() is leaking key references due to futex_wait_setup()
acquiring an additional reference via the queue_lock() routine. The
nested key ref-counting has been masking bugs and complicating code
analysis. queue_lock() is only called with a previously ref-counted
key, so remove the additional ref-counting from the queue_(un)lock()
functions.

Also futex_wait_requeue_pi() drops one key reference too many in
unqueue_me_pi(). Remove the key reference handling from
unqueue_me_pi(). This was paired with a queue_lock() in
futex_lock_pi(), so the count remains unchanged.

Document remaining nested key ref-counting sites.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Reported-and-tested-by: Matthieu Fertré<matthieu.fertre@kerlabs.com>
Reported-by: Louis Rilling<louis.rilling@kerlabs.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <4CBB17A8.70401@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2010-10-19 11:41:54 +02:00
Yinghai Lu 9717967c4b x86: ioapic: Call free_irte only if interrupt remapping enabled
On a system that support intr-rempping when booting with "intremap=off"

[  177.895501] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
[  177.913316] IP: [<ffffffff8145fc18>] free_irte+0x47/0xc0
...
[  178.173326] Call Trace:
[  178.173574]  [<ffffffff810515b4>] destroy_irq+0x3a/0x75
[  178.192934]  [<ffffffff81051834>] arch_teardown_msi_irq+0xe/0x10
[  178.193418]  [<ffffffff81458dc3>] arch_teardown_msi_irqs+0x56/0x7f
[  178.213021]  [<ffffffff81458e79>] free_msi_irqs+0x8d/0xeb

Call free_irte only when interrupt remapping is enabled.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CBCB274.7010108@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-19 09:25:33 +02:00
Paul Mackerras 57fa721433 perf, powerpc: Fix power_pmu_event_init to not use event->ctx
Commit c3f00c70 ("perf: Separate find_get_context() from event
initialization") changed the generic perf_event code to call
perf_event_alloc, which calls the arch-specific event_init code,
before looking up the context for the new event.  Unfortunately,
power_pmu_event_init uses event->ctx->task to see whether the
new event is a per-task event or a system-wide event, and thus
crashes since event->ctx is NULL at the point where
power_pmu_event_init gets called.

(The reason it needs to know whether it is a per-task event is
because there are some hardware events on Power systems which
only count when the processor is not idle, and there are some
fixed-function counters which count such events.  For example,
the "run cycles" event counts cycles when the processor is not
idle.  If the user asks to count cycles, we can use "run cycles"
if this is a per-task event, since the processor is running when
the task is running, by definition.  We can't use "run cycles"
if the user asks for "cycles" on a system-wide counter.)

Fortunately the information we need is in the
event->attach_state field, so we just use that instead.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101019055535.GA10398@drongo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-19 09:18:34 +02:00
Ingo Molnar 1fa41266e9 Merge branch 'tip/perf/recordmcount-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-10-19 08:21:10 +02:00