Commit Graph

211385 Commits

Author SHA1 Message Date
Andi Kleen 4e1c19750a Clean up __page_set_anon_rmap
Linus asked for a cleanup of __page_set_anon_rmap to make
it look more like the cleaner huge pages version.

Factor out the duplicated PageAnon check into a single check
at the beginning of the function.

Remove obsolete comments and rewrite them into standard English.

No functional changes.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:46 +02:00
Naoya Horiguchi 6a90181c7b HWPOISON, hugetlb: fix unpoison for hugepage
Currently unpoisoning hugepages doesn't work correctly because
clearing PG_HWPoison is done outside if (TestClearPageHWPoison).
This patch fixes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi d950b95882 HWPOISON, hugetlb: soft offlining for hugepage
This patch extends soft offlining framework to support hugepage.
When memory corrected errors occur repeatedly on a hugepage,
we can choose to stop using it by migrating data onto another hugepage
and disabling the original (maybe half-broken) one.

ChangeLog since v4:
- branch soft_offline_page() for hugepage

ChangeLog since v3:
- remove comment about "ToDo: hugepage soft-offline"

ChangeLog since v2:
- move refcount handling into isolate_lru_page()

ChangeLog since v1:
- add double check in isolating hwpoisoned hugepage
- define free/non-free checker for hugepage
- postpone calling put_page() for hugepage in soft_offline_page()

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi 8c6c2ecb44 HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED
Currently error recovery for free hugepage works only for MF_COUNT_INCREASED.
This patch enables !MF_COUNT_INCREASED case.

Free hugepages can be handled directly by alloc_huge_page() and
dequeue_hwpoisoned_huge_page(), and both of them are protected
by hugetlb_lock, so there is no race between them.

Note that this patch defines the refcount of HWPoisoned hugepage
dequeued from freelist is 1, deviated from present 0, thereby we
can avoid race between unpoison and memory failure on free hugepage.
This is reasonable because unlikely to free buddy pages, free hugepage
is governed by hugetlbfs even after error handling finishes.
And it also makes unpoison code added in the later patch cleaner.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi a9869b837c hugetlb: move refcounting in hugepage allocation inside hugetlb_lock
Currently alloc_huge_page() raises page refcount outside hugetlb_lock.
but it causes race when dequeue_hwpoison_huge_page() runs concurrently
with alloc_huge_page().
To avoid it, this patch moves set_page_refcounted() in hugetlb_lock.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi 6de2b1aab9 HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page()
This check is necessary to avoid race between dequeue and allocation,
which can cause a free hugepage to be dequeued twice and get kernel unstable.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi 290408d4a2 hugetlb: hugepage migration core
This patch extends page migration code to support hugepage migration.
One of the potential users of this feature is soft offlining which
is triggered by memory corrected errors (added by the next patch.)

Todo:
- there are other users of page migration such as memory policy,
  memory hotplug and memocy compaction.
  They are not ready for hugepage support for now.

ChangeLog since v4:
- define migrate_huge_pages()
- remove changes on isolation/putback_lru_page()

ChangeLog since v2:
- refactor isolate/putback_lru_page() to handle hugepage
- add comment about race on unmap_and_move_huge_page()

ChangeLog since v1:
- divide migration code path for hugepage
- define routine checking migration swap entry for hugetlb
- replace "goto" with "if/else" in remove_migration_pte()

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi 0ebabb416f hugetlb: redefine hugepage copy functions
This patch modifies hugepage copy functions to have only destination
and source hugepages as arguments for later use.
The old ones are renamed from copy_{gigantic,huge}_page() to
copy_user_{gigantic,huge}_page().
This naming convention is consistent with that between copy_highpage()
and copy_user_highpage().

ChangeLog since v4:
- add blank line between local declaration and code
- remove unnecessary might_sleep()

ChangeLog since v2:
- change copy_huge_page() from macro to inline dummy function
  to avoid compile warning when !CONFIG_HUGETLB_PAGE.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:44 +02:00
Naoya Horiguchi bf50bab2b3 hugetlb: add allocate function for hugepage migration
We can't use existing hugepage allocation functions to allocate hugepage
for page migration, because page migration can happen asynchronously with
the running processes and page migration users should call the allocation
function with physical addresses (not virtual addresses) as arguments.

ChangeLog since v3:
- unify alloc_buddy_huge_page() and alloc_buddy_huge_page_node()

ChangeLog since v2:
- remove unnecessary get/put_mems_allowed() (thanks to David Rientjes)

ChangeLog since v1:
- add comment on top of alloc_huge_page_no_vma()

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:44 +02:00
Naoya Horiguchi 998b4382c1 hugetlb: fix metadata corruption in hugetlb_fault()
Since the PageHWPoison() check is for avoiding hwpoisoned page remained
in pagecache mapping to the process, it should be done in "found in pagecache"
branch, not in the common path.
Otherwise, metadata corruption occurs if memory failure happens between
alloc_huge_page() and lock_page() because page fault fails with metadata
changes remained (such as refcount, mapcount, etc.)

This patch moves the check to "found in pagecache" branch and fix the problem.

ChangeLog since v2:
- remove retry check in "new allocation" path.
- make description more detailed
- change patch name from "HWPOISON, hugetlb: move PG_HWPoison bit check"

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:44 +02:00
Linus Torvalds 6b0cd00bc3 Merge branch 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
  HWPOISON: Stop shrinking at right page count
  HWPOISON: Report correct address granuality for AO huge page errors
  HWPOISON: Copy si_addr_lsb to user
  page-types.c: fix name of unpoison interface
2010-10-07 13:59:32 -07:00
Linus Torvalds c8d86d8ac4 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  elevator: fix oops on early call to elevator_change()
2010-10-07 13:54:56 -07:00
Linus Torvalds 74de82ed1e Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: check return code of read_sb_page
  md/raid1: minor bio initialisation improvements.
  md/raid1:  avoid overflow in raid1 resync when bitmap is in use.
2010-10-07 13:50:48 -07:00
Linus Torvalds dda9cd9fb3 Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm: don't drop handle reference on unload
  drm/ttm: Fix two race conditions + fix busy codepaths
2010-10-07 13:47:20 -07:00
Linus Torvalds afe147466c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: wacom - fix runtime PM related deadlock
  Input: joydev - fix JSIOCSAXMAP ioctl
  Input: uinput - setup MT usage during device creation
2010-10-07 13:46:33 -07:00
Linus Torvalds 5710c2b275 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: properly account for reclaimed inodes
2010-10-07 13:45:26 -07:00
Linus Torvalds a4099ae79d Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (37 commits)
  V4L/DVB: v4l: radio: si470x: fix unneeded free_irq() call
  V4L/DVB: v4l: videobuf: prevent passing a NULL to dma_free_coherent()
  V4L/DVB: ir-core: Fix null dereferences in the protocols sysfs interface
  V4L/DVB: v4l: s5p-fimc: Fix 3-planar formats handling and pixel offset error on S5PV210 SoCs
  V4L/DVB: v4l: s5p-fimc: Fix return value on probe() failure
  V4L/DVB: uvcvideo: Restrict frame rates for Chicony CNF7129 webcam
  V4L/DVB: uvcvideo: Fix support for Medion Akoya All-in-one PC integrated webcam
  V4L/DVB: ivtvfb: prevent reading uninitialized stack memory
  V4L/DVB: cx25840: Fix typo in volume control initialization: 65335 vs. 65535
  V4L/DVB: v4l: mem2mem_testdev: add missing release for video_device
  V4L/DVB: v4l: mem2mem_testdev: fix errorenous comparison
  V4L/DVB: mt9v022.c: Fixed compilation warning
  V4L/DVB: mt9m111: added current colorspace at g_fmt
  V4L/DVB: mt9m111: cropcap and s_crop check if type is VIDEO_CAPTURE
  V4L/DVB: mx2_camera: fix a race causing NULL dereference
  V4L/DVB: tm6000: bugfix data handling
  V4L/DVB: gspca - sn9c20x: Bad transfer size of Bayer images
  V4L/DVB: videobuf-dma-sg: set correct size in last sg element
  V4L/DVB: cx231xx: Avoid an OOPS when card is unknown (card=0)
  V4L/DVB: dvb: fix smscore_getbuffer() logic
  ...
2010-10-07 13:45:00 -07:00
Linus Torvalds 5672bc8181 Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  of/i2c: Fix module load order issue caused by of_i2c.c
  i2c: Fix checks which cause legacy suspend to never get called
  i2c-pca: Fix waitforcompletion() return value
  i2c: Fix for suspend/resume issue
  i2c: Remove obsolete cleanup for clientdata
2010-10-07 13:44:30 -07:00
Eric Dumazet 27b3d80a7b sysctl: fix min/max handling in __do_proc_doulongvec_minmax()
When proc_doulongvec_minmax() is used with an array of longs, and no
min/max check requested (.extra1 or .extra2 being NULL), we dereference a
NULL pointer for the second element of the array.

Noticed while doing some changes in network stack for the "16TB problem"

Fix is to not change min & max pointers in __do_proc_doulongvec_minmax(),
so that all elements of the vector share an unique min/max limit, like
proc_dointvec_minmax().

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-07 13:31:21 -07:00
Kyungmin Park aaac7d9e0a MAINTAINERS: add Samsung S5P series FIMC maintainers
Add Samsung S5P series FIMC(Camera Interface) maintainers.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-07 13:31:21 -07:00
Andrew Morton 880b0e2649 MAINTAINERS: Haavard has moved
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-07 13:31:21 -07:00
Kirill A. Shutemov ad4ca5f4b7 memcg: fix thresholds with use_hierarchy == 1
We need to check parent's thresholds if parent has use_hierarchy == 1 to
be sure that parent's threshold events will be triggered even if parent
itself is not active (no MEM_CGROUP_EVENTS).

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-07 13:31:21 -07:00
Robin Holt f241e6607b mm: alloc_large_system_hash() printk overflow on 16TB boot
During boot of a 16TB system, the following is printed:
Dentry cache hash table entries: -2147483648 (order: 22, 17179869184 bytes)

Signed-off-by: Robin Holt <holt@sgi.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-07 13:31:21 -07:00
Andi Kleen 47f43e7efa HWPOISON: Stop shrinking at right page count
When we call the slab shrinker to free a page we need to stop at
page count one because the caller always holds a single reference, not zero.

This avoids useless looping over slab shrinkers and freeing too much
memory.

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-07 09:47:10 +02:00
Andi Kleen 0d9ee6a2d4 HWPOISON: Report correct address granuality for AO huge page errors
The SIGBUS user space signalling is supposed to report the
address granuality of a corruption. Pass this information correctly
for huge pages by querying the hpage order.

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-07 09:45:44 +02:00
Andi Kleen a337fdac7a HWPOISON: Copy si_addr_lsb to user
The original hwpoison code added a new siginfo field si_addr_lsb to
pass the granuality of the fault address to user space. Unfortunately
this field was never copied to user space. Fix this here.

I added explicit checks for the MCEERR codes to avoid having
to patch all potential callers to initialize the field.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-07 09:41:25 +02:00
Naoya Horiguchi 6715981312 page-types.c: fix name of unpoison interface
The page-types utility still uses an out of date name for the
unpoison interface: debugfs:hwpoison/renew-pfn
This patch renames and fixes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-07 09:41:24 +02:00
Jens Axboe 430c62fb29 elevator: fix oops on early call to elevator_change()
2.6.36 introduces an API for drivers to switch the IO scheduler
instead of manually calling the elevator exit and init functions.
This API was added since q->elevator must be cleared in between
those two calls. And since we already have this functionality
directly from use by the sysfs interface to switch schedulers
online, it was prudent to reuse it internally too.

But this API needs the queue to be in a fully initialized state
before it is called, or it will attempt to unregister elevator
kobjects before they have been added. This results in an oops
like this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000051
IP: [<ffffffff8116f15e>] sysfs_create_dir+0x2e/0xc0
PGD 47ddfc067 PUD 47c6a1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/irq
CPU 2
Modules linked in: t(+) loop hid_apple usbhid ahci ehci_hcd uhci_hcd libahci usbcore nls_base igb

Pid: 7319, comm: modprobe Not tainted 2.6.36-rc6+ #132 QSSC-S4R/QSSC-S4R
RIP: 0010:[<ffffffff8116f15e>]  [<ffffffff8116f15e>] sysfs_create_dir+0x2e/0xc0
RSP: 0018:ffff88027da25d08  EFLAGS: 00010246
RAX: ffff88047c68c528 RBX: 00000000fffffffe RCX: 0000000000000000
RDX: 000000000000002f RSI: 000000000000002f RDI: ffff88047e196c88
RBP: ffff88027da25d38 R08: 0000000000000000 R09: d84156c5635688c0
R10: d84156c5635688c0 R11: 0000000000000000 R12: ffff88047e196c88
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047c68c528
FS:  00007fcb0b26f6e0(0000) GS:ffff880287400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000051 CR3: 000000047e76e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 7319, threadinfo ffff88027da24000, task ffff88027d377090)
Stack:
 ffff88027da25d58 ffff88047c68c528 00000000fffffffe ffff88047e196c88
<0> ffff88047c68c528 ffff88047e05bd90 ffff88027da25d78 ffffffff8123fb77
<0> ffff88047e05bd90 0000000000000000 ffff88047e196c88 ffff88047c68c528
Call Trace:
 [<ffffffff8123fb77>] kobject_add_internal+0xe7/0x1f0
 [<ffffffff8123fd98>] kobject_add_varg+0x38/0x60
 [<ffffffff8123feb9>] kobject_add+0x69/0x90
 [<ffffffff8116efe0>] ? sysfs_remove_dir+0x20/0xa0
 [<ffffffff8103d48d>] ? sub_preempt_count+0x9d/0xe0
 [<ffffffff8143de20>] ? _raw_spin_unlock+0x30/0x50
 [<ffffffff8116efe0>] ? sysfs_remove_dir+0x20/0xa0
 [<ffffffff8116eff4>] ? sysfs_remove_dir+0x34/0xa0
 [<ffffffff81224204>] elv_register_queue+0x34/0xa0
 [<ffffffff81224aad>] elevator_change+0xfd/0x250
 [<ffffffffa007e000>] ? t_init+0x0/0x361 [t]
 [<ffffffffa007e000>] ? t_init+0x0/0x361 [t]
 [<ffffffffa007e0a8>] t_init+0xa8/0x361 [t]
 [<ffffffff810001de>] do_one_initcall+0x3e/0x170
 [<ffffffff8108c3fd>] sys_init_module+0xbd/0x220
 [<ffffffff81002f2b>] system_call_fastpath+0x16/0x1b
Code: e5 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 85 ff 74 52 48 8b 47 18 49 c7 c5 00 46 61 81 48 85 c0 74 04 4c 8b 68 30 45 31 f6 <41> 80 7d 51 00 74 0e 49 8b 44 24 28 4c 89 e7 ff 50 20 49 89 c6
RIP  [<ffffffff8116f15e>] sysfs_create_dir+0x2e/0xc0
 RSP <ffff88027da25d08>
CR2: 0000000000000051
---[ end trace a6541d3bf07945df ]---

Fix this by adding a registered bit to the elevator queue, which is
set when the sysfs kobjects have been registered.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-07 09:35:16 +02:00
Dave Airlie dab8dcfa3c drm: don't drop handle reference on unload
since the handle references are all tied to a file_priv, and when it disappears
all the handle refs go with it.

The fbcon ones we'd only notice on unload, but the nouveau notifier one
would would happen on reboot.

nouveau: Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
nouveau: Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
i915 unload: Reported-by: Keith Packard <keithp@keithp.com>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-07 14:01:17 +10:00
Johannes Weiner 081003fff4 xfs: properly account for reclaimed inodes
When marking an inode reclaimable, a per-AG counter is increased, the
inode is tagged reclaimable in its per-AG tree, and, when this is the
first reclaimable inode in the AG, the AG entry in the per-mount tree
is also tagged.

When an inode is finally reclaimed, however, it is only deleted from
the per-AG tree.  Neither the counter is decreased, nor is the parent
tree's AG entry untagged properly.

Since the tags in the per-mount tree are not cleared, the inode
shrinker iterates over all AGs that have had reclaimable inodes at one
point in time.

The counters on the other hand signal an increasing amount of slab
objects to reclaim.  Since "70e60ce xfs: convert inode shrinker to
per-filesystem context" this is not a real issue anymore because the
shrinker bails out after one iteration.

But the problem was observable on a machine running v2.6.34, where the
reclaimable work increased and each process going into direct reclaim
eventually got stuck on the xfs inode shrinking path, trying to scan
several million objects.

Fix this by properly unwinding the reclaimable-state tracking of an
inode when it is reclaimed.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@kernel.org
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-06 22:35:48 -05:00
Vasiliy Kulikov 5c04f5512f md: check return code of read_sb_page
Function read_sb_page may return ERR_PTR(...). Check for it.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-07 12:02:50 +11:00
NeilBrown db8d9d3591 md/raid1: minor bio initialisation improvements.
When performing a resync we pre-allocate some bios and repeatedly use
them.  This requires us to re-initialise them each time.
One field (bi_comp_cpu) and some flags weren't being initiaised
reliably.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-07 12:00:50 +11:00
NeilBrown 7571ae887d md/raid1: avoid overflow in raid1 resync when bitmap is in use.
bitmap_start_sync returns - via a pass-by-reference variable - the
number of sectors before we need to check with the bitmap again.
Since commit ef42567335 this number can be substantially larger,
2^27 is a common value.

Unfortunately it is an 'int' and so when raid1.c:sync_request shifts
it 9 places to the left it becomes 0.  This results in a zero-length
read which the scsi layer justifiably complains about.

This patch just removes the shift so the common case becomes safe with
a trivially-correct patch.

In the next merge window we will convert this 'int' to a 'sector_t'

Reported-by: "George Spelvin" <linux@horizon.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-10-07 11:54:46 +11:00
Linus Torvalds cb655d0f3d Linux 2.6.36-rc7 2010-10-06 13:39:52 -07:00
Linus Torvalds 81c20b96e5 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Octeon: Place cnmips_cu2_setup in __init memory.
  MIPS: Don't place cu2 notifiers in __cpuinitdata
  MIPS: Calculate VMLINUZ_LOAD_ADDRESS based on the length of vmlinux.bin
  MIPS: Alchemy: Resolve prom section mismatches
  MIPS: Fix syscall 64 bit number comments.
  MIPS: Hookup fanotify_init, fanotify_mark, and prlimit64 syscalls.
  MIPS: TX49xx: Rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN
  MIPS: N32: Fix getdents64 syscall for n32
  MIPS: Remove pr_<level> uses of KERN_<level>
  MIPS: PNX8550: Sort out machine halt, restart and powerdown functions.
  MIPS: GIC: Remove dependencies from Malta files.
  MIPS: Kconfig: Fix and clarify kconfig help text for VSMP and SMTC.
  MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.
  MIPS: Audit: Fix hang in entry.S.
  MIPS: Document why RELOC_HIDE is there.
  MIPS: Octeon: Determine if helper needs to be built
  MIPS: Use generic atomic64 for 32-bit kernels
  MIPS: RM7000: Symbol should be static
  MIPS: kspd: Adjust confusing if indentation
  MIPS: Fix a typo.
2010-10-06 13:27:19 -07:00
Linus Torvalds 089eed29b4 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  writeback: always use sb->s_bdi for writeback purposes
2010-10-06 11:11:18 -07:00
Linus Torvalds 34984f54b7 Merge branch 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: do not initialize PV timers on HVM if !xen_have_vector_callback
  xen: do not set xenstored_ready before xenbus_probe on hvm
2010-10-06 09:51:28 -07:00
Linus Torvalds 8fe9793af0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: Initialize total_len in fuse_retrieve()
2010-10-06 09:50:41 -07:00
Stephen Rothwell 7c6d45e665 powerpc: remove unused variable
Since powerpc uses -Werror on arch powerpc, the build was broken like
this:

  cc1: warnings being treated as errors
  arch/powerpc/kernel/module.c: In function 'module_finalize':
  arch/powerpc/kernel/module.c:66: error: unused variable 'err'

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-05 17:27:54 -07:00
Thomas Hellstrom 1df6a2ebd7 drm/ttm: Fix two race conditions + fix busy codepaths
This fixes a race pointed out by Dave Airlie where we don't take a buffer
object about to be destroyed off the LRU lists properly. It also fixes a rare
case where a buffer object could be destroyed in the middle of an
accelerated eviction.

The patch also adds a utility function that can be used to prematurely
release GPU memory space usage of an object waiting to be destroyed.
For example during eviction or swapout.

The above mentioned commit didn't queue the buffer on the delayed destroy
list under some rare circumstances. It also didn't completely honor the
remove_all parameter.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=615505
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591061

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-06 09:04:43 +10:00
Linus Torvalds e1d9694cae Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: rcu_read_lock_bh_held(): disabling irqs also disables bh
  generic-ipi: Fix deadlock in __smp_call_function_single
2010-10-05 13:07:43 -07:00
Linus Torvalds 39c12be86a Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf trace scripting: Fix extern struct definitions
  perf ui hist browser: Fix segfault on 'a' for annotate
  perf tools: Fix build breakage
  perf, x86: Handle in flight NMIs on P4 platform
  oprofile, ARM: Release resources on failure
  oprofile: Add Support for Intel CPU Family 6 / Model 29
2010-10-05 11:57:37 -07:00
Evgeny Kuznetsov 231d0aefd8 wait: using uninitialized member of wait queue
The "flags" member of "struct wait_queue_t" is used in several places in
the kernel code without beeing initialized by init_wait().  "flags" is
used in bitwise operations.

If "flags" not initialized then unexpected behaviour may take place.
Incorrect flags might used later in code.

Added initialization of "wait_queue_t.flags" with zero value into
"init_wait".

Signed-off-by: Evgeny Kuznetsov <EXT-Eugeny.Kuznetsov@nokia.com>
[ The bit we care about does end up being initialized by both
   prepare_to_wait() and add_to_wait_queue(), so this doesn't seem to
   cause actual bugs, but is definitely the right thing to do -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-05 11:47:18 -07:00
Linus Torvalds 5336377d62 modules: Fix module_bug_list list corruption race
With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.

However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling.  That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.

Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.

So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.

Future fixups:
 - move the module list handling code into kernel/module.c where it
   belongs.
 - get rid of 'module_bug_list' and just use the regular list of modules
   (called 'modules' - imagine that) that we already create and maintain
   for other reasons.

Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-05 11:29:27 -07:00
Stefano Stabellini 31e7e931cd xen: do not initialize PV timers on HVM if !xen_have_vector_callback
if !xen_have_vector_callback do not initialize PV timer unconditionally
because we still don't know how many cpus are available and if there is
more than one we won't be able to receive the timer interrupts on
cpu > 0.

This patch fixes an hang at boot when Xen does not support vector
callbacks and the guest has multiple vcpus.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
2010-10-05 13:39:23 +01:00
Stefano Stabellini a947f0f8f7 xen: do not set xenstored_ready before xenbus_probe on hvm
Register_xenstore_notifier should guarantee that the caller gets
notified even if xenstore is already up.
Therefore we revert "do not notify callers from
register_xenstore_notifier" and set xenstored_read at the right time for
PV on HVM guests too.
In fact in case of PV on HVM guests xenstored is ready only after the
platform pci driver has completed the initialization, so do not set
xenstored_ready before the call to xenbus_probe().

This patch fixes a shutdown_event watcher registration bug that causes
"xm shutdown" not to work properly.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
2010-10-05 13:37:28 +01:00
Dmitry Torokhov f6cd378372 Input: wacom - fix runtime PM related deadlock
When runtime PM is enabled by default for input devices, X hangs in
wacom open:
[<ffffffff814a00ea>] mutex_lock+0x1a/0x40
[<ffffffffa02bc94b>] wacom_resume+0x3b/0x90 [wacom]
[<ffffffff81327a32>] usb_resume_interface+0xd2/0x190
[<ffffffff81327b5d>] usb_resume_both+0x6d/0x110
[<ffffffff81327c24>] usb_runtime_resume+0x24/0x40
[<ffffffff8130a2cf>] __pm_runtime_resume+0x26f/0x450
[<ffffffff8130a23a>] __pm_runtime_resume+0x1da/0x450
[<ffffffff8130a53a>] pm_runtime_resume+0x2a/0x50
[<ffffffff81328176>] usb_autopm_get_interface+0x26/0x60
[<ffffffffa02bc626>] wacom_open+0x36/0x90 [wacom]

wacom_open() takes wacom->lock and calls usb_autopm_get_interface(),
which in turn calls wacom_resume() which tries to acquire the lock
again.

The fix is to call usb_autopm_get_interface() first, before we take
the lock.

Since we do not do usb_autopm_put_interface() until wacom_close()
is called runtime PM is effectively disabled for the driver, however
changing it now would risk regressions so the complete fix will
have to wait till the next merge window.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-10-04 22:36:41 -07:00
Linus Torvalds 2f6b3aa7a5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
  regulator: max8649 - fix setting extclk_freq
  regulator: fix typo in current units
  regulator: fix device_register() error handling
2010-10-04 13:35:48 -07:00
Linus Torvalds 3c06806e69 Merge branch 'merge-powerpc' of git://git.secretlab.ca/git/linux-2.6
* 'merge-powerpc' of git://git.secretlab.ca/git/linux-2.6:
  powerpc/5200: tighten up ac97 reset timing
  powerpc/5200: efika.c: Add of_node_put to avoid memory leak
  powerpc/512x: fix clk_get() return value
2010-10-04 11:45:35 -07:00
Linus Torvalds d9f73afcd3 Merge branch 'fix/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'fix/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: i2c/other/ak4xx-adda: Fix a compile warning with CONFIG_PROCFS=n
  ALSA: prevent heap corruption in snd_ctl_new()
2010-10-04 11:15:59 -07:00