radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
or radix_tree_tag_clear(). The problem is that the double tag_get() in
radix_tree_tag_get():
if (!tag_get(node, tag, offset))
saw_unset_tag = 1;
if (height == 1) {
int ret = tag_get(node, tag, offset);
may see the value change due to the action of set/clear. RCU is no protection
against this as no pointers are being changed, no nodes are being replaced
according to a COW protocol - set/clear alter the node directly.
The documentation in linux/radix-tree.h, however, says that
radix_tree_tag_get() is an exception to the rule that "any function modifying
the tree or tags (...) must exclude other modifications, and exclude any
functions reading the tree".
The problem is that the next statement in radix_tree_tag_get() checks that the
tag doesn't vary over time:
BUG_ON(ret && saw_unset_tag);
This has been seen happening in FS-Cache:
https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html
To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
comments that the value of the tag may change whilst the RCU read lock is held,
and thus that the return value of radix_tree_tag_get() may not be relied upon
unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
running concurrently with it.
Reported-by: Romain DEGEZ <romain.degez@smartjog.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As suggested by Linus, fix up kmem_ptr_validate() to handle non-kernel pointers
more graciously. The patch changes kmem_ptr_validate() to use the newly
introduced kern_ptr_validate() helper to check that a pointer is a valid kernel
pointer before we attempt to convert it into a 'struct page'.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As suggested by Linus, introduce a kern_ptr_validate() helper that does some
sanity checks to make sure a pointer is a valid kernel pointer. This is a
preparational step for fixing SLUB kmem_ptr_validate().
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit ba168fc37d.
It changes user-visible sysfs interfaces, and breaks some existing user
space applications which apparently rely on the fact that the output
does not contain the "0x" prefix.
Requested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The savesys_ipl_nss asm function is put into the .init.text section
however it is missing a ".previous" section which would restore the
previous section.
Luckily all functions in early.c are init functions so it doesn't
matter currently.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The default size of the vmalloc area is currently 1 GB. The memory resource
controller uses about 10 MB of vmalloc space per gigabyte of memory. That
turns a system with more than ~100 GB memory unbootable with the default
vmalloc size. It costs us nothing to increase the default size to some
more adequate value, e.g. 128 GB.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
commit 6a985c6194
([S390] s390: use change recording override for kernel mapping)
deactivated the change bit recording for the kernel mapping to
improve the performance. This works most of the time, but there
are cases (e.g. kernel runs in home space, futex atomic compare xcmg)
where we modify user memory with the kernel mapping instead of the
user mapping.
Instead of fixing these cases, this patch just deactivates change bit
override to avoid future problems with other kernel code that might
use the kernel mapping for user memory.
CC: stable@kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If a machine check interrupts the io interrupt handler on one of the
instructions between io_return and io_leave the critical section
cleanup code will move the return psw to io_work_loop. By doing that
the switch from the asynchronous interrupt stack to the process stack
is skipped. If e.g. TIF_NEED_RESCHED is set things break because
the scheduler is called with the asynchronous interrupts stack.
Moving the psw back to io_return instead fixes the problem.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
"len" hasn't been properly range checked so we shouldn't use it as an
array offset. This can only be written to by root but it would still be
annoying to accidentally write more than 3 characters and corrupt your
memory.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In the default case the lock is not unlocked. The return is
converted to a goto, to share the unlock at the end of the function.
A simplified version of the semantic patch that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r exists@
expression E1;
identifier f;
@@
f (...) { <+...
* spin_lock_irq (E1,...);
... when != E1
* return ...;
...+> }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When CFQ dispatches requests forcefully due to a barrier or changing iosched,
it runs through all cfqq's dispatching requests and then expires each queue.
However, it does not activate a cfqq before flushing its IOs resulting in
using stale values for computing slice_used.
This patch fixes it by calling activate queue before flushing reuqests from
each queue.
This is useful mostly for barrier requests because when the iosched is changing
it really doesnt matter if we have incorrect accounting since we're going to
break down all structures anyway.
We also now expire the current timeslice before moving on with the dispatch
to accurately account slice used for that cfqq.
Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'nouveau/for-airlied' of ../drm-nouveau-next: (21 commits)
drm/nouveau: bail out of auxch transaction if we repeatedly recieve defers
drm/nv50: implement gpio set/get routines
drm/nv50: parse/use some more de-magiced parts of gpio table entries
drm/nouveau: store raw gpio table entry in bios gpio structs
drm/nv40: Init some tiling-related PGRAPH state.
drm/nv50: Add NVA3 support in ctxprog/ctxvals generator.
drm/nv50: another dodgy DP hack
drm/nv50: punt hotplug irq handling out to workqueue
drm/nv50: preserve an unknown SOR_MODECTRL value for DP encoders
drm/nv50: Allow using the NVA3 new compute class.
drm/nv50: cleanup properly if PDISPLAY init fails
drm/nouveau: fixup the init failure paths some more
drm/nv50: fix instmem init on IGPs if stolen mem crosses 4GiB mark
drm/nv40: add LVDS table quirk for Dell Latitude D620
drm/nv40: rework lvds table parsing
drm/nouveau: detect vram amount once, and save the value
drm/nouveau: remove some unused members from drm_nouveau_private
drm/nouveau: Make use of TTM busy_placements.
drm/nv50: add more 0x100c80 flushy magic
drm/nv50: fix fbcon when framebuffer above 4GiB mark
...
Fixes garbled 3D on an nv46 card.
Reported-by: Francesco Marella <francesco.marella@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Allows *some* DP cards to keep working in some corner cases that most
people shouldn't hit. I hit it all the time with development, so this
can stay for now.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This value interacts with some registers we don't currently know how to
program properly ourselves. The default of 5 that we were using matches
what the VBIOS on early DP cards do, but later ones use 6, which would
cause nouveau to program an incorrect mode on these chips.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
All indications seem to be that the version 0x30 table should be handled
the same way as 0x40 (as used on G80), at least for the parts that we
currently try use.
This commit cleans up the parsing to make it clearer about what we're
actually trying to achieve, and unifies the 0x30/0x40 parsing.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
As opposed to repeatedly reading the amount back from the GPU every
time we need to know the VRAM size.
We should now fail to load gracefully on detecting no VRAM, rather than
something potentially messy happening.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Previously we were filling it the same as "placements", but in some
cases there're valid alternatives that we were ignoring completely.
Keeping a back-up memory type helps on several low-mem situations.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fixes the !vbo_fifo path in the 3D driver on certain chipsets. Still not
really any good idea of what exactly the magic achieves, but it makes
things work.
While we're at it, in the PCIEGART path, flush on unbinding also.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Update mtime when writing to backing filesystem using the address space
operations write_begin and write_end.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
not overwriting file_lock structure after GET_LK
cifs: Fix a kernel BUG with remote OS/2 server (try #3)
[CIFS] initialize nbytes at the beginning of CIFSSMBWrite()
[CIFS] Add mmap for direct, nobrl cifs mount types
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: Fix accesses at LBA28 boundary (old bug, but nasty) (v2)
Most drives from Seagate, Hitachi, and possibly other brands,
do not allow LBA28 access to sector number 0x0fffffff (2^28 - 1).
So instead use LBA48 for such accesses.
This bug could bite a lot of systems, especially when the user has
taken care to align partitions to 4KB boundaries. On misaligned systems,
it is less likely to be encountered, since a 4KB read would end at
0x10000000 rather than at 0x0fffffff.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
ide: Fix IDE taskfile with cfq scheduler
ide: Must hold queue lock when requeueing
ide: Requeue request after DMA timeout
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI / PM: Move ACPI video resume to a PM notifier
ACPI: Reduce ACPI resource conflict message to KERN_WARNING, printk cleanup
ACPI: battery drivers should call power_supply_changed()
ACPI: battery: Fix CONFIG_ACPI_SYSFS_POWER=n
PNPACPI: truncate _CRS windows with _LEN > _MAX - _MIN + 1
ACPI: Don't send KEY_UNKNOWN for random video notifications
ACPI: NUMA: map pxms to low node ids
ACPI: use _HID when supplied by root-level devices
ACPI / ACPICA: Do not check reference counters in acpi_ev_enable_gpe()
ACPI: fixes a false alarm from lockdep
ACPI dock: support multiple ACPI dock devices
ACPI: EC: Allow multibyte access to EC
generic setattr not longer responsible for quota transfer.
use udf_setattr for all udf's inodes.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
bloc->logicalBlockNum is unsigned so it's never less than zero.
When I saw that, it made me worry that "bloc->logicalBlockNum + count"
could overflow. That's why I changed the check for less than zero
to an overflow check. (The test works because "count" is also
unsigned.)
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
hvc_console: Fix race between hvc_close and hvc_remove
virtio: disable multiport console support.
virtio: console makes incorrect assumption about virtio API
virtio: console: Fix early_put_chars usage
MAINTAINERS: Put the virtio-console entry in correct alphabetical order
I don't claim to understand the tty layer, but it seems like hvc_open and
hvc_close should be balanced in their kref reference counting.
Right now we get a kref every call to hvc_open:
if (hp->count++ > 0) {
tty_kref_get(tty); <----- here
spin_unlock_irqrestore(&hp->lock, flags);
hvc_kick();
return 0;
} /* else count == 0 */
tty->driver_data = hp;
hp->tty = tty_kref_get(tty); <------ or here if hp->count was 0
But hvc_close has:
tty_kref_get(tty);
if (--hp->count == 0) {
...
/* Put the ref obtained in hvc_open() */
tty_kref_put(tty);
...
}
tty_kref_put(tty);
Since the outside kref get/put balance we only do a single kref_put when
count reaches 0.
The patch below changes things to call tty_kref_put once for every
hvc_close call, and with that my machine boots fine.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Move MULTIPORT feature and related config changes
out of exported headers, and disable the feature
at runtime.
At this point, it seems less risky to keep code around
until we can enable it than rip it out completely.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The get_buf() API sets the second arg to the number of bytes *written*
by the other side; in this case it should be zero as these are output buffers.
lguest gets this right (obviously kvm's console doesn't), resulting in
continual buildup of console writes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Amit Shah <amit.shah@redhat.com>
Currently early_put_chars is not used by virtio_console because it can
only be used once a port has been found, at which point it's too late
because it is no longer needed. This patch should fix it.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Move around the entry for virtio-console to keep the file sorted.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>