Commit Graph

287950 Commits

Author SHA1 Message Date
Linus Torvalds
34ddc81a23 i387: re-introduce FPU state preloading at context switch time
After all the FPU state cleanups and finally finding the problem that
caused all our FPU save/restore problems, this re-introduces the
preloading of FPU state that was removed in commit b3b0870ef3 ("i387:
do not preload FPU state at task switch time").

However, instead of simply reverting the removal, this reimplements
preloading with several fixes, most notably

 - properly abstracted as a true FPU state switch, rather than as
   open-coded save and restore with various hacks.

   In particular, implementing it as a proper FPU state switch allows us
   to optimize the CR0.TS flag accesses: there is no reason to set the
   TS bit only to then almost immediately clear it again.  CR0 accesses
   are quite slow and expensive, don't flip the bit back and forth for
   no good reason.

 - Make sure that the same model works for both x86-32 and x86-64, so
   that there are no gratuitous differences between the two due to the
   way they save and restore segment state differently due to
   architectural differences that really don't matter to the FPU state.

 - Avoid exposing the "preload" state to the context switch routines,
   and in particular allow the concept of lazy state restore: if nothing
   else has used the FPU in the meantime, and the process is still on
   the same CPU, we can avoid restoring state from memory entirely, just
   re-expose the state that is still in the FPU unit.

   That optimized lazy restore isn't actually implemented here, but the
   infrastructure is set up for it.  Of course, older CPU's that use
   'fnsave' to save the state cannot take advantage of this, since the
   state saving also trashes the state.

In other words, there is now an actual _design_ to the FPU state saving,
rather than just random historical baggage.  Hopefully it's easier to
follow as a result.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-18 14:03:48 -08:00
Ben Hutchings
6c63522460 builddeb: Don't create files in /tmp with predictable names
The current use of /tmp for file lists is insecure.  Put them under
$objtree/debian instead.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org  # 2.6.39+
Acked-by: maximilian attems <max@stro.at>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2012-02-18 22:33:26 +01:00
Linus Torvalds
f94edacf99 i387: move TS_USEDFPU flag from thread_info to task_struct
This moves the bit that indicates whether a thread has ownership of the
FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
(called 'has_fpu') in task_struct->thread.has_fpu.

This fixes two independent bugs at the same time:

 - changing 'thread_info->status' from the scheduler causes nasty
   problems for the other users of that variable, since it is defined to
   be thread-synchronous (that's what the "TS_" part of the naming was
   supposed to indicate).

   So perfectly valid code could (and did) do

	ti->status |= TS_RESTORE_SIGMASK;

   and the compiler was free to do that as separate load, or and store
   instructions.  Which can cause problems with preemption, since a task
   switch could happen in between, and change the TS_USEDFPU bit. The
   change to TS_USEDFPU would be overwritten by the final store.

   In practice, this seldom happened, though, because the 'status' field
   was seldom used more than once, so gcc would generally tend to
   generate code that used a read-modify-write instruction and thus
   happened to avoid this problem - RMW instructions are naturally low
   fat and preemption-safe.

 - On x86-32, the current_thread_info() pointer would, during interrupts
   and softirqs, point to a *copy* of the real thread_info, because
   x86-32 uses %esp to calculate the thread_info address, and thus the
   separate irq (and softirq) stacks would cause these kinds of odd
   thread_info copy aliases.

   This is normally not a problem, since interrupts aren't supposed to
   look at thread information anyway (what thread is running at
   interrupt time really isn't very well-defined), but it confused the
   heck out of irq_fpu_usable() and the code that tried to squirrel
   away the FPU state.

   (It also caused untold confusion for us poor kernel developers).

It also turns out that using 'task_struct' is actually much more natural
for most of the call sites that care about the FPU state, since they
tend to work with the task struct for other reasons anyway (ie
scheduling).  And the FPU data that we are going to save/restore is
found there too.

Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to
the %esp issue.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Reported-and-tested-by: Raphael Prevost <raphael@buro.asia>
Acked-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-18 10:19:41 -08:00
Alan Stern
fea6d607e1 [SCSI] scsi_pm: Fix bug in the SCSI power management handler
This patch (as1520) fixes a bug in the SCSI layer's power management
implementation.

LUN scanning can be carried out asynchronously in do_scan_async(), and
sd uses an asynchronous thread for the time-consuming parts of disk
probing in sd_probe_async().  Currently nothing coordinates these
async threads with system sleep transitions; they can and do attempt
to continue scanning/probing SCSI devices even after the host adapter
has been suspended.  As one might expect, the outcome is not ideal.

This is what the "prepare" stage of system suspend was created for.
After the prepare callback has been called for a host, target, or
device, drivers are not allowed to register any children underneath
them.  Currently the SCSI prepare callback is not implemented; this
patch rectifies that omission.

For SCSI hosts, the prepare routine calls scsi_complete_async_scans()
to wait until async scanning is finished.  It might be slightly more
efficient to wait only until the host in question has been scanned,
but there's currently no way to do that.  Besides, during a sleep
transition we will ultimately have to wait until all the host scanning
has finished anyway.

For SCSI devices, the prepare routine calls async_synchronize_full()
to wait until sd probing is finished.  The routine does nothing for
SCSI targets, because asynchronous target scanning is done only as
part of host scanning.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:54:19 -06:00
Huajun Li
267a6ad4ae [SCSI] scsi_scan: Fix 'Poison overwritten' warning caused by using freed 'shost'
In do_scan_async(), calling scsi_autopm_put_host(shost) may reference
freed shost, and cause Posison overwitten warning.
Yes, this case can happen, for example, an USB is disconnected just
when do_scan_async() thread starts to run, then scsi_host_put() called
in scsi_finish_async_scan() will lead to shost be freed(because the
refcount of shost->shost_gendev decreases to 1 after USB disconnects),
at this point, if references shost again, system will show following
warning msg.

To make scsi_autopm_put_host(shost) always reference a valid shost,
put it just before scsi_host_put() in function
scsi_finish_async_scan().

[  299.281565] =============================================================================
[  299.281634] BUG kmalloc-4096 (Tainted: G          I ): Poison overwritten
[  299.281682] -----------------------------------------------------------------------------
[  299.281684]
[  299.281752] INFO: 0xffff880056c305d0-0xffff880056c305d0. First byte
0x6a instead of 0x6b
[  299.281816] INFO: Allocated in scsi_host_alloc+0x4a/0x490 age=1688
cpu=1 pid=2004
[  299.281870] 	__slab_alloc+0x617/0x6c1
[  299.281901] 	__kmalloc+0x28c/0x2e0
[  299.281931] 	scsi_host_alloc+0x4a/0x490
[  299.281966] 	usb_stor_probe1+0x5b/0xc40 [usb_storage]
[  299.282010] 	storage_probe+0xa4/0xe0 [usb_storage]
[  299.282062] 	usb_probe_interface+0x172/0x330 [usbcore]
[  299.282105] 	driver_probe_device+0x257/0x3b0
[  299.282138] 	__driver_attach+0x103/0x110
[  299.282171] 	bus_for_each_dev+0x8e/0xe0
[  299.282201] 	driver_attach+0x26/0x30
[  299.282230] 	bus_add_driver+0x1c4/0x430
[  299.282260] 	driver_register+0xb6/0x230
[  299.282298] 	usb_register_driver+0xe5/0x270 [usbcore]
[  299.282337] 	0xffffffffa04ab03d
[  299.282364] 	do_one_initcall+0x47/0x230
[  299.282396] 	sys_init_module+0xa0f/0x1fe0
[  299.282429] INFO: Freed in scsi_host_dev_release+0x18a/0x1d0 age=85
cpu=0 pid=2008
[  299.282482] 	__slab_free+0x3c/0x2a1
[  299.282510] 	kfree+0x296/0x310
[  299.282536] 	scsi_host_dev_release+0x18a/0x1d0
[  299.282574] 	device_release+0x74/0x100
[  299.282606] 	kobject_release+0xc7/0x2a0
[  299.282637] 	kobject_put+0x54/0xa0
[  299.282668] 	put_device+0x27/0x40
[  299.282694] 	scsi_host_put+0x1d/0x30
[  299.282723] 	do_scan_async+0x1fc/0x2b0
[  299.282753] 	kthread+0xdf/0xf0
[  299.282782] 	kernel_thread_helper+0x4/0x10
[  299.282817] INFO: Slab 0xffffea00015b0c00 objects=7 used=7 fp=0x
      (null) flags=0x100000000004080
[  299.282882] INFO: Object 0xffff880056c30000 @offset=0 fp=0x          (null)
[  299.282884]
...

Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
Cc: stable@kernel.org
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:52:48 -06:00
Chad Dupuis
477e3e9ffc [SCSI] qla2xxx: Update version number to 8.03.07.13-k.
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:50:20 -06:00
Giridhar Malavali
2cc97965e4 [SCSI] qla2xxx: Proper detection of firmware abort error code for ISP82xx.
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:49:58 -06:00
Shyam Sundar
5a034bb3c3 [SCSI] qla2xxx: Remove resetting memory during device initialization for ISP82xx.
With IOs running and PegHalt testing the system reboots when memory reset is
performed during device initialization.

Signed-off-by: Shyam Sundar <shyam.sundar@qlogic.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:49:37 -06:00
Giridhar Malavali
d33609607c [SCSI] qla2xxx: Complete mailbox command timedout to avoid initialization failures during next reset cycle.
Complete the mailbox command timed out before initiating another abort cycle
to recover so that mailbox commands issued during next reset cycle don't fail
due to pending mailbox access timeout.

Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:49:01 -06:00
Michael Christie
c7a9927842 [SCSI] qla2xxx: Remove check for null fcport from host reset handler.
Remove the check for a NULL fcport so that the host reset will run
unconditionally to unwedge any commands before the device is offlined and to
prevent a quick runthrough of the SCSI error handling.

Signed-off-by: Michael Christie <mchristi@redhat.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:48:29 -06:00
Andrew Vasquez
67ddda353c [SCSI] qla2xxx: Correct out of bounds read of ISP2200 mailbox registers.
ISP2200 adapters only have 24 mailbox registers so read only that many.

Reported-by: Olatunji Ruwase <oor@cs.cmu.edu>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:47:27 -06:00
Andrew Vasquez
7cb0eb1c17 [SCSI] qla2xxx: Remove errant clearing of MBX_INTERRUPT flag during CT-IOCB processing.
This can cause instability in mailbox command state machine handling.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:46:46 -06:00
Andrew Vasquez
4ba988db8d [SCSI] qla2xxx: Clear options-flags while issuing stop-firmware mbx command.
Not clearing the options flags in mbx1 could lead the firmware
into interpreting old data in mbx1 through mbx8.  This could
lead to inadvertent DMA read/write operations to stale memory.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:44:51 -06:00
Andrew Vasquez
d051a5aa1c [SCSI] qla2xxx: Add an "is reset active" helper.
Many locations within the driver would use an inconsistent set of
checks to determine ISP-reset state.  Consolidate the checks into
this inline-helper.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:43:26 -06:00
Chad Dupuis
aa651be83d [SCSI] qla2xxx: Add check for null fcport references in qla2xxx_queuecommand.
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:42:10 -06:00
Arun Easi
a55aac79de [SCSI] qla2xxx: Propagate up abort failures.
Signed-off-by: Arun Easi <arun.easi@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:40:20 -06:00
Dave Jiang
6d7938f46f [SCSI] isci: Fix NULL ptr dereference when no firmware is being loaded
NULL orom ptr passed in for verification which caused page fault.
We will set a default version when we don't have orom struct.

Reported-by: Dan Melnic <dan@seamicro.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:37:00 -06:00
Kleber Sacilotto de Souza
a92fa25c63 [SCSI] ipr: fix eeh recovery for 64-bit adapters
In some scenarios, an EEH error can take a long time to be detected, since the
driver issues an MMIO read only after a device reset command times out and we
try to reset the adapter. This patch adds some code in ipr_cancel_op() to read
a hardware register so we detect the error earlier in case the op is being
aborted because of a timeout caused by a frozen adapter slot.

Another problem in such scenarios is that in __ipr_eh_host_reset() we change the
dump state flag from WAIT_FOR_DUMP to GET_DUMP, and the flag is later changed
from GET_DUMP to READ_DUMP in ipr_reset_restore_cfg_space(). However, if when
__ipr_eh_host_reset() is called by the SCSI error handling the function
ipr_reset_restore_cfg_space() has already been called by the PCI EEH code, we
end up with the flag in an inconsistent state. This patch also prevents this
problem.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-18 08:33:13 -06:00
Weston Andros Adamson
abe9a6d57b NFSv4: fix server_scope memory leak
server_scope would never be freed if nfs4_check_cl_exchange_flags() returned
non-zero

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17 17:34:03 -05:00
Trond Myklebust
f86f36a6ae NFSv4.1: Fix a NFSv4.1 session initialisation regression
Commit aacd553 (NFSv4.1: cleanup init and reset of session slot tables)
introduces a regression in the session initialisation code. New tables
now find their sequence ids initialised to 0, rather than the mandated
value of 1 (see RFC5661).

Fix the problem by merging nfs4_reset_slot_table() and nfs4_init_slot_table().
Since the tbl->max_slots is initialised to 0, the test in
nfs4_reset_slot_table for max_reqs != tbl->max_slots will automatically
pass for an empty table.

Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17 17:33:39 -05:00
Martin Schwidefsky
cf1eb40f8f [S390] correct ktime to tod clock comparator conversion
The conversion of the ktime to a value suitable for the clock comparator
does not take changes to wall_to_monotonic into account. In fact the
conversion just needs the boot clock (sched_clock_base_cc) and the
total_sleep_time.

This is applicable to 3.2+ kernels.

CC: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-02-17 10:29:33 +01:00
Martin Schwidefsky
656d912537 [S390] 3215 deadlock with tty_wakeup
The 3215 driver calls tty_wakeup from irq context while holding the
device spinlock. If printk is called by any function on the callchain
starting from tty_wakeup the system deadlocks on the device spinlock.
Using a tasklet to call tty_wakup solves the problem.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-02-17 10:29:33 +01:00
Martin Schwidefsky
2320c57937 [S390] incorrect PageTables counter for kvm page tables
The page_table_free_pgste function is used for kvm processes to free page
tables that have the pgste extension. It calls pgtable_page_ctor instead of
pgtable_page_dtor which increases NR_PAGETABLE instead of decreasing it.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-02-17 10:29:33 +01:00
Heiko Carstens
f3612304ee [S390] idle: avoid RCU usage in extended quiescent state
Avoid calling wake_up() from our NMI "bottom halve" from RCU extended
quiescent state in idle. wake_up() has RCU read-side critical sections
but this will be completely ignored by RCU if the cpu is in extended
quiescent state.
Which means that whatever object is being accessed from within the
read-side critical section can be freed concurrently from a different
cpu.
So make sure we leave extended quiescent state before calling wake_up().

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-02-17 10:29:32 +01:00
Takashi Iwai
ef8d60fb79 ALSA: hda/realtek - Fix surround output regression on Acer Aspire 5935
The previous fix for the speaker on Acer Aspire 59135 introduced
another problem for surround outputs.  It changed the connections on
the line-in/mic pins for limiting the routes, but it left the modified
connections.  Thus wrong connection indices were written when set to
4ch or 6ch mode.

This patch fixes it by restoring the right connections just after
parsing the tree but before the initialization.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42740

Cc: <stable@kernel.org> [v3.2+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-02-17 10:28:06 +01:00
Linus Torvalds
4903062b54 i387: move AMD K7/K8 fpu fxsave/fxrstor workaround from save to restore
The AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
pending.  In order to not leak FIP state from one process to another, we
need to do a floating point load after the fxsave of the old process,
and before the fxrstor of the new FPU state.  That resets the state to
the (uninteresting) kernel load, rather than some potentially sensitive
user information.

We used to do this directly after the FPU state save, but that is
actually very inconvenient, since it

 (a) corrupts what is potentially perfectly good FPU state that we might
     want to lazy avoid restoring later and

 (b) on x86-64 it resulted in a very annoying ordering constraint, where
     "__unlazy_fpu()" in the task switch needs to be delayed until after
     the DS segment has been reloaded just to get the new DS value.

Coupling it to the fxrstor instead of the fxsave automatically avoids
both of these issues, and also ensures that we only do it when actually
necessary (the FP state after a save may never actually get used).  It's
simply a much more natural place for the leaked state cleanup.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-16 19:11:15 -08:00
Linus Torvalds
b3b0870ef3 i387: do not preload FPU state at task switch time
Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
is spending several days looking for a bug in the state save/restore
code.  And the preload code has some rather subtle interactions with
both paravirtualization support and segment state restore, so it's not
nearly as simple as it should be.

Also, now that we no longer necessarily depend on a single bit (ie
TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
to do better.  If we are really switching between two processes that
keep touching the FP state, save/restore is inevitable, but in the case
of having one process that does most of the FPU usage, we may actually
be able to do much better than the preloading.

In particular, we may be able to keep track of which CPU the process ran
on last, and also per CPU keep track of which process' FP state that CPU
has.  For modern CPU's that don't destroy the FPU contents on save time,
that would allow us to do a lazy restore by just re-enabling the
existing FPU state - with no restore cost at all!

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-16 15:45:23 -08:00
Cong Wang
465c9343c5 ecryptfs: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-02-16 16:06:27 -06:00
Tyler Hicks
545d680938 eCryptfs: Copy up lower inode attrs after setting lower xattr
After passing through a ->setxattr() call, eCryptfs needs to copy the
inode attributes from the lower inode to the eCryptfs inode, as they
may have changed in the lower filesystem's ->setxattr() path.

One example is if an extended attribute containing a POSIX Access
Control List is being set. The new ACL may cause the lower filesystem to
modify the mode of the lower inode and the eCryptfs inode would need to
be updated to reflect the new mode.

https://launchpad.net/bugs/926292

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sebastien Bacher <seb128@ubuntu.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: <stable@vger.kernel.org>
2012-02-16 16:06:27 -06:00
Tyler Hicks
4a26620df4 eCryptfs: Improve statfs reporting
statfs() calls on eCryptfs files returned the wrong filesystem type and,
when using filename encryption, the wrong maximum filename length.

If mount-wide filename encryption is enabled, the cipher block size and
the lower filesystem's max filename length will determine the max
eCryptfs filename length. Pre-tested, known good lengths are used when
the lower filesystem's namelen is 255 and a cipher with 8 or 16 byte
block sizes is used. In other, less common cases, we fall back to a safe
rounded-down estimate when determining the eCryptfs namelen.

https://launchpad.net/bugs/885744

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
2012-02-16 16:06:21 -06:00
Linus Torvalds
6d59d7a9f5 i387: don't ever touch TS_USEDFPU directly, use helper functions
This creates three helper functions that do the TS_USEDFPU accesses, and
makes everybody that used to do it by hand use those helpers instead.

In addition, there's a couple of helper functions for the "change both
CR0.TS and TS_USEDFPU at the same time" case, and the places that do
that together have been changed to use those.  That means that we have
fewer random places that open-code this situation.

The intent is partly to clarify the code without actually changing any
semantics yet (since we clearly still have some hard to reproduce bug in
this area), but also to make it much easier to use another approach
entirely to caching the CR0.TS bit for software accesses.

Right now we use a bit in the thread-info 'status' variable (this patch
does not change that), but we might want to make it a full field of its
own or even make it a per-cpu variable.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-16 13:33:12 -08:00
Linus Torvalds
b6c66418dc i387: move TS_USEDFPU clearing out of __save_init_fpu and into callers
Touching TS_USEDFPU without touching CR0.TS is confusing, so don't do
it.  By moving it into the callers, we always do the TS_USEDFPU next to
the CR0.TS accesses in the source code, and it's much easier to see how
the two go hand in hand.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-16 12:22:48 -08:00
Linus Torvalds
15d8791cae i387: fix x86-64 preemption-unsafe user stack save/restore
Commit 5b1cbac377 ("i387: make irq_fpu_usable() tests more robust")
added a sanity check to the #NM handler to verify that we never cause
the "Device Not Available" exception in kernel mode.

However, that check actually pinpointed a (fundamental) race where we do
cause that exception as part of the signal stack FPU state save/restore
code.

Because we use the floating point instructions themselves to save and
restore state directly from user mode, we cannot do that atomically with
testing the TS_USEDFPU bit: the user mode access itself may cause a page
fault, which causes a task switch, which saves and restores the FP/MMX
state from the kernel buffers.

This kind of "recursive" FP state save is fine per se, but it means that
when the signal stack save/restore gets restarted, it will now take the
'#NM' exception we originally tried to avoid.  With preemption this can
happen even without the page fault - but because of the user access, we
cannot just disable preemption around the save/restore instruction.

There are various ways to solve this, including using the
"enable/disable_page_fault()" helpers to not allow page faults at all
during the sequence, and fall back to copying things by hand without the
use of the native FP state save/restore instructions.

However, the simplest thing to do is to just allow the #NM from kernel
space, but fix the race in setting and clearing CR0.TS that this all
exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
atomic wrt scheduling, so while the actual state save/restore can be
interrupted and restarted, the act of actually clearing/setting CR0.TS
and the TS_USEDFPU bit together must not.

Instead of just adding random "preempt_disable/enable()" calls to what
is already excessively ugly code, this introduces some helper functions
that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
the user state instead.

Those helper functions should probably eventually replace the other
ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
some more: the task switching functionality in particular needs to
expose the difference between the 'prev' and 'next' threads, while the
new helper functions intentionally were written to only work with
'current'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-16 09:15:04 -08:00
Liu Bo
d9b0218f6c Btrfs: fix a bug on overcommit stuff
When overcommitting, we should check the sum of pinned space and
bytes for delayed item.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-02-16 17:23:18 +01:00
Liu Bo
9d47c7671d Btrfs: kick out redundant stuff in convert_extent_bit
clear_state_bit will do merge_state for us, so kick out the redundant one.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-02-16 17:23:17 +01:00
Liu Bo
0449314a9c Btrfs: skip states when they does not contain bits to clear
Clearing a range's bits is different with setting them, since we don't
need to touch them when states do not contain bits we want.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-02-16 17:23:17 +01:00
Tsutomu Itoh
285190d99f Btrfs: check return value of lookup_extent_mapping() correctly
This patch corrects error checking of lookup_extent_mapping().

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-02-16 17:23:17 +01:00
Miao Xie
600a45e1d5 Btrfs: fix deadlock on page lock when doing auto-defragment
When I ran xfstests circularly on a auto-defragment btrfs, the deadlock
happened.

Steps to reproduce:
[tty0]
 # export MOUNT_OPTIONS="-o autodefrag"
 # export TEST_DEV=<partition1>
 # export TEST_DIR=<mountpoint1>
 # export SCRATCH_DEV=<partition2>
 # export SCRATCH_MNT=<mountpoint2>
 # while [ 1 ]
 > do
 > ./check 091 127 263
 > sleep 1
 > done
[tty1]
 # while [ 1 ]
 > do
 > echo 3 > /proc/sys/vm/drop_caches
 > done

Several hours later, the test processes will hang on, and the deadlock will
happen on page lock.

The reason is that:
  Auto defrag task		Flush thread			Test task
				btrfs_writepages()
				  add ordered extent
				  (including page 1, 2)
				  set page 1 writeback
				  set page 2 writeback
				endio_fn()
				  end page 2 writeback
								release page 2
lock page 1
alloc and lock page 2
page 2 is not uptodate
  btrfs_readpage()
    start ordered extent()
    btrfs_writepages()
      try  to lock page 1

so deadlock happens.

Fix this bug by unlocking the page which is in writeback, and re-locking it
after the writeback end.

Signed-off-by: Miao Xie <miax@cn.fujitsu.com>
2012-02-16 17:23:16 +01:00
Tsutomu Itoh
013bd4c336 Btrfs: fix return value check of extent_io_ops
This patch adds the check on the return value of extent_io_ops.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-02-16 17:23:16 +01:00
Takashi Iwai
c14c95f62e ALSA: hda/realtek - Fix overflow of vol/sw check bitmap
The bitmap introduced in the commit [527e4d73: ALSA: hda/realtek - Fix
missing volume controls with ALC260] is too narrow for some codecs,
which may have more NIDs than 0x20, thus it may overflow the bitmap
array on them.

Just double the number to cover all and also add a sanity-check code
to be safer.

Cc: <stable@kernel.org> [v3.2+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-02-16 16:39:14 +01:00
Florian Albrechtskirchinger
12fc9d0923 btrfs: honor umask when creating subvol root
Set the subvol root inode permissions based on the current umask.
2012-02-16 16:35:41 +01:00
Inki Dae
53ef299f39 drm/exynos: added postclose to release resource.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-16 09:40:54 +00:00
Inki Dae
bc41eae2c8 drm/exynos: removed exynos_drm_fbdev_recreate function.
this function ins't needed anymore.

Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-16 09:40:52 +00:00
Inki Dae
c5614ae326 drm/exynos: fixed page flip issue.
with vblank_refcount = 1, there was the case that drm_vblank_put
is called by specific page flip function so this patch fixes the
issue.

Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-16 09:40:50 +00:00
Inki Dae
d081f56604 drm/exynos: added possible_clones setup function.
basically, all crtcs are possible to clone each other.

Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-16 09:40:47 +00:00
Joonyoung Shim
6f811502a4 drm/exynos: removed pageflip_event_list init code when closed.
if one process is terminated by ctrl-c while two processes are
using pageflip feature then for last pageflip event,
user can't get poll from kernel side so this patch fixes the problem.

Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyoungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-16 09:40:44 +00:00
Joonyoung Shim
44a0e022b8 drm/exynos: changed priority of mixer layers.
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-16 09:40:43 +00:00
Masanari Iida
1109bf8bcb drm/exynos: Fix typo in exynos_mixer.c
Correct spelling "sucessful" to "successful" in
drivers/gpu/drm/exynos/exynos_mixer.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-16 09:37:49 +00:00
Anton Blanchard
9a45a9407c powerpc/perf: power_pmu_start restores incorrect values, breaking frequency events
perf on POWER stopped working after commit e050e3f0a7 (perf: Fix
broken interrupt rate throttling). That patch exposed a bug in
the POWER perf_events code.

Since the PMCs count upwards and take an exception when the top bit
is set, we want to write 0x80000000 - left in power_pmu_start. We were
instead programming in left which effectively disables the counter
until we eventually hit 0x80000000. This could take seconds or longer.

With the patch applied I get the expected number of samples:

          SAMPLE events:       9948

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@kernel.org>
2012-02-16 16:24:35 +11:00
majianpeng
64f8c13561 powerpc/adb: Use set_current_state()
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-16 16:15:12 +11:00