Commit Graph

45199 Commits

Author SHA1 Message Date
NeilBrown f5b9409973 All Arch: remove linkage for sys_nfsservctl system call
The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-26 15:09:58 -07:00
Linus Torvalds efe45ab1ee Merge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  omap-serial: Allow IXON and IXOFF to be disabled.
  TTY: serial, document ignoring of uart->ops->startup error
  TTY: pty, fix pty counting
  8250: Fix race condition in serial8250_backup_timeout().
  serial/8250_pci: delete duplicate data definition
  8250_pci: add support for Rosewill RC-305 4x serial port card
  tty: Add "spi:" prefix for spi modalias
  atmel_serial: fix atmel_default_console_device
  serial: 8250_pnp: add Intermec CV60 touchscreen device
  drivers/serial/ucc_uart.c: Fix compiler warning
  pch_uart: Set PCIe bus number using probe parameter
  serial: samsung: Fix build error
2011-08-26 13:06:06 -07:00
Linus Torvalds 3ab47029d9 Merge branch 'driver-core-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* 'driver-core-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  drivers:misc: ti-st: fix unexpected UART close
  drivers:misc: ti-st: free skb on firmware download
  drivers:misc: ti-st: wait for completion at fail
  drivers:misc: ti-st: reinit completion before send
  drivers:misc: ti-st: fail-safe on wrong pkt type
  drivers:misc: ti-st: reinit completion on ver read
  drivers:misc:ti-st: platform hooks for chip states
  drivers:misc: ti-st: avoid a misleading dbg msg
  base/devres.c: quiet sparse noise about context imbalance
  pti: add missing CONFIG_PCI dependency
  drivers/base/devtmpfs.c: correct annotation of `setup_done'
  driver core: fix kernel-doc warning in platform.c
  firmware: fix google/gsmi.c build warning
2011-08-26 13:05:09 -07:00
Dilan Lee cc7993f643 backlight: add a callback 'notify_after' for backlight control
We need a callback to do some things after pwm_enable, pwm_disable
and pwm_config.

Signed-off-by: Dilan Lee <dilee@nvidia.com>
Reviewed-by: Robert Morell <rmorell@nvidia.com>
Reviewed-by: Arun Murthy <arun.murthy@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 16:25:34 -07:00
Alexandre Bounine 284fb68d00 rapidio: fix use of non-compatible registers
Replace/remove use of RIO v.1.2 registers/bits that are not
forward-compatible with newer versions of RapidIO specification.

RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR,
Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.

Use of removed (since RIO v.1.3) register bits affects users of
currently available 1.3 and 2.x compliant devices who may use not so
recent kernel versions.

Removing checks for unsupported bits makes corresponding routines
compatible with all versions of RapidIO specification.  Therefore,
backporting makes stable kernel versions compliant with RIO v.1.3 and
later as well.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Chul Kim <chul.kim@idt.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 16:25:34 -07:00
Evgeniy Polyakov a801876638 MAINTAINERS: Evgeniy has moved
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 16:25:33 -07:00
Josh Boyer e096d0c7e2 lockdep: Add helper function for dir vs file i_mutex annotation
Purely in-memory filesystems do not use the inode hash as the dcache
tells us if an entry already exists.  As a result, they do not call
unlock_new_inode, and thus directory inodes do not get put into a
different lockdep class for i_sem.

We need the different lockdep classes, because the locking order for
i_mutex is different for directory inodes and regular inodes.  Directory
inodes can do "readdir()", which takes i_mutex *before* possibly taking
mm->mmap_sem (due to a page fault while copying the directory entry to
user space).

In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
before accessing i_mutex.

The two cases can never happen for the same inode, so no real deadlock
can occur, but without the different lockdep classes, lockdep cannot
understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
can lead to false positives from lockdep like below:

    find/645 is trying to acquire lock:
     (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac

    but task is already holding lock:
     (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
    vfs_readdir+0x5b/0xb4

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
          [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
          [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
          [<ffffffff81111557>] mmap_region+0x258/0x432
          [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
          [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
          [<ffffffff8100c858>] sys_mmap+0x22/0x24
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

    -> #0 (&mm->mmap_sem){++++++}:
          [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff81109541>] might_fault+0x89/0xac
          [<ffffffff81149cff>] filldir+0x6f/0xc7
          [<ffffffff811586ea>] dcache_readdir+0x67/0x205
          [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
          [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

This patch moves the directory vs file lockdep annotation into a helper
function that can be called by in-memory filesystems and has hugetlbfs
call it.

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 10:50:18 -07:00
Linus Torvalds e33f2d238e Merge branch 'urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback
* 'urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback:
  squeeze max-pause area and drop pass-good area
2011-08-25 10:40:12 -07:00
Linus Torvalds f385b6974b Merge branch '3.1-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* '3.1-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (21 commits)
  target: Convert acl_node_lock to be IRQ-disabling
  target: Make locking in transport_deregister_session() IRQ safe
  tcm_fc: init/exit functions should not be protected by "#ifdef MODULE"
  target: Print subpage too for unhandled MODE SENSE pages
  iscsi-target: Fix iscsit_allocate_se_cmd_for_tmr failure path bugs
  iscsi-target: Implement iSCSI target IPv6 address printing.
  target: Fix task SGL chaining breakage with transport_allocate_data_tasks
  target: Fix task count > 1 handling breakage and use max_sector page alignment
  target: Add missing DATA_SG_IO transport_cmd_get_valid_sectors check
  target: Fix SYNCHRONIZE_CACHE zero LBA + range breakage
  target: Remove duplicate task completions in transport_emulate_control_cdb
  target: Fix WRITE_SAME usage with transport_get_size
  target: Add WRITE_SAME (10) parsing and refactor passthrough checks
  target: Fix write payload exception handling with ->new_cmd_map
  iscsi-target: forever loop bug in iscsit_attach_ooo_cmdsn()
  iscsi-target: remove duplicate return
  target: Convert target_core_rd.c to use use BUG_ON
  iscsi-target: Fix leak on failure in iscsi_copy_param_list()
  target: Use ERR_CAST inlined function
  target: Make standard INQUIRY return 'not connected' for tpg_virt_lun0
  ...
2011-08-25 10:30:51 -07:00
Andi Kleen be27425dcc Add a personality to report 2.6.x version numbers
I ran into a couple of programs which broke with the new Linux 3.0
version.  Some of those were binary only.  I tried to use LD_PRELOAD to
work around it, but it was quite difficult and in one case impossible
because of a mix of 32bit and 64bit executables.

For example, all kind of management software from HP doesnt work, unless
we pretend to run a 2.6 kernel.

  $ uname -a
  Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux

  $ hpacucli ctrl all show

  Error: No controllers detected.

  $ rpm -qf /usr/sbin/hpacucli
  hpacucli-8.75-12.0

Another notable case is that Python now reports "linux3" from
sys.platform(); which in turn can break things that were checking
sys.platform() == "linux2":

  https://bugzilla.mozilla.org/show_bug.cgi?id=664564

It seems pretty clear to me though it's a bug in the apps that are using
'==' instead of .startswith(), but this allows us to unbreak broken
programs.

This patch adds a UNAME26 personality that makes the kernel report a
2.6.40+x version number instead.  The x is the x in 3.x.

I know this is somewhat ugly, but I didn't find a better workaround, and
compatibility to existing programs is important.

Some programs also read /proc/sys/kernel/osrelease.  This can be worked
around in user space with mount --bind (and a mount namespace)

To use:

  wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
  gcc -o uname26 uname26.c
  ./uname26 program

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-25 10:17:28 -07:00
Linus Torvalds 051732bcbe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message
  fuse: mark pages accessed when written to
  fuse: delete dead .write_begin and .write_end aops
  fuse: fix flock
  fuse: fix non-ANSI void function notation
2011-08-24 09:14:42 -07:00
Jiri Slaby 24d406a6bf TTY: pty, fix pty counting
tty_operations->remove is normally called like:
queue_release_one_tty
 ->tty_shutdown
   ->tty_driver_remove_tty
     ->tty_operations->remove

However tty_shutdown() is called from queue_release_one_tty() only if
tty_operations->shutdown is NULL. But for pty, it is not.
pty_unix98_shutdown() is used there as ->shutdown.

So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
called. This results in invalid pty_count. I.e. what can be seen in
/proc/sys/kernel/pty/nr.

I see this was already reported at:
  https://lkml.org/lkml/2009/11/5/370
But it was not fixed since then.

This patch is kind of a hackish way. The problem lies in ->install. We
allocate there another tty (so-called tty->link). So ->install is
called once, but ->remove twice, for both tty and tty->link. The fix
here is to count both tty and tty->link and divide the count by 2 for
user.

And to have ->remove called, let's make tty_driver_remove_tty() global
and call that from pty_unix98_shutdown() (tty_operations->shutdown).

While at it, let's document that when ->shutdown is defined,
tty_shutdown() is not called.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-23 10:10:38 -07:00
Pavan Savoy 0d7c5f2572 drivers:misc:ti-st: platform hooks for chip states
Certain platform specific or Host-WiLink Interface specific actions would be
required to be taken when the chip is being enabled and after the chip is
disabled such as configuration of the mux modes for the GPIO of host connected
to the nshutdown of the chip or relinquishing UART after the chip is disabled.

Similar actions can also be taken when the chip is in deep sleep or when the
chip is awake. Performance enhancements such as configuring the host to run
faster when chip is awake and slower when chip is asleep can also be made
here.

Signed-off-by: Pavan Savoy <pavan_savoy@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22 14:13:32 -07:00
Nicholas Bellinger 052605c6ca target: Make standard INQUIRY return 'not connected' for tpg_virt_lun0
This patch changes target_emulate_inquiry_std() to set the 'not connected'
(0x35) bit in standard INQUIRY response data when we are processing a
request to a virtual LUN=0 mapping from struct se_device *g_lun0_dev that
have been setup for us in transport_lookup_cmd_lun().

This addresses an issue where qla2xxx FC clients need to be able
to create demo-mode I_T FC Nexuses by default, but should not be
exposing the default set of TPG LUNs to all FC clients.  This includes
adding an new optional target_core_fabric_ops->tpg_check_demo_mode_login_only()
caller to allow demo_mode nexuses to skip the old default of bulding
a demo-mode MappedLUNs list via core_tpg_add_node_to_devs().

(roland: Add missing tpg_check_demo_mode_login_only check in core_dev_add_lun)

Reported-by: Roland Dreier <roland@purestorage.com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@risingtidesystems.com>
2011-08-22 19:25:35 +00:00
Linus Torvalds 5ccc38740a Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
  Revert "cfq: Remove special treatment for metadata rqs."
  block: fix flush machinery for stacking drivers with differring flush flags
  block: improve rq_affinity placement
  blktrace: add FLUSH/FUA support
  Move some REQ flags to the common bio/request area
  allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
  xen/blkback: Make description more obvious.
  cfq-iosched: Add documentation about idling
  block: Make rq_affinity = 1 work as expected
  block: swim3: fix unterminated of_device_id table
  block/genhd.c: remove useless cast in diskstats_show()
  drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
  drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
  bsg-lib: add module.h include
  cfq-iosched: Reduce linked group count upon group destruction
  blk-throttle: correctly determine sync bio
  loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
  loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
  loop: add management interface for on-demand device allocation
  loop: replace linked list of allocated devices with an idr index
  ...
2011-08-19 10:47:07 -07:00
Linus Torvalds 0c3bef6128 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: OF: Don't crash when bridge parent is NULL.
  PCI: export pcie_bus_configure_settings symbol
  PCI: code and comments cleanup
  PCI: make cardbus-bridge resources optional
  PCI: make SRIOV resources optional
  PCI : ability to relocate assigned pci-resources
  PCI: honor child buses add_size in hot plug configuration
  PCI: Set PCI-E Max Payload Size on fabric
2011-08-19 10:02:37 -07:00
Wu Fengguang bb0822954a squeeze max-pause area and drop pass-good area
Revert the pass-good area introduced in ffd1f609ab ("writeback:
introduce max-pause and pass-good dirty limits") and make the max-pause
area smaller and safe.

This fixes ~30% performance regression in the ext3 data=writeback
fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.

Using deadline scheduler also has a regression, but not that big as CFQ,
so this suggests we have some write starvation.

The test logs show that

- the disks are sometimes under utilized

- global dirty pages sometimes rush high to the pass-good area for
  several hundred seconds, while in the mean time some bdi dirty pages
  drop to very low value (bdi_dirty << bdi_thresh).  Then suddenly the
  global dirty pages dropped under global dirty threshold and bdi_dirty
  rush very high (for example, 2 times higher than bdi_thresh). During
  which time balance_dirty_pages() is not called at all.

So the problems are

1) The random writes progress so slow that they break the assumption of
   the max-pause logic that "8 pages per 200ms is typically more than
   enough to curb heavy dirtiers".

2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
   for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh)
   and then (bdi_thresh >> bdi_dirty) for others.

3) The higher max-pause/pass-good thresholds somehow leads to the bad
   swing of dirty pages.

The fix is to allow the task to slightly dirty over task_bdi_thresh, but
no way to exceed bdi_dirty and/or global dirty_thresh.

Tests show that it fixed the JBOD regression completely (both behavior
and performance), while still being able to cut down large pause times
in balance_dirty_pages() for single-disk cases.

Reported-by: Li Shaohua <shaohua.li@intel.com>
Tested-by: Li Shaohua <shaohua.li@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-08-19 22:42:07 +08:00
Linus Torvalds b4fd4ae6c6 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
2011-08-17 13:15:25 -07:00
Ian Campbell aa462abe8a mm: fix __page_to_pfn for a const struct page argument
This allows the cast in lowmem_page_address (introduced as a warning
fixup to 33dd4e0ec9 "mm: make some struct page's const") to be
removed.

Propagate const'ness to page_to_section() as well since it is required
by __page_to_pfn.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:00:20 -07:00
Ian Campbell f991879473 mm: make HASHED_PAGE_VIRTUAL page_address' struct page argument const.
Followup to 33dd4e0ec9 "mm: make some struct page's const" which missed the
HASHED_PAGE_VIRTUAL case.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:00:20 -07:00
Linus Torvalds 6cac952960 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rtc: Limit RTC PIE frequency
  rtc: Fix hrtimer deadlock
  rtc: Handle errors correctly in rtc_irq_set_state()

Fixup trivial conflicts in drivers/rtc/interface.c due to slightly
trivially versions of the same patch coming in two different ways.
2011-08-17 10:28:33 -07:00
Linus Torvalds 950d0a10d1 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  irq: Track the owner of irq descriptor
  irq: Always set IRQF_ONESHOT if no primary handler is specified
  genirq: Fix wrong bit operation
2011-08-17 10:23:50 -07:00
Jeff Moyer 4853abaae7 block: fix flush machinery for stacking drivers with differring flush flags
Commit ae1b153962, block: reimplement
FLUSH/FUA to support merge, introduced a performance regression when
running any sort of fsyncing workload using dm-multipath and certain
storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
that dm-multipath always advertised flush+fua support, and passed
commands on down the stack, where those flags used to get stripped off.
The above commit changed that behavior:

static inline struct request *__elv_next_request(struct request_queue *q)
{
        struct request *rq;

        while (1) {
-               while (!list_empty(&q->queue_head)) {
+               if (!list_empty(&q->queue_head)) {
                        rq = list_entry_rq(q->queue_head.next);
-                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
-                           (rq->cmd_flags & REQ_FLUSH_SEQ))
-                               return rq;
-                       rq = blk_do_flush(q, rq);
-                       if (rq)
-                               return rq;
+                       return rq;
                }

Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
        unsigned int fflags = q->flush_flags; /* may change, cache it */
        bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
        bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
        bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
        REQ_FUA);
        unsigned skip = 0;
...
        if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                rq->cmd_flags &= ~REQ_FLUSH;
		if (!has_fua)
			rq->cmd_flags &= ~REQ_FUA;
	        return rq;
	}

So, the flush machinery was bypassed in such cases (q->flush_flags == 0
&& rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

Now, however, we don't get into the flush machinery at all.  Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even if the underlying request_queue does not
support flush or fua.

The agreed upon approach is to fix the flush machinery to allow
stacking.  While this isn't used in practice (since there is only one
request-based dm target, and that target will now reflect the flush
flags of the underlying device), it does future-proof the solution, and
make it function as designed.

In order to make this work, I had to add a field to the struct request,
inside the flush structure (to store the original req->end_io).  Shaohua
had suggested overloading the union with rb_node and completion_data,
but the completion data is used by device mapper and can also be used by
other drivers.  So, I didn't see a way around the additional field.

I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
the lost performance.  Comments and other testers, as always, are
appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-15 21:37:25 +02:00
Linus Torvalds 97c24d1d45 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: remove unused "ddr" parameter in struct mmc_ios
  mmc: dw_mmc: Fix DDR mode support.
  mmc: core: use defined R1_STATE_PRG macro for card status
  mmc: sdhci: use f_max instead of host->clock for timeouts
  mmc: sdhci: move timeout_clk calculation farther down
  mmc: sdhci: check host->clock before using it as a denominator
  mmc: Revert "mmc: sdhci: Fix SDHCI_QUIRK_TIMEOUT_USES_SDCLK"
  mmc: tmio: eliminate unused variable 'mmc' warning
  mmc: esdhc-imx: fix card interrupt loss on freescale eSDHC
  mmc: sdhci-s3c: Fix build for header change
  mmc: dw_mmc: Fix mask in IDMAC_SET_BUFFER1_SIZE macro
  mmc: cb710: fix possible pci_dev leak in cb710_pci_configure()
  mmc: core: Detect eMMC v4.5 ext_csd entries
  mmc: mmc_test: avoid stalled file in debugfs
  mmc: sdhci-s3c: add BROKEN_ADMA_ZEROLEN_DESC quirk
  mmc: sdhci: pxav3: controller needs 32 bit ADMA addressing
  mmc: sdhci: fix retuning timer wrongly deleted in sdhci_tasklet_finish
2011-08-14 12:28:15 -07:00
Rafael J. Wysocki 17f2ae7f67 PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
Function genpd_queue_power_off_work() is not defined for
CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build
error to happen in that case.  Fix the problem by making
pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-08-14 13:34:31 +02:00
Linus Torvalds 17987783e5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ASoC: Fix compile warning in wm8750.c
  ASoC: omap: Update e-mail address of Jarkko Nikula
  ASoC: SAMSUNG: Add I2S0 internal dma driver
  ASoC: Terminate WM8750 SPI device ID table
  ASoC: Add missing break in WM8994 probe
  ALSA: snd-usb-caiaq: Correct offset fields of outbound iso_frame_desc
  ALSA: azt3328 - adjust error handling code to include debugging code
  ALSA: hda - Add CONFIG_SND_HDA_POWER_SAVE to stac_vrefout_set()
  ALSA: usb-audio - Add quirk for BOSS Micro BR-80
  ASoC: Fix typo in wm8750 spi_ids
  ASoC: Fix warning in Speyside WM8962
  ASoC: Fix SPI driver binding for WM8987
  ASoC: Fix binding of WM8750 on Jive
  ASoC: WM8903: Free IRQ on device removal
  ASoC: Tegra: wm8903 machine driver: Allow re-insertion of module
  ASoC: Tegra: tegra_pcm_deallocate_dma_buffer: Don't OOPS
2011-08-13 18:36:28 -07:00
Jaehoon Chung 7fd781e8f9 mmc: remove unused "ddr" parameter in struct mmc_ios
"mmc: dw_mmc: Fix DDR mode support" removed the last user.

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2011-08-13 14:50:32 -04:00
Linus Torvalds 73e0881d31 Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
  dt: add empty of_get_property for non-dt
2011-08-12 21:59:09 -07:00
Linus Torvalds ce8a84ef1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  e1000e: increase driver version number
  e1000e: alternate MAC address update
  e1000e: do not disable receiver on 82574/82583
  e1000e: alternate MAC address does not work on device id 0x1060
  PCnet: Fix section mismatch
  bnx2x: disable dcb on 578xx since not supported yet
  bnx2x: properly clean indirect addresses
  bnx2x: prevent race between undi_unload and load flows
  bnx2x: fix select_queue when FCoE is disabled
  bnx2x: init FCOE FP only once
  ipv4: some rt_iif -> rt_route_iif conversions
  net/bridge/netfilter/ebtables.c: use available error handling code
  net/netlabel/netlabel_kapi.c: add missing cleanup code
  net/irda: sh_sir: tidyup compile warning
  net/irda: sh_sir: add missing header
  net/irda: sh_irda: add missing header
  slcan: ldisc generated skbs are received in softirq context
  scm: Capture the full credentials of the scm sender
  tcp: initialize variable ecn_ok in syncookies path
  drivers/net/wireless/wl1251: add missing kfree
  ...
2011-08-12 06:43:53 -07:00
Jarkko Nikula 7ec41ee5ad ASoC: omap: Update e-mail address of Jarkko Nikula
My gmail account got disabled and I'm not going to reopen it.

Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com>
Acked-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-08-12 11:45:10 +09:00
Vasiliy Kulikov 72fa59970f move RLIMIT_NPROC check from set_user() to do_execve_common()
The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.

Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only.  But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges.  So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes.  Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve().  If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected.  If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().

The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.

Similar check was introduced in -ow patches (without the process flag).

v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().

Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 11:24:42 -07:00
Namhyung Kim c09c47caed blktrace: add FLUSH/FUA support
Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
FUA follows WRITE, use the same 'F' flag for both cases and
distinguish them by their (relative) position. The end results
look like (other flags might be shown also):

 - WRITE:            W
 - WRITE_FLUSH:      FW
 - WRITE_FUA:        WF
 - WRITE_FLUSH_FUA:  FWF

Note that we reuse TC_BARRIER due to lack of bit space of act_mask
so that the older versions of blktrace tools will report flush
requests as barriers from now on.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-11 10:36:05 +02:00
Matthew Wilcox 8e4bf84474 Move some REQ flags to the common bio/request area
REQ_SECURE, REQ_FLUSH and REQ_FUA may all be set on a bio as well as
on a request, so relocate them to the shared part of the enum.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-11 10:36:03 +02:00
Stephen Warren 89272b8c0d dt: add empty of_get_property for non-dt
The patch adds empty function of_get_property for non-dt build, so that
drivers migrating to dt can save some '#ifdef CONFIG_OF'.

This also fixes the current Tegra compile problem in linux-next.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-08-09 11:27:16 -06:00
Linus Torvalds 6bb615bc39 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  sound: pss - don't use the deprecated function check_region
  ALSA: timer - Add NULL-check for invalid slave timer
  ALSA: timer - Fix Oops at closing slave timer
  ASoC: Acknowledge WM8996 interrupts before acting on them
  ASoC: Rename WM8915 to WM8996
  ALSA: Fix dependency of CONFIG_SND_TEA575X
  ALSA: asihpi - use kzalloc()
  ALSA: snd-usb-caiaq: Fix keymap for RigKontrol3
  ALSA: snd-usb: Fix uninitialized variable usage
  ALSA: hda - Fix a complile warning in patch_via.c
  ALSA: hdspm - Fix uninitialized compile warnings
  ALSA: usb-audio - add quirk for Keith McMillen StringPort
  ALSA: snd-usb: operate on given mixer interface only
  ALSA: snd-usb: avoid dividing by zero on invalid input
  ALSA: snd-usb: Accept UAC2 FORMAT_TYPE descriptors with bLength > 6
  sound: oss/pas2: Remove CLOCK_TICK_RATE dependency from PAS16 driver
  ALSA: hda - Use auto-parser for ASUS UX50, Eee PC P901, S101 and P1005
  ALSA: hda - Fix digital-mic mono recording on ASUS Eee PC
  ASoC: sgtl5000: fix cache handling
  ASoC: Disable wm_hubs periodic DC servo update
2011-08-09 08:41:36 -07:00
Peter Zijlstra 5c723ba5b7 mm: Fix fixup_user_fault() for MMU=n
In commit 2efaca927f ("mm/futex: fix futex writes on archs with SW
tracking of dirty & young") we forgot about MMU=n.  This patch fixes
that.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/1311761831.24752.413.camel@twins
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08 12:11:02 -07:00
Linus Torvalds 638a843909 cred: use 'const' in get_current_{user,groups}
Avoid annoying warnings from these functions ("discards qualifiers")
because they assign 'current_cred()' to a non-const pointer.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08 11:33:23 -07:00
David Howells 27e4e43627 CRED: Restore const to current_cred()
Commit 3295514841 ("fix rcu annotations noise in cred.h") accidentally
dropped the const of current->cred inside current_cred() by the
insertion of a cast to deal with an RCU annotation loss warning from
sparce.

Use an appropriate RCU wrapper instead so as not to lose the const.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08 09:03:16 -07:00
Miklos Szeredi 37fb3a30b4 fuse: fix flock
Commit a9ff4f87 "fuse: support BSD locking semantics" overlooked a
number of issues with supporing flock locks over existing POSIX
locking infrastructure:

  - it's not backward compatible, passing flock(2) calls to userspace
    unconditionally (if userspace sets FUSE_POSIX_LOCKS)

  - it doesn't cater for the fact that flock locks are automatically
    unlocked on file release

  - it doesn't take into account the fact that flock exclusive locks
    (write locks) don't need an fd opened for write.

The last one invalidates the original premise of the patch that flock
locks can be emulated with POSIX locks.

This patch fixes the first two issues.  The last one needs to be fixed
in userspace if the filesystem assumed that a write lock will happen
only on a file operned for write (as in the case of the current fuse
library).

Reported-by: Sebastian Pipping <webmaster@hartwork.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-08-08 16:08:08 +02:00
Julian Anastasov 47670b767b ipv4: route non-local sources for raw socket
The raw sockets can provide source address for
routing but their privileges are not considered. We
can provide non-local source address, make sure the
FLOWI_FLAG_ANYSRC flag is set if socket has privileges
for this, i.e. based on hdrincl (IP_HDRINCL) and
transparent flags.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:52:32 -07:00
David S. Miller 6602a4baf4 net: Make userland include of netlink.h more sane.
Currently userland will barf when including linux/netlink.h unless it
precisely includes sys/socket.h first.  The issue is where the
definition of "sa_family_t" comes from.

We've been back and forth on how to fix this issue in the past, see:

http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621
http://thread.gmane.org/gmane.linux.network/143380

Ben Hutchings suggested we take a hint from how we handle the
sockaddr_storage type.  First we define a "__kernel_sa_family_t"
to linux/socket.h that is always defined.

Then if __KERNEL__ is defined, we also define "sa_family_t" as
equal to "__kernel_sa_family_t".

Then in places like linux/netlink.h we use __kernel_sa_family_t
in user visible datastructures.

Reported-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:48:07 -07:00
Mark Brown a9ba615134 ASoC: Rename WM8915 to WM8996
For marketing reasons the part will be called WM8996. In order to avoid
user confusion rename the driver to reflect this.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Acked-by: Liam Girdwood <lrg@ti.com>
2011-08-08 14:30:37 +09:00
Al Viro 3295514841 fix rcu annotations noise in cred.h
task->cred is declared as __rcu, and access to other tasks' ->cred is,
indeed, protected.  Access to current->cred does not need rcu_dereference()
at all, since only the task itself can change its ->cred.  sparse, of
course, has no way of knowing that...

Add force-cast in current_cred(), make current_fsuid() et.al. use it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-07 13:42:35 -07:00
Linus Torvalds c2f340a69c Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  ore: Make ore its own module
  exofs: Rename raid engine from exofs/ios.c => ore
  exofs: ios: Move to a per inode components & device-table
  exofs: Move exofs specific osd operations out of ios.c
  exofs: Add offset/length to exofs_get_io_state
  exofs: Fix truncate for the raid-groups case
  exofs: Small cleanup of exofs_fill_super
  exofs: BUG: Avoid sbi realloc
  exofs: Remove pnfs-osd private definitions
  nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
2011-08-06 22:56:03 -07:00
Linus Torvalds 3ddcd0569c vfs: optimize inode cache access patterns
The inode structure layout is largely random, and some of the vfs paths
really do care.  The path lookup in particular is already quite D$
intensive, and profiles show that accessing the 'inode->i_op->xyz'
fields is quite costly.

We already optimized the dcache to not unnecessarily load the d_op
structure for members that are often NULL using the DCACHE_OP_xyz bits
in dentry->d_flags, and this does something very similar for the inode
ops that are used during pathname lookup.

It also re-orders the fields so that the fields accessed by 'stat' are
together at the beginning of the inode structure, and roughly in the
order accessed.

The effect of this seems to be in the 1-2% range for an empty kernel
"make -j" run (which is fairly kernel-intensive, mostly in filename
lookup), so it's visible.  The numbers are fairly noisy, though, and
likely depend a lot on exact microarchitecture.  So there's more tuning
to be done.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 22:53:23 -07:00
Linus Torvalds 830c0f0edc vfs: renumber DCACHE_xyz flags, remove some stale ones
Gcc tends to generate better code with small integers, including the
DCACHE_xyz flag tests - so move the common ones to be first in the list.
Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
tree.

And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
common case to be a nice straight-line fall-through.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 22:52:40 -07:00
Linus Torvalds 7cd4767e69 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: Compute protocol sequence numbers and fragment IDs using MD5.
  crypto: Move md5_transform to lib/md5.c
2011-08-06 22:12:37 -07:00
Boaz Harrosh 8ff660ab85 exofs: Rename raid engine from exofs/ios.c => ore
ORE stands for "Objects Raid Engine"

This patch is a mechanical rename of everything that was in ios.c
and its API declaration to an ore.c and an osd_ore.h header. The ore
engine will later be used by the pnfs objects layout driver.

* File ios.c => ore.c

* Declaration of types and API are moved from exofs.h to a new
  osd_ore.h

* All used types are prefixed by ore_ from their exofs_ name.

* Shift includes from exofs.h to osd_ore.h so osd_ore.h is
  independent, include it from exofs.h.

Other than a pure rename there are no other changes. Next patch
will move the ore into it's own module and will export the API
to be used by exofs and later the layout driver

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:36:18 -07:00
David S. Miller 6e5714eaf7 net: Compute protocol sequence numbers and fragment IDs using MD5.
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation.  So the periodic
regeneration and 8-bit counter have been removed.  We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky <dan@doxpara.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-06 18:33:19 -07:00
David S. Miller bc0b96b54a crypto: Move md5_transform to lib/md5.c
We are going to use this for TCP/IP sequence number and fragment ID
generation.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-06 18:32:45 -07:00