Commit Graph

561221 Commits

Author SHA1 Message Date
Jeff Layton 616c319683 nfs: ensure that attrcache is revalidated after a SETATTR
If we get no post-op attributes back from a SETATTR operation, then no
attributes will of course be updated during the call to
nfs_update_inode.

We know however that the attributes are invalid at that point, since we
just changed some of them. At the very least, the ctime will be bogus.
If we get no post-op attributes back on the call, mark the attrcache
invalid to reflect that fact.

Reviewed-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-25 15:24:30 -05:00
Gabriele Paoloni 7c7a0e9453 ARM/PCI: Move align_resource function pointer to pci_host_bridge structure
Commit b3a72384fe ("ARM/PCI: Replace pci_sys_data->align_resource with
global function pointer") introduced an ARM-specific align_resource()
function pointer.  This is not portable to other arches and doesn't work
for platforms with two different PCIe host bridge controllers.

Move the function pointer to the pci_host_bridge structure so each host
bridge driver can specify its own align_resource() function.

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2015-11-25 13:23:38 -06:00
Linus Torvalds 9b81d512a4 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull more block layer fixes from Jens Axboe:
 "I wasn't going to send off a new pull before next week, but the blk
  flush fix from Jan from the other day introduced a regression.  It's
  rare enough not to have hit during testing, since it requires both a
  device that rejects the first flush, and bad timing while it does
  that.  But since someone did hit it, let's get the revert into 4.4-rc3
  so we don't have a released rc with that known issue.

  Apart from that revert, three other fixes:

   - From Christoph, a fix for a missing unmap in NVMe request
     preparation.

   - An NVMe fix from Nishanth that fixes data corruption on powerpc.

   - Also from Christoph, fix a list_del() attempt on blk-mq that didn't
     have a matching list_add() at timer start"

* 'for-linus' of git://git.kernel.dk/linux-block:
  Revert "blk-flush: Queue through IO scheduler when flush not required"
  block: fix blk_abort_request for blk-mq drivers
  nvme: add missing unmaps in nvme_queue_rq
  NVMe: default to 4k device page size
2015-11-25 11:08:35 -08:00
Grygorii Strashko 918af9f941 ARM: OMAP4+: SMP: use lockless clkdm/pwrdm api in omap4_boot_secondary
OMAP CPU hotplug uses cpu1's clocks and power domains for CPU1 wake up
from low power states (or turn on CPU1). This part of code is also
part of system suspend (disable_nonboot_cpus()).
>From other side, cpu1's clocks and power domains are used by CPUIdle. All above
functionality is mutually exclusive and, therefore, lockless clkdm/pwrdm api
can be used in omap4_boot_secondary().

This fixes below back-trace on -RT which is triggered by
pwrdm_lock/unlock():

BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
 in_atomic(): 1, irqs_disabled(): 0, pid: 118, name: sh
 9 locks held by sh/118:
  #0:  (sb_writers#4){.+.+.+}, at: [<c0144a6c>] vfs_write+0x13c/0x164
  #1:  (&of->mutex){+.+.+.}, at: [<c01b4c70>] kernfs_fop_write+0x48/0x19c
  #2:  (s_active#24){.+.+.+}, at: [<c01b4c78>] kernfs_fop_write+0x50/0x19c
  #3:  (device_hotplug_lock){+.+.+.}, at: [<c03cbff0>] lock_device_hotplug_sysfs+0xc/0x4c
  #4:  (&dev->mutex){......}, at: [<c03cd284>] device_online+0x14/0x88
  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<c003af90>] cpu_up+0x50/0x1a0
  #6:  (cpu_hotplug.lock){++++++}, at: [<c003ae48>] cpu_hotplug_begin+0x0/0xc4
  #7:  (cpu_hotplug.lock#2){+.+.+.}, at: [<c003aec0>] cpu_hotplug_begin+0x78/0xc4
  #8:  (boot_lock){+.+...}, at: [<c002b254>] omap4_boot_secondary+0x1c/0x178
 Preemption disabled at:[<  (null)>]   (null)

 CPU: 0 PID: 118 Comm: sh Not tainted 4.1.12-rt11-01998-gb4a62c3-dirty #137
 Hardware name: Generic DRA74X (Flattened Device Tree)
 [<c0017574>] (unwind_backtrace) from [<c0013be8>] (show_stack+0x10/0x14)
 [<c0013be8>] (show_stack) from [<c05a8670>] (dump_stack+0x80/0x94)
 [<c05a8670>] (dump_stack) from [<c05ad158>] (rt_spin_lock+0x24/0x54)
 [<c05ad158>] (rt_spin_lock) from [<c0030dac>] (clkdm_wakeup+0x10/0x2c)
 [<c0030dac>] (clkdm_wakeup) from [<c002b2c0>] (omap4_boot_secondary+0x88/0x178)
 [<c002b2c0>] (omap4_boot_secondary) from [<c0015d00>] (__cpu_up+0xc4/0x164)
 [<c0015d00>] (__cpu_up) from [<c003b09c>] (cpu_up+0x15c/0x1a0)
 [<c003b09c>] (cpu_up) from [<c03cd2d4>] (device_online+0x64/0x88)
 [<c03cd2d4>] (device_online) from [<c03cd360>] (online_store+0x68/0x74)
 [<c03cd360>] (online_store) from [<c01b4ce0>] (kernfs_fop_write+0xb8/0x19c)
 [<c01b4ce0>] (kernfs_fop_write) from [<c0144124>] (__vfs_write+0x20/0xd8)
 [<c0144124>] (__vfs_write) from [<c01449c0>] (vfs_write+0x90/0x164)
 [<c01449c0>] (vfs_write) from [<c01451e4>] (SyS_write+0x44/0x9c)
 [<c01451e4>] (SyS_write) from [<c0010240>] (ret_fast_syscall+0x0/0x54)
 CPU1: smp_ops.cpu_die() returned, trying to resuscitate

Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2015-11-25 11:03:20 -08:00
Tony Lindgren 970259bff4 Merge branch '81xx' into omap-for-v4.4/fixes 2015-11-25 10:56:40 -08:00
Neil Armstrong 29f5b34ca1 arm: omap2+: add missing HWMOD_NO_IDLEST in 81xx hwmod data
Add missing HWMOD_NO_IDLEST hwmod flag for entries not
having omap4 clkctrl values.
The emac0 hwmod flag fixes the davinci_emac driver probe
since the return of pm_resume() call is now checked.

This solves the following boot errors :
[    0.121429] omap_hwmod: l4_ls: _wait_target_ready failed: -16
[    0.121441] omap_hwmod: l4_ls: cannot be enabled for reset (3)
[    0.124342] omap_hwmod: l4_hs: _wait_target_ready failed: -16
[    0.124352] omap_hwmod: l4_hs: cannot be enabled for reset (3)
[    1.967228] omap_hwmod: emac0: _wait_target_ready failed: -16

Cc: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2015-11-25 10:54:22 -08:00
Jens Axboe dcd8376c36 Revert "blk-flush: Queue through IO scheduler when flush not required"
This reverts commit 1b2ff19e6a.

Jan writes:

--

Thanks for report! After some investigation I found out we allocate
elevator specific data in __get_request() only for non-flush requests. And
this is actually required since the flush machinery uses the space in
struct request for something else. Doh. So my patch is just wrong and not
easy to fix since at the time __get_request() is called we are not sure
whether the flush machinery will be used in the end. Jens, please revert
1b2ff19e6a. Thanks!

I'm somewhat surprised that you can reliably hit the race where flushing
gets disabled for the device just while the request is in flight. But I
guess during boot it makes some sense.

--

So let's just revert it, we can fix the queue run manually after the
fact. This race is rare enough that it didn't trigger in testing, it
requires the specific disable-while-in-flight scenario to trigger.
2015-11-25 10:12:54 -07:00
Linus Torvalds 4cf193b4b2 Bug fixes for all architectures. Nothing really stands out.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWVcuuAAoJEL/70l94x66DmUEIAKnU6SCojoFOxWY0/EH/PBue
 m53mjiRiHp+YH/74dW0XF843+IKLfbLiADRaWHTqc9VW0ifnXmRjOv/bYpC7I/+R
 8XKHaJZQfpb6yvICEqWvMItBpddoakbhv8DJOf4bUfipNY0zx5F2STFfx0KICtbc
 mHTB4y5bFgIz8mJBLX+Dmh/UyXL0kbjSnksu0WA80Szr0pq2Sr4Csrx8PqGAEfIJ
 e5DUW0h3UXY77J5fQbpgJs93hzp1YwkuRKEeYpB8POx4fmvssHoybmOk46sP0Ipb
 IYxrJ+CUQ4o6Vpp3LTMjzMfJ4Y/NaOHCvYxL0okhxtUuq+UbUZjM5ziclfQ/32M=
 =cbxQ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Bug fixes for all architectures.  Nothing really stands out"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: nVMX: remove incorrect vpid check in nested invvpid emulation
  arm64: kvm: report original PAR_EL1 upon panic
  arm64: kvm: avoid %p in __kvm_hyp_panic
  KVM: arm/arm64: vgic: Trust the LR state for HW IRQs
  KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active
  KVM: arm/arm64: Fix preemptible timer active state crazyness
  arm64: KVM: Add workaround for Cortex-A57 erratum 834220
  arm64: KVM: Fix AArch32 to AArch64 register mapping
  ARM/arm64: KVM: test properly for a PTE's uncachedness
  KVM: s390: fix wrong lookup of VCPUs by array index
  KVM: s390: avoid memory overwrites on emergency signal injection
  KVM: Provide function for VCPU lookup by id
  KVM: s390: fix pfmf intercept handler
  KVM: s390: enable SIMD only when no VCPUs were created
  KVM: x86: request interrupt window when IRQ chip is split
  KVM: x86: set KVM_REQ_EVENT on local interrupt request from user space
  KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection
  KVM: x86: fix interrupt window handling in split IRQ chip case
  MIPS: KVM: Uninit VCPU in vcpu_create error path
  MIPS: KVM: Fix CACHE immediate offset sign extension
  ...
2015-11-25 09:01:49 -08:00
Alex Deucher 9c565e3386 drm/radeon: make some dpm errors debug only
"Could not force DPM to low", etc. is usually harmless and
just confuses users.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2015-11-25 11:28:42 -05:00
Mark Rutland 3b12acf4c9 arm64: efi: correctly map runtime regions
The kernel may use a page granularity of 4K, 16K, or 64K depending on
configuration.

When mapping EFI runtime regions, we use memrange_efi_to_native to round
the physical base address of a region down to a kernel page boundary,
and round the size up to a kernel page boundary, adding the residue left
over from rounding down the physical base address. We do not round down
the virtual base address.

In __create_mapping we account for the offset of the virtual base from a
granule boundary, adding the residue to the size before rounding the
base down to said granule boundary.

Thus we account for the residue twice, and when the residue is non-zero
will cause __create_mapping to map an additional page at the end of the
region. Depending on the memory map, this page may be in a region we are
not intended/permitted to map, or may clash with a different region that
we wish to map. In typical cases, mapping the next item in the memory
map will overwrite the erroneously created entry, as we sort the memory
map in the stub.

As __create_mapping can cope with base addresses which are not page
aligned, we can instead rely on it to map the region appropriately, and
simplify efi_virtmap_init by removing the unnecessary code.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-11-25 15:49:17 +00:00
Mark Rutland c03784ee8a arm64: mm: fix fault_info table xFSC decoding
We are missing descriptions for some valid xFSC values in the fault info
table (e.g. "TLB conflict abort"), and have erroneous descriptions for
reserved values (e.g. "asynchronous external abort", "debug event").

This patch adds the missing xFSC values, and removes erroneous decoding
of values reserved by the architecture, as described in ARM DDI 0487A.h.

At the same time, fixed the unbalanced brackets for the synchronous
parity error strings in the table.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-11-25 15:49:16 +00:00
Arnd Bergmann fbc416ff86 arm64: fix building without CONFIG_UID16
As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16
disabled currently fails because the system call table still needs to
reference the individual function entry points that are provided by
kernel/sys_ni.c in this case, and the declarations are hidden inside
of #ifdef CONFIG_UID16:

arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function)
 __SYSCALL(__NR_lchown, sys_lchown16)

I believe this problem only exists on ARM64, because older architectures
tend to not need declarations when their system call table is built
in assembly code, while newer architectures tend to not need UID16
support. ARM64 only uses these system calls for compatibility with
32-bit ARM binaries.

This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is
set unconditionally on ARM64 with CONFIG_COMPAT, so we see the
declarations whenever we need them, but otherwise the behavior is
unchanged.

Fixes: af1839eb4b ("Kconfig: clean up the long arch list for the UID16 config option")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-11-25 15:49:13 +00:00
Nicolas Pitre 4d2ec7e206 ARM: orion5x: Fix legacy get_irqnr_and_base
Commit 5be9fc23cd ("ARM: orion5x: fix legacy orion5x IRQ numbers") shifted
IRQ numbers by one but didn't update the get_irqnr_and_base macro
accordingly.  This macro is involved when CONFIG_MULTI_IRQ_HANDLER
is not defined.

[jac: 5d6bed2a9c went in to v4.2, but was backported to v3.18]

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Fixes: 5be9fc23cd ("ARM: orion5x: fix legacy orion5x IRQ numbers")
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2015-11-25 15:01:00 +00:00
Nicolas Pitre c1c90728ef ARM: dove: Fix legacy get_irqnr_and_base
Commit 5d6bed2a9c ("ARM: dove: fix legacy dove IRQ numbers") shifted
IRQ numbers by one but didn't update the get_irqnr_and_base macro
accordingly.  This macro is involved when CONFIG_MULTI_IRQ_HANDLER
is not defined.

[jac: 5d6bed2a9c went in to v4.2, but was backported to v3.18]

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Fixes: 5d6bed2a9c ("ARM: dove: fix legacy dove IRQ numbers")
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2015-11-25 14:59:12 +00:00
Haozhong Zhang b2467e744f KVM: nVMX: remove incorrect vpid check in nested invvpid emulation
This patch removes the vpid check when emulating nested invvpid
instruction of type all-contexts invalidation. The existing code is
incorrect because:
 (1) According to Intel SDM Vol 3, Section "INVVPID - Invalidate
     Translations Based on VPID", invvpid instruction does not check
     vpid in the invvpid descriptor when its type is all-contexts
     invalidation.
 (2) According to the same document, invvpid of type all-contexts
     invalidation does not require there is an active VMCS, so/and
     get_vmcs12() in the existing code may result in a NULL-pointer
     dereference. In practice, it can crash both KVM itself and L1
     hypervisors that use invvpid (e.g. Xen).

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 15:52:55 +01:00
Holger Hoffstätte dba72cb30b btrfs: fix balance range usage filters in 4.4-rc
There's a regression in 4.4-rc since commit bc3094673f
(btrfs: extend balance filter usage to take minimum and maximum) in that
existing (non-ranged) balance with -dusage=x no longer works; all chunks
are skipped.

After staring at the code for a while and wondering why a non-ranged
balance would even need min and max thresholds (..which then were not
set correctly, leading to the bug) I realized that the only problem
was the fact that the filter functions were named wrong, thanks to
patching copypasta. Simply renaming both functions lets the existing
btrfs-progs call balance with -dusage=x and now the non-ranged filter
function is invoked, properly using only a single chunk limit.

Signed-off-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Fixes: bc3094673f ("btrfs: extend balance filter usage to take minimum and maximum")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:27:33 -08:00
Mark Fasheh 82bd101b52 btrfs: qgroup: account shared subtree during snapshot delete
Commit 0ed4792 ('btrfs: qgroup: Switch to new extent-oriented qgroup
mechanism.') removed our qgroup accounting during
btrfs_drop_snapshot(). Predictably, this results in qgroup numbers
going bad shortly after a snapshot is removed.

Fix this by adding a dirty extent record when we encounter extents during
our shared subtree walk. This effectively restores the functionality we had
with the original shared subtree walking code in 1152651 (btrfs: qgroup:
account shared subtrees during snapshot delete).

The idea with the original patch (and this one) is that shared subtrees can
get skipped during drop_snapshot. The shared subtree walk then allows us a
chance to visit those extents and add them to the qgroup work for later
processing. This ultimately makes the accounting for drop snapshot work.

The new qgroup code nicely handles all the other extents during the tree
walk via the ref dec/inc functions so we don't have to add actions beyond
what we had originally.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:27:33 -08:00
Josef Bacik 2d9e977610 Btrfs: use btrfs_get_fs_root in resolve_indirect_ref
The backref code will look up the fs_root we're trying to resolve our indirect
refs for, unfortunately we use btrfs_read_fs_root_no_name, which returns -ENOENT
if the ref is 0.  This isn't helpful for the qgroup stuff with snapshot delete
as it won't be able to search down the snapshot we are deleting, which will
cause us to miss roots.  So use btrfs_get_fs_root and send false for check_ref
so we can always get the root we're looking for.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:22:08 -08:00
Justin Maggard 967ef5131e btrfs: qgroup: fix quota disable during rescan
There's a race condition that leads to a NULL pointer dereference if you
disable quotas while a quota rescan is running.  To fix this, we just need
to wait for the quota rescan worker to actually exit before tearing down
the quota structures.

Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:22:08 -08:00
Filipe Manana 036a9348dc Btrfs: fix race between cleaner kthread and space cache writeout
When a block group becomes unused and the cleaner kthread is currently
running, we can end up getting the current transaction aborted with error
-ENOENT when we try to commit the transaction, leading to the following
trace:

  [59779.258768] WARNING: CPU: 3 PID: 5990 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]()
  [59779.272594] BTRFS: Transaction aborted (error -2)
  (...)
  [59779.291137] Call Trace:
  [59779.291621]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
  [59779.292543]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
  [59779.293435]  [<ffffffffa04cb81f>] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
  [59779.295000]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
  [59779.296138]  [<ffffffffa04c2721>] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs]
  [59779.297663]  [<ffffffffa04cb81f>] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
  [59779.299141]  [<ffffffffa0549b0d>] commit_cowonly_roots+0x1de/0x261 [btrfs]
  [59779.300359]  [<ffffffffa04dd5b6>] btrfs_commit_transaction+0x4c4/0x99c [btrfs]
  [59779.301805]  [<ffffffffa04b5df4>] btrfs_sync_fs+0x145/0x1ad [btrfs]
  [59779.302893]  [<ffffffff81196634>] sync_filesystem+0x7f/0x93
  (...)
  [59779.318186] ---[ end trace 577e2daff90da33a ]---

The following diagram illustrates a sequence of steps leading to this
problem:

       CPU 1                                             CPU 2

                           <at transaction N>

                                                        adds bg A to list
                                                        fs_info->unused_bgs

                                                        adds bg B to list
                                                        fs_info->unused_bgs

                           <transaction kthread
                            commits transaction N
                            and wakes up the
                            cleaner kthread>

  cleaner kthread
    delete_unused_bgs()

      sees bg A in list
      fs_info->unused_bgs

      btrfs_start_transaction()

                           <transaction N + 1 starts>

      deletes bg A

                                                        update_block_group(bg C)

                                                          --> adds bg C to list
                                                              fs_info->unused_bgs

      deletes bg B

      sees bg C in the list
      fs_info->unused_bgs

      btrfs_remove_chunk(bg C)
        btrfs_remove_block_group(bg C)

          --> checks if the block group
              is in a dirty list, and
              because it isn't now, it
              does nothing

          --> the block group item
              is deleted from the
              extent tree

                                                          --> adds bg C to list
                                                              transaction->dirty_bgs

                                                         some task calls
                                                         btrfs_commit_transaction(t N + 1)
                                                           commit_cowonly_roots()
                                                             btrfs_write_dirty_block_groups()
                                                               --> sees bg C in cur_trans->dirty_bgs
                                                               --> calls write_one_cache_group()
                                                                   which returns -ENOENT because
                                                                   it did not find the block group
                                                                   item in the extent tree
                                                               --> transaction aborte with -ENOENT
                                                                   because write_one_cache_group()
                                                                   returned that error

So fix this by adding a block group to the list of dirty block groups
before adding it to the list of unused block groups.

This happened on a stress test using fsstress plus concurrent calls to
fallocate 20G and truncate (releasing part of the space allocated with
fallocate).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:22:08 -08:00
Filipe Manana 758f2dfcf8 Btrfs: fix scrub preventing unused block groups from being deleted
Currently scrub can race with the cleaner kthread when the later attempts
to delete an unused block group, and the result is preventing the cleaner
kthread from ever deleting later the block group - unless the block group
becomes used and unused again. The following diagram illustrates that
race:

              CPU 1                                 CPU 2

 cleaner kthread
   btrfs_delete_unused_bgs()

     gets block group X from
     fs_info->unused_bgs and
     removes it from that list

                                             scrub_enumerate_chunks()

                                               searches device tree using
                                               its commit root

                                               finds device extent for
                                               block group X

                                               gets block group X from the tree
                                               fs_info->block_group_cache_tree
                                               (via btrfs_lookup_block_group())

                                               sets bg X to RO

     sees the block group is
     already RO and therefore
     doesn't delete it nor adds
     it back to unused list

So fix this by making scrub add the block group again to the list of
unused block groups if the block group is still unused when it finished
scrubbing it and it hasn't been removed already.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:22:08 -08:00
Filipe Manana 020d5b7366 Btrfs: fix race between scrub and block group deletion
Scrub can race with the cleaner kthread deleting block groups that are
unused (and with relocation too) leading to a failure with error -EINVAL
that gets returned to user space.

The following diagram illustrates how it happens:

              CPU 1                                 CPU 2

 cleaner kthread
   btrfs_delete_unused_bgs()

     gets block group X from
     fs_info->unused_bgs

     sets block group to RO

       btrfs_remove_chunk(bg X)

         deletes device extents

                                         scrub_enumerate_chunks()

                                           searches device tree using
                                           its commit root

                                           finds device extent for
                                           block group X

                                           gets block group X from the tree
                                           fs_info->block_group_cache_tree
                                           (via btrfs_lookup_block_group())

                                           sets bg X to RO (again)

          btrfs_remove_block_group(bg X)

            deletes block group from
            fs_info->block_group_cache_tree

            removes extent map from
            fs_info->mapping_tree

                                               scrub_chunk(offset X)

                                                 searches fs_info->mapping_tree
                                                 for extent map starting at
                                                 offset X

                                                    --> doesn't find any such
                                                        extent map
                                                    --> returns -EINVAL and scrub
                                                        errors out to userspace
                                                        with -EINVAL

Fix this by dealing with an extent map lookup failure as an indicator of
block group deletion.
Issue reproduced with fstest btrfs/071.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:19:51 -08:00
David Sterba 31388ab2ed btrfs: fix rcu warning during device replace
The test btrfs/011 triggers a rcu warning
Reviewed-by: Anand Jain <anand.jain@oracle.com>

===============================
[ INFO: suspicious RCU usage. ]
4.4.0-rc1-default+ #286 Tainted: G        W
-------------------------------
fs/btrfs/volumes.c:1977 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
4 locks held by btrfs/28786:

0:  (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+...}, at: [<ffffffffa00bc785>] btrfs_dev_replace_finishing+0x45/0xa00 [btrfs]
1:  (uuid_mutex){+.+.+.}, at: [<ffffffffa00bc84f>] btrfs_dev_replace_finishing+0x10f/0xa00 [btrfs]
2:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa00bc868>] btrfs_dev_replace_finishing+0x128/0xa00 [btrfs]
3:  (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa00bc87d>] btrfs_dev_replace_finishing+0x13d/0xa00 [btrfs]

stack backtrace:
CPU: 0 PID: 28786 Comm: btrfs Tainted: G        W       4.4.0-rc1-default+ #286
Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS ASNBCPT1.86C.0031.B00.1006301607 06/30/2010
0000000000000001 ffff8800a07dfb48 ffffffff8141d47b 0000000000000001
0000000000000001 0000000000000000 ffff8801464a4f00 ffff8800a07dfb78
ffffffff810cd883 ffff880146eb9400 ffff8800a3698600 ffff8800a33fe220
Call Trace:
[<ffffffff8141d47b>] dump_stack+0x4f/0x74
[<ffffffff810cd883>] lockdep_rcu_suspicious+0x103/0x140
[<ffffffffa0071261>] btrfs_rm_dev_replace_remove_srcdev+0x111/0x130 [btrfs]
[<ffffffff810d354d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81449536>] ? __percpu_counter_sum+0x66/0x80
[<ffffffffa00bcc15>] btrfs_dev_replace_finishing+0x4d5/0xa00 [btrfs]
[<ffffffffa00bc96e>] ? btrfs_dev_replace_finishing+0x22e/0xa00 [btrfs]
[<ffffffffa00a8795>] ? btrfs_scrub_dev+0x415/0x6d0 [btrfs]
[<ffffffffa003ea69>] ? btrfs_start_transaction+0x9/0x20 [btrfs]
[<ffffffffa00bda79>] btrfs_dev_replace_start+0x339/0x590 [btrfs]
[<ffffffff81196aa5>] ? __might_fault+0x95/0xa0
[<ffffffffa0078638>] btrfs_ioctl_dev_replace+0x118/0x160 [btrfs]
[<ffffffff811409c6>] ? stack_trace_call+0x46/0x70
[<ffffffffa007c914>] ? btrfs_ioctl+0x24/0x1770 [btrfs]
[<ffffffffa007ce43>] btrfs_ioctl+0x553/0x1770 [btrfs]
[<ffffffff811409c6>] ? stack_trace_call+0x46/0x70
[<ffffffff811d6eb1>] ? do_vfs_ioctl+0x21/0x5a0
[<ffffffff811d6f1c>] do_vfs_ioctl+0x8c/0x5a0
[<ffffffff811e3336>] ? __fget_light+0x86/0xb0
[<ffffffff811e3369>] ? __fdget+0x9/0x20
[<ffffffff811d7451>] ? SyS_ioctl+0x21/0x80
[<ffffffff811d7483>] SyS_ioctl+0x53/0x80
[<ffffffff81b1efd7>] entry_SYSCALL_64_fastpath+0x12/0x6f

This is because of unprotected use of rcu_dereference in
btrfs_scratch_superblocks. We can't add rcu locks around the whole
function because we read the superblock.

The fix will use the rcu string buffer directly without the rcu locking.
Thi is safe as the device will not go away in the meantime. We're
holding the device list mutexes.

Restructuring the code to narrow down the rcu section turned out to be
impossible, we need to call filp_open (through update_dev_time) on the
buffer and this could call kmalloc/__might_sleep. We could call kstrdup
with GFP_ATOMIC but it's not absolutely necessary.

Fixes: 12b1c2637b (Btrfs: enhance btrfs_scratch_superblock to scratch all superblocks)
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:19:51 -08:00
Zhaolei 76a8efa171 btrfs: Continue replace when set_block_ro failed
xfstests/011 failed in node with small_size filesystem.
Can be reproduced by following script:
  DEV_LIST="/dev/vdd /dev/vde"
  DEV_REPLACE="/dev/vdf"

  do_test()
  {
      local mkfs_opt="$1"
      local size="$2"

      dmesg -c >/dev/null
      umount $SCRATCH_MNT &>/dev/null

      echo  mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}"
      mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1
      mount "${DEV_LIST[0]}" $SCRATCH_MNT

      echo -n "Writing big files"
      dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M count=1 >/dev/null 2>&1
      for ((i = 1; i <= size; i++)); do
          echo -n .
          /bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return 1
      done
      echo

      echo Start replace
      btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE" $SCRATCH_MNT || {
          dmesg
          return 1
      }
      return 0
  }

  # Set size to value near fs size
  # for example, 1897 can trigger this bug in 2.6G device.
  #
  ./do_test "-d raid1 -m raid1" 1897

System will report replace fail with following warning in dmesg:
 [  134.710853] BTRFS: dev_replace from /dev/vdd (devid 1) to /dev/vdf started
 [  135.542390] BTRFS: btrfs_scrub_dev(/dev/vdd, 1, /dev/vdf) failed -28
 [  135.543505] ------------[ cut here ]------------
 [  135.544127] WARNING: CPU: 0 PID: 4080 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x398/0x440()
 [  135.545276] Modules linked in:
 [  135.545681] CPU: 0 PID: 4080 Comm: btrfs Not tainted 4.3.0 #256
 [  135.546439] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
 [  135.547798]  ffffffff81c5bfcf ffff88003cbb3d28 ffffffff817fe7b5 0000000000000000
 [  135.548774]  ffff88003cbb3d60 ffffffff810a88f1 ffff88002b030000 00000000ffffffe4
 [  135.549774]  ffff88003c080000 ffff88003c082588 ffff88003c28ab60 ffff88003cbb3d70
 [  135.550758] Call Trace:
 [  135.551086]  [<ffffffff817fe7b5>] dump_stack+0x44/0x55
 [  135.551737]  [<ffffffff810a88f1>] warn_slowpath_common+0x81/0xc0
 [  135.552487]  [<ffffffff810a89e5>] warn_slowpath_null+0x15/0x20
 [  135.553211]  [<ffffffff81448c88>] btrfs_dev_replace_start+0x398/0x440
 [  135.554051]  [<ffffffff81412c3e>] btrfs_ioctl+0x1d2e/0x25c0
 [  135.554722]  [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
 [  135.555506]  [<ffffffff8111ab36>] ? current_kernel_time64+0x56/0xa0
 [  135.556304]  [<ffffffff81201e3d>] do_vfs_ioctl+0x30d/0x580
 [  135.557009]  [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
 [  135.557855]  [<ffffffff810011d1>] ? do_audit_syscall_entry+0x61/0x70
 [  135.558669]  [<ffffffff8120d1c1>] ? __fget_light+0x61/0x90
 [  135.559374]  [<ffffffff81202124>] SyS_ioctl+0x74/0x80
 [  135.559987]  [<ffffffff81809857>] entry_SYSCALL_64_fastpath+0x12/0x6f
 [  135.560842] ---[ end trace 2a5c1fc3205abbdd ]---

Reason:
 When big data writen to fs, the whole free space will be allocated
 for data chunk.
 And operation as scrub need to set_block_ro(), and when there is
 only one metadata chunk in system(or other metadata chunks
 are all full), the function will try to allocate a new chunk,
 and failed because no space in device.

Fix:
 When set_block_ro failed for metadata chunk, it is not a problem
 because scrub_lock paused commit_trancaction in same time, and
 metadata are always cowed, so the on-the-fly writepages will not
 write data into same place with scrub/replace.
 Let replace continue in this case is no problem.

Tested by above script, and xfstests/011, plus 100 times xfstests/070.

Changelog v1->v2:
1: Add detail comments in source and commit-message.
2: Add dmesg detail into commit-message.
3: Limit return value of -ENOSPC to be passed.
All suggested by: Filipe Manana <fdmanana@gmail.com>

Suggested-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:19:51 -08:00
David Sterba da02c68989 btrfs: fix clashing number of the enhanced balance usage filter
I've accidentally picked an already used number for the enhanced usage
filter represented by BTRFS_BALANCE_ARGS_USAGE_RANGE, clashing with
BTRFS_BALANCE_ARGS_CONVERT. Introduced during the development phase,
no backward compatibility issues.

Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: bc3094673f ("btrfs: extend balance filter usage to take minimum and maximum")
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:19:50 -08:00
Filipe Manana 7fd01182d1 Btrfs: fix the number of transaction units needed to remove a block group
We were using only 1 transaction unit when attempting to delete an unused
block group but in reality we need 3 + N units, where N corresponds to the
number of stripes. We were accounting only for the addition of the orphan
item (for the block group's free space cache inode) but we were not
accounting that we need to delete one block group item from the extent
tree, one free space item from the tree of tree roots and N device extent
items from the device tree.

While one unit is not enough, it worked most of the time because for each
single unit we are too pessimistic and assume an entire tree path, with
the highest possible heigth (8), needs to be COWed with eventual node
splits at every possible level in the tree, so there was usually enough
reserved space for removing all the items and adding the orphan item.

However after adding the orphan item, writepages() can by called by the VM
subsystem against the btree inode when we are under memory pressure, which
causes writeback to start for the nodes we COWed before, this forces the
operation to remove the free space item to COW again some (or all of) the
same nodes (in the tree of tree roots). Even without writepages() being
called, we could fail with ENOSPC because these items are located in
multiple trees and one of them might have a higher heigth and require
node/leaf splits at many levels, exhausting all the reserved space before
removing all the items and adding the orphan.

In the kernel 4.0 release, commit 3d84be7991 ("Btrfs: fix BUG_ON in
btrfs_orphan_add() when delete unused block group"), we attempted to fix
a BUG_ON due to ENOSPC when trying to add the orphan item by making the
cleaner kthread reserve one transaction unit before attempting to remove
the block group, but this was not enough. We had a couple user reports
still hitting the same BUG_ON after 4.0, like Stefan Priebe's report on
a 4.2-rc6 kernel for example:

    http://www.spinics.net/lists/linux-btrfs/msg46070.html

So fix this by reserving all the necessary units of metadata.

Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Fixes: 3d84be7991 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:19:50 -08:00
Filipe Manana 8eab77ff16 Btrfs: use global reserve when deleting unused block group after ENOSPC
It's possible to reach a state where the cleaner kthread isn't able to
start a transaction to delete an unused block group due to lack of enough
free metadata space and due to lack of unallocated device space to allocate
a new metadata block group as well. If this happens try to use space from
the global block group reserve just like we do for unlink operations, so
that we don't reach a permanent state where starting a transaction for
filesystem operations (file creation, renames, etc) keeps failing with
-ENOSPC. Such an unfortunate state was observed on a machine where over
a dozen unused data block groups existed and the cleaner kthread was
failing to delete them due to ENOSPC error when attempting to start a
transaction, and even running balance with a -dusage=0 filter failed with
ENOSPC as well. Also unmounting and mounting again the filesystem didn't
help. Allowing the cleaner kthread to use the global block reserve to
delete the unused data block groups fixed the problem.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:19:50 -08:00
Dan Carpenter 89b6c8d1e4 Btrfs: tests: checking for NULL instead of IS_ERR()
btrfs_alloc_dummy_root() return an error pointer on failure, it never
returns NULL.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:19:50 -08:00
David Sterba 9dcbeed4d7 btrfs: fix signed overflows in btrfs_sync_file
The calculation of range length in btrfs_sync_file leads to signed
overflow. This was caught by PaX gcc SIZE_OVERFLOW plugin.

https://forums.grsecurity.net/viewtopic.php?f=1&t=4284

The fsync call passes 0 and LLONG_MAX, the range length does not fit to
loff_t and overflows, but the value is converted to u64 so it silently
works as expected.

The minimal fix is a typecast to u64, switching functions to take
(start, end) instead of (start, len) would be more intrusive.

Coccinelle script found that there's one more opencoded calculation of
the length.

<smpl>
@@
loff_t start, end;
@@
* end - start
</smpl>

CC: stable@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25 05:19:50 -08:00
Suzuki K. Poulose 7142392dca arm64: early_alloc: Fix check for allocation failure
In early_alloc we check if the memblock_alloc failed by checking
the virtual address of the result, which will never fail. This patch
fixes it to check the actual result for failure.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-11-25 12:14:25 +00:00
Felipe Balbi 51c4cfef56 rtc: ds1307: fix kernel splat due to wakeup irq handling
Since commit 3fffd12839 ("i2c: allow specifying
separate wakeup interrupt in device tree") we have
automatic wakeup irq support for i2c devices. That
commit missed the fact that rtc-1307 had its own
wakeup irq handling and ended up introducing a
kernel splat for at least Beagle x15 boards.

Fix that by reverting original commit _and_ passing
correct interrupt names on DTS so i2c-core can
choose correct IRQ as wakeup.

Now that we have automatic wakeirq support, we can
revert the original commit which did it manually.

Fixes the following warning:

[   10.346582] WARNING: CPU: 1 PID: 263 at linux/drivers/base/power/wakeirq.c:43 dev_pm_attach_wake_irq+0xbc/0xd4()
[   10.359244] rtc-ds1307 2-006f: wake irq already initialized

Cc: Tony Lindgren <tony@atomide.com>
Cc: Nishanth Menon <nm@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-11-25 12:15:44 +01:00
Martin Peres ef0e9f5518 drm/nouveau/volt/pwm/gk104: fix an off-by-one resulting in the voltage not being set
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Martin Peres <martin.peres@free.fr>
2015-11-25 15:37:45 +10:00
Ben Skeggs f5e551873e drm/nouveau/nvif: allow userspace access to its own client object
Regression from "abi16: implement limited interoperability with
usif/nvif".

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25 15:31:21 +10:00
Ben Skeggs 0d7fc24616 drm/nouveau/gr/gf100-: fix oops when calling zbc methods
Somehow missed these two when removing dodgy void casts during the
rework.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25 15:31:21 +10:00
Ben Skeggs 2fb2b3c6e4 drm/nouveau/gr/gf117-: assume no PPC if NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK is zero
fdo#92761

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25 15:31:21 +10:00
Ben Skeggs ccb7b6ba07 drm/nouveau/gr/gf117-: read NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK from correct GPC
Each GPCCS unit was reading the mask from GPC0, which causes problems on
boards where some GPCs are missing PPCs.

Part of the fix for fdo#92761.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25 15:31:21 +10:00
Ben Skeggs 7028156a91 drm/nouveau/gr/gf100-: split out per-gpc address calculation macro
There's a few places where we need to access a GPC register from ucode,
but outside of the falcon's io address space.  To do this we need to
calculate the offset based on which GPC we're executing on.

This used to be done manually, but we've since found a "base" offset
that can be added by the hardware.  To use this, an extra bit needs to
be set in the register address, which is what this macro achieves.

There should be no functional change from this commit.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25 15:31:21 +10:00
Ben Skeggs 954329412e drm/nouveau/bios: return actual size of the buffer retrieved via _ROM
Fixes detection of a failed attempt at fetching the entire ROM image
in one-shot (a violation of the spec, that works a lot of the time).

Tested on a HP Zbook 15 G2.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25 15:31:21 +10:00
Ben Skeggs 950950327b drm/nouveau/instmem: protect instobj list with a spinlock
No locking is required for the traversal of this list, as it only
happens during suspend/resume where nothing else can be executing.

Fixes some of the issues noticed during parallel piglit runs.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25 15:31:21 +10:00
Ben Skeggs c294a052f8 drm/nouveau/pci: enable c800 magic for some unknown Samsung laptop
fdo#70354 - comment #88.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25 15:31:21 +10:00
Karol Herbst 269249e174 drm/nouveau/pci: enable c800 magic for Clevo P157SM
this is needed for my gpu

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25 15:31:21 +10:00
David Howells 096fe9eaea KEYS: Fix handling of stored error in a negatively instantiated user key
If a user key gets negatively instantiated, an error code is cached in the
payload area.  A negatively instantiated key may be then be positively
instantiated by updating it with valid data.  However, the ->update key
type method must be aware that the error code may be there.

The following may be used to trigger the bug in the user key type:

    keyctl request2 user user "" @u
    keyctl add user user "a" @u

which manifests itself as:

	BUG: unable to handle kernel paging request at 00000000ffffff8a
	IP: [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
	PGD 7cc30067 PUD 0
	Oops: 0002 [#1] SMP
	Modules linked in:
	CPU: 3 PID: 2644 Comm: a.out Not tainted 4.3.0+ #49
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
	task: ffff88003ddea700 ti: ffff88003dd88000 task.ti: ffff88003dd88000
	RIP: 0010:[<ffffffff810a376f>]  [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280
	 [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
	RSP: 0018:ffff88003dd8bdb0  EFLAGS: 00010246
	RAX: 00000000ffffff82 RBX: 0000000000000000 RCX: 0000000000000001
	RDX: ffffffff81e3fe40 RSI: 0000000000000000 RDI: 00000000ffffff82
	RBP: ffff88003dd8bde0 R08: ffff88007d2d2da0 R09: 0000000000000000
	R10: 0000000000000000 R11: ffff88003e8073c0 R12: 00000000ffffff82
	R13: ffff88003dd8be68 R14: ffff88007d027600 R15: ffff88003ddea700
	FS:  0000000000b92880(0063) GS:ffff88007fd00000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
	CR2: 00000000ffffff8a CR3: 000000007cc5f000 CR4: 00000000000006e0
	Stack:
	 ffff88003dd8bdf0 ffffffff81160a8a 0000000000000000 00000000ffffff82
	 ffff88003dd8be68 ffff88007d027600 ffff88003dd8bdf0 ffffffff810a39e5
	 ffff88003dd8be20 ffffffff812a31ab ffff88007d027600 ffff88007d027620
	Call Trace:
	 [<ffffffff810a39e5>] kfree_call_rcu+0x15/0x20 kernel/rcu/tree.c:3136
	 [<ffffffff812a31ab>] user_update+0x8b/0xb0 security/keys/user_defined.c:129
	 [<     inline     >] __key_update security/keys/key.c:730
	 [<ffffffff8129e5c1>] key_create_or_update+0x291/0x440 security/keys/key.c:908
	 [<     inline     >] SYSC_add_key security/keys/keyctl.c:125
	 [<ffffffff8129fc21>] SyS_add_key+0x101/0x1e0 security/keys/keyctl.c:60
	 [<ffffffff8185f617>] entry_SYSCALL_64_fastpath+0x12/0x6a arch/x86/entry/entry_64.S:185

Note the error code (-ENOKEY) in EDX.

A similar bug can be tripped by:

    keyctl request2 trusted user "" @u
    keyctl add trusted user "a" @u

This should also affect encrypted keys - but that has to be correctly
parameterised or it will fail with EINVAL before getting to the bit that
will crashes.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2015-11-25 14:19:47 +11:00
Christoph Hellwig 55ce0da1da block: fix blk_abort_request for blk-mq drivers
We only added the request to the request list for the !blk-mq case,
so we should only delete it in that case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24 15:24:10 -07:00
Christoph Hellwig bf508e910b nvme: add missing unmaps in nvme_queue_rq
When we fail various metadata related operations in nvme_queue_rq we
need to unmap the data SGL.

Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24 15:24:05 -07:00
Nishanth Aravamudan c5c9f25b98 NVMe: default to 4k device page size
We received a bug report recently when DDW (64-bit direct DMA on Power)
is not enabled for NVMe devices. In that case, we fall back to 32-bit
DMA via the IOMMU, which is always done via 4K TCEs (Translation Control
Entries).

The NVMe device driver, though, assumes that the DMA alignment for the
PRP entries will match the device's page size, and that the DMA aligment
matches the kernel's page aligment. On Power, the the IOMMU page size,
as mentioned above, can be 4K, while the device can have a page size of
8K, while the kernel has a page size of 64K. This eventually trips the
BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple
of 4K but not 8K (e.g., 0xF000).

In this particular case of page sizes, we clearly want to use the
IOMMU's page size in the driver. And generally, the NVMe driver in this
function should be using the IOMMU's page size for the default device
page size, rather than the kernel's page size. There is not currently an
API to obtain the IOMMU's page size across all architectures and in the
interest of a stop-gap fix to this functional issue, default the NVMe
device page size to 4K, with the intent of adding such an API and
implementation across all architectures in the next merge window.

With the functionally equivalent v3 of this patch, our hardware test
exerciser survives when using 32-bit DMA; without the patch, the kernel
will BUG within a few minutes.

Signed-off-by: Nishanth Aravamudan <nacc at linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24 15:05:51 -07:00
Arnd Bergmann 9f55cf5654 PCI: hisi: Fix deferred probing
The hisi_pcie_probe() function is incorrectly marked as __init, as Kconfig
tells us:

  WARNING: drivers/pci/host/built-in.o(.data+0x7780): Section mismatch in reference from the variable hisi_pcie_driver to the function .init.text:hisi_pcie_probe()

If the probe for this device gets deferred past the point where __init
functions are removed, or the device is unbound and then reattached to the
driver, we branch into uninitialized memory, which is bad.

Remove the __init annotation from hisi_pcie_probe() and
hisi_add_pcie_port().

Fixes: 500a1d9a43 ("PCI: hisi: Add HiSilicon SoC Hip05 PCIe driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Zhou Wang <wangzhou1@hisilicon.com>
2015-11-24 15:38:07 -06:00
Linus Torvalds 6ffeba9607 Two fixes for 4.4-rc1's DM ioctl changes that introduced the potential
for infinite recursion on ioctl (with DM multipath).
 
 And four stable fixes:
 
 - A DM thin-provisioning fix to restore 'error_if_no_space' setting when
   a thin-pool is made writable again (after having been out of space).
 
 - A DM thin-provisioning fix to properly advertise discard support for
   thin volumes that are stacked on a thin-pool whose underlying data
   device doesn't support discards.
 
 - A DM ioctl fix to allow ctrl-c to break out of an ioctl retry loop
   when DM multipath is configured to 'queue_if_no_path'.
 
 - A DM crypt fix for a possible hang on dm-crypt device removal.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWVMgcAAoJEMUj8QotnQNawbMIAK9SqJoLWNVmxMFbUMr8LkWM
 8TPfdt/TiwXE2FmiBs3Yhj0GuQsdcihGvfjrS7sIbP8XNpsjtiKVHIWwjAk0UFKm
 dS3R6yYZWySeaG1tLmQcuvmGf5JUQJ7hufyqAgkVdQ9bqeORJ4dJFx24Rjciz73s
 UbyhxOeu1bJC3ECtiiUcPpUrlwhsOir4i1HWUQcQjo2r4y0vYUY3lndDdI6w9tis
 K7M/V3YNkw5WMupg+VPE9hJdp97zHKHNNMQgb8Q5Z0Y/PnrZ7uVEr9yyrgpKBE+P
 u3GUUIqH+N7CZxJDld0LTJ+GcK+gIgnzJqAtB3EWSeONHkBAuAR+Pvqh/qqYhsI=
 =8WtB
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
 "Two fixes for 4.4-rc1's DM ioctl changes that introduced the potential
  for infinite recursion on ioctl (with DM multipath).

  And four stable fixes:

   - A DM thin-provisioning fix to restore 'error_if_no_space' setting
     when a thin-pool is made writable again (after having been out of
     space).

   - A DM thin-provisioning fix to properly advertise discard support
     for thin volumes that are stacked on a thin-pool whose underlying
     data device doesn't support discards.

   - A DM ioctl fix to allow ctrl-c to break out of an ioctl retry loop
     when DM multipath is configured to 'queue_if_no_path'.

   - A DM crypt fix for a possible hang on dm-crypt device removal"

* tag 'dm-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm thin: fix regression in advertised discard limits
  dm crypt: fix a possible hang due to race condition on exit
  dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path
  dm: do not reuse dm_blk_ioctl block_device input as local variable
  dm: fix ioctl retry termination with signal
  dm thin: restore requested 'error_if_no_space' setting on OODS to WRITE transition
2015-11-24 12:53:11 -08:00
Stanimir Varbanov 5228e39e3f PCI: designware: Remove incorrect io_base assignment
"pp->io" is an I/O resource, e.g., "[io 0x0000-0xffff]"; "pp->io_base" is
the CPU physical address of a region where the host bridge converts CPU
memory accesses into PCI I/O transactions.

Corrupting pp->io_base by assigning pp->io->start to it breaks access to
the PCI I/O space, as reported by Kishon.

Remove the invalid assignment.

[bhelgaas: changelog]
Fixes: 0021d22b73 ("PCI: designware: Use of_pci_get_host_bridge_resources() to parse DT")
Reported-and-tested-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2015-11-24 14:06:41 -06:00
Eric Dumazet 81b1a832d7 pidns: fix NULL dereference in __task_pid_nr_ns()
I got a crash during a "perf top" session that was caused by a race in
__task_pid_nr_ns() :

pid_nr_ns() was inlined, but apparently compiler chose to read
task->pids[type].pid twice, and the pid->level dereference crashed
because we got a NULL pointer at the second read :

    if (pid && ns->level <= pid->level) { // CRASH

Just use RCU API properly to solve this race, and not worry about "perf
top" crashing hosts :(

get_task_pid() can benefit from same fix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-24 12:03:55 -08:00
Takashi Iwai 0c25ad8040 ALSA: hda - Fix noise on Gigabyte Z170X mobo
Gigabyte Z710X mobo with ALC1150 codec gets significant noises from
the analog loopback routes even if their inputs are all muted.
Simply kill the aamix for fixing it.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=108301
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-11-24 20:02:12 +01:00