Commit Graph

692256 Commits

Author SHA1 Message Date
Jan Kara 19ec8e4858 ocfs2: don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0').  However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
into ocfs2_iop_set_acl().  That way the function will not be called when
inheriting ACLs which is what we want as it prevents SGID bit clearing
and the mode has been properly set by posix_acl_create() anyway.  Also
posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
mode itself.

Fixes: 073931017b ("posix_acl: Clear SGID bit when setting file permissions")
Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:13 -07:00
Kan Liang 1ee1c3f5b5 mm: allow page_cache_get_speculative in interrupt context
Kernel panic when calling the IRQ-safe __get_user_pages_fast in NMI
handler.

The bug was introduced by commit 2947ba054a ("x86/mm/gup: Switch GUP
to the generic get_user_page_fast() implementation").

The original x86 __get_user_page_fast used plain get_page() or
page_ref_add().  However, the generic __get_user_page_fast uses
page_cache_get_speculative(), which has VM_BUG_ON(in_interrupt()).

There is no reason to prevent page_cache_get_speculative from using in
interrupt context.  According to the author, putting a BUG_ON there is
just because the code is not verifying correctness of interrupt races.
I did some tests in interrupt context.  There is no issue found.

Removing VM_BUG_ON(in_interrupt()) for page_cache_get_speculative().

Link: http://lkml.kernel.org/r/1501609146-59730-1-git-send-email-kan.liang@intel.com
Fixes: 2947ba054a ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:13 -07:00
Mike Rapoport 5a18b64e3f userfaultfd: non-cooperative: flush event_wqh at release time
There may still be threads waiting on event_wqh at the time the
userfault file descriptor is closed.  Flush the events wait-queue to
prevent waiting threads from hanging.

Link: http://lkml.kernel.org/r/1501398127-30419-1-git-send-email-rppt@linux.vnet.ibm.com
Fixes: 9cd75c3cd4 ("userfaultfd: non-cooperative: add ability to report
non-PF events from uffd descriptor")
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:13 -07:00
Kees Cook ade9f91b32 ipc: add missing container_of()s for randstruct
When building with the randstruct gcc plugin, the layout of the IPC
structs will be randomized, which requires any sub-structure accesses to
use container_of().  The proc display handlers were missing the needed
container_of()s since the iterator is passing in the top-level struct
kern_ipc_perm.

This would lead to crashes when running the "lsipc" program after the
system had IPC registered (e.g. after starting up Gnome):

  general protection fault: 0000 [#1] PREEMPT SMP
  ...
  RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
  ...
  Call Trace:
    sysvipc_shm_proc_show+0x5e/0x150
    sysvipc_proc_show+0x1a/0x30
    seq_read+0x2e9/0x3f0
  ...

Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
Fixes: 3859a271a0 ("randstruct: Mark various structs for randomization")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:12 -07:00
Dima Zavin 89affbf5d9 cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
In codepaths that use the begin/retry interface for reading
mems_allowed_seq with irqs disabled, there exists a race condition that
stalls the patch process after only modifying a subset of the
static_branch call sites.

This problem manifested itself as a deadlock in the slub allocator,
inside get_any_partial.  The loop reads mems_allowed_seq value (via
read_mems_allowed_begin), performs the defrag operation, and then
verifies the consistency of mem_allowed via the read_mems_allowed_retry
and the cookie returned by xxx_begin.

The issue here is that both begin and retry first check if cpusets are
enabled via cpusets_enabled() static branch.  This branch can be
rewritted dynamically (via cpuset_inc) if a new cpuset is created.  The
x86 jump label code fully synchronizes across all CPUs for every entry
it rewrites.  If it rewrites only one of the callsites (specifically the
one in read_mems_allowed_retry) and then waits for the
smp_call_function(do_sync_core) to complete while a CPU is inside the
begin/retry section with IRQs off and the mems_allowed value is changed,
we can hang.

This is because begin() will always return 0 (since it wasn't patched
yet) while retry() will test the 0 against the actual value of the seq
counter.

The fix is to use two different static keys: one for begin
(pre_enable_key) and one for retry (enable_key).  In cpuset_inc(), we
first bump the pre_enable key to ensure that cpuset_mems_allowed_begin()
always return a valid seqcount if are enabling cpusets.  Similarly, when
disabling cpusets via cpuset_dec(), we first ensure that callers of
cpuset_mems_allowed_retry() will start ignoring the seqcount value
before we let cpuset_mems_allowed_begin() return 0.

The relevant stack traces of the two stuck threads:

  CPU: 1 PID: 1415 Comm: mkdir Tainted: G L  4.9.36-00104-g540c51286237 #4
  Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
  task: ffff8817f9c28000 task.stack: ffffc9000ffa4000
  RIP: smp_call_function_many+0x1f9/0x260
  Call Trace:
    smp_call_function+0x3b/0x70
    on_each_cpu+0x2f/0x90
    text_poke_bp+0x87/0xd0
    arch_jump_label_transform+0x93/0x100
    __jump_label_update+0x77/0x90
    jump_label_update+0xaa/0xc0
    static_key_slow_inc+0x9e/0xb0
    cpuset_css_online+0x70/0x2e0
    online_css+0x2c/0xa0
    cgroup_apply_control_enable+0x27f/0x3d0
    cgroup_mkdir+0x2b7/0x420
    kernfs_iop_mkdir+0x5a/0x80
    vfs_mkdir+0xf6/0x1a0
    SyS_mkdir+0xb7/0xe0
    entry_SYSCALL_64_fastpath+0x18/0xad

  ...

  CPU: 2 PID: 1 Comm: init Tainted: G L  4.9.36-00104-g540c51286237 #4
  Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
  task: ffff8818087c0000 task.stack: ffffc90000030000
  RIP: int3+0x39/0x70
  Call Trace:
    <#DB> ? ___slab_alloc+0x28b/0x5a0
    <EOE> ? copy_process.part.40+0xf7/0x1de0
    __slab_alloc.isra.80+0x54/0x90
    copy_process.part.40+0xf7/0x1de0
    copy_process.part.40+0xf7/0x1de0
    kmem_cache_alloc_node+0x8a/0x280
    copy_process.part.40+0xf7/0x1de0
    _do_fork+0xe7/0x6c0
    _raw_spin_unlock_irq+0x2d/0x60
    trace_hardirqs_on_caller+0x136/0x1d0
    entry_SYSCALL_64_fastpath+0x5/0xad
    do_syscall_64+0x27/0x350
    SyS_clone+0x19/0x20
    do_syscall_64+0x60/0x350
    entry_SYSCALL64_slow_path+0x25/0x25

Link: http://lkml.kernel.org/r/20170731040113.14197-1-dmitriyz@waymo.com
Fixes: 46e700abc4 ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled")
Signed-off-by: Dima Zavin <dmitriyz@waymo.com>
Reported-by: Cliff Spradlin <cspradlin@waymo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:12 -07:00
Mike Rapoport 9d95aa4bad userfaultfd_zeropage: return -ENOSPC in case mm has gone
In the non-cooperative userfaultfd case, the process exit may race with
outstanding mcopy_atomic called by the uffd monitor.  Returning -ENOSPC
instead of -EINVAL when mm is already gone will allow uffd monitor to
distinguish this case from other error conditions.

Unfortunately I overlooked userfaultfd_zeropage when updating
userfaultd_copy().

Link: http://lkml.kernel.org/r/1501136819-21857-1-git-send-email-rppt@linux.vnet.ibm.com
Fixes: 96333187ab ("userfaultfd_copy: return -ENOSPC in case mm has gone")
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:12 -07:00
Heiko Carstens 167d0f258f mm: take memory hotplug lock within numa_zonelist_order_handler()
Andre Wild reported the following warning:

  WARNING: CPU: 2 PID: 1205 at kernel/cpu.c:240 lockdep_assert_cpus_held+0x4c/0x60
  Modules linked in:
  CPU: 2 PID: 1205 Comm: bash Not tainted 4.13.0-rc2-00022-gfd2b2c57ec20 #10
  Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
  task: 00000000701d8100 task.stack: 0000000073594000
  Krnl PSW : 0704f00180000000 0000000000145e24 (lockdep_assert_cpus_held+0x4c/0x60)
  ...
  Call Trace:
   lockdep_assert_cpus_held+0x42/0x60)
   stop_machine_cpuslocked+0x62/0xf0
   build_all_zonelists+0x92/0x150
   numa_zonelist_order_handler+0x102/0x150
   proc_sys_call_handler.isra.12+0xda/0x118
   proc_sys_write+0x34/0x48
   __vfs_write+0x3c/0x178
   vfs_write+0xbc/0x1a0
   SyS_write+0x66/0xc0
   system_call+0xc4/0x2b0
   locks held by bash/1205:
   #0:  (sb_writers#4){.+.+.+}, at: vfs_write+0xa6/0x1a0
   #1:  (zl_order_mutex){+.+...}, at: numa_zonelist_order_handler+0x44/0x150
   #2:  (zonelists_mutex){+.+...}, at: numa_zonelist_order_handler+0xf4/0x150
  Last Breaking-Event-Address:
    lockdep_assert_cpus_held+0x48/0x60

This can be easily triggered with e.g.

    echo n > /proc/sys/vm/numa_zonelist_order

In commit 3f906ba236 ("mm/memory-hotplug: switch locking to a percpu
rwsem") memory hotplug locking was changed to fix a potential deadlock.

This also switched the stop_machine() invocation within
build_all_zonelists() to stop_machine_cpuslocked() which now expects
that online cpus are locked when being called.

This assumption is not true if build_all_zonelists() is being called
from numa_zonelist_order_handler().

In order to fix this simply add a mem_hotplug_begin()/mem_hotplug_done()
pair to numa_zonelist_order_handler().

Link: http://lkml.kernel.org/r/20170726111738.38768-1-heiko.carstens@de.ibm.com
Fixes: 3f906ba236 ("mm/memory-hotplug: switch locking to a percpu rwsem")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Andre Wild <wild@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:11 -07:00
Tetsuo Handa b0ba2d0faf mm/page_io.c: fix oops during block io poll in swapin path
When a thread is OOM-killed during swap_readpage() operation, an oops
occurs because end_swap_bio_read() is calling wake_up_process() based on
an assumption that the thread which called swap_readpage() is still
alive.

  Out of memory: Kill process 525 (polkitd) score 0 or sacrifice child
  Killed process 525 (polkitd) total-vm:528128kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
  oom_reaper: reaped process 525 (polkitd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
  general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
  Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp ppdev pcspkr vmw_balloon sg shpchp vmw_vmci parport_pc parport i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi vmwgfx ahci libahci drm_kms_helper ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi scsi_transport_spi ttm e1000 mptscsih drm mptbase i2c_core libata serio_raw
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-rc2-next-20170725 #129
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
  task: ffffffffb7c16500 task.stack: ffffffffb7c00000
  RIP: 0010:__lock_acquire+0x151/0x12f0
  Call Trace:
   <IRQ>
   lock_acquire+0x59/0x80
   _raw_spin_lock_irqsave+0x3b/0x4f
   try_to_wake_up+0x3b/0x410
   wake_up_process+0x10/0x20
   end_swap_bio_read+0x6f/0xf0
   bio_endio+0x92/0xb0
   blk_update_request+0x88/0x270
   scsi_end_request+0x32/0x1c0
   scsi_io_completion+0x209/0x680
   scsi_finish_command+0xd4/0x120
   scsi_softirq_done+0x120/0x140
   __blk_mq_complete_request_remote+0xe/0x10
   flush_smp_call_function_queue+0x51/0x120
   generic_smp_call_function_single_interrupt+0xe/0x20
   smp_trace_call_function_single_interrupt+0x22/0x30
   smp_call_function_single_interrupt+0x9/0x10
   call_function_single_interrupt+0xa7/0xb0
   </IRQ>
  RIP: 0010:native_safe_halt+0x6/0x10
   default_idle+0xe/0x20
   arch_cpu_idle+0xa/0x10
   default_idle_call+0x1e/0x30
   do_idle+0x187/0x200
   cpu_startup_entry+0x6e/0x70
   rest_init+0xd0/0xe0
   start_kernel+0x456/0x477
   x86_64_start_reservations+0x24/0x26
   x86_64_start_kernel+0xf7/0x11a
   secondary_startup_64+0xa5/0xa5
  Code: c3 49 81 3f 20 9e 0b b8 41 bc 00 00 00 00 44 0f 45 e2 83 fe 01 0f 87 62 ff ff ff 89 f0 49 8b 44 c7 08 48 85 c0 0f 84 52 ff ff ff <f0> ff 80 98 01 00 00 8b 3d 5a 49 c4 01 45 8b b3 18 0c 00 00 85
  RIP: __lock_acquire+0x151/0x12f0 RSP: ffffa01f39e03c50
  ---[ end trace 6c441db499169b1e ]---
  Kernel panic - not syncing: Fatal exception in interrupt
  Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
  ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Fix it by holding a reference to the thread.

[akpm@linux-foundation.org: add comment]
Fixes: 23955622ff ("swap: add block io poll in swapin path")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Shaohua Li <shli@fb.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:11 -07:00
Minchan Kim 3189c82056 zram: do not free pool->size_class
Mike reported kernel goes oops with ltp:zram03 testcase.

  zram: Added device: zram0
  zram0: detected capacity change from 0 to 107374182400
  BUG: unable to handle kernel paging request at 0000306d61727a77
  IP: zs_map_object+0xb9/0x260
  PGD 0
  P4D 0
  Oops: 0000 [#1] SMP
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Modules linked in: zram(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) loop(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) br_netfilter(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) intel_powerclamp(E) coretemp(E) cdc_ether(E) kvm_intel(E) usbnet(E) mii(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) bnx2(E) iTCO_vendor_support(E) pcbc(E) ioatdma(E) ipmi_ssif(E) aesni_intel(E) i5500_temp(E) i2c_i801(E) aes_x86_64(E) lpc_ich(E) shpchp(E) mfd_core(E) crypto_simd(E) i7core_edac(E) dca(E) glue_helper(E) cryptd(E) ipmi_si(E) button(E) acpi_cpufreq(E) ipmi_devintf(E) pcspkr(E) ipmi_msghandler(E)
   nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) ata_generic(E) i2c_algo_bit(E) ata_piix(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) uhci_hcd(E) ehci_pci(E) ttm(E) ehci_hcd(E) libata(E) drm(E) megaraid_sas(E) usbcore(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [last unloaded: zram]
  CPU: 6 PID: 12356 Comm: swapon Tainted: G            E   4.13.0.g87b2c3f-default #194
  Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698     , BIOS -[D6E150AUS-1.10]- 12/15/2010
  task: ffff880158d2c4c0 task.stack: ffffc90001680000
  RIP: 0010:zs_map_object+0xb9/0x260
  Call Trace:
   zram_bvec_rw.isra.26+0xe8/0x780 [zram]
   zram_rw_page+0x6e/0xa0 [zram]
   bdev_read_page+0x81/0xb0
   do_mpage_readpage+0x51a/0x710
   mpage_readpages+0x122/0x1a0
   blkdev_readpages+0x1d/0x20
   __do_page_cache_readahead+0x1b2/0x270
   ondemand_readahead+0x180/0x2c0
   page_cache_sync_readahead+0x31/0x50
   generic_file_read_iter+0x7e7/0xaf0
   blkdev_read_iter+0x37/0x40
   __vfs_read+0xce/0x140
   vfs_read+0x9e/0x150
   SyS_read+0x46/0xa0
   entry_SYSCALL_64_fastpath+0x1a/0xa5
  Code: 81 e6 00 c0 3f 00 81 fe 00 00 16 00 0f 85 9f 01 00 00 0f b7 13 65 ff 05 5e 07 dc 7e 66 c1 ea 02 81 e2 ff 01 00 00 49 8b 54 d4 08 <8b> 4a 48 41 0f af ce 81 e1 ff 0f 00 00 41 89 c9 48 c7 c3 a0 70
  RIP: zs_map_object+0xb9/0x260 RSP: ffffc90001683988
  CR2: 0000306d61727a77

He bisected the problem is [1].

After commit cf8e0fedf0 ("mm/zsmalloc: simplify zs_max_alloc_size
handling"), zram doesn't use double pointer for pool->size_class any
more in zs_create_pool so counter function zs_destroy_pool don't need to
free it, either.

Otherwise, it does kfree wrong address and then, kernel goes Oops.

Link: http://lkml.kernel.org/r/20170725062650.GA12134@bbox
Fixes: cf8e0fedf0 ("mm/zsmalloc: simplify zs_max_alloc_size handling")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:47 -07:00
Jonathan Corbet d16977f3a6 kthread: fix documentation build warning
The kerneldoc comment for kthread_create() had an incorrect argument
name, leading to a warning in the docs build.

Correct it, and make one more small step toward a warning-free build.

Link: http://lkml.kernel.org/r/20170724135916.7f486c6f@lwn.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:47 -07:00
Arnd Bergmann e7701557bf kasan: avoid -Wmaybe-uninitialized warning
gcc-7 produces this warning:

  mm/kasan/report.c: In function 'kasan_report':
  mm/kasan/report.c:351:3: error: 'info.first_bad_addr' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     print_shadow_for_address(info->first_bad_addr);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  mm/kasan/report.c:360:27: note: 'info.first_bad_addr' was declared here

The code seems fine as we only print info.first_bad_addr when there is a
shadow, and we always initialize it in that case, but this is relatively
hard for gcc to figure out after the latest rework.

Adding an intialization to the most likely value together with the other
struct members shuts up that warning.

Fixes: b235b9808664 ("kasan: unify report headers")
Link: https://patchwork.kernel.org/patch/9641417/
Link: http://lkml.kernel.org/r/20170725152739.4176967-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Alexander Potapenko <glider@google.com>
Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Mike Rapoport b228237193 userfaultfd: non-cooperative: notify about unmap of destination during mremap
When mremap is called with MREMAP_FIXED it unmaps memory at the
destination address without notifying userfaultfd monitor.

If the destination were registered with userfaultfd, the monitor has no
way to distinguish between the old and new ranges and to properly relate
the page faults that would occur in the destination region.

Fixes: 897ab3e0c4 ("userfaultfd: non-cooperative: add event for memory unmaps")
Link: http://lkml.kernel.org/r/1500276876-3350-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Mel Gorman 3ea277194d mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.

He described the race as follows:

        CPU0                            CPU1
        ----                            ----
                                        user accesses memory using RW PTE
                                        [PTE now cached in TLB]
        try_to_unmap_one()
        ==> ptep_get_and_clear()
        ==> set_tlb_ubc_flush_pending()
                                        mprotect(addr, PROT_READ)
                                        ==> change_pte_range()
                                        ==> [ PTE non-present - no flush ]

                                        user writes using cached RW PTE
        ...

        try_to_unmap_flush()

The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.

For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access.  For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB.  However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.

Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption.  In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch.  Other users such as gup as a race with
reclaim looks just at PTEs.  huge page variants should be ok as they
don't race with reclaim.  mincore only looks at PTEs.  userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.

Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected.  His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.

[akpm@linux-foundation.org: tweak comments]
[akpm@linux-foundation.org: fix spello]
Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>	[v4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Kefeng Wang 27e37d84e5 pid: kill pidhash_size in pidhash_init()
After commit 3d375d7859 ("mm: update callers to use HASH_ZERO flag"),
drop unused pidhash_size in pidhash_init().

Link: http://lkml.kernel.org/r/1500389267-49222-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Pavel Tatashin <Pasha.Tatashin@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Daniel Jordan 2be7cfed99 mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
Commit 9a291a7c94 ("mm/hugetlb: report -EHWPOISON not -EFAULT when
FOLL_HWPOISON is specified") causes __get_user_pages to ignore certain
errors from follow_hugetlb_page.  After such error, __get_user_pages
subsequently calls faultin_page on the same VMA and start address that
follow_hugetlb_page failed on instead of returning the error immediately
as it should.

In follow_hugetlb_page, when hugetlb_fault returns a value covered under
VM_FAULT_ERROR, follow_hugetlb_page returns it without setting nr_pages
to 0 as __get_user_pages expects in this case, which causes the
following to happen in __get_user_pages: the "while (nr_pages)" check
succeeds, we skip the "if (!vma..." check because we got a VMA the last
time around, we find no page with follow_page_mask, and we call
faultin_page, which calls hugetlb_fault for the second time.

This issue also slightly changes how __get_user_pages works.  Before, it
only returned error if it had made no progress (i = 0).  But now,
follow_hugetlb_page can clobber "i" with an error code since its new
return path doesn't check for progress.  So if "i" is nonzero before a
failing call to follow_hugetlb_page, that indication of progress is lost
and __get_user_pages can return error even if some pages were
successfully pinned.

To fix this, change follow_hugetlb_page so that it updates nr_pages,
allowing __get_user_pages to fail immediately and restoring the "error
only if no progress" behavior to __get_user_pages.

Tested that __get_user_pages returns when expected on error from
hugetlb_fault in follow_hugetlb_page.

Fixes: 9a291a7c94 ("mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified")
Link: http://lkml.kernel.org/r/1500406795-58462-1-git-send-email-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: James Morse <james.morse@arm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: <stable@vger.kernel.org>	[4.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Linus Torvalds 4d3f5d04d6 platform-drivers-x86 for v4.13-3
Fix two bugs under error or abnormal usage conditions. Correct a config
 dependency.
 
 dell-wmi:
  - Fix driver interface version query
 
 wmi:
  - Fix error handling in acpi_wmi_init()
 
 peaq-wmi:
  - select INPUT_POLLDEV
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZgfUFAAoJEKbMaAwKp364u/IIAKvumEHC/ULEY/18hUvw0/nm
 hJn479pcvVKW7huDG0PF6v6KstYgWJmY28ZZV0jnBeAlKwZA+mr5y4skbTNZRkaC
 GK2w3PeWywCiDjME9IKbrBFjnNIiFLplH4Jjotjf+1g0fTp7SE0s5CrttU8oXSne
 DKhlyo1JsN+NnamxtvJBWMXZAIYbw5qjfy9vnaTPaKFokVEYAeXfI4Jn2Ue9vw0q
 D4xgxrd/UL6/WL4erezb80jqVzsdCHhN5AKQ5PeDbTRmLDJK8VxjvYH7LPtjQS9T
 L4VU1rqUCRitNHu+Z3daqOySk3zPhZZxckJZPimBojPhKEK98qUFHZLpWCVpeXU=
 =ZTnI
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.13-3' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fixes from Darren Hart:
 "Fix two bugs under error or abnormal usage conditions. Correct a
  config dependency:

  dell-wmi:
   - Fix driver interface version query

  wmi:
   - Fix error handling in acpi_wmi_init()

  peaq-wmi:
   - select INPUT_POLLDEV"

* tag 'platform-drivers-x86-v4.13-3' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: dell-wmi: Fix driver interface version query
  platform/x86: wmi: Fix error handling in acpi_wmi_init()
  platform/x86: peaq-wmi: select INPUT_POLLDEV
2017-08-02 09:43:28 -07:00
Linus Torvalds 33611ba0fe SCSI fixes on 20170802
This seven is mostly minor build, Kconfig and error leg fixes.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZgeACAAoJEAVr7HOZEZN4vxwP/0GZc1EWDubS8DZZOr4X25WA
 jofQFCrWYf9jBcCA64pv7kF5wCbQk0MFXMvs2u5YHGMV8U4VyquNT2zEYm+R5ZD8
 C3sIdgZyxb7x+KkUJ5NxvDBIzAQuQcPEUPZspgi809veoATQ28qoMO1NkOJzbJGE
 nUs/6TZcpA5xGdqXtI3G6IsbvuAFO1hmI2aIUAj5UVKV+B9ILGsxWHPzYPErgtra
 afTJ8jO4vYW/wcyY7/DG/V8UAvU/0pe9NZqgt0Gn+XiSq5C9HnqS9+BWjU9nx98g
 EiPuvP6UBq7zGmjFQdUTwYuysbPDwyVfB1l7RJFVeBTPj6sINFrqTQPU8qwy96S8
 gxfObnfbEkx4zJmv4iP8HtAgRQMVsVN64ICD6oeS4JQWmXoz90S0vCLwrTQv36ll
 UHmBaSKMqPHTt+emEmup3rcOBU+4PIVq40jQjUyBCJXd/kg5GUz7Og++/NHX8JDZ
 qhVJ/VhkBacnDD9fZLQ2cMfcB/E5pUmEgx3MycZty8pXx0AkttWyB0lIddln2yiL
 qpZwbJ4nbhS7TO1/ScZlHEqp/rVY3aaIvLovWrhrmy4N1KBk41EmSxCokxkv+Kmd
 Twy35eHLDT9Hn0emT+y12Ul5y0cTxy1JMOcMf6LLXAHQsKgg2khm3JfHcXC7H4mO
 H/FpxwQT7z47yU3FF2mZ
 =/+Ot
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "These seven patches are mostly minor build, Kconfig and error leg
  fixes"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qedi: Fix return code in qedi_ep_connect()
  scsi: lpfc: fix linking against modular NVMe support
  scsi: scsi_transport_fc: return -EBUSY for deleted vport
  scsi: libcxgbi: add check for valid cxgbi_task_data
  scsi: aic7xxx: fix firmware build with O=path
  scsi: megaraid_sas: fix memleak in megasas_alloc_cmdlist_fusion
  scsi: qedi: Add ISCSI_BOOT_SYSFS to Kconfig
2017-08-02 08:43:19 -07:00
Andy Lutomirski 51391caf99 platform/x86: dell-wmi: Fix driver interface version query
When I converted dell-wmi to the new bus infrastructure, I left the
call to dell_wmi_check_descriptor_buffer() in dell_wmi_init().  This
could cause two problems:

 - An error message when loading the driver on a system without
   dell-wmi.  We'd try to read the event descriptor even if the WMI
   GUID wasn't there.

 - A possible race if dell-wmi was loaded manually before wmi was
   fully initialized.

Fix it by moving the call to the probe function where it belongs.

Fixes: bff589be59 ("platform/x86: dell-wmi: Convert to the WMI bus infrastructure")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2017-08-01 15:41:43 -07:00
Linus Torvalds 26c5cebfdb Merge branch 'parisc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parsic fixes from Helge Deller:

 - Our cache flushing code ran into a BUG in case context is not
   current. Fix it by flushing the whole cache in such rare situations
   (by Dave Anglin).

 - Fix a "sleeping function called from invalid context BUG" in our
   pdc_stable driver by rearranging our locks (by James Bottomley)

 - The thread and irq stacks require more than 16 KB since kernel 4.11.
   Increase both to 32 KB.

 - Define CONFIG_CPU_BIG_ENDIAN unconditionally on parisc to avoid wrong
   behaviour in qrwlock functions (by Babu Moger).

* 'parisc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Define CONFIG_CPU_BIG_ENDIAN
  parisc: pdc_stable: Fix locking when creating sysfs links
  parisc: Increase thread and stack size to 32kb
  parisc: Handle vma's whose context is not current in flush_cache_range
2017-08-01 13:20:24 -07:00
Linus Torvalds bc78d646e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle notifier registry failures properly in tun/tap driver, from
    Tonghao Zhang.

 2) Fix bpf verifier handling of subtraction bounds and add a testcase
    for this, from Edward Cree.

 3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.

 4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
    from Cong Wang.

 5) Fix SElinux regression due to recent UDP optimizations, from Paolo
    Abeni.

 6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
    paths, fix from Stefano Brivio.

 7) Fix some mem leaks in dccp, from Xin Long.

 8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
    Bergmann.

 9) Mac address length check and buffer size fixes from Cong Wang.

10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.

11) Fix return value when copy_from_user() fails in
    bpf_prog_get_info_by_fd(), from Daniel Borkmann.

12) Handle PHY_HALTED properly in phy library state machine, from
    Florian Fainelli.

13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.

14) Fix truesize calculation in virtio_net which led to performance
    regressions, from Michael S Tsirkin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  samples/bpf: fix bpf tunnel cleanup
  udp6: fix jumbogram reception
  ppp: Fix a scheduling-while-atomic bug in del_chan
  Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
  virtio_net: fix truesize for mergeable buffers
  mv643xx_eth: fix of_irq_to_resource() error check
  MAINTAINERS: Add more files to the PHY LIBRARY section
  ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
  net: phy: Correctly process PHY_HALTED in phy_stop_machine()
  sunhme: fix up GREG_STAT and GREG_IMASK register offsets
  bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
  tcp: avoid bogus gcc-7 array-bounds warning
  net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
  bpf: don't indicate success when copy_from_user fails
  udp6: fix socket leak on early demux
  net: thunderx: Fix BGX transmit stall due to underflow
  Revert "vhost: cache used event for better performance"
  team: use a larger struct for mac address
  net: check dev->addr_len for dev_set_mac_address()
  phy: bcm-ns-usb3: fix MDIO_BUS dependency
  ...
2017-07-31 22:36:42 -07:00
William Tu cc75f8514d samples/bpf: fix bpf tunnel cleanup
test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
next geneve tunnelling test case fails.  In addition, the geneve reserved bit
in tcbpf2_kern.c should be zero, according to the RFC.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 22:02:47 -07:00
Paolo Abeni cb891fa6a1 udp6: fix jumbogram reception
Since commit 67a51780ae ("ipv6: udp: leverage scratch area
helpers") udp6_recvmsg() read the skb len from the scratch area,
to avoid a cache miss.
But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
length exceeds the 16 bits available in the scratch area. As a side
effect the length returned by recvmsg() is:
<ingress datagram len> % (1<<16)

This commit addresses the issue allocating one more bit in the
IP6CB flags field and setting it for incoming jumbograms.
Such field is still in the first cacheline, so at recvmsg()
time we can check it and fallback to access skb->len if
required, without a measurable overhead.

Fixes: 67a51780ae ("ipv6: udp: leverage scratch area helpers")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 22:01:21 -07:00
Gao Feng ddab82821f ppp: Fix a scheduling-while-atomic bug in del_chan
The PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would
trigger this bug when __sk_free is invoked in atomic context, because of
the call path pptp_sock_destruct->del_chan->synchronize_rcu.

Now move the synchronize_rcu to pptp_release from del_chan. This is the
only one case which would free the sock and need the synchronize_rcu.

The following is the panic I met with kernel 3.3.8, but this issue should
exist in current kernel too according to the codes.

BUG: scheduling while atomic
__schedule_bug+0x5e/0x64
__schedule+0x55/0x580
? ppp_unregister_channel+0x1cd5/0x1de0 [ppp_generic]
? dev_hard_start_xmit+0x423/0x530
? sch_direct_xmit+0x73/0x170
__cond_resched+0x16/0x30
_cond_resched+0x22/0x30
wait_for_common+0x18/0x110
? call_rcu_bh+0x10/0x10
wait_for_completion+0x12/0x20
wait_rcu_gp+0x34/0x40
? wait_rcu_gp+0x40/0x40
synchronize_sched+0x1e/0x20
0xf8417298
0xf8417484
? sock_queue_rcv_skb+0x109/0x130
__sk_free+0x16/0x110
? udp_queue_rcv_skb+0x1f2/0x290
sk_free+0x16/0x20
__udp4_lib_rcv+0x3b8/0x650

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 21:59:01 -07:00
Florian Fainelli 00d51094b8 Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
This reverts commit 28b45910cc ("net: bcmgenet: Remove init parameter
from bcmgenet_mii_config") because in the process of moving from
dev_info() to dev_info_once() we essentially lost the helpful printed
messages once the second instance of the driver is loaded.
dev_info_once() does not actually print the message once per device
instance, but once period.

Fixes: 28b45910cc ("net: bcmgenet: Remove init parameter from bcmgenet_mii_config")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 18:03:36 -07:00
Michael S. Tsirkin 1daa8790d0 virtio_net: fix truesize for mergeable buffers
Seth Forshee noticed a performance degradation with some workloads.
This turns out to be due to packet drops.  Euan Kemp noticed that this
is because we drop all packets where length exceeds the truesize, but
for some packets we add in extra memory without updating the truesize.
This in turn was kept around unchanged from ab7db91705 ("virtio-net:
auto-tune mergeable rx buffer size for improved performance").  That
commit had an internal reason not to account for the extra space: not
enough bits to do it.  No longer true so let's account for the allocated
length exactly.

Many thanks to Seth Forshee for the report and bisecting and Euan Kemp
for debugging the issue.

Fixes: 680557cf79 ("virtio_net: rework mergeable buffer handling")
Reported-by: Euan Kemp <euan.kemp@coreos.com>
Tested-by: Euan Kemp <euan.kemp@coreos.com>
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 18:02:46 -07:00
Sergei Shtylyov cfbcb61f6c mv643xx_eth: fix of_irq_to_resource() error check
of_irq_to_resource() has recently been  fixed to return negative error #'s
along with 0 in case of failure,  however the Marvell MV643xx Ethernet
driver still only regards 0  as invalid IRQ -- fix it up.

Fixes: 7a4228bbff ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 17:56:47 -07:00
Florian Fainelli 13332db54f MAINTAINERS: Add more files to the PHY LIBRARY section
Include missing files that are provided by, used, or directly maintained
within the PHY LIBRARY, this include uapi header, header files used by
Device Tree code etc.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 17:52:47 -07:00
Ido Schimmel 71ed7ee35a ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
Michał reported a NULL pointer deref during fib_sync_down_dev() when
unregistering a netdevice. The problem is that we don't check for
'in_dev' being NULL, which can happen in very specific cases.

Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or
the inetaddr notification chains. However, if an interface isn't
configured with any IP address, then it's possible for host routes to be
flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in
inetdev_destroy().

To reproduce:
$ ip link add type dummy
$ ip route add local 1.1.1.0/24 dev dummy0
$ ip link del dev dummy0

Fix this by checking for the presence of 'in_dev' before referencing it.

Fixes: 982acb9756 ("ipv4: fib: Notify about nexthop status changes")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 17:51:11 -07:00
Florian Fainelli 7ad813f208 net: phy: Correctly process PHY_HALTED in phy_stop_machine()
Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Fixes: a390d1f379 ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 17:27:10 -07:00
Mark Cave-Ayland 96a734bae0 sunhme: fix up GREG_STAT and GREG_IMASK register offsets
Update the values to match those from the STP2002QFP documentation.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 16:23:05 -07:00
Linus Torvalds 2e7ca2064c Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Several cgroup bug fixes.

   - cgroup core was calling a migration callback on empty migrations,
     which could make cpuset crash.

   - There was a very subtle bug where the controller interface files
     aren't created directly when cgroup2 is mounted. Because later
     operations create them, this bug didn't get noticed earlier.

   - Failed writes to cgroup.subtree_control were incorrectly returning
     zero"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix error return value from cgroup_subtree_control()
  cgroup: create dfl_root files on subsys registration
  cgroup: don't call migration methods if there are no tasks to migrate
2017-07-31 14:03:05 -07:00
Linus Torvalds ff2620f778 Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
 "Two notable fixes.

   - While adding NUMA affinity support to unbound workqueues, the
     assumption that an unbound workqueue with max_active == 1 is
     ordered was broken.

     The plan was to use explicit alloc_ordered_workqueue() for those
     cases. Unfortunately, I forgot to update the documentation properly
     and we grew a handful of use cases which depend on that assumption.

     While we want to convert them to alloc_ordered_workqueue(), we
     don't really lose anything by enforcing ordered execution on
     unbound max_active == 1 workqueues and it doesn't make sense to
     risk subtle bugs. Restore the assumption.

   - Workqueue assumes that CPU <-> NUMA node mapping remains static.

     This is a general assumption - we don't have any synchronization
     mechanism around CPU <-> node mapping. Unfortunately, powerpc may
     change the mapping dynamically leading to crashes. Michael added a
     workaround so that we at least don't crash while powerpc hotplug
     code gets updated"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Work around edge cases for calc of pool's cpumask
  workqueue: implicit ordered attribute should be overridable
  workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
2017-07-31 13:37:28 -07:00
Linus Torvalds 3dcc4c7d42 Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
 "Dan found a really old bug where libata hotplug code wasn't sanitizing
  index value from userland and may end up indexing with a negative
  number. It is scary but fortunately can only be triggered by root.

  Other than that, minor fixes"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: fix a couple of doc build warnings
  libata: array underflow in ata_find_dev()
  ata: sata_rcar: add gen[23] fallback compatibility strings
  libata: remove unused rc in ata_eh_handle_port_resume
  libata: Cleanup ata_read_log_page()
  ata: fix gemini Kconfig dependencies
2017-07-31 13:33:21 -07:00
Babu Moger 74ad3d28af parisc: Define CONFIG_CPU_BIG_ENDIAN
While working on enabling queued rwlock on SPARC, found this following
code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN
to clear a byte.

static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
 {
	return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
 }

Problem is many of the fixed big endian architectures don't define
CPU_BIG_ENDIAN and clears the wrong byte.

Define CPU_BIG_ENDIAN for parisc architecture to fix it.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-31 17:51:27 +02:00
Jonathan Corbet 2f60e1ab2f libata: fix a couple of doc build warnings
The kerneldoc comments for a couple of functions in drivers/ata/libata-eh.c
had fallen behind the current implementation, resulting in these doc build
warnings:

  ./drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
  ./drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
  ./drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
  ./drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'

Update the comments and make the warnings go away.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-31 08:03:06 -07:00
James Bottomley 93964fd4ea parisc: pdc_stable: Fix locking when creating sysfs links
There's no need to take the write lock when creating sysfs links.

This patch fixes the following BUG:
 BUG: sleeping function called from invalid context at mm/slab.h:416
 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2-00110-g0b5477d9dabd #111
 Backtrace:
 [<0000000040217ac8>] show_stack+0x20/0x38
 [<00000000406fbbb0>] dump_stack+0xb0/0x128
 [<0000000040274090>] ___might_sleep+0x180/0x1b8
 [<0000000040274144>] __might_sleep+0x7c/0xe8
 [<0000000040373874>] kmem_cache_alloc+0x14c/0x1e0
 [<0000000040419514>] __kernfs_new_node+0x84/0x1b8
 [<000000004041b09c>] kernfs_new_node+0x3c/0x78
 [<000000004041e040>] kernfs_create_link+0x40/0xd8
 [<000000004041f320>] sysfs_do_create_link_sd.isra.0+0xb0/0x130
 [<000000004041f3d4>] sysfs_create_link+0x34/0x58
 [<000000004011b4a4>] pdc_stable_init+0x2c4/0x458
 [<0000000040200250>] do_one_initcall+0x70/0x1d8
 [<0000000040101644>] kernel_init_freeable+0x27c/0x390
 [<000000004020be44>] kernel_init+0x24/0x1c0

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-31 16:43:13 +02:00
Helge Deller 8f8201dfed parisc: Increase thread and stack size to 32kb
Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
the default size of 16k. The reason why stack usage suddenly grew is yet
unknown.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-31 08:41:26 +02:00
John David Anglin 13d57093c1 parisc: Handle vma's whose context is not current in flush_cache_range
In testing James' patch to drivers/parisc/pdc_stable.c, I hit the BUG
statement in flush_cache_range() during a system shutdown:

kernel BUG at arch/parisc/kernel/cache.c:595!
CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx

 IAOQ[0]: flush_cache_range+0x144/0x148
 IAOQ[1]: flush_cache_page+0x0/0x1a8
 RP(r2): flush_cache_range+0xec/0x148
Backtrace:
 [<00000000402910ac>] unmap_page_range+0x84/0x880
 [<00000000402918f4>] unmap_single_vma+0x4c/0x60
 [<0000000040291a18>] zap_page_range_single+0x110/0x160
 [<0000000040291c34>] unmap_mapping_range+0x174/0x1a8
 [<000000004026ccd8>] truncate_pagecache+0x50/0xa8
 [<000000004026cd84>] truncate_setsize+0x54/0x70
 [<000000004033d534>] put_aio_ring_file+0x44/0xb0
 [<000000004033d5d8>] aio_free_ring+0x38/0x140
 [<000000004033d714>] free_ioctx+0x34/0xa8
 [<00000000401b0028>] process_one_work+0x1b8/0x4d0
 [<00000000401b04f4>] worker_thread+0x1b4/0x648
 [<00000000401b9128>] kthread+0x1b0/0x208
 [<0000000040150020>] end_fault_vector+0x20/0x28
 [<0000000040639518>] nf_ip_reroute+0x50/0xa8
 [<0000000040638ed0>] nf_ip_route+0x10/0x78
 [<0000000040638c90>] xfrm4_mode_tunnel_input+0x180/0x1f8

CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx
Backtrace:
 [<0000000040163bf0>] show_stack+0x20/0x38
 [<0000000040688480>] dump_stack+0xa8/0x120
 [<0000000040163dc4>] die_if_kernel+0x19c/0x2b0
 [<0000000040164d0c>] handle_interruption+0xa24/0xa48

This patch modifies flush_cache_range() to handle non current contexts.
In as much as this occurs infrequently, the simplest approach is to
flush the entire cache when this happens.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-31 08:22:33 +02:00
Linus Torvalds 16f73eb02d Linux 4.13-rc3 2017-07-30 12:40:36 -07:00
Linus Torvalds f137e0b0c5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of x86 fixes:

   - prevent the kernel from using the EFI reboot method when EFI is
     disabled.

   - two patches addressing clang issues"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Disable the address-of-packed-member compiler warning
  x86/efi: Fix reboot_mode when EFI runtime services are disabled
  x86/boot: #undef memcpy() et al in string.c
2017-07-30 12:19:35 -07:00
Linus Torvalds e4776b8ccb Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "Two patches addressing build warnings caused by inconsistent kernel
  doc comments"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/wait: Clean up some documentation warnings
  sched/core: Fix some documentation build warnings
2017-07-30 11:54:08 -07:00
Linus Torvalds dbc52a8030 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A couple of fixes for performance counters and kprobes:

   - a series of small patches which make the uncore performance
     counters on Skylake server systems work correctly

   - add a missing instruction slot release to the failure path of
     kprobes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes/x86: Release insn_slot in failure path
  perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs
  perf/x86/intel/uncore: Fix SKX CHA event extra regs
  perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field
  perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask
  perf/x86/intel/uncore: Fix Skylake server PCU PMU event format
  perf/x86/intel/uncore: Fix Skylake UPI PMU event masks
2017-07-30 11:52:15 -07:00
Linus Torvalds 06efc7df37 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
 "Fix for a regression caused by the conversion of x86 to the generic
  hotplug code.

  Instead of doing a plain single line revert, this adds a pile of
  comments so the semantics of the force argument are clear"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
2017-07-30 11:27:33 -07:00
Daniel Borkmann 9975a54b3c bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
bpf_prog_size(prog->len) is not the correct length we want to dump
back to user space. The code in bpf_prog_get_info_by_fd() uses this
to copy prog->insnsi to user space, but bpf_prog_size(prog->len) also
includes the size of struct bpf_prog itself plus program instructions
and is usually used either in context of accounting or for bpf_prog_alloc()
et al, thus we copy out of bounds in bpf_prog_get_info_by_fd()
potentially. Use the correct bpf_prog_insn_size() instead.

Fixes: 1e27097690 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 23:29:41 -07:00
Arnd Bergmann efe967cdec tcp: avoid bogus gcc-7 array-bounds warning
When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a
false-positive warning:

net/ipv4/tcp_output.c: In function 'tcp_connect':
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
                                        ^~
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~

I have opened a gcc bug for this, but distros have already shipped
compilers with this problem, and it's not clear yet whether there is
a way for gcc to avoid the warning. As the problem is related to the
bitfield access, this introduces a temporary variable to store the old
enum value.

I did not notice this warning earlier, since UBSAN is disabled when
building with COMPILE_TEST, and that was always turned on in both
allmodconfig and randconfig tests.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 23:26:29 -07:00
David S. Miller 4084e01db9 wireless-drivers fixes for 4.13
Two fixes for for brcmfmac, the crash was reported by two people
 already so it's a high priority fix.
 
 brcmfmac
 
 * fix a crash in skb headroom handling in v4.13-rc1
 * fix a memory leak due to a merge error in v4.6
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZe0L2AAoJEG4XJFUm622bCiYIAKXCyEV2CCfNgloFcPmElvPR
 HvmYDzxQeJVEjCYjzYLpaBK6rfJtMuBhmHfSrCTSd6YulWOUoUy6IRMi7bgoZe2C
 5cccqZl3uU8fG34ib6jKvp6ofx5DX3yFMQQa0toY27VQTkY46AT6yx9UMn4Mi+bQ
 p0skV/1gwf0IPGfZQkpSemhosmtEaNLNMiqAJlOQrsi2b9YYcBYTW1svnr/yB0UM
 7pl0+zUVE+Ul/eJHb82JeJaRrNYLRZ7KD5bMlV9OoDO/Rlu3933fNE/+jWtuuTVG
 VnNLyhEI1CR767CppOhraDVkGvINg+EewCouWQ9ZCUWxTcVuFxAew+QkI6foS2A=
 =uMUa
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-for-davem-2017-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.13

Two fixes for for brcmfmac, the crash was reported by two people
already so it's a high priority fix.

brcmfmac

* fix a crash in skb headroom handling in v4.13-rc1
* fix a memory leak due to a merge error in v4.6
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 15:30:08 -07:00
Colin Ian King b103ec73b2 net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
Trivial fix to spelling mistake in printk message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 15:22:08 -07:00
Daniel Borkmann 89b096898a bpf: don't indicate success when copy_from_user fails
err in bpf_prog_get_info_by_fd() still holds 0 at that time from prior
check_uarg_tail_zero() check. Explicitly return -EFAULT instead, so
user space can be notified of buggy behavior.

Fixes: 1e27097690 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 14:28:54 -07:00
Paolo Abeni c9f2c1ae12 udp6: fix socket leak on early demux
When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
sk reference is retrieved and used, but the relevant reference
count is leaked and the socket destructor is never called.
Beyond leaking the sk memory, if there are pending UDP packets
in the receive queue, even the related accounted memory is leaked.

In the long run, this will cause persistent forward allocation errors
and no UDP skbs (both ipv4 and ipv6) will be able to reach the
user-space.

Fix this by explicitly accessing the early demux reference before
the lookup, and properly decreasing the socket reference count
after usage.

Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
the now obsoleted comment about "socket cache".

The newly added code is derived from the current ipv4 code for the
similar path.

v1 -> v2:
  fixed the __udp6_lib_rcv() return code for resubmission,
  as suggested by Eric

Reported-by: Sam Edwards <CFSworks@gmail.com>
Reported-by: Marc Haber <mh+netdev@zugschlus.de>
Fixes: 5425077d73 ("net: ipv6: Add early demux handler for UDP unicast")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 14:19:03 -07:00
Sunil Goutham 500268e9f2 net: thunderx: Fix BGX transmit stall due to underflow
For SGMII/RGMII/QSGMII interfaces when physical link goes down
while traffic is high is resulting in underflow condition being set
on that specific BGX's LMAC. Which assets a backpresure and VNIC stops
transmitting packets.

This is due to BGX being disabled in link status change callback while
packet is in transit. This patch fixes this issue by not disabling BGX
but instead just disables packet Rx and Tx.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 14:17:07 -07:00