Commit Graph

737163 Commits

Author SHA1 Message Date
Leon Romanovsky 0c81ffc60d RDMA/ucma: Don't allow join attempts for unsupported AF family
Users can provide garbage while calling to ucma_join_ip_multicast(),
it will indirectly cause to rdma_addr_size() return 0, making the
call to ucma_process_join(), which had the right checks, but it is
better to check the input as early as possible.

The following crash from syzkaller revealed it.

kernel BUG at lib/string.c:1052!
invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4113 Comm: syz-executor0 Not tainted 4.16.0-rc5+ #261
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051
RSP: 0018:ffff8801ca81f8f0 EFLAGS: 00010286
RAX: 0000000000000022 RBX: 1ffff10039503f23 RCX: 0000000000000000
RDX: 0000000000000022 RSI: 1ffff10039503ed3 RDI: ffffed0039503f12
RBP: ffff8801ca81f8f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff8801ca81f998
R13: ffff8801ca81f938 R14: ffff8801ca81fa58 R15: 000000000000fa00
FS:  0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:000000000a12a900
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000008138024 CR3: 00000001cbb58004 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 memcpy include/linux/string.h:344 [inline]
 ucma_join_ip_multicast+0x36b/0x3b0 drivers/infiniband/core/ucma.c:1421
 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1633
 __vfs_write+0xef/0x970 fs/read_write.c:480
 vfs_write+0x189/0x510 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:581
 do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline]
 do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f9ec99
RSP: 002b:00000000ff8172cc EFLAGS: 00000282 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000100
RDX: 0000000000000063 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 42 2c e3 fb eb de
55 48 89 fe 48 c7 c7 80 75 98 86 48 89 e5 e8 85 95 94 fb <0f> 0b 90 90 90 90
90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56
RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801ca81f8f0

Fixes: 5bc2b7b397 ("RDMA/ucma: Allow user space to specify AF_IB when joining multicast")
Reported-by: <syzbot+2287ac532caa81900a4e@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:31:11 -04:00
Leon Romanovsky 7688f2c3bb RDMA/ucma: Fix access to non-initialized CM_ID object
The attempt to join multicast group without ensuring that CMA device
exists will lead to the following crash reported by syzkaller.

[   64.076794] BUG: KASAN: null-ptr-deref in rdma_join_multicast+0x26e/0x12c0
[   64.076797] Read of size 8 at addr 00000000000000b0 by task join/691
[   64.076797]
[   64.076800] CPU: 1 PID: 691 Comm: join Not tainted 4.16.0-rc1-00219-gb97853b65b93 #23
[   64.076802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4
[   64.076803] Call Trace:
[   64.076809]  dump_stack+0x5c/0x77
[   64.076817]  kasan_report+0x163/0x380
[   64.085859]  ? rdma_join_multicast+0x26e/0x12c0
[   64.086634]  rdma_join_multicast+0x26e/0x12c0
[   64.087370]  ? rdma_disconnect+0xf0/0xf0
[   64.088579]  ? __radix_tree_replace+0xc3/0x110
[   64.089132]  ? node_tag_clear+0x81/0xb0
[   64.089606]  ? idr_alloc_u32+0x12e/0x1a0
[   64.090517]  ? __fprop_inc_percpu_max+0x150/0x150
[   64.091768]  ? tracing_record_taskinfo+0x10/0xc0
[   64.092340]  ? idr_alloc+0x76/0xc0
[   64.092951]  ? idr_alloc_u32+0x1a0/0x1a0
[   64.093632]  ? ucma_process_join+0x23d/0x460
[   64.094510]  ucma_process_join+0x23d/0x460
[   64.095199]  ? ucma_migrate_id+0x440/0x440
[   64.095696]  ? futex_wake+0x10b/0x2a0
[   64.096159]  ucma_join_multicast+0x88/0xe0
[   64.096660]  ? ucma_process_join+0x460/0x460
[   64.097540]  ? _copy_from_user+0x5e/0x90
[   64.098017]  ucma_write+0x174/0x1f0
[   64.098640]  ? ucma_resolve_route+0xf0/0xf0
[   64.099343]  ? rb_erase_cached+0x6c7/0x7f0
[   64.099839]  __vfs_write+0xc4/0x350
[   64.100622]  ? perf_syscall_enter+0xe4/0x5f0
[   64.101335]  ? kernel_read+0xa0/0xa0
[   64.103525]  ? perf_sched_cb_inc+0xc0/0xc0
[   64.105510]  ? syscall_exit_register+0x2a0/0x2a0
[   64.107359]  ? __switch_to+0x351/0x640
[   64.109285]  ? fsnotify+0x899/0x8f0
[   64.111610]  ? fsnotify_unmount_inodes+0x170/0x170
[   64.113876]  ? __fsnotify_update_child_dentry_flags+0x30/0x30
[   64.115813]  ? ring_buffer_record_is_on+0xd/0x20
[   64.117824]  ? __fget+0xa8/0xf0
[   64.119869]  vfs_write+0xf7/0x280
[   64.122001]  SyS_write+0xa1/0x120
[   64.124213]  ? SyS_read+0x120/0x120
[   64.126644]  ? SyS_read+0x120/0x120
[   64.128563]  do_syscall_64+0xeb/0x250
[   64.130732]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   64.132984] RIP: 0033:0x7f5c994ade99
[   64.135699] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   64.138740] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99
[   64.141056] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015
[   64.143536] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000
[   64.146017] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0
[   64.148608] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0
[   64.151060]
[   64.153703] Disabling lock debugging due to kernel taint
[   64.156032] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[   64.159066] IP: rdma_join_multicast+0x26e/0x12c0
[   64.161451] PGD 80000001d0298067 P4D 80000001d0298067 PUD 1dea39067 PMD 0
[   64.164442] Oops: 0000 [#1] SMP KASAN PTI
[   64.166817] CPU: 1 PID: 691 Comm: join Tainted: G    B 4.16.0-rc1-00219-gb97853b65b93 #23
[   64.170004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4
[   64.174985] RIP: 0010:rdma_join_multicast+0x26e/0x12c0
[   64.177246] RSP: 0018:ffff8801c8207860 EFLAGS: 00010282
[   64.179901] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff94789522
[   64.183344] RDX: 1ffffffff2d50fa5 RSI: 0000000000000297 RDI: 0000000000000297
[   64.186237] RBP: ffff8801c8207a50 R08: 0000000000000000 R09: ffffed0039040ea7
[   64.189328] R10: 0000000000000001 R11: ffffed0039040ea6 R12: 0000000000000000
[   64.192634] R13: 0000000000000000 R14: ffff8801e2022800 R15: ffff8801d4ac2400
[   64.196105] FS:  00007f5c99b98700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000
[   64.199211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   64.202046] CR2: 00000000000000b0 CR3: 00000001d1c48004 CR4: 00000000003606a0
[   64.205032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   64.208221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   64.211554] Call Trace:
[   64.213464]  ? rdma_disconnect+0xf0/0xf0
[   64.216124]  ? __radix_tree_replace+0xc3/0x110
[   64.219337]  ? node_tag_clear+0x81/0xb0
[   64.222140]  ? idr_alloc_u32+0x12e/0x1a0
[   64.224422]  ? __fprop_inc_percpu_max+0x150/0x150
[   64.226588]  ? tracing_record_taskinfo+0x10/0xc0
[   64.229763]  ? idr_alloc+0x76/0xc0
[   64.232186]  ? idr_alloc_u32+0x1a0/0x1a0
[   64.234505]  ? ucma_process_join+0x23d/0x460
[   64.237024]  ucma_process_join+0x23d/0x460
[   64.240076]  ? ucma_migrate_id+0x440/0x440
[   64.243284]  ? futex_wake+0x10b/0x2a0
[   64.245302]  ucma_join_multicast+0x88/0xe0
[   64.247783]  ? ucma_process_join+0x460/0x460
[   64.250841]  ? _copy_from_user+0x5e/0x90
[   64.253878]  ucma_write+0x174/0x1f0
[   64.257008]  ? ucma_resolve_route+0xf0/0xf0
[   64.259877]  ? rb_erase_cached+0x6c7/0x7f0
[   64.262746]  __vfs_write+0xc4/0x350
[   64.265537]  ? perf_syscall_enter+0xe4/0x5f0
[   64.267792]  ? kernel_read+0xa0/0xa0
[   64.270358]  ? perf_sched_cb_inc+0xc0/0xc0
[   64.272575]  ? syscall_exit_register+0x2a0/0x2a0
[   64.275367]  ? __switch_to+0x351/0x640
[   64.277700]  ? fsnotify+0x899/0x8f0
[   64.280530]  ? fsnotify_unmount_inodes+0x170/0x170
[   64.283156]  ? __fsnotify_update_child_dentry_flags+0x30/0x30
[   64.286182]  ? ring_buffer_record_is_on+0xd/0x20
[   64.288749]  ? __fget+0xa8/0xf0
[   64.291136]  vfs_write+0xf7/0x280
[   64.292972]  SyS_write+0xa1/0x120
[   64.294965]  ? SyS_read+0x120/0x120
[   64.297474]  ? SyS_read+0x120/0x120
[   64.299751]  do_syscall_64+0xeb/0x250
[   64.301826]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   64.304352] RIP: 0033:0x7f5c994ade99
[   64.306711] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   64.309577] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99
[   64.312334] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015
[   64.315783] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000
[   64.318365] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0
[   64.320980] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0
[   64.323515] Code: e8 e8 79 08 ff 4c 89 ff 45 0f b6 a7 b8 01 00 00 e8 68 7c 08 ff 49 8b 1f 4d 89 e5 49 c1 e4 04 48 8
[   64.330753] RIP: rdma_join_multicast+0x26e/0x12c0 RSP: ffff8801c8207860
[   64.332979] CR2: 00000000000000b0
[   64.335550] ---[ end trace 0c00c17a408849c1 ]---

Reported-by: <syzbot+e6aba77967bd72cbc9d6@syzkaller.appspotmail.com>
Fixes: c8f6a362bf ("RDMA/cma: Add multicast communication support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:22:55 -04:00
Tatyana Nikolova 9dea9a2ff6 RDMA/core: Do not use invalid destination in determining port reuse
cma_port_is_unique() allows local port reuse if the quad (source
address and port, destination address and port) for this connection
is unique. However, if the destination info is zero or unspecified, it
can't make a correct decision but still allows port reuse. For example,
sometimes rdma_bind_addr() is called with unspecified destination and
reusing the port can lead to creating a connection with a duplicate quad,
after the destination is resolved. The issue manifests when MPI scale-up
tests hang after the duplicate quad is used.

Set the destination address family and add checks for zero destination
address and port to prevent source port reuse based on invalid destination.

Fixes: 19b752a19d ("IB/cma: Allow port reuse for rdma_id")
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:57:05 -04:00
Leon Romanovsky f3f134f526 RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory
The failure in rereg_mr flow caused to set garbage value (error value)
into mr->umem pointer. This pointer is accessed at the release stage
and it causes to the following crash.

There is not enough to simply change umem to point to NULL, because the
MR struct is needed to be accessed during MR deregistration phase, so
delay kfree too.

[    6.237617] BUG: unable to handle kernel NULL pointer dereference a 0000000000000228
[    6.238756] IP: ib_dereg_mr+0xd/0x30
[    6.239264] PGD 80000000167eb067 P4D 80000000167eb067 PUD 167f9067 PMD 0
[    6.240320] Oops: 0000 [#1] SMP PTI
[    6.240782] CPU: 0 PID: 367 Comm: dereg Not tainted 4.16.0-rc1-00029-gc198fafe0453 #183
[    6.242120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    6.244504] RIP: 0010:ib_dereg_mr+0xd/0x30
[    6.245253] RSP: 0018:ffffaf5d001d7d68 EFLAGS: 00010246
[    6.246100] RAX: 0000000000000000 RBX: ffff95d4172daf00 RCX: 0000000000000000
[    6.247414] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffff95d41a317600
[    6.248591] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[    6.249810] R10: ffff95d417033c10 R11: 0000000000000000 R12: ffff95d4172c3a80
[    6.251121] R13: ffff95d4172c3720 R14: ffff95d4172c3a98 R15: 00000000ffffffff
[    6.252437] FS:  0000000000000000(0000) GS:ffff95d41fc00000(0000) knlGS:0000000000000000
[    6.253887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.254814] CR2: 0000000000000228 CR3: 00000000172b4000 CR4: 00000000000006b0
[    6.255943] Call Trace:
[    6.256368]  remove_commit_idr_uobject+0x1b/0x80
[    6.257118]  uverbs_cleanup_ucontext+0xe4/0x190
[    6.257855]  ib_uverbs_cleanup_ucontext.constprop.14+0x19/0x40
[    6.258857]  ib_uverbs_close+0x2a/0x100
[    6.259494]  __fput+0xca/0x1c0
[    6.259938]  task_work_run+0x84/0xa0
[    6.260519]  do_exit+0x312/0xb40
[    6.261023]  ? __do_page_fault+0x24d/0x490
[    6.261707]  do_group_exit+0x3a/0xa0
[    6.262267]  SyS_exit_group+0x10/0x10
[    6.262802]  do_syscall_64+0x75/0x180
[    6.263391]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[    6.264253] RIP: 0033:0x7f1b39c49488
[    6.264827] RSP: 002b:00007ffe2de05b68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[    6.266049] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1b39c49488
[    6.267187] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[    6.268377] RBP: 00007f1b39f258e0 R08: 00000000000000e7 R09: ffffffffffffff98
[    6.269640] R10: 00007f1b3a147260 R11: 0000000000000246 R12: 00007f1b39f258e0
[    6.270783] R13: 00007f1b39f2ac20 R14: 0000000000000000 R15: 0000000000000000
[    6.271943] Code: 74 07 31 d2 e9 25 d8 6c 00 b8 da ff ff ff c3 0f 1f
44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 07 53 48 8b
5f 08 <48> 8b 80 28 02 00 00 e8 f7 d7 6c 00 85 c0 75 04 3e ff 4b 18 5b
[    6.274927] RIP: ib_dereg_mr+0xd/0x30 RSP: ffffaf5d001d7d68
[    6.275760] CR2: 0000000000000228
[    6.276200] ---[ end trace a35641f1c474bd20 ]---

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:37:53 -04:00
Boris Pismenny c2b37f7648 IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
This patch validates user provided input to prevent integer overflow due
to integer manipulation in the mlx5_ib_create_srq function.

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:31:21 -04:00
Boris Pismenny 2c292dbb39 IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
Add a check for the length of the qpin structure to prevent out-of-bounds reads

BUG: KASAN: slab-out-of-bounds in create_raw_packet_qp+0x114c/0x15e2
Read of size 8192 at addr ffff880066b99290 by task syz-executor3/549

CPU: 3 PID: 549 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #27 Hardware
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0x8d/0xd4
 print_address_description+0x73/0x290
 kasan_report+0x25c/0x370
 ? create_raw_packet_qp+0x114c/0x15e2
 memcpy+0x1f/0x50
 create_raw_packet_qp+0x114c/0x15e2
 ? create_raw_packet_qp_tis.isra.28+0x13d/0x13d
 ? lock_acquire+0x370/0x370
 create_qp_common+0x2245/0x3b50
 ? destroy_qp_user.isra.47+0x100/0x100
 ? kasan_kmalloc+0x13d/0x170
 ? sched_clock_cpu+0x18/0x180
 ? fs_reclaim_acquire.part.15+0x5/0x30
 ? __lock_acquire+0xa11/0x1da0
 ? sched_clock_cpu+0x18/0x180
 ? kmem_cache_alloc_trace+0x17e/0x310
 ? mlx5_ib_create_qp+0x30e/0x17b0
 mlx5_ib_create_qp+0x33d/0x17b0
 ? sched_clock_cpu+0x18/0x180
 ? create_qp_common+0x3b50/0x3b50
 ? lock_acquire+0x370/0x370
 ? __radix_tree_lookup+0x180/0x220
 ? uverbs_try_lock_object+0x68/0xc0
 ? rdma_lookup_get_uobject+0x114/0x240
 create_qp.isra.5+0xce4/0x1e20
 ? ib_uverbs_ex_create_cq_cb+0xa0/0xa0
 ? copy_ah_attr_from_uverbs.isra.2+0xa00/0xa00
 ? ib_uverbs_cq_event_handler+0x160/0x160
 ? __might_fault+0x17c/0x1c0
 ib_uverbs_create_qp+0x21b/0x2a0
 ? ib_uverbs_destroy_cq+0x2e0/0x2e0
 ib_uverbs_write+0x55a/0xad0
 ? ib_uverbs_destroy_cq+0x2e0/0x2e0
 ? ib_uverbs_destroy_cq+0x2e0/0x2e0
 ? ib_uverbs_open+0x760/0x760
 ? futex_wake+0x147/0x410
 ? check_prev_add+0x1680/0x1680
 ? do_futex+0x3d3/0xa60
 ? sched_clock_cpu+0x18/0x180
 __vfs_write+0xf7/0x5c0
 ? ib_uverbs_open+0x760/0x760
 ? kernel_read+0x110/0x110
 ? lock_acquire+0x370/0x370
 ? __fget+0x264/0x3b0
 vfs_write+0x18a/0x460
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x4477b9
RSP: 002b:00007f1822cadc18 EFLAGS: 00000292 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004477b9
RDX: 0000000000000070 RSI: 000000002000a000 RDI: 0000000000000005
RBP: 0000000000708000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 00000000ffffffff
R13: 0000000000005d70 R14: 00000000006e6e30 R15: 0000000020010ff0

Allocated by task 549:
 __kmalloc+0x15e/0x340
 kvmalloc_node+0xa1/0xd0
 create_user_qp.isra.46+0xd42/0x1610
 create_qp_common+0x2e63/0x3b50
 mlx5_ib_create_qp+0x33d/0x17b0
 create_qp.isra.5+0xce4/0x1e20
 ib_uverbs_create_qp+0x21b/0x2a0
 ib_uverbs_write+0x55a/0xad0
 __vfs_write+0xf7/0x5c0
 vfs_write+0x18a/0x460
 SyS_write+0xc7/0x1a0
 entry_SYSCALL_64_fastpath+0x18/0x85

Freed by task 368:
 kfree+0xeb/0x2f0
 kernfs_fop_release+0x140/0x180
 __fput+0x266/0x700
 task_work_run+0x104/0x180
 exit_to_usermode_loop+0xf7/0x110
 syscall_return_slowpath+0x298/0x370
 entry_SYSCALL_64_fastpath+0x83/0x85

The buggy address belongs to the object at ffff880066b99180  which
belongs to the cache kmalloc-512 of size 512 The buggy address is
located 272 bytes inside of  512-byte region [ffff880066b99180,
ffff880066b99380) The buggy address belongs to the page:
page:000000006040eedd count:1 mapcount:0 mapping:          (null)
index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 0000000180190019
raw: ffffea00019a7500 0000000b0000000b ffff88006c403080 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880066b99180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880066b99200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880066b99280: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
 ffff880066b99300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880066b99380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: 0fb2ed66a1 ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:30:21 -04:00
Leon Romanovsky 28e9091e31 RDMA/mlx5: Fix integer overflow while resizing CQ
The user can provide very large cqe_size which will cause to integer
overflow as it can be seen in the following UBSAN warning:

=======================================================================
UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/cq.c:1192:53
signed integer overflow:
64870 * 65536 cannot be represented in type 'int'
CPU: 0 PID: 267 Comm: syzkaller605279 Not tainted 4.15.0+ #90 Hardware
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ubsan_epilogue+0xe/0x81
 handle_overflow+0x1f3/0x251
 ? __ubsan_handle_negate_overflow+0x19b/0x19b
 ? lock_acquire+0x440/0x440
 mlx5_ib_resize_cq+0x17e7/0x1e40
 ? cyc2ns_read_end+0x10/0x10
 ? native_read_msr_safe+0x6c/0x9b
 ? cyc2ns_read_end+0x10/0x10
 ? mlx5_ib_modify_cq+0x220/0x220
 ? sched_clock_cpu+0x18/0x200
 ? lookup_get_idr_uobject+0x200/0x200
 ? rdma_lookup_get_uobject+0x145/0x2f0
 ib_uverbs_resize_cq+0x207/0x3e0
 ? ib_uverbs_ex_create_cq+0x250/0x250
 ib_uverbs_write+0x7f9/0xef0
 ? cyc2ns_read_end+0x10/0x10
 ? print_irqtrace_events+0x280/0x280
 ? ib_uverbs_ex_create_cq+0x250/0x250
 ? uverbs_devnode+0x110/0x110
 ? sched_clock_cpu+0x18/0x200
 ? do_raw_spin_trylock+0x100/0x100
 ? __lru_cache_add+0x16e/0x290
 __vfs_write+0x10d/0x700
 ? uverbs_devnode+0x110/0x110
 ? kernel_read+0x170/0x170
 ? sched_clock_cpu+0x18/0x200
 ? security_file_permission+0x93/0x260
 vfs_write+0x1b0/0x550
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x433549
RSP: 002b:00007ffe63bd1ea8 EFLAGS: 00000217
=======================================================================

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 3.13
Fixes: bde51583f4 ("IB/mlx5: Add support for resize CQ")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-09 18:10:48 -05:00
Doug Ledford 212a0cbc56 Revert "RDMA/mlx5: Fix integer overflow while resizing CQ"
The original commit of this patch has a munged log message that is
missing several of the tags the original author intended to be on the
patch.  This was due to patchworks misinterpreting a cut-n-paste
separator line as an end of message line and munging the mbox that was
used to import the patch:

https://patchwork.kernel.org/patch/10264089/

The original patch will be reapplied with a fixed commit message so the
proper tags are applied.

This reverts commit aa0de36a40.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-09 18:07:46 -05:00
Leon Romanovsky a5880b8443 RDMA/ucma: Check that user doesn't overflow QP state
The QP state is limited and declared in enum ib_qp_state,
but ucma user was able to supply any possible (u32) value.

Reported-by: syzbot+0df1ab766f8924b1edba@syzkaller.appspotmail.com
Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:25:40 -05:00
Leon Romanovsky aa0de36a40 RDMA/mlx5: Fix integer overflow while resizing CQ
The user can provide very large cqe_size which will cause to integer
overflow as it can be seen in the following UBSAN warning:

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:23:43 -05:00
Leon Romanovsky 6a21dfc0d0 RDMA/ucma: Limit possible option size
Users of ucma are supposed to provide size of option level,
in most paths it is supposed to be equal to u8 or u16, but
it is not the case for the IB path record, where it can be
multiple of struct ib_path_rec_data.

This patch takes simplest possible approach and prevents providing
values more than possible to allocate.

Reported-by: syzbot+a38b0e9f694c379ca7ce@syzkaller.appspotmail.com
Fixes: 7ce86409ad ("RDMA/ucma: Allow user space to set service type")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:18:03 -05:00
Parav Pandit bb7f8f199c IB/core: Fix possible crash to access NULL netdev
resolved_dev returned might be NULL as ifindex is transient number.
Ignoring NULL check of resolved_dev might crash the kernel.
Therefore perform NULL check before accessing resolved_dev.

Additionally rdma_resolve_ip_route() invokes addr_resolve() which
performs check and address translation for loopback ifindex.
Therefore, checking it again in rdma_resolve_ip_route() is not helpful.
Therefore, the code is simplified to avoid IFF_LOOPBACK check.

Fixes: 200298326b ("IB/core: Validate route when we init ah")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:15:40 -05:00
Selvin Xavier 942c9b6ca8 RDMA/bnxt_re: Avoid Hard lockup during error CQE processing
Hitting the following hardlockup due to a race condition in
error CQE processing.

[26146.879798] bnxt_en 0000:04:00.0: QPLIB: FP: CQ Processed Req
[26146.886346] bnxt_en 0000:04:00.0: QPLIB: wr_id[1251] = 0x0 with status 0xa
[26156.350935] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
[26156.357470] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
[26156.447957] CPU: 4 PID: 3413 Comm: kworker/4:1H Kdump: loaded
[26156.457994] Hardware name: Dell Inc. PowerEdge R430/0CN7X8,
[26156.466390] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[26156.472639] Call Trace:
[26156.475379]  <NMI>  [<ffffffff98d0d722>] dump_stack+0x19/0x1b
[26156.481833]  [<ffffffff9873f775>] watchdog_overflow_callback+0x135/0x140
[26156.489341]  [<ffffffff9877f237>] __perf_event_overflow+0x57/0x100
[26156.496256]  [<ffffffff98787c24>] perf_event_overflow+0x14/0x20
[26156.502887]  [<ffffffff9860a580>] intel_pmu_handle_irq+0x220/0x510
[26156.509813]  [<ffffffff98d16031>] perf_event_nmi_handler+0x31/0x50
[26156.516738]  [<ffffffff98d1790c>] nmi_handle.isra.0+0x8c/0x150
[26156.523273]  [<ffffffff98d17be8>] do_nmi+0x218/0x460
[26156.528834]  [<ffffffff98d16d79>] end_repeat_nmi+0x1e/0x7e
[26156.534980]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.543268]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.551556]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.559842]  <EOE>  [<ffffffff98d083e4>] queued_spin_lock_slowpath+0xb/0xf
[26156.567555]  [<ffffffff98d15690>] _raw_spin_lock+0x20/0x30
[26156.573696]  [<ffffffffc08381a1>] bnxt_qplib_lock_buddy_cq+0x31/0x40 [bnxt_re]
[26156.581789]  [<ffffffffc083bbaa>] bnxt_qplib_poll_cq+0x43a/0xf10 [bnxt_re]
[26156.589493]  [<ffffffffc083239b>] bnxt_re_poll_cq+0x9b/0x760 [bnxt_re]

The issue happens if RQ poll_cq or SQ poll_cq or Async error event tries to
put the error QP in flush list. Since SQ and RQ of each error qp are added
to two different flush list, we need to protect it using locks of
corresponding CQs. Difference in order of acquiring the lock in
SQ poll_cq and RQ poll_cq can cause a hard lockup.

Revisits the locking strategy and removes the usage of qplib_cq.hwq.lock.
Instead of this lock, introduces qplib_cq.flush_lock to handle
addition/deletion of QPs in flush list. Also, always invoke the flush_lock
in order (SQ CQ lock first and then RQ CQ lock) to avoid any potential
deadlock.

Other than the poll_cq context, the movement of QP to/from flush list can
be done in modify_qp context or from an async error event from HW.
Synchronize these operations using the bnxt_re verbs layer CQ locks.
To achieve this, adds a call back to the HW abstraction layer(qplib) to
bnxt_re ib_verbs layer in case of async error event. Also, removes the
buddy cq functions as it is no longer required.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:39 -07:00
Max Gurtovoy d3b9e8ad42 RDMA/core: Reduce poll batch for direct cq polling
Fix warning limit for kernel stack consumption:

drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes
is larger than 1024 bytes [-Werror=frame-larger-than=]

Using smaller ib_wc array on the stack brings us comfortably below that
limit again.

Fixes: 246d8b184c ("IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sergey Gorenko <sergeygo@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:39 -07:00
Dan Carpenter 5d414b178e IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
"err" is either zero or possibly uninitialized here.  It should be
-EINVAL.

Fixes: 427c1e7bcd ("{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:39 -07:00
Mark Bloch 210b1f7807 IB/mlx5: When not in dual port RoCE mode, use provided port as native
The series that introduced dual port RoCE mode assumed that we don't have
a dual port HCA that use the mlx5 driver, this is not the case for
Connect-IB HCAs. This reasoning led to assigning 1 as the native port
index which causes issue when the second port is used.

For example query_pkey() when called on the second port will return values
of the first port. Make sure that we assign the right port index as the
native port index.

Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:38 -07:00
Jack M a18177925c IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE
The commit cited below added a gid_type field (RoCEv1 or RoCEv2)
to GID properties.

When adding GIDs, this gid_type field was copied over to the
hardware gid table. However, when deleting GIDs, the gid_type field
was not copied over to the hardware gid table.

As a result, when running RoCEv2, all RoCEv2 gids in the
hardware gid table were set to type RoCEv1 when any gid was deleted.

This problem would persist until the next gid was added (which would again
restore the gid_type field for all the gids in the hardware gid table).

Fix this by copying over the gid_type field to the hardware gid table
when deleting gids, so that the gid_type of all remaining gids is
preserved when a gid is deleted.

Fixes: b699a859d1 ("IB/mlx4: Add gid_type to GID properties")
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:38 -07:00
Jack Morgenstein 0077416a3d IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs
When using IPv4 addresses in RoCEv2, the GID format for the mapped
IPv4 address should be: ::ffff:<4-byte IPv4 address>.

In the cited commit, IPv4 mapped IPV6 addresses had the 3 upper dwords
zeroed out by memset, which resulted in deleting the ffff field.

However, since procedure ipv6_addr_v4mapped() already verifies that the
gid has format ::ffff:<ipv4 address>, no change is needed for the gid,
and the memset can simply be removed.

Fixes: 7e57b85c44 ("IB/mlx4: Add support for setting RoCEv2 gids in hardware")
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:38 -07:00
Kalderon, Michal 551e1c67b4 RDMA/qedr: Fix iWARP write and send with immediate
iWARP does not support RDMA WRITE or SEND with immediate data.
Driver should check this before submitting to FW and return an
immediate error

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 19:57:37 -07:00
Kalderon, Michal e3fd112cbf RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA
Race in qedr_poll_cq, lastest_cqe wasn't protected by lock,
leading to a case where two context's accessing poll_cq at
the same time lead to one of them having a pointer to an old
latest_cqe and reading an invalid cqe element

Signed-off-by: Amit Radzi <Amit.Radzi@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 19:57:37 -07:00
Kalderon, Michal ea0ed47803 RDMA/qedr: Fix iWARP connect with port mapper
Fix iWARP connect and listen to use the mapped port for
ipv4 and ipv6. Without this fixed, running on a server
that has iwpmd enabled will not use the correct port

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 19:57:37 -07:00
Kalderon, Michal 11052696fd RDMA/qedr: Fix ipv6 destination address resolution
The wrong parameter was passed to dst_neigh_lookup

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 19:57:37 -07:00
Muneendra Kumar M 4cd482c12b IB/core : Add null pointer check in addr_resolve
dev_get_by_index is being called in addr_resolve
function which returns NULL and NULL pointer access
leads to kernel crash.

Following call trace is observed while running
rdma_lat test application

[  146.173149] BUG: unable to handle kernel NULL pointer dereference
at 00000000000004a0
[  146.173198] IP: addr_resolve+0x9e/0x3e0 [ib_core]
[  146.173221] PGD 0 P4D 0
[  146.173869] Oops: 0000 [#1] SMP PTI
[  146.182859] CPU: 8 PID: 127 Comm: kworker/8:1 Tainted: G  O 4.15.0-rc6+ #18
[  146.183758] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01KN179,
 BIOS-[TCE132H-2.50]- 10/11/2017
[  146.184691] Workqueue: ib_cm cm_work_handler [ib_cm]
[  146.185632] RIP: 0010:addr_resolve+0x9e/0x3e0 [ib_core]
[  146.186584] RSP: 0018:ffffc9000362faa0 EFLAGS: 00010246
[  146.187521] RAX: 000000000000001b RBX: ffffc9000362fc08 RCX:
0000000000000006
[  146.188472] RDX: 0000000000000000 RSI: 0000000000000096 RDI
: ffff88087fc16990
[  146.189427] RBP: ffffc9000362fb18 R08: 00000000ffffff9d R09:
00000000000004ac
[  146.190392] R10: 00000000000001e7 R11: 0000000000000001 R12:
ffff88086af2e090
[  146.191361] R13: 0000000000000000 R14: 0000000000000001 R15:
00000000ffffff9d
[  146.192327] FS:  0000000000000000(0000) GS:ffff88087fc00000(0000)
knlGS:0000000000000000
[  146.193301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  146.194274] CR2: 00000000000004a0 CR3: 000000000220a002 CR4:
00000000003606e0
[  146.195258] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  146.196256] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  146.197231] Call Trace:
[  146.198209]  ? rdma_addr_register_client+0x30/0x30 [ib_core]
[  146.199199]  rdma_resolve_ip+0x1af/0x280 [ib_core]
[  146.200196]  rdma_addr_find_l2_eth_by_grh+0x154/0x2b0 [ib_core]

The below patch adds the missing NULL pointer check
returned by dev_get_by_index before accessing the netdev to
avoid kernel crash.

We observed the below crash when we try to do the below test.

 server                       client
 ---------                    ---------
 |1.1.1.1|<----rxe-channel--->|1.1.1.2|
 ---------                    ---------

On server: rdma_lat -c -n 2 -s 1024
On client:rdma_lat 1.1.1.1 -c -n 2 -s 1024

Fixes: 200298326b ("IB/core: Validate route when we init ah")
Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:33 -07:00
Selvin Xavier 497158aa5f RDMA/bnxt_re: Fix the ib_reg failure cleanup
Release the netdev references in the cleanup path.  Invokes the cleanup
routines if bnxt_re_ib_reg fails.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:33 -07:00
Devesh Sharma c354dff00d RDMA/bnxt_re: Fix incorrect DB offset calculation
To support host systems with non 4K page size, l2_db_size shall be
calculated with 4096 instead of PAGE_SIZE. Also, supply the host page size
to FW during initialization.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Devesh Sharma a45bc17b36 RDMA/bnxt_re: Unconditionly fence non wire memory operations
HW requires an unconditonal fence for all non-wire memory operations
through SQ. This guarantees the completions of these memory operations.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Parav Pandit 2fb4f4eadd IB/core: Fix missing RDMA cgroups release in case of failure to register device
During IB device registration process, if query_device() fails or if
ib_core fails to registers sysfs entries, rdma cgroup cleanup is
skipped.

Cc: <stable@vger.kernel.org> # v4.2+
Fixes: 4be3a4fa51 ("IB/core: Fix kernel crash during fail to initialize device")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Moni Shoua 65389322b2 IB/mlx: Set slid to zero in Ethernet completion struct
IB spec says that a lid should be ignored when link layer is Ethernet,
for example when building or parsing a CM request message (CA17-34).
However, since ib_lid_be16() and ib_lid_cpu16()  validates the slid,
not only when link layer is IB, we set the slid to zero to prevent
false warnings in the kernel log.

Fixes: 62ede77799 ("Add OPA extended LID support")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Daniel Jurgens aba4621346 {net, IB}/mlx5: Raise fatal IB event when sys error occurs
All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Fixes: 89d44f0a6c73("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Noa Osherovich e7b169f344 IB/mlx5: Avoid passing an invalid QP type to firmware
During QP creation, the mlx5 driver translates the QP type to an
internal value which is passed on to FW. There was no check to make
sure that the translated value is valid, and -EINVAL was coerced into
the mailbox command.

Current firmware refuses this as an invalid QP type, but future/past
firmware may do something else.

Fixes: 09a7d9eca1 ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc')
Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Sergey Gorenko da343b6d90 IB/mlx5: Fix incorrect size of klms in the memory region
The value of mr->ndescs greater than mr->max_descs is set in the
function mlx5_ib_sg_to_klms() if sg_nents is greater than
mr->max_descs. This is an invalid value and it causes the
following error when registering mr:

mlx5_0:dump_cqe:276:(pid 193): dump error cqe
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 0f 00 78 06 25 00 00 8b 08 1e 8f d3

Cc: <stable@vger.kernel.org> # 4.5
Fixes: b005d31647 ("mlx5: Add arbitrary sg list support")
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Leon Romanovsky f45765872e RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type
Attempt to modify XRC_TGT QP type from the user space (ibv_xsrq_pingpong
invocation) will trigger the following kernel panic. It is caused by the
fact that such QPs missed uobject initialization.

[   17.408845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[   17.412645] IP: rdma_lookup_put_uobject+0x9/0x50
[   17.416567] PGD 0 P4D 0
[   17.419262] Oops: 0000 [#1] SMP PTI
[   17.422915] CPU: 0 PID: 455 Comm: ibv_xsrq_pingpo Not tainted 4.16.0-rc1+ #86
[   17.424765] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   17.427399] RIP: 0010:rdma_lookup_put_uobject+0x9/0x50
[   17.428445] RSP: 0018:ffffb8c7401e7c90 EFLAGS: 00010246
[   17.429543] RAX: 0000000000000000 RBX: ffffb8c7401e7cf8 RCX: 0000000000000000
[   17.432426] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[   17.437448] RBP: 0000000000000000 R08: 00000000000218f0 R09: ffffffff8ebc4cac
[   17.440223] R10: fffff6038052cd80 R11: ffff967694b36400 R12: ffff96769391f800
[   17.442184] R13: ffffb8c7401e7cd8 R14: 0000000000000000 R15: ffff967699f60000
[   17.443971] FS:  00007fc29207d700(0000) GS:ffff96769fc00000(0000) knlGS:0000000000000000
[   17.446623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   17.448059] CR2: 0000000000000048 CR3: 000000001397a000 CR4: 00000000000006b0
[   17.449677] Call Trace:
[   17.450247]  modify_qp.isra.20+0x219/0x2f0
[   17.451151]  ib_uverbs_modify_qp+0x90/0xe0
[   17.452126]  ib_uverbs_write+0x1d2/0x3c0
[   17.453897]  ? __handle_mm_fault+0x93c/0xe40
[   17.454938]  __vfs_write+0x36/0x180
[   17.455875]  vfs_write+0xad/0x1e0
[   17.456766]  SyS_write+0x52/0xc0
[   17.457632]  do_syscall_64+0x75/0x180
[   17.458631]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   17.460004] RIP: 0033:0x7fc29198f5a0
[   17.460982] RSP: 002b:00007ffccc71f018 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   17.463043] RAX: ffffffffffffffda RBX: 0000000000000078 RCX: 00007fc29198f5a0
[   17.464581] RDX: 0000000000000078 RSI: 00007ffccc71f050 RDI: 0000000000000003
[   17.466148] RBP: 0000000000000000 R08: 0000000000000078 R09: 00007ffccc71f050
[   17.467750] R10: 000055b6cf87c248 R11: 0000000000000246 R12: 00007ffccc71f300
[   17.469541] R13: 000055b6cf8733a0 R14: 0000000000000000 R15: 0000000000000000
[   17.471151] Code: 00 00 0f 1f 44 00 00 48 8b 47 48 48 8b 00 48 8b 40 10 e9 0b 8b 68 00 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 89 f5 <48> 8b 47 48 48 89 fb 40 0f b6 f6 48 8b 00 48 8b 40 20 e8 e0 8a
[   17.475185] RIP: rdma_lookup_put_uobject+0x9/0x50 RSP: ffffb8c7401e7c90
[   17.476841] CR2: 0000000000000048
[   17.477764] ---[ end trace 1dbcc5354071a712 ]---
[   17.478880] Kernel panic - not syncing: Fatal exception
[   17.480277] Kernel Offset: 0xd000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: 2f08ee363f ("RDMA/restrack: don't use uaccess_kernel()")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-21 13:52:19 -05:00
Selvin Xavier 7374fbd9e1 RDMA/bnxt_re: Avoid system hang during device un-reg
BNXT_RE_FLAG_TASK_IN_PROG doesn't handle multiple work
requests posted together. Track schedule of multiple
workqueue items by maintaining a per device counter
and proceed with IB dereg only if this counter is zero.
flush_workqueue is no longer required from
NETDEV_UNREGISTER path.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:59:47 -05:00
Selvin Xavier dcdaba0806 RDMA/bnxt_re: Fix system crash during load/unload
During driver unload, the driver proceeds with cleanup
without waiting for the scheduled events. So the device
pointers get freed up and driver crashes when the events
are scheduled later.

Flush the bnxt_re_task work queue before starting
device removal.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:57:21 -05:00
Selvin Xavier 3b921e3bc4 RDMA/bnxt_re: Synchronize destroy_qp with poll_cq
Avoid system crash when destroy_qp is invoked while
the driver is processing the poll_cq. Synchronize these
functions using the cq_lock.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:57:21 -05:00
Devesh Sharma 6b4521f517 RDMA/bnxt_re: Unpin SQ and RQ memory if QP create fails
Driver leaves the QP memory pinned if QP create command
fails from the FW. Avoids this scenario by adding a proper
exit path if the FW command fails.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:57:21 -05:00
Devesh Sharma 7ff662b761 RDMA/bnxt_re: Disable atomic capability on bnxt_re adapters
More testing needs to be done before enabling this feature.
Disabling the feature temporarily

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:57:21 -05:00
Steve Wise 2f08ee363f RDMA/restrack: don't use uaccess_kernel()
uaccess_kernel() isn't sufficient to determine if an rdma resource is
user-mode or not.  For example, resources allocated in the add_one()
function of an ib_client get falsely labeled as user mode, when they
are kernel mode allocations.  EG: mad qps.

The result is that these qps are skipped over during a nldev query
because of an erroneous namespace mismatch.

So now we determine if the resource is user-mode by looking at the object
struct's uobject or similar pointer to know if it was allocated for user
mode applications.

Fixes: 02d8883f52 ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-16 10:18:11 -07:00
Leon Romanovsky 2188558621 RDMA/verbs: Check existence of function prior to accessing it
Update all the flows to ensure that function pointer exists prior
to accessing it.

This is much safer than checking the uverbs_ex_mask variable, especially
since we know that test isn't working properly and will be removed
in -next.

This prevents a user triggereable oops.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-16 09:18:55 -07:00
Adit Ranadive 1f5a6c47aa RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file
This ensures that we return the right structures back to userspace.
Otherwise, it looks like the reserved fields in the response structures
in userspace might have uninitialized data in them.

Fixes: 8b10ba783c ("RDMA/vmw_pvrdma: Add shared receive queue support")
Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:28 -07:00
Leon Romanovsky 5d4c05c3ee RDMA/uverbs: Sanitize user entered port numbers prior to access it
==================================================================
BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs+0x6f2/0x8c0
Read of size 4 at addr ffff88006476a198 by task syzkaller697701/265

CPU: 0 PID: 265 Comm: syzkaller697701 Not tainted 4.15.0+ #90
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ? show_regs_print_info+0x17/0x17
 ? lock_contended+0x11a0/0x11a0
 print_address_description+0x83/0x3e0
 kasan_report+0x18c/0x4b0
 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0
 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0
 ? lookup_get_idr_uobject+0x120/0x200
 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0
 copy_ah_attr_from_uverbs+0x6f2/0x8c0
 ? modify_qp+0xd0e/0x1350
 modify_qp+0xd0e/0x1350
 ib_uverbs_modify_qp+0xf9/0x170
 ? ib_uverbs_query_qp+0xa70/0xa70
 ib_uverbs_write+0x7f9/0xef0
 ? attach_entity_load_avg+0x8b0/0x8b0
 ? ib_uverbs_query_qp+0xa70/0xa70
 ? uverbs_devnode+0x110/0x110
 ? cyc2ns_read_end+0x10/0x10
 ? print_irqtrace_events+0x280/0x280
 ? sched_clock_cpu+0x18/0x200
 ? _raw_spin_unlock_irq+0x29/0x40
 ? _raw_spin_unlock_irq+0x29/0x40
 ? _raw_spin_unlock_irq+0x29/0x40
 ? time_hardirqs_on+0x27/0x670
 __vfs_write+0x10d/0x700
 ? uverbs_devnode+0x110/0x110
 ? kernel_read+0x170/0x170
 ? _raw_spin_unlock_irq+0x29/0x40
 ? finish_task_switch+0x1bd/0x7a0
 ? finish_task_switch+0x194/0x7a0
 ? prandom_u32_state+0xe/0x180
 ? rcu_read_unlock+0x80/0x80
 ? security_file_permission+0x93/0x260
 vfs_write+0x1b0/0x550
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x433c29
RSP: 002b:00007ffcf2be82a8 EFLAGS: 00000217

Allocated by task 62:
 kasan_kmalloc+0xa0/0xd0
 kmem_cache_alloc+0x141/0x480
 dup_fd+0x101/0xcc0
 copy_process.part.62+0x166f/0x4390
 _do_fork+0x1cb/0xe90
 kernel_thread+0x34/0x40
 call_usermodehelper_exec_work+0x112/0x260
 process_one_work+0x929/0x1aa0
 worker_thread+0x5c6/0x12a0
 kthread+0x346/0x510
 ret_from_fork+0x3a/0x50

Freed by task 259:
 kasan_slab_free+0x71/0xc0
 kmem_cache_free+0xf3/0x4c0
 put_files_struct+0x225/0x2c0
 exit_files+0x88/0xc0
 do_exit+0x67c/0x1520
 do_group_exit+0xe8/0x380
 SyS_exit_group+0x1e/0x20
 entry_SYSCALL_64_fastpath+0x1e/0x8b

The buggy address belongs to the object at ffff88006476a000
 which belongs to the cache files_cache of size 832
The buggy address is located 408 bytes inside of
 832-byte region [ffff88006476a000, ffff88006476a340)
The buggy address belongs to the page:
page:ffffea000191da80 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 0000000100080008
raw: 0000000000000000 0000000100000001 ffff88006bcf7a80 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88006476a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88006476a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88006476a180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
 ffff88006476a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88006476a280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.11
Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:27 -07:00
Leon Romanovsky 1ff5325c3c RDMA/uverbs: Fix circular locking dependency
Avoid circular locking dependency by calling
to uobj_alloc_commit() outside of xrcd_tree_mutex lock.

======================================================
WARNING: possible circular locking dependency detected
4.15.0+ #87 Not tainted
------------------------------------------------------
syzkaller401056/269 is trying to acquire lock:
 (&uverbs_dev->xrcd_tree_mutex){+.+.}, at: [<000000006c12d2cd>] uverbs_free_xrcd+0xd2/0x360

but task is already holding lock:
 (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&ucontext->uobjects_lock){+.+.}:
       __mutex_lock+0x111/0x1720
       rdma_alloc_commit_uobject+0x22c/0x600
       ib_uverbs_open_xrcd+0x61a/0xdd0
       ib_uverbs_write+0x7f9/0xef0
       __vfs_write+0x10d/0x700
       vfs_write+0x1b0/0x550
       SyS_write+0xc7/0x1a0
       entry_SYSCALL_64_fastpath+0x1e/0x8b

-> #0 (&uverbs_dev->xrcd_tree_mutex){+.+.}:
       lock_acquire+0x19d/0x440
       __mutex_lock+0x111/0x1720
       uverbs_free_xrcd+0xd2/0x360
       remove_commit_idr_uobject+0x6d/0x110
       uverbs_cleanup_ucontext+0x2f0/0x730
       ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120
       ib_uverbs_close+0xf2/0x570
       __fput+0x2cd/0x8d0
       task_work_run+0xec/0x1d0
       do_exit+0x6a1/0x1520
       do_group_exit+0xe8/0x380
       SyS_exit_group+0x1e/0x20
       entry_SYSCALL_64_fastpath+0x1e/0x8b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ucontext->uobjects_lock);
                               lock(&uverbs_dev->xrcd_tree_mutex);
                               lock(&ucontext->uobjects_lock);
  lock(&uverbs_dev->xrcd_tree_mutex);

 *** DEADLOCK ***

3 locks held by syzkaller401056/269:
 #0:  (&file->cleanup_mutex){+.+.}, at: [<00000000c9f0c252>] ib_uverbs_close+0xac/0x570
 #1:  (&ucontext->cleanup_rwsem){++++}, at: [<00000000b6994d49>] uverbs_cleanup_ucontext+0xf6/0x730
 #2:  (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730

stack backtrace:
CPU: 0 PID: 269 Comm: syzkaller401056 Not tainted 4.15.0+ #87
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ? uverbs_cleanup_ucontext+0x168/0x730
 ? console_unlock+0x502/0xbd0
 print_circular_bug.isra.24+0x35e/0x396
 ? print_circular_bug_header+0x12e/0x12e
 ? find_usage_backwards+0x30/0x30
 ? entry_SYSCALL_64_fastpath+0x1e/0x8b
 validate_chain.isra.28+0x25d1/0x40c0
 ? check_usage+0xb70/0xb70
 ? graph_lock+0x160/0x160
 ? find_usage_backwards+0x30/0x30
 ? cyc2ns_read_end+0x10/0x10
 ? print_irqtrace_events+0x280/0x280
 ? __lock_acquire+0x93d/0x1630
 __lock_acquire+0x93d/0x1630
 lock_acquire+0x19d/0x440
 ? uverbs_free_xrcd+0xd2/0x360
 __mutex_lock+0x111/0x1720
 ? uverbs_free_xrcd+0xd2/0x360
 ? uverbs_free_xrcd+0xd2/0x360
 ? __mutex_lock+0x828/0x1720
 ? mutex_lock_io_nested+0x1550/0x1550
 ? uverbs_cleanup_ucontext+0x168/0x730
 ? __lock_acquire+0x9a9/0x1630
 ? mutex_lock_io_nested+0x1550/0x1550
 ? uverbs_cleanup_ucontext+0xf6/0x730
 ? lock_contended+0x11a0/0x11a0
 ? uverbs_free_xrcd+0xd2/0x360
 uverbs_free_xrcd+0xd2/0x360
 remove_commit_idr_uobject+0x6d/0x110
 uverbs_cleanup_ucontext+0x2f0/0x730
 ? sched_clock_cpu+0x18/0x200
 ? uverbs_close_fd+0x1c0/0x1c0
 ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120
 ib_uverbs_close+0xf2/0x570
 ? ib_uverbs_remove_one+0xb50/0xb50
 ? ib_uverbs_remove_one+0xb50/0xb50
 __fput+0x2cd/0x8d0
 task_work_run+0xec/0x1d0
 do_exit+0x6a1/0x1520
 ? fsnotify_first_mark+0x220/0x220
 ? exit_notify+0x9f0/0x9f0
 ? entry_SYSCALL_64_fastpath+0x5/0x8b
 ? entry_SYSCALL_64_fastpath+0x5/0x8b
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 ? time_hardirqs_on+0x27/0x670
 ? time_hardirqs_off+0x27/0x490
 ? syscall_return_slowpath+0x6c/0x460
 ? entry_SYSCALL_64_fastpath+0x5/0x8b
 do_group_exit+0xe8/0x380
 SyS_exit_group+0x1e/0x20
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x431ce9

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.11
Fixes: fd3c7904db ("IB/core: Change idr objects to use the new schema")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:27 -07:00
Leon Romanovsky 5c2e1c4f92 RDMA/uverbs: Fix bad unlock balance in ib_uverbs_close_xrcd
There is no matching lock for this mutex. Git history suggests this is
just a missed remnant from an earlier version of the function before
this locking was moved into uverbs_free_xrcd.

Originally this lock was protecting the xrcd_table_delete()

=====================================
WARNING: bad unlock balance detected!
4.15.0+ #87 Not tainted
-------------------------------------
syzkaller223405/269 is trying to release lock (&uverbs_dev->xrcd_tree_mutex) at:
[<00000000b8703372>] ib_uverbs_close_xrcd+0x195/0x1f0
but there are no more locks to release!

other info that might help us debug this:
1 lock held by syzkaller223405/269:
 #0:  (&uverbs_dev->disassociate_srcu){....}, at: [<000000005af3b960>] ib_uverbs_write+0x265/0xef0

stack backtrace:
CPU: 0 PID: 269 Comm: syzkaller223405 Not tainted 4.15.0+ #87
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ? ib_uverbs_write+0x265/0xef0
 ? console_unlock+0x502/0xbd0
 ? ib_uverbs_close_xrcd+0x195/0x1f0
 print_unlock_imbalance_bug+0x131/0x160
 lock_release+0x59d/0x1100
 ? ib_uverbs_close_xrcd+0x195/0x1f0
 ? lock_acquire+0x440/0x440
 ? lock_acquire+0x440/0x440
 __mutex_unlock_slowpath+0x88/0x670
 ? wait_for_completion+0x4c0/0x4c0
 ? rdma_lookup_get_uobject+0x145/0x2f0
 ib_uverbs_close_xrcd+0x195/0x1f0
 ? ib_uverbs_open_xrcd+0xdd0/0xdd0
 ib_uverbs_write+0x7f9/0xef0
 ? cyc2ns_read_end+0x10/0x10
 ? ib_uverbs_open_xrcd+0xdd0/0xdd0
 ? uverbs_devnode+0x110/0x110
 ? cyc2ns_read_end+0x10/0x10
 ? cyc2ns_read_end+0x10/0x10
 ? sched_clock_cpu+0x18/0x200
 __vfs_write+0x10d/0x700
 ? uverbs_devnode+0x110/0x110
 ? kernel_read+0x170/0x170
 ? __fget+0x358/0x5d0
 ? security_file_permission+0x93/0x260
 vfs_write+0x1b0/0x550
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x4335c9

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.11
Fixes: fd3c7904db ("IB/core: Change idr objects to use the new schema")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:27 -07:00
Leon Romanovsky 0cba0efcc7 RDMA/restrack: Increment CQ restrack object before committing
Once the uobj is committed it is immediately possible another thread
could destroy it, which worst case, can result in a use-after-free
of the restrack objects.

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: 08f294a152 ("RDMA/core: Add resource tracking for create and destroy CQs")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:26 -07:00
Leon Romanovsky 3f802b162d RDMA/uverbs: Protect from command mask overflow
The command number is not bounds checked against the command mask before it
is shifted, resulting in an ubsan hit. This does not cause malfunction since
the command number is eventually bounds checked, but we can make this ubsan
clean by moving the bounds check to before the mask check.

================================================================================
UBSAN: Undefined behaviour in
drivers/infiniband/core/uverbs_main.c:647:21
shift exponent 207 is too large for 64-bit type 'long long unsigned int'
CPU: 0 PID: 446 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #61
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0xde/0x164
? dma_virt_map_sg+0x22c/0x22c
ubsan_epilogue+0xe/0x81
__ubsan_handle_shift_out_of_bounds+0x293/0x2f7
? debug_check_no_locks_freed+0x340/0x340
? __ubsan_handle_load_invalid_value+0x19b/0x19b
? lock_acquire+0x440/0x440
? lock_acquire+0x19d/0x440
? __might_fault+0xf4/0x240
? ib_uverbs_write+0x68d/0xe20
ib_uverbs_write+0x68d/0xe20
? __lock_acquire+0xcf7/0x3940
? uverbs_devnode+0x110/0x110
? cyc2ns_read_end+0x10/0x10
? sched_clock_cpu+0x18/0x200
? sched_clock_cpu+0x18/0x200
__vfs_write+0x10d/0x700
? uverbs_devnode+0x110/0x110
? kernel_read+0x170/0x170
? __fget+0x35b/0x5d0
? security_file_permission+0x93/0x260
vfs_write+0x1b0/0x550
SyS_write+0xc7/0x1a0
? SyS_read+0x1a0/0x1a0
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x448e29
RSP: 002b:00007f033f567c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f033f5686bc RCX: 0000000000448e29
RDX: 0000000000000060 RSI: 0000000020001000 RDI: 0000000000000012
RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000056a0 R14: 00000000006e8740 R15: 0000000000000000
================================================================================

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.5
Fixes: 2dbd5186a3 ("IB/core: IB/core: Allow legacy verbs through extended interfaces")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:26 -07:00
Jason Gunthorpe ec6f8401c4 IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy
If remove_commit fails then the lock is left locked while the uobj still
exists. Eventually the kernel will deadlock.

lockdep detects this and says:

 test/4221 is leaving the kernel with locks still held!
 1 lock held by test/4221:
  #0:  (&ucontext->cleanup_rwsem){.+.+}, at: [<000000001e5c7523>] rdma_explicit_destroy+0x37/0x120 [ib_uverbs]

Fixes: 4da70da23e ("IB/core: Explicitly destroy an object while keeping uobject")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:26 -07:00
Jason Gunthorpe 104f268d43 IB/uverbs: Improve lockdep_check
This is really being used as an assert that the expected usecnt
is being held and implicitly that the usecnt is valid. Rename it to
assert_uverbs_usecnt and tighten the checks to only accept valid
values of usecnt (eg 0 and < -1 are invalid).

The tigher checkes make the assertion cover more cases and is more
likely to find bugs via syzkaller/etc.

Fixes: 3832125624 ("IB/core: Add support for idr types")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:47 -07:00
Leon Romanovsky 6623e3e3cd RDMA/uverbs: Protect from races between lookup and destroy of uobjects
The race is between lookup_get_idr_uobject and
uverbs_idr_remove_uobj -> uverbs_uobject_put.

We deliberately do not call sychronize_rcu after the idr_remove in
uverbs_idr_remove_uobj for performance reasons, instead we call
kfree_rcu() during uverbs_uobject_put.

However, this means we can obtain pointers to uobj's that have
already been released and must protect against krefing them
using kref_get_unless_zero.

==================================================================
BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
Read of size 4 at addr ffff88005fda1ac8 by task syz-executor2/441

CPU: 1 PID: 441 Comm: syz-executor2 Not tainted 4.15.0-rc2+ #56
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xd4
print_address_description+0x73/0x290
kasan_report+0x25c/0x370
? copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
? uverbs_try_lock_object+0x68/0xc0
? modify_qp.isra.7+0xdc4/0x10e0
modify_qp.isra.7+0xdc4/0x10e0
ib_uverbs_modify_qp+0xfe/0x170
? ib_uverbs_query_qp+0x970/0x970
? __lock_acquire+0xa11/0x1da0
ib_uverbs_write+0x55a/0xad0
? ib_uverbs_query_qp+0x970/0x970
? ib_uverbs_query_qp+0x970/0x970
? ib_uverbs_open+0x760/0x760
? futex_wake+0x147/0x410
? sched_clock_cpu+0x18/0x180
? check_prev_add+0x1680/0x1680
? do_futex+0x3b6/0xa30
? sched_clock_cpu+0x18/0x180
__vfs_write+0xf7/0x5c0
? ib_uverbs_open+0x760/0x760
? kernel_read+0x110/0x110
? lock_acquire+0x370/0x370
? __fget+0x264/0x3b0
vfs_write+0x18a/0x460
SyS_write+0xc7/0x1a0
? SyS_read+0x1a0/0x1a0
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x448e29
RSP: 002b:00007f443fee0c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f443fee16bc RCX: 0000000000448e29
RDX: 0000000000000078 RSI: 00000000209f8000 RDI: 0000000000000012
RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000008e98 R14: 00000000006ebf38 R15: 0000000000000000

Allocated by task 1:
kmem_cache_alloc_trace+0x16c/0x2f0
mlx5_alloc_cmd_msg+0x12e/0x670
cmd_exec+0x419/0x1810
mlx5_cmd_exec+0x40/0x70
mlx5_core_mad_ifc+0x187/0x220
mlx5_MAD_IFC+0xd7/0x1b0
mlx5_query_mad_ifc_gids+0x1f3/0x650
mlx5_ib_query_gid+0xa4/0xc0
ib_query_gid+0x152/0x1a0
ib_query_port+0x21e/0x290
mlx5_port_immutable+0x30f/0x490
ib_register_device+0x5dd/0x1130
mlx5_ib_add+0x3e7/0x700
mlx5_add_device+0x124/0x510
mlx5_register_interface+0x11f/0x1c0
mlx5_ib_init+0x56/0x61
do_one_initcall+0xa3/0x250
kernel_init_freeable+0x309/0x3b8
kernel_init+0x14/0x180
ret_from_fork+0x24/0x30

Freed by task 1:
kfree+0xeb/0x2f0
mlx5_free_cmd_msg+0xcd/0x140
cmd_exec+0xeba/0x1810
mlx5_cmd_exec+0x40/0x70
mlx5_core_mad_ifc+0x187/0x220
mlx5_MAD_IFC+0xd7/0x1b0
mlx5_query_mad_ifc_gids+0x1f3/0x650
mlx5_ib_query_gid+0xa4/0xc0
ib_query_gid+0x152/0x1a0
ib_query_port+0x21e/0x290
mlx5_port_immutable+0x30f/0x490
ib_register_device+0x5dd/0x1130
mlx5_ib_add+0x3e7/0x700
mlx5_add_device+0x124/0x510
mlx5_register_interface+0x11f/0x1c0
mlx5_ib_init+0x56/0x61
do_one_initcall+0xa3/0x250
kernel_init_freeable+0x309/0x3b8
kernel_init+0x14/0x180
ret_from_fork+0x24/0x30

The buggy address belongs to the object at ffff88005fda1ab0
which belongs to the cache kmalloc-32 of size 32
The buggy address is located 24 bytes inside of
32-byte region [ffff88005fda1ab0, ffff88005fda1ad0)
The buggy address belongs to the page:
page:00000000d5655c19 count:1 mapcount:0 mapping: (null)
index:0xffff88005fda1fc0
flags: 0x4000000000000100(slab)
raw: 4000000000000100 0000000000000000 ffff88005fda1fc0 0000000180550008
raw: ffffea00017f6780 0000000400000004 ffff88006c803980 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88005fda1980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb
ffff88005fda1a00: fb fb fc fc fb fb fb fb fc fc 00 00 00 00 fc fc
ffff88005fda1a80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb
ffff88005fda1b00: fc fc 00 00 00 00 fc fc fb fb fb fb fc fc fb fb
ffff88005fda1b80: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc
==================================================================@

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 4.11
Fixes: 3832125624 ("IB/core: Add support for idr types")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:47 -07:00
Jason Gunthorpe d9dc7a3500 IB/uverbs: Hold the uobj write lock after allocate
This clarifies the design intention that time between allocate and
commit has the uobj exclusive to the caller. We already guarantee
this by delaying publishing the uobj pointer via idr_insert,
fd_install, list_add, etc.

Additionally holding the usecnt lock during this period provides
extra clarity and more protection against future mistakes.

Fixes: 3832125624 ("IB/core: Add support for idr types")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:46 -07:00
Matan Barak 4d39a959bc IB/uverbs: Fix possible oops with duplicate ioctl attributes
If the same attribute is listed twice by the user in the ioctl attribute
list then error unwind can cause the kernel to deref garbage.

This happens when an object with WRITE access is sent twice. The second
parse properly fails but corrupts the state required for the error unwind
it triggers.

Fixing this by making duplicates in the attribute list invalid. This is
not something we need to support.

The ioctl interface is currently recommended to be disabled in kConfig.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 14:59:46 -07:00