Commit Graph

240 Commits

Author SHA1 Message Date
Michael S. Tsirkin
a39ee449f9 vhost: validate vhost_get_vq_desc return value
vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfffffff2
as vector size, which exceeds the allocated size.

The code in question was introduced in commit
8dd014adfe
    vhost-net: mergeable buffers support

CVE-2014-0055

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:10:35 -04:00
Michael S. Tsirkin
d8316f3991 vhost: fix total length when packets are too short
When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.

This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.

Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.

Fix this up by detecting this overrun and doing packet drop
immediately.

CVE-2014-0077

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:07:31 -04:00
Linus Torvalds
702256e604 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "The bulk of the series are bugfixes for qla2xxx target NPIV support
  that went in for v3.14-rc1.  Also included are a few DIF related
  fixes, a qla2xxx fix (Cc'ed to stable) from Greg W., and vhost/scsi
  protocol version related fix from Venkatesh.

  Also just a heads up that a series to address a number of issues with
  iser-target active I/O reset/shutdown is still being tested, and will
  be included in a separate -rc6 PULL request"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  vhost/scsi: Check LUN structure byte 0 is set to 1, per spec
  qla2xxx: Fix kernel panic on selective retransmission request
  Target/sbc: Don't use sg as iterator in sbc_verify_read
  target: Add DIF sense codes in transport_generic_request_failure
  target/sbc: Fix sbc_dif_copy_prot addr offset bug
  tcm_qla2xxx: Fix NAA formatted name for NPIV WWPNs
  tcm_qla2xxx: Perform configfs depend/undepend for base_tpg
  tcm_qla2xxx: Add NPIV specific enable/disable attribute logic
  qla2xxx: Check + fail when npiv_vports_inuse exists in shutdown
  qla2xxx: Fix qlt_lport_register base_vha callback race
2014-03-01 21:33:09 -06:00
Venkatesh Srinivas
7fe412d07d vhost/scsi: Check LUN structure byte 0 is set to 1, per spec
The virtio spec requires byte 0 of the virtio-scsi LUN structure
to be '1'.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-24 16:19:43 -08:00
Michael S. Tsirkin
b0c057ca7e vhost: fix a theoretical race in device cleanup
vhost_zerocopy_callback accesses VQ right after it drops a ubuf
reference.  In theory, this could race with device removal which waits
on the ubuf kref, and crash on use after free.

Do all accesses within rcu read side critical section, and synchronize
on release.

Since callbacks are always invoked from bh, synchronize_rcu_bh seems
enough and will help release complete a bit faster.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 18:47:30 -05:00
Michael S. Tsirkin
0ad8b480d6 vhost: fix ref cnt checking deadlock
vhost checked the counter within the refcnt before decrementing.  It
really wanted to know that it is the one that has the last reference, as
a way to batch freeing resources a bit more efficiently.

Note: we only let refcount go to 0 on device release.

This works well but we now access the ref counter twice so there's a
race: all users might see a high count and decide to defer freeing
resources.
In the end no one initiates freeing resources until the last reference
is gone (which is on VM shotdown so might happen after a looooong time).

Let's do what we probably should have done straight away:
switch from kref to plain atomic, documenting the
semantics, return the refcount value atomically after decrement,
then use that to avoid the deadlock.

Reported-by: Qin Chuanyu <qinchuanyu@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 18:47:30 -05:00
Linus Torvalds
4e13c5d021 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

  - add support for SCSI Referrals (Hannes)
  - add support for T10 DIF into target core (nab + mkp)
  - add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
  - add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab)
  - prep changes to iser-target for >= v3.15 T10 DIF support (Sagi)
  - add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
  - allow percpu_ida_alloc() to receive task state bitmask (Kent)
  - fix >= v3.12 iscsi-target session reset hung task regression (nab)
  - fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab)
  - fix a long-standing network portal creation race (Andy)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  target: Fix percpu_ref_put race in transport_lun_remove_cmd
  target/iscsi: Fix network portal creation race
  target: Report bad sector in sense data for DIF errors
  iscsi-target: Convert gfp_t parameter to task state bitmask
  iscsi-target: Fix connection reset hang with percpu_ida_alloc
  percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
  iscsi-target: Pre-allocate more tags to avoid ack starvation
  qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
  qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO.
  qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
  IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
  IB/isert: Move fastreg descriptor creation to a function
  IB/isert: Avoid frwr notation, user fastreg
  IB/isert: seperate connection protection domains and dma MRs
  tcm_loop: Enable DIF/DIX modes in SCSI host LLD
  target/rd: Add DIF protection into rd_execute_rw
  target/rd: Add support for protection SGL setup + release
  target/rd: Refactor rd_build_device_space + rd_release_device_space
  target/file: Add DIF protection support to fd_execute_rw
  target/file: Add DIF protection init/format support
  ...
2014-01-31 15:31:23 -08:00
Kent Overstreet
6f6b5d1ec5 percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
This patch changes percpu_ida_alloc() + callers to accept task state
bitmask for prepare_to_wait() for code like target/iscsi that needs
it for interruptible sleep, that is provided in a subsequent patch.

It now expects TASK_UNINTERRUPTIBLE when the caller is able to sleep
waiting for a new tag, or TASK_RUNNING when the caller cannot sleep,
and is forced to return a negative value when no tags are available.

v2 changes:
  - Include blk-mq + tcm_fc + vhost/scsi + target/iscsi changes
  - Drop signal_pending_state() call
v3 changes:
  - Only call prepare_to_wait() + finish_wait() when != TASK_RUNNING
    (PeterZ)

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-23 20:17:18 +00:00
Nicholas Bellinger
def2b339b4 target: Add protection SGLs to target_submit_cmd_map_sgls
This patch adds support to target_submit_cmd_map_sgls() for
accepting 'sgl_prot' + 'sgl_prot_count' parameters for
DIF protection information.

Note the passed parameters are stored at se_cmd->t_prot_sg
and se_cmd->t_prot_nents respectively.

Also, update tcm_loop and vhost-scsi fabrics usage of
target_submit_cmd_map_sgls() to take into account the
new parameters.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-18 09:58:09 +00:00
Zhi Yong Wu
59566b6e8c vhost: remove the dead branch
Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 15:22:05 -05:00
Linus Torvalds
b0e3636f65 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Things have been quiet this round with mostly bugfixes, percpu
  conversions, and other minor iscsi-target conformance testing changes.

  The highlights include:

   - Add demo_mode_discovery attribute for iscsi-target (Thomas)
   - Convert tcm_fc(FCoE) to use percpu-ida pre-allocation
   - Add send completion interrupt coalescing for ib_isert
   - Convert target-core to use percpu-refcounting for se_lun
   - Fix mutex_trylock usage bug in iscsit_increment_maxcmdsn
   - tcm_loop updates (Hannes)
   - target-core ALUA cleanups + prep for v3.14 SCSI Referrals support (Hannes)

  v3.14 is currently shaping to be a busy development cycle in target
  land, with initial support for T10 Referrals and T10 DIF currently on
  the roadmap"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
  iscsi-target: chap auth shouldn't match username with trailing garbage
  iscsi-target: fix extract_param to handle buffer length corner case
  iscsi-target: Expose default_erl as TPG attribute
  target_core_configfs: split up ALUA supported states
  target_core_alua: Make supported states configurable
  target_core_alua: Store supported ALUA states
  target_core_alua: Rename ALUA_ACCESS_STATE_OPTIMIZED
  target_core_alua: spellcheck
  target core: rename (ex,im)plict -> (ex,im)plicit
  percpu-refcount: Add percpu-refcount.o to obj-y
  iscsi-target: Do not reject non-immediate CmdSNs exceeding MaxCmdSN
  iscsi-target: Convert iscsi_session statistics to atomic_long_t
  target: Convert se_device statistics to atomic_long_t
  target: Fix delayed Task Aborted Status (TAS) handling bug
  iscsi-target: Reject unsupported multi PDU text command sequence
  ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
  iscsi-target: Fix mutex_trylock usage in iscsit_increment_maxcmdsn
  target: Core does not need blkdev.h
  target: Pass through I/O topology for block backstores
  iser-target: Avoid using FRMR for single dma entry requests
  ...
2013-11-22 10:52:03 -08:00
Nicholas Bellinger
60a01f558a vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter
This patch addresses a long-standing bug where the get_user_pages_fast()
write parameter used for setting the underlying page table entry permission
bits was incorrectly set to write=1 for data_direction=DMA_TO_DEVICE, and
passed into get_user_pages_fast() via vhost_scsi_map_iov_to_sgl().

However, this parameter is intended to signal WRITEs to pinned userspace
PTEs for the virtio-scsi DMA_FROM_DEVICE -> READ payload case, and *not*
for the virtio-scsi DMA_TO_DEVICE -> WRITE payload case.

This bug would manifest itself as random process segmentation faults on
KVM host after repeated vhost starts + stops and/or with lots of vhost
endpoints + LUNs.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Asias He <asias@redhat.com>
Cc: <stable@vger.kernel.org> # 3.6+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-25 11:03:34 -07:00
Andy Grover
d80e224dd5 target: Remove TF_CIT_TMPL macro
Remove a lingering macro that just hid a dereference.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-16 13:35:02 -07:00
Nicholas Bellinger
4a47d3a1ff vhost/scsi: Use GFP_ATOMIC with percpu_ida_alloc for obtaining tag
Fix GFP_KERNEL -> GFP_ATOMIC usage of percpu_ida_alloc() within
vhost_scsi_get_tag(), as this code is expected to be called directly
from interrupt context.

v2 changes:

  - Handle possible tag < 0 failure with GFP_ATOMIC

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Asias He <asias@redhat.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-01 21:27:31 -07:00
Michael S. Tsirkin
d3d665a654 vhost-scsi: whitespace tweak
Remove space at start of line that sneaked in.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-09-17 22:56:09 +03:00
Michael S. Tsirkin
595cb75498 vhost/scsi: use vmalloc for order-10 allocation
As vhost scsi device struct is large, if the device is
created on a busy system, kzalloc() might fail, so this patch does a
fallback to vzalloc().

As vmalloc() adds overhead on data-path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.

Reviewed-by: Asias He <asias@redhat.com>
Reported-by: Dan Aloni <alonid@stratoscale.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-09-17 22:55:46 +03:00
Qin Chuanyu
ac9fde2474 vhost: wake up worker outside spin_lock
the wake_up_process func is included by spin_lock/unlock in
vhost_work_queue,
but it could be done outside the spin_lock.
I have test it with kernel 3.0.27 and guest suse11-sp2 using iperf,
the num as below.
                  original                 modified
thread_num  tp(Gbps)   vhost(%)  |  tp(Gbps)     vhost(%)
1           9.59        28.82    |   9.59        27.49
8           9.61        32.92    |   9.62        26.77
64          9.58        46.48    |   9.55        38.99
256         9.6         63.7     |   9.6         52.59

Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-09-17 09:21:32 +03:00
Linus Torvalds
48efe453e6 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity again this round for I/O performance optimizations
  (per-cpu IDA pre-allocation for vhost + iscsi/target), and the
  addition of new fabric independent features to target-core
  (COMPARE_AND_WRITE + EXTENDED_COPY).

  The main highlights include:

   - Support for iscsi-target login multiplexing across individual
     network portals
   - Generic Per-cpu IDA logic (kent + akpm + clameter)
   - Conversion of vhost to use per-cpu IDA pre-allocation for
     descriptors, SGLs and userspace page pointer list
   - Conversion of iscsi-target + iser-target to use per-cpu IDA
     pre-allocation for descriptors
   - Add support for generic COMPARE_AND_WRITE (AtomicTestandSet)
     emulation for virtual backend drivers
   - Add support for generic EXTENDED_COPY (CopyOffload) emulation for
     virtual backend drivers.
   - Add support for fast memory registration mode to iser-target (Vu)

  The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of
  particular significance, which make us the first and only open source
  target to support the full set of VAAI primitives.

  Currently Linux clients are lacking upstream support to actually
  utilize these primitives.  However, with server side support now in
  place for folks like MKP + ZAB working on the client, this logic once
  reserved for the highest end of storage arrays, can now be run in VMs
  on their laptops"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi: Bump versions to v4.1.0
  target: Update copyright ownership/year information to 2013
  iscsi-target: Bump default TCP listen backlog to 256
  target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out
  iscsi-target; Bump default CmdSN Depth to 64
  iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set
  iscsi-target: Add thread_set->ts_activate_sem + use common deallocate
  iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE
  target: remove unused including <linux/version.h>
  iser-target: introduce fast memory registration mode (FRWR)
  iser-target: generalize rdma memory registration and cleanup
  iser-target: move rdma wr processing to a shared function
  target: Enable global EXTENDED_COPY setup/release
  target: Add Third Party Copy (3PC) bit in INQUIRY response
  target: Enable EXTENDED_COPY setup in spc_parse_cdb
  target: Add support for EXTENDED_COPY copy offload emulation
  target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
  target: Add global device list for EXTENDED_COPY
  target: Make helpers non static for EXTENDED_COPY command setup
  target: Make spc_parse_naa_6h_vendor_specific non static
  ...
2013-09-12 16:11:45 -07:00
Nicholas Bellinger
4c76251e8e target: Update copyright ownership/year information to 2013
Update copyright ownership/year information for target-core,
loopback, iscsi-target, tcm_qla2xx, vhost and iser-target.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 20:23:36 -07:00
Nicholas Bellinger
3aee26b4ae vhost/scsi: Add pre-allocation for tv_cmd SGL + upages memory
This patch adds support for pre-allocation of per tv_cmd descriptor
scatterlist + user-space page pointer memory using se_sess->sess_cmd_map
within tcm_vhost_make_nexus() code.

This includes sanity checks within vhost_scsi_map_to_sgl()
to reject I/O that exceeds these initial hardcoded values, and
the necessary cleanup in tcm_vhost_make_nexus() failure path +
tcm_vhost_drop_nexus().

v3 changes:
  - Rebase to v3.11-rc5 code

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Asias He <asias@redhat.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Reviewed-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09 14:29:19 -07:00
Nicholas Bellinger
4824d3bfb9 vhost/scsi: Convert to per-cpu ida_alloc + ida_free command map
This patch changes vhost/scsi to use transport_init_session_tags()
pre-allocation logic for per-cpu session tag pooling with internal
ida_alloc() + ida_free() calls based upon the saved se_cmd->map_tag id.

FIXME: Make transport_init_session_tags() number of tags setup
configurable per vring client setting via configfs

v5 changes:
 - Convert to percpu_ida.h include

v3 changes:
 - Update to percpu-ida usage
 - Rebase to v3.11-rc5 code

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Asias He <asias@redhat.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09 14:29:18 -07:00
Jason Wang
f7c6be404d vhost_net: correctly limit the max pending buffers
As Michael point out, We used to limit the max pending DMAs to get better cache
utilization. But it was not done correctly since it was one done when there's no
new buffers submitted from guest. Guest can easily exceeds the limitation by
keeping sending packets.

So this patch moves the check into main loop. Tests shows about 5%-10%
improvement on per cpu throughput for guest tx.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 22:46:58 -04:00
Jason Wang
19c73b3e08 vhost_net: poll vhost queue after marking DMA is done
We used to poll vhost queue before making DMA is done, this is racy if vhost
thread were waked up before marking DMA is done which can result the signal to
be missed. Fix this by always polling the vhost thread before DMA is done.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 22:46:57 -04:00
Jason Wang
ce21a02913 vhost_net: determine whether or not to use zerocopy at one time
Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
upend_idx != done_idx we still set zcopy_used to true and rollback this choice
later. This could be avoided by determining zerocopy once by checking all
conditions at one time before.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 22:46:57 -04:00
Jason Wang
c49e4e573b vhost: switch to use vhost_add_used_n()
Let vhost_add_used() to use vhost_add_used_n() to reduce the code
duplication. To avoid the overhead brought by __copy_to_user(). We will use
put_user() when one used need to be added.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 22:46:57 -04:00
Jason Wang
c92112aed3 vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()
We tend to batch the used adding and signaling in vhost_zerocopy_callback()
which may result more than 100 used buffers to be updated in
vhost_zerocopy_signal_used() in some cases. So switch to use
vhost_add_used_and_signal_n() to avoid multiple calls to
vhost_add_used_and_signal(). Which means much less times of used index
updating and memory barriers.

2% performance improvement were seen on netperf TCP_RR test.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 22:46:57 -04:00
Jason Wang
094afe7d55 vhost_net: make vhost_zerocopy_signal_used() return void
None of its caller use its return value, so let it return void.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03 22:46:57 -04:00
Asias He
35596b2796 vhost: Include linux/uio.h instead of linux/socket.h
memcpy_fromiovec is moved from net/core/iovec.c to lib/iovec.c.
linux/uio.h provides the declaration for memcpy_fromiovec.

Include linux/uio.h instead of inux/socket.h for it.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 15:05:04 -07:00
Linus Torvalds
a030cbc350 vhost: more fixes for 3.11
This includes some fixes for vhost net and scsi drivers.
 The test module has already been reworked to avoid rcu
 usage, but the necessary core changes are missing,
 we fixed this.
 Unlikely to affect any real-world users, but it's
 early in the cycle so, let's merge them.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJR3qu5AAoJECgfDbjSjVRp66UIAKZFjSnjWBhQbqI+LkrrUJ+H
 Zv4xnEgTU+5k7zLpArqhDABbWMPVUSHm91ecjS/OX9WWsXVYG2jAT+PK6F3PrexK
 BpUA3RDX8a5drXleidhwGf2dHmsBhTTwZyn31h/pvKAiM27kVjFOS0S9Rhnpu/0x
 UI/z/s//CZ/kNKbCvYVqu0+KjbrwDiRD1N56hGDLqwsVoFIn1sP9CBIg/PHCqebK
 GnPSidTFdRu4yvFvhwJqQ3RgfpCxbgBc619knWr8XX4PVoH25NdpLZ4yHrR58Ms0
 Hb2IJ5q9bOGf0hbLFKOeHO29uVyOtTGGq0UG7NnfS1aToIT/Zw2kYICV5kBlSvs=
 =XGuu
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull vhost fixes from Michael Tsirkin:
 "vhost: more fixes for 3.11

  This includes some fixes for vhost net and scsi drivers.

  The test module has already been reworked to avoid rcu usage, but the
  necessary core changes are missing, we fixed this.

  Unlikely to affect any real-world users, but it's early in the cycle
  so, let's merge them"

(It was earlier when Michael originally sent the email, but it somehot
got missed in the flood, so here it is after -rc2)

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: Remove custom vhost rcu usage
  vhost-scsi: Always access vq->private_data under vq mutex
  vhost-net: Always access vq->private_data under vq mutex
2013-07-23 14:38:20 -07:00
Linus Torvalds
6d2fa9e141 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity this round on performance improvements in target-core
  while benchmarking the prototype scsi-mq initiator code with
  vhost-scsi fabric ports, along with a number of iscsi/iser-target
  improvements and hardening fixes for exception path cases post v3.10
  merge.

  The highlights include:

   - Make persistent reservations APTPL buffer allocated on-demand, and
     drop per t10_reservation buffer.  (grover)
   - Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO
     device pages (grover)
   - Add transport_cmd_check_stop write_pending bit to avoid extra
     access of ->t_state_lock is WRITE I/O submission fast-path.  (nab)
   - Drop unnecessary CMD_T_DEV_ACTIVE check from
     transport_lun_remove_cmd to avoid extra access of ->t_state_lock in
     release fast-path.  (nab)
   - Avoid extra t_state_lock access in __target_execute_cmd fast-path
     (nab)
   - Drop unnecessary vhost-scsi wait_for_tasks=true usage +
     ->t_state_lock access in release fast-path.  (nab)
   - Convert vhost-scsi to use modern se_cmd->cmd_kref
     TARGET_SCF_ACK_KREF usage (nab)
   - Add tracepoints for SCSI commands being processed (roland)
   - Refactoring of iscsi-target handling of ISCSI_OP_NOOP +
     ISCSI_OP_TEXT to be transport independent (nab)
   - Add iscsi-target SendTargets=$IQN support for in-band discovery
     (nab)
   - Add iser-target support for in-band discovery (nab + Or)
   - Add iscsi-target demo-mode TPG authentication context support (nab)
   - Fix isert_put_reject payload buffer post (nab)
   - Fix iscsit_add_reject* usage for iser (nab)
   - Fix iscsit_sequence_cmd reject handling for iser (nab)
   - Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab)
   - Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab)

  The last five iscsi/iser-target items are CC'ed to stable, as they do
  address issues present in v3.10 code.  They are certainly larger than
  I'd like for stable patch set, but are important to ensure proper
  REJECT exception handling in iser-target for 3.10.y"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
  target: make queue_tm_rsp() return void
  target: remove unused codes from enum tcm_tmrsp_table
  iscsi-target: kstrtou* configfs attribute parameter cleanups
  iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow
  iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow
  iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
  iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
  iser-target: Add vendor_err debug output
  target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10)
  target: Return correct sense data for IO past the end of a device
  target: Add tracepoints for SCSI commands being processed
  iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
  iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
  iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
  iscsi-target: Fix iscsit_add_reject* usage for iser
  iser-target: Fix isert_put_reject payload buffer post
  iscsi-target: missing kfree() on error path
  iscsi-target: Drop left-over iscsi_conn->bad_hdr
  target: Make core_scsi3_update_and_write_aptpl return sense_reason_t
  ...
2013-07-11 12:57:19 -07:00
Asias He
22fa90c7fb vhost: Remove custom vhost rcu usage
Now, vq->private_data is always accessed under vq mutex. No need to play
the vhost rcu trick.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-11 15:38:40 +03:00
Asias He
e7802212ea vhost-scsi: Always access vq->private_data under vq mutex
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-11 15:37:28 +03:00
Asias He
2e26af79b7 vhost-net: Always access vq->private_data under vq mutex
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-11 15:36:45 +03:00
Joern Engel
b79fafac70 target: make queue_tm_rsp() return void
The return value wasn't checked by any of the callers.  Assuming this is
correct behaviour, we can simplify some code by not bothering to
generate it.

nab: Add srpt_queue_data_in() + srpt_queue_tm_rsp() nops around
     srpt_queue_response() void return

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:53 -07:00
Michael S. Tsirkin
09a34c8404 vhost/test: update test after vhost cleanups
We changed the type of dev.vqs field, update test
module accordingly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 18:02:25 +03:00
Asias He
0a1febf7ba vhost: Make local function static
$ make C=1 M=drivers/vhost

drivers/vhost/net.c:168:5: warning: symbol 'vhost_net_set_ubuf_info' was not declared. Should it be static?
drivers/vhost/net.c:194:6: warning: symbol 'vhost_net_vq_reset' was not declared. Should it be static?
drivers/vhost/scsi.c:219:6: warning: symbol 'tcm_vhost_done_inflight' was not declared. Should it be static?

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 17:34:07 +03:00
Asias He
6ac1afbf61 vhost: Make vhost a separate module
Currently, vhost-net and vhost-scsi are sharing the vhost core code.
However, vhost-scsi shares the code by including the vhost.c file
directly.

Making vhost a separate module makes it is easier to share code with
other vhost devices.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 17:33:44 +03:00
Asias He
3c63f66a0d vhost-scsi: Rename struct tcm_vhost_cmd *tv_cmd to *cmd
This way, we use cmd for struct tcm_vhost_cmd and evt for struct
tcm_vhost_cmd.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 14:38:58 +03:00
Asias He
9871831283 vhost-scsi: Rename struct tcm_vhost_tpg *tv_tpg to *tpg
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 14:38:57 +03:00
Asias He
683bd967dd vhost-scsi: Make func indention more consistent
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 14:38:55 +03:00
Asias He
c7289312fe vhost-scsi: Rename struct vhost_scsi *s to *vs
vs is used everywhere, make the naming more consistent.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 14:38:53 +03:00
Asias He
deeacef086 vhost-scsi: Remove unnecessary forward struct vhost_scsi declaration
It was needed when struct tcm_vhost_tpg is in tcm_vhost.h

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 14:38:50 +03:00
Asias He
6d5e6aa860 vhost: Simplify dev->vqs[i] access
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 14:38:26 +03:00
Michael S. Tsirkin
c38e39c378 vhost-net: fix use-after-free in vhost_net_flush
vhost_net_ubuf_put_and_wait has a confusing name:
it will actually also free it's argument.
Thus since commit 1280c27f8e
    "vhost-net: flush outstanding DMAs on memory change"
vhost_net_flush tries to use the argument after passing it
to vhost_net_ubuf_put_and_wait, this results
in use after free.
To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
add an new API for callers that want to free ubufs.

Acked-by: Asias He <asias@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2013-07-07 14:06:22 +03:00
Nicholas Bellinger
084ed45b38 vhost/scsi: Convert to se_cmd->cmd_kref TARGET_SCF_ACK_KREF usage
This patch coverts vhost/scsi to se_cmd->cmd_kref TARGET_SCF_ACK_KREF
usage, instead of assuming that vhost_scsi_free_cmd() is always called
before TCM processing is completed in the response fast path.

This includes adding vhost_scsi_check_stop_free() -> target_put_sess_cmd()
to perform the second se_cmd->cmd_kref put, and moving vhost_scsi_free_cmd()
resource release into tcm_vhost_release_cmd() that is invoked once the last
se_cmd->cmd_kref put occurs.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@kernel.org>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Asias He <asias@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Moussa Ba <moussaba@micron.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-06-20 14:10:47 -07:00
Nicholas Bellinger
6c131d0c58 vhost/scsi: Drop unnecessary wait_for_tasks=true usage with transport_generic_free_cmd
This patch changes vhost_scsi_free_cmd() to call transport_generic_free_cmd()
with wait_for_tasks=false in order to avoid the extra se_cmd->t_state_lock
access for the wait_for_tasks=true case.

This is unnecessary because vhost_scsi_free_cmd() is only ever called by
vhost_scsi_complete_cmd_work() after TCM completion handoff, and by
vhost_scsi_handle_vq() exception code before TCM submission handoff, so
there is never a case where se_cmd is still active from TCM's perspective
when transport_generic_free_cmd() is called.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@kernel.org>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Asias He <asias@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Moussa Ba <moussaba@micron.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-06-20 14:10:46 -07:00
Michael S. Tsirkin
288cfe78c8 vhost: fix ubuf_info cleanup
vhost_net_clear_ubuf_info didn't clear ubuf_info
after kfree, this could trigger double free.
Fix this and simplify this code to make it more robust: make sure
ubuf info is always freed through vhost_net_clear_ubuf_info.

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:46:21 -07:00
Michael S. Tsirkin
05c0535194 vhost: check owner before we overwrite ubuf_info
If device has an owner, we shouldn't touch ubuf_info
since it might be in use.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:46:21 -07:00
Jason Wang
4364d5f96e vhost_net: clear msg.control for non-zerocopy case during tx
When we decide not use zero-copy, msg.control should be set to NULL otherwise
macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs
wrongly.

Bug were introduced by commit cedb9bdce0
(vhost-net: skip head management if no outstanding).

This solves the following warnings:

WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
ffffffffa0198323 ffff88007c9ebd08 ffffffff81796b73 ffff88007c9ebd48
ffffffff8103d66b 000000007b773e20 ffff8800779f0000 ffff8800779f43f0
ffff8800779f8418 000000000000015c 0000000000000062 ffff88007c9ebd58
Call Trace:
[<ffffffff81796b73>] dump_stack+0x19/0x1e
[<ffffffff8103d66b>] warn_slowpath_common+0x6b/0xa0
[<ffffffff8103d6b5>] warn_slowpath_null+0x15/0x20
[<ffffffffa0197627>] handle_tx+0x477/0x4b0 [vhost_net]
[<ffffffffa0197690>] handle_tx_kick+0x10/0x20 [vhost_net]
[<ffffffffa019541e>] vhost_worker+0xfe/0x1a0 [vhost_net]
[<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[<ffffffff81061f46>] kthread+0xc6/0xd0
[<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff817a1aec>] ret_from_fork+0x7c/0xb0
[<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-10 14:31:45 -07:00
Dave Jones
f558a845c3 Add missing module license tag to vring helpers.
[  624.286653] vringh: module license 'unspecified' taints kernel.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-05-08 10:49:03 +09:30