This impacts both BMDMA and AHCI HBA interfaces for IDE.
Currently, we confuse the difference between a PRDT having
"0 bytes" and a PRDT having "0 complete sectors."
When we receive an incomplete sector, inconsistent error checking
leads to an infinite loop wherein the call succeeds, but it
didn't give us enough bytes -- leading us to re-call the
DMA chain over and over again. This leads to, in the BMDMA case,
leaked memory for short PRDTs, and infinite loops and resource
usage in the AHCI case.
The .prepare_buf() callback is reworked to return the number of
bytes that it successfully prepared. 0 is a valid, non-error
answer that means the table was empty and described no bytes.
-1 indicates an error.
Our current implementation uses the io_buffer in IDEState to
ultimately describe the size of a prepared scatter-gather list.
Even though the AHCI PRDT/SGList can be as large as 256GiB, the
AHCI command header limits transactions to just 4GiB. ATA8-ACS3,
however, defines the largest transaction to be an LBA48 command
that transfers 65,536 sectors. With a 512 byte sector size, this
is just 32MiB.
Since our current state structures use the int type to describe
the size of the buffer, and this state is migrated as int32, we
are limited to describing 2GiB buffer sizes unless we change the
migration protocol.
For this reason, this patch begins to unify the assertions in the
IDE pathways that the scatter-gather list provided by either the
AHCI PRDT or the PCI BMDMA PRDs can only describe, at a maximum,
2GiB. This should be resilient enough unless we need a sector
size that exceeds 32KiB.
Further, the likelihood of any guest operating system actually
attempting to transfer this much data in a single operation is
very slim.
To this end, the IDEState variables have been updated to more
explicitly clarify our maximum supported size. Callers to the
prepare_buf callback have been reworked to understand the new
return code, and all versions of the prepare_buf callback have
been adjusted accordingly.
Lastly, the ahci_populate_sglist helper, relied upon by the
AHCI implementation of .prepare_buf() as well as the PCI
implementation of the callback have had overflow assertions
added to help make clear the reasonings behind the various
type changes.
[Added %d -> %"PRId64" fix John sent because off_pos changed from int to
int64_t.
--Stefan]
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1414785819-26209-4-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Device models should access their block backends only through the
block-backend.h API. Convert them, and drop direct includes of
inappropriate headers.
Just four uses of BlockDriverState are left:
* The Xen paravirtual block device backend (xen_disk.c) opens images
itself when set up via xenbus, bypassing blockdev.c. I figure it
should go through qmp_blockdev_add() instead.
* Device model "usb-storage" prompts for keys. No other device model
does, and this one probably shouldn't do it, either.
* ide_issue_trim_cb() uses bdrv_aio_discard() instead of
blk_aio_discard() because it fishes its backend out of a BlockAIOCB,
which has only the BlockDriverState.
* PC87312State has an unused BlockDriverState[] member.
The next two commits take care of the latter two.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I'll use it with block backends shortly, and the name is going to fit
badly there. It's a block layer thing anyway, not just a block driver
thing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
They are not used by AHCI, and should not be even available there.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is now called only after the set_inactive callback. Put the two together.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Drop the unused return value and make the callback optional.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Drop the unused return value and make the callback optional.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Drop the unused return value and make the callback optional.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
After previous Peter patch, they are redundant. This way we don't
assign them except when needed. Once there, there were lots of case
where the ".fields" indentation was wrong:
.fields = (VMStateField []) {
and
.fields = (VMStateField []) {
Change all the combinations to:
.fields = (VMStateField[]){
The biggest problem (appart from aesthetics) was that checkpatch complained
when we copy&pasted the code from one place to another.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
This fixes a warning from the static code analysis (smatch).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The device is supposed to reset the Bus Master IDE Active bit in the
status register when 0 is written to the Start/Stop Bus Master bit in
the command register.
In the common cases this happens automatically because bdrv_drain_all()
flushes the requests, but with a large PRDT it could remain set.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Bus Master IDE Active bit (BM_STATUS_DMAING) is not only set when
the request is still in flight, but also when it has completed and the
size of the physical memory regions in the PRDT was larger than the
transfer size.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
reserved) and its purpose doesn't match the name (most target_phys_addr_t
addresses are not target specific). Replace it with a finger-friendly,
standards conformant hwaddr.
Outstanding patchsets can be fixed up with the command
git rebase -i --exec 'find -name "*.[ch]"
| xargs s/target_phys_addr_t/hwaddr/g' origin
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Do this while we are touching this part of the code, before introducing
more uses of "int is_read".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Back when I made patches introducing dma_addr_t and various PCI DMA
wrapper functions, I made a mistake. The bmdma_addr_{read,write} functions
need to take target_phys_addr_t not dma_addr_t, since they are assigned
to MemoryRegionOps callbacks.
This patch corrects my error.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch removes some unnecessary casts in the PCI IDE device,
introduced by commit 552908fef5
'PCI IDE: Use PCI DMA stub functions'.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O. Most of these places actually want to drain all block
requests but there is no block layer API to do so.
This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete. As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This updates the PCI IDE device emulation to use the explicit PCI DMA
wrapper to initialize its scatter/gathjer structure. This means this
driver should not need further changes when the sglist interface is
extended to support IOMMUs.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Today, when notifying a VM state change with vm_state_notify(),
we pass a VMSTOP macro as the 'reason' argument. This is not ideal
because the VMSTOP macros tell why qemu stopped and not exactly
what the current VM state is.
One example to demonstrate this problem is that vm_start() calls
vm_state_notify() with reason=0, which turns out to be VMSTOP_USER.
This commit fixes that by replacing the VMSTOP macros with a proper
state type called RunState.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Including it in device models is unclean, including it without a
reason adds insult to injury.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Clearing the error status flag was missing for restarting flushes. Now that the
error status is separate from the BM status register, we can simply set it to 0
after restarting the request. This ensures that we never forget to clear a bit.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add support for TRIM sub function of the data set management command,
and wire it up to the qemu discard infrastructure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Replace the is_read flag with a dma_cmd flag to allow the dma and
restart logic to handler other commands like TRIM.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When adding the werror=stop mode, some flags were added to s->status
which are used to determine what kind of operation should be restarted
when the VM is continued.
Unfortunately, it turns out that s->status is in fact a device register
and as such is visible to the guest (some of the abused bits are even
writable for the guest).
For migration we keep on using the old VMState field (renamed to
migration_compat_status) if the status register doesn't use any of the
previously abused bits. If it does, we use a subsection with a clean copy of
the status register.
The error status is always sent in a subsection if there is any error. It can't
use the old field because errors happen even without PCI.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These printfs aren't really debug messages, but clearly indicate a bug if they
ever become effective. Noone uses DEBUG_IDE, let's re-enable the check
unconditionally and make it an assertion instead of printfs in the device
emulation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
With bm == NULL, other code in the same function would crash.
This bug was reported by cppcheck:
hw/ide/pci.c:280: error: Possible null pointer dereference: bm
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Factor the DMA I/O path that is duplicated between read and write
commands, into common helpers using the s->is_read flag added for
the macio ATA controller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The ATA core is currently heavily intertwined with BMDMA code. Let's loosen
that a bit, so we can happily replace the DMA backend with different
implementations.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Whenever SSBM is reset in the command register all state information is lost.
Restarting DMA means that current_addr must be reset to the base address of the
PRD table. The OS is not required to change the base address register before
starting a DMA operation, it can reuse the value it wrote for an earlier
request.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
You can only start a DMA transfer if it's not running yet, and you can only
cancel it if it's running.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
The reason for not actually canceling the I/O is because with
virtualization and lots of VM running, a guest fs may mistake a
overload of the host, as an IDE timeout. So rather than canceling the
I/O, it's safer to wait I/O completion and simulate that the I/O has
completed just before the io cancellation was requested by the
guest. This way if ntfs or an app writes data without checking for
-EIO retval, and it thinks the write has succeeded, it's less likely
to run into troubles. Similar issues for reads.
Furthermore because the DMA operation is splitted into many synchronous
aio_read/write if there's more than one entry in the SG table, without this
patch the DMA would be cancelled in the middle, something we've no idea if it
happens on real hardware too or not. Overall this seems a great risk for zero
gain.
This approach is sure safer than previous code given we can't pretend all guest
fs code out there to check for errors and reply the DMA if it was completed
partially, given a timeout would never materialize on a real harddisk unless
there are defective blocks (and defective blocks are practically only an issue
for reads never for writes in any recent hardware as writing to blocks is the
way to fix them) or the harddisk breaks as a whole.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It reintroduces
Revert "ide save/restore pio/atapi cmd transfer fields and io buffer"
but using subsections. Added bonus is the addition of ide_dummy_transfer_stop
to transfer_end_table, that was missing.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If migration takes place between write of the bmdma address register and
write of the command register (to initiate DMA), the destination will
not properly start the DMA op, hanging the guest:
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: cmd c8/00:16:41:00:00/00:00:00:00:00/e0 tag 0 dma 11264 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
Fix by sending current transfer information in the migration data.
We need to update ide version to 4 for this to work. As we don't
have subsectios, we need to chain the update increase until
vmstate_ide_pci (quintela)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch splits cmd646 specific code from pci.c.
This patch splits piix4 specific code from pci.c.
And compile new piix.o and cmd646.o when they are needed.
The only change that is not code movemet is removal of cmd646 specific parts
in bmdma_readb/writeb for piix.
Patchworks-ID: 35301
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We already have a PCIDevice at that point
Patchworks-ID: 35296
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>