Older versions (anything but the latest) of Linux usbfs + libusb(x),
will submit larger (bulk) transfers split into multiple 16k submissions,
which means that rather then all tds getting linked into the queue in
one atomic operarion they get linked in a bunch at a time, which could
cause problems if:
1) We scan the queue while libusb is in the middle of submitting a split
bulk transfer
2) While this bulk transfer is pending we migrate to another host.
The problem is that after 2, the new host will rescan the queue and
combine the packets in one large transfer, where as 1) has caused the
original host to see them as 2 transfers. This patch fixes this by stopping
combinging if we detect a 16k transfer with its int_req flag set.
This should not adversely effect performance for other cases as:
1) Linux never sets the interrupt flag on packets other then the last
2) Windows does set the in_req flag on each td, but will submit large
transfers in 20k tds thus never triggering the check
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Currently we only do pipelining for output endpoints, since to properly
support short-not-ok semantics we can only have one outstanding input
packet. Since the ehci and uhci controllers have a limited per td packet
size guests will split large input transfers to into multiple packets,
and since we don't pipeline these, this comes with a serious performance
penalty.
This patch adds helper functions to (re-)combine packets which belong to 1
transfer at the guest device-driver level into 1 large transger. This can be
used by (redirection) usb-devices to enable pipelining for input endpoints.
This patch will combine packets together until a transfer terminating packet
is encountered. A terminating packet is a packet which meets one or more of
the following conditions:
1) The packet size is *not* a multiple of the endpoint max packet size
2) The packet does *not* have its short-not-ok flag set
3) The packet has its interrupt-on-complete flag set
The short-not-ok flag of the combined packet is that of the terminating packet.
Multiple combined packets may be submitted to the device, if the combined
packets do not have their short-not-ok flag set, enabling true pipelining.
If a combined packet does have its short-not-ok flag set the queue will
wait with submitting further packets to the device until that packet has
completed.
Once enabled in the usb-redir and ehci code, this improves the speed (MB/s)
of a Linux guest reading from a USB mass storage device by a factor of
1.2 - 1.5.
And the main reason why I started working on this, when reading from a pl2303
USB<->serial converter, it combines the previous 4 packets submitted per
device-driver level read into 1 big read, reducing the number of packets / sec
by a factor 4, and it allows to have multiple reads outstanding. This allows
for much better latency tolerance without the pl2303's internal buffer
overflowing (which was happening at 115200 bps, without serial flow control).
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
My recent uhci cleanup series has introduced a regression, where
qemu sometimes crashes on a device disconnect. The problem is that
the uhci code never checked for a device not / no longer existing, instead
it was relying on usb_handle_packet accepting a NULL device.
But since we now pass usb_handle_packet q->ep->dev, rather then just
a local dev variable, we crash as q->ep == NULL due to the device no longer
existing.
This patch fixes this. Note that this patch also improves over
the old behavior were we would:
1) create a queue for the device
2) create an async for the packet
3) have usb_handle_packet fail
4) destroy the async
5) wait for the queue to be idle for 32 frames
6) destroy the queue
Which was rather sub-optimal.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Kills the ugly "switch (device_id) { ... }" struct and makes it easier
to figure what the differences between the uhci variants are.
Need our own DeviceClass struct for that so we can allocate some space
to store UHCIInfo.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Guard against re-definition of EHCI_DEBUG. Allows for turning on of debug info
from configure (using --qemu-extra-cflags="-DEHCI_DEBUG=1") rather than source
code hacking.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Seperate the PCI stuff from the EHCI components. Extracted the PCIDevice
out into a new wrapper struct to make EHCIState non-PCI-specific. Seperated
tho non PCI init component out into a seperate "common" init function.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Pull the DMAContext for the PCI DMA out at device init time and put it into
the device state. Use dma_memory_read/write() instead of pci specific versions.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The capabilities register and operational register offsets can vary from one
EHCI implementation to the next. Parameterise accordingly.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Factor out the code which checks whenever a usb device is attached
to the port in question. No functional change.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Rename the function for xhci_port_* naming scheme, also drop
the xhci parameter as port carries a pointer to xhci anyway.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add {get,set}_field macros (simliar to ehci) to read and update
some bits of a word. Put them into use for updating pls (port
link state) values. Also add a enum for pls values.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
If the guest uses a TLBWI instruction for upgrading permissions, we
don't need to flush the extra TLBs. This improve boot time performance
by about 10%.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use the deposit op instead of and hardcoded bit field insertion. It
allows the host to emit the corresponding instruction if available.
At the same time remove the (lsb > msb) test. The MIPS64R2 instruction
set manual says "Because of the instruction format, lsb can never be
greater than msb, so there is no UNPREDICATABLE case for this
instruction."
(Bug reported as LP:1071149.)
Cc: Никита Канунников <n.kanunnikov@sbtcom.ru>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The result of a division by 0, or a division of INT_MIN by -1 in the
signed case, is unpredictable. Just replace 0 by 1 in that case so that
it doesn't trigger a floating point exception on the host.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Avoid the branches in movn/movz implementation and replace them with
movcond. Also update a wrong command.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Store conditional operations only need local temps in user mode. Fix
the code to use temp local only in user mode, this spares two memory
stores in system mode.
At the same time remove a wrong a wrong copied & pasted comment,
store operations don't have a register destination.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Load/store from helpers should be avoided as they are quite
inefficient. Rewrite unaligned loads instructions using TCG and
aligned loads. The number of actual loads operations to implement
an unaligned load instruction is reduced from up to 8 to 1.
Note: As we can't rely on shift by 32 or 64 undefined behaviour,
the code loads already shift by one constants.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
load/store microMIPS helpers are reinventing the wheel. Call do_lw,
do_ll, do_sw and do_sl instead of using a macro calling the cpu_*
load/store functions.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Load/store operations use macros for historical reasons. Now that there
is no point in keeping them, replace them by direct calls to qemu_ld/st.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Rework *raise_exception*() functions so that they can be called from
other helpers, passing the return address as an argument.
Use do_raise_exception() function in update_fcr31() to correctly restore
the CPU state after an FPU exception.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
softfloat already has a few constants defined, use them instead of
redefining them in target-mips.
Rename FLOAT_SNAN32 and FLOAT_SNAN64 to FP_TO_INT32_OVERFLOW and
FP_TO_INT64_OVERFLOW as even if they have the same value, they are
technically different (and defined differently in the MIPS ISA).
Remove the unused constants.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Instead of accessing the flags from the floating point control
register after updating it, read the softfloat flags.
This is just code cleanup and should not change the behaviour.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
For each FPU instruction that can trigger an FPU exception, to call
call update_fcr31() after.
Remove the manual NaN assignment in case of float to float operation, as
softfloat is already taking care of that. However for float to int
operation, the value has to be changed to the MIPS one. In the cvtpw_ps
case, the two registers have to be handled separately to guarantee
a correct final value in both registers.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Instead of clearing the softfloat exception flags before each floating
point instruction, reset them to 0 in update_fcr31() when an exception
is detected.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use the new softfloat floatXX_muladd() functions to implement the madd,
msub, nmadd and nmsub instructions. At the same time replace the name of
the helpers by the name of the instruction, as the only reason for the
previous names was to keep the macros simple.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add a pickNaNMulAdd function for MIPS, implementing NaN propagation
rules for MIPS fused multiply-add instructions.
Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When the CPU state after a possible retranslation is going to be handled
through code retranslation, we don't need to save the CPU state before.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When the CPU state is restored through retranslation after an exception,
btarget should also be restored.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit 9c43b68de6 do not correctly check
for dead outputs when they need to be synced to memory in case of
half-dead operations.
Fix that by applying the same pattern than for the default case.
Tested-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Mark helper functions that raise exceptions, but otherwise do not
change TCG register state, with TCG_CALL_NO_WG.
Signed-off-by: Richard Henderson <rth@twiddle.net>
As the block layer may decide to flush bottom-halfs while the machine is
still initializing (e.g. to read geometry data from the disk), our
postponed open event may be processed before the last frontend
registered with a muxed chardev.
Until the semantics of BHs have been clarified, use an expired timer to
achieve the same effect (suggested by Paolo Bonzini). This requires to
perform the alarm timer initialization earlier as otherwise timer
subsystem can be used before being ready.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>