Commit Graph

76 Commits

Author SHA1 Message Date
Jason Wang
1001cf45a7 e1000: indicate dropped packets in HW counters
The e1000 emulation silently discards RX packets if there's
insufficient space in the ring buffer. This leads to errors
on higher-level protocols in the guest, with no indication
about the error cause.

This patch increments the "Missed Packets Count" (MPC) and
"Receive No Buffers Count" (RNBC) HW counters in this case.
As the emulation has no FIFO for buffering packets that can't
immediately be pushed to the guest, these two registers are
practically equivalent (see 10.2.7.4, 10.2.7.33 in
https://www.intel.com/content/www/us/en/embedded/products/networking/82574l-gbe-controller-datasheet.html).

On a Linux guest, the register content  will be reflected in
the "rx_missed_errors" and "rx_no_buffer_count" stats from
"ethtool -S", and in the "missed" stat from "ip -s -s link show",
giving at least some hint about the error cause inside the guest.

If the cause is known, problems like this can often be avoided
easily, by increasing the number of RX descriptors in the guest
e1000 driver (e.g under Linux, "e1000.RxDescriptors=1024").

The patch also adds a qemu trace message for this condition.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19 11:15:04 +08:00
Dr. David Alan Gilbert
ff214d427e e1000: Choose which set of props to migrate
When we're using the subsection we migrate both
the 'props' and 'tso_props' data; when we're not using
the subsection (to migrate to 2.11 or old machine types) we've
got to choose what to migrate in the main structure.

If we're using the subsection migrate 'props' in the main structure.
If we're not using the subsection then migrate the last one
that changed, which gives behaviour similar to the old behaviour.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-04-10 11:30:03 +08:00
Dr. David Alan Gilbert
5935448478 e1000: Migrate props via a temporary structure
Swing the tx.props out via a temporary structure, so in future patches
we can select what we're going to send.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-04-10 11:30:03 +08:00
Dr. David Alan Gilbert
46f2a9ec54 e1000: wire new subsection to property
Wire the new subsection from the previous commit to a property
so we can turn it off easily.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-04-10 11:30:03 +08:00
Dr. David Alan Gilbert
3c4053c52c e1000: Dupe offload data on reading old stream
Old QEMUs only had one set of offload data;  when we only receive
one lot, dupe the received data - that should give us about the
same bug level as the old version.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-04-10 11:30:03 +08:00
Dr. David Alan Gilbert
4ae4bf5bb1 e1000: Convert v3 fields to subsection
A bunch of new TSO fields were introduced by d62644b4 and this bumped
the VMState version; however it's easier for those trying to keep
backwards migration compatibility if these fields are added in a
subsection instead.

Move the new fields to a subsection.

Since this was added after 2.11, this change will only affect
compatbility with 2.12-rc0.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-04-10 11:29:35 +08:00
Thomas Huth
b20219b645 hw/net: Remove unnecessary header includes
Headers like "hw/loader.h" and "qemu/sockets.h" are not needed in
the hw/net/*.c files. And Some other headers are included via other
headers already, so we can drop them, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-03-05 10:30:16 +08:00
Ed Swierk via Qemu-devel
d62644b46a e1000: Separate TSO and non-TSO contexts, fixing UDP TX corruption
The device is supposed to maintain two distinct contexts for transmit
offloads: one has parameters for both segmentation and checksum
offload, the other only for checksum offload. The guest driver can
send two context descriptors, one for each context (the TSE flag
specifies which). Then the guest can refer to one or the other context
in subsequent transmit data descriptors, depending on what offloads it
wants applied to each packet.

Currently the e1000 device stores just one context, and misinterprets
the TSE flags in the context and data descriptors. This is often okay:
Linux happens to send a fresh context descriptor before every data
descriptor, so forgetting the other context doesn't matter. Windows
does rely on separate contexts for TSO vs. non-TSO packets, but for
mostly-TCP traffic the two contexts have identical TCP-specific
offload parameters so confusing them doesn't matter.

One case where this confusion matters is when a Windows guest sets up
a TSO context for TCP and a non-TSO context for UDP, and then
transmits both TCP and UDP traffic in parallel. The e1000 device
sometimes ends up using TCP-specific parameters while doing checksum
offload on a UDP datagram: it writes the checksum to offset 16 (the
correct location for a TCP checksum), stomping on two bytes of UDP
data, and leaving the wrong value in the actual UDP checksum field at
offset 6. (Even worse, the host network stack may then recompute the
UDP checksum, "correcting" it to match the corrupt data before sending
it out a physical interface.)

Correct this by tracking the TSO context independently of the non-TSO
context, and selecting the appropriate context based on the TSE flag
in each transmit data descriptor.

Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2017-12-22 09:53:50 +08:00
Ed Swierk via Qemu-devel
7d08c73e7b e1000, e1000e: Move per-packet TX offload flags out of context state
sum_needed and cptse flags are received from the guest within each
transmit data descriptor. They are not part of the offload context;
instead, they determine how to apply a previously received context to
the packet being transmitted:

- If cptse is set, perform both segmentation and checksum offload
  using the parameters in the TSO context; otherwise just do checksum
  offload. (Currently the e1000 device incorrectly stores only one
  context, which will be fixed in a subsequent patch.)

- Depending on the bits set in sum_needed, possibly perform L4
  checksum offload and/or IP checksum offload, using the parameters in
  the appropriate context.

Move these flags out of struct e1000x_txd_props, which is otherwise
dedicated to storing values from a context descriptor, and into the
per-packet TX struct.

Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2017-12-22 09:53:23 +08:00
Ed Swierk
0dacea92d2 net: Transmit zero UDP checksum as 0xFFFF
The checksum algorithm used by IPv4, TCP and UDP allows a zero value
to be represented by either 0x0000 and 0xFFFF. But per RFC 768, a zero
UDP checksum must be transmitted as 0xFFFF because 0x0000 is a special
value meaning no checksum.

Substitute 0xFFFF whenever a checksum is computed as zero when
modifying a UDP datagram header. Doing this on IPv4 and TCP checksums
is unnecessary but legal. Add a wrapper for net_checksum_finish() that
makes the substitution.

(We can't just change net_checksum_finish(), as that function is also
used by receivers to verify checksums, and in that case the expected
value is always 0x0000.)

Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2017-11-20 11:08:00 +08:00
Eduardo Habkost
fd3b02c889 pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devices
Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of
TYPE_PCI_DEVICE, except:

1) The ones that already have INTERFACE_PCIE_DEVICE set:

* base-xhci
* e1000e
* nvme
* pvscsi
* vfio-pci
* virtio-pci
* vmxnet3

2) base-pci-bridge

Not all PCI bridges are Conventional PCI devices, so
INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes
that are actually Conventional PCI:

* dec-21154-p2p-bridge
* i82801b11-bridge
* pbm-bridge
* pci-bridge

The direct subtypes of base-pci-bridge not touched by this patch
are:

* xilinx-pcie-root: Already marked as PCIe-only.
* pcie-pci-bridge: Already marked as PCIe-only.
* pcie-port: all non-abstract subtypes of pcie-port are already
  marked as PCIe-only devices.

3) megasas-base

Not all megasas devices are Conventional PCI devices, so the
interface names are added to the subclasses registered by
megasas_register_types(), according to information in the
megasas_devices[] array.

"megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add
INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas".

Acked-by: Alberto Garcia <berto@igalia.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-15 05:54:43 +03:00
Dr. David Alan Gilbert
44b1ff319c migration: pre_save return int
Modify the pre_save method on VMStateDescription to return an int
rather than void so that it potentially can fail.

Changed zillions of devices to make them return 0; the only
case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already
had an error_report/return case.

Note: If you add an error exit in your pre_save you must emit
an error_report to say why.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-09-27 11:35:59 +01:00
Kamil Rytarowski
757704f1b7 e1000: Rename the SEC symbol to SEQEC
SunOS defines SEC in <sys/time.h> as 1 (commonly used time symbols).

This fixes build on SmartOS (Joyent).

Patch cherry-picked from pkgsrc by jperkin (Joyent).

Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2017-09-08 08:17:37 +08:00
Jason Wang
b4053c6483 e1000: disable debug by default
Disable debug output by default, the information were not needed for
release.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Leonid Bloch <leonid.bloch@ravellosystems.com>
Cc: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2017-03-31 08:48:13 +08:00
Eric Blake
f394b2e20d qapi: Change Netdev into a flat union
This is a mostly-mechanical conversion that creates a new flat
union 'Netdev' QAPI type that covers all the branches of the
former 'NetClientOptions' simple union, where the branches are
now listed in a new 'NetClientDriver' enum rather than generated
from the simple union.  The existence of a flat union has no
change to the command line syntax accepted for new code, and
will make it possible for a future patch to switch the QMP
command to parse a boxed union for no change to valid QMP; but
it does have some ripple effect on the C code when dealing with
the new types.

While making the conversion, note that the 'NetLegacy' type
remains unchanged: it applies only to legacy command line options,
and will not be ported to QMP, so it should remain a wrapper
around a simple union; to avoid confusion, the type named
'NetClientOptions' is now gone, and we introduce 'NetLegacyOptions'
in its place.  Then, in the C code, we convert from NetLegacy to
Netdev as soon as possible, so that the bulk of the net stack
only has to deal with one QAPI type, not two.  Note that since
the old legacy code always rejected 'hubport', we can just omit
that branch from the new 'NetLegacyOptions' simple union.

Based on an idea originally by Zoltán Kővágó <DirtY.iCE.hu@gmail.com>:
Message-Id: <01a527fbf1a5de880091f98cf011616a78adeeee.1441627176.git.DirtY.iCE.hu@gmail.com>
although the sed script in that patch no longer applies due to
other changes in the tree since then, and I also did some manual
cleanups (such as fixing whitespace to keep checkpatch happy).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1468468228-27827-13-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Fixup from Eric squashed in]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-19 20:18:02 +02:00
Peter Maydell
14e60aaece hw/net/e1000: Don't use *_to_cpup()
Don't use *_to_cpup() to do byte-swapped loads; instead use
ld*_p() which correctly handle misaligned accesses.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Dmitry Fleytman <dmitry@daynix.com <mailto:dmitry@daynix.com>>
Message-id: 1466097446-981-6-git-send-email-peter.maydell@linaro.org
2016-06-27 16:39:56 +01:00
Sameeh Jubran
b92233b329 e1000: Removing unnecessary if statement
Since mit_delay can never be 0 this if statement is
superfluous.

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:23 +03:00
Dmitry Fleytman
093454e21d e1000: Move out code that will be reused in e1000e
Code that will be shared moved to a separate files.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Sameeh Jubran
8e0f7dd251 Revert "e1000: fix hang of win2k12 shutdown with flood ping"
This reverts commit 9596ef7c7b.

This workaround in order to fix endless interrupts is no
longer needed because it was superseded by the previous patch
(e1000: Fixing interrupt pace).

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:42 +08:00
Sameeh Jubran
74004e8ce4 e1000: Fixing interrupts pace.
This patch introduces an upper bound for number of interrupts
per second. Without this bound an interrupt storm can occur as
it has been observed on Windows 10 when disabling the device.

According to the SPEC - Intel PCI/PCI-X Family of Gigabit
Ethernet Controllers Software Developer's Manual, section
13.4.18 - the Ethernet controller guarantees a maximum
observable interrupt rate of 7813 interrupts/sec. If there is
no upper bound this could lead to an interrupt storm by e1000
(when mit_delay < 500) causing interrupts to fire at a very high
pace.
Thus if mit_delay < 500 then the delay should be set to the
minimum delay possible which is 500. This can be calculated
easily as follows:

Interval = 10^9 / (7813 * 256) = 500.

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:36 +08:00
Laszlo Ersek
dd793a7488 e1000: eliminate infinite loops on out-of-bounds transfer start
The start_xmit() and e1000_receive_iov() functions implement DMA transfers
iterating over a set of descriptors that the guest's e1000 driver
prepares:

- the TDLEN and RDLEN registers store the total size of the descriptor
  area,

- while the TDH and RDH registers store the offset (in whole tx / rx
  descriptors) into the area where the transfer is supposed to start.

Each time a descriptor is processed, the TDH and RDH register is bumped
(as appropriate for the transfer direction).

QEMU already contains logic to deal with bogus transfers submitted by the
guest:

- Normally, the transmit case wants to increase TDH from its initial value
  to TDT. (TDT is allowed to be numerically smaller than the initial TDH
  value; wrapping at or above TDLEN bytes to zero is normal.) The failsafe
  that QEMU currently has here is a check against reaching the original
  TDH value again -- a complete wraparound, which should never happen.

- In the receive case RDH is increased from its initial value until
  "total_size" bytes have been received; preferably in a single step, or
  in "s->rxbuf_size" byte steps, if the latter is smaller. However, null
  RX descriptors are skipped without receiving data, while RDH is
  incremented just the same. QEMU tries to prevent an infinite loop
  (processing only null RX descriptors) by detecting whether RDH assumes
  its original value during the loop. (Again, wrapping from RDLEN to 0 is
  normal.)

What both directions miss is that the guest could program TDLEN and RDLEN
so low, and the initial TDH and RDH so high, that these registers will
immediately be truncated to zero, and then never reassume their initial
values in the loop -- a full wraparound will never occur.

The condition that expresses this is:

  xdh_start >= s->mac_reg[XDLEN] / sizeof(desc)

i.e., TDH or RDH start out after the last whole rx or tx descriptor that
fits into the TDLEN or RDLEN sized area.

This condition could be checked before we enter the loops, but
pci_dma_read() / pci_dma_write() knows how to fill in buffers safely for
bogus DMA addresses, so we just extend the existing failsafes with the
above condition.

This is CVE-2016-1981.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Prasad Pandit <ppandit@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: qemu-stable@nongnu.org
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1296044
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-02-04 14:13:11 +08:00
Peter Maydell
e8d4046559 hw/net: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-19-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:23 +00:00
Peter Maydell
84942979de -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJWZZJPAAoJEO8Ells5jWIRmp0H/26aFXVEgZykkUVNbqq05r7w
 AI7podQlFOAESJHqZtR8FMaH8TAZ5GhphP4pn0PsWp54VjwcYZbdoME+dhZ4Elyc
 WDanRHIweLv/zVg6+M8oHhw5GMaxtFLoLWrf0oanbUW9IZZmmM3COz/Y31hSVrR2
 EzEJi1VZZhpMj3ibeOJns4MrugYrne8MtOdvusE/Uw2rJBTiStnWw1eTk8RmkNcg
 5un1mQZxFU2AcNzmWdmWJmjY0rCnR3HhtTdZOwjM6uZGIJ9hbsItGzqiGadBfozI
 fUtIa2HZahioe0VIzoB0snXnAuhV1jA0Uy18i04dPvgQOmiVSRjQNE2/lwQflyE=
 =Pad3
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Mon 07 Dec 2015 14:06:07 GMT using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  lan9118: log and ignore access to invalid registers, rather than aborting
  lan9118: fix emulation of MAC address loaded bit in E2P_CMD register
  vmxnet3: silence warning
  pcnet: fix rx buffer overflow(CVE-2015-7512)
  net: pcnet: add check to validate receive data size(CVE-2015-7504)
  e1000: fix hang of win2k12 shutdown with flood ping

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-07 14:18:31 +00:00
Denis V. Lunev
9596ef7c7b e1000: fix hang of win2k12 shutdown with flood ping
e1000 driver in Win2k12 is really well rotten. It 100% hangs on shutdown
of UP VM under flood ping. The guest checks card state and reinjects
itself interrupt in a loop. This is fatal for UP machine.

There is no good way to fix this misbehavior but to kludge it. The
emulation has interrupt throttling register aka ITR which limits
interrupt rate and allows the guest to proceed this phase.
There is no problem with this kludge for Linux guests - it adjust the
value of it itself.

On the other hand according to the initial research in
    commit e9845f0985
    Author: Vincenzo Maffione <v.maffione@gmail.com>
    Date:   Fri Aug 2 18:30:52 2013 +0200

    e1000: add interrupt mitigation support

    ...

    Interrupt mitigation boosts performance when the guest suffers from
    an high interrupt rate (i.e. receiving short UDP packets at high packet
    rate). For some numerical results see the following link
    http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf

this should also boost performance a bit.

See https://bugzilla.redhat.com/show_bug.cgi?id=874406 for additional
details.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vincenzo Maffione <v.maffione@gmail.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-12-07 21:43:43 +08:00
Leonid Bloch
ba63ec8594 e1000: Introducing backward compatibility command line parameter
This follows the previous patches, where support for migrating the
entire MAC registers' array, and some new MAC registers were introduced.

This patch introduces the e1000-specific boolean parameter
"extra_mac_registers", which is on by default. Setting it to off will
enable migration to older versions of QEMU, but will disable the read
and write access to the new registers, that were introduced since adding
the ability to migrate the entire MAC array.

Example for usage to enable backward compatibility and to disable the
new MAC registers:

    qemu-system-x86_64 -device e1000,extra_mac_registers=off,... ...

As mentioned above, the default value is "on".

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:54 +08:00
Leonid Bloch
3b27430177 e1000: Implementing various counters
This implements the following Statistic registers (various counters)
according to Intel's specs:

TSCTC  GOTCL  GOTCH  GORCL  GORCH  MPRC   BPRC   RUC    ROC
BPTC   MPTC   PTC... PRC...

PLEASE NOTE: these registers will not be active, nor will migrate, until
a compatibility flag will be set (in the next patch in this series).

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:54 +08:00
Leonid Bloch
4aeea330f0 e1000: Fixing the packet address filtering procedure
Previously, if promiscuous unicast was enabled, a packet was received
straight away, even if it was a multicast or a broadcast packet. This
patch fixes that behavior, while making the filtering procedure a bit
more human-readable.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:54 +08:00
Leonid Bloch
45e9376471 e1000: Fixing the received/transmitted octets' counters
Previously, these 64-bit registers did not stick at their maximal
values when (and if) they reached them, as they should do, according to
the specs.

This patch introduces a function that takes care of such registers,
avoiding code duplication, making the relevant parts more compatible
with the QEMU coding style, while ensuring that in the unlikely case
of reaching the maximal value, the counter will stick there, as it
supposed to.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:53 +08:00
Leonid Bloch
1f67f92c4f e1000: Fixing the received/transmitted packets' counters
According to Intel's specs, these counters (as the other Statistic
registers) stick at 0xffffffff when this maximal value is reached.
Previously, they would reset after the max. value.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:53 +08:00
Leonid Bloch
72ea771c97 e1000: Trivial implementation of various MAC registers
These registers appear in Intel's specs, but were not implemented.
These registers are now implemented trivially, i.e. they are initiated
with zero values, and if they are RW, they can be written or read by the
driver, or read only if they are R (essentially retaining their zero
values). For these registers no other procedures are performed.

For the trivially implemented Diagnostic registers, a debug warning is
produced on read/write attempts.

PLEASE NOTE: these registers will not be active, nor will migrate, until
a compatibility flag will be set (in a later patch in this series).

The registers implemented here are:

Transmit:
RW: AIT

Management:
RW: WUC     WUS     IPAV    IP6AT*  IP4AT*  FFLT*   WUPM*   FFMT*   FFVT*

Diagnostic:
RW: RDFH    RDFT    RDFHS   RDFTS   RDFPC   PBM*    TDFH    TDFT    TDFHS
    TDFTS   TDFPC

Statistic:
RW: FCRUC
R:  RNBC    TSCTFC  MGTPRC  MGTPDC  MGTPTC  RFC     RJC     SCC     ECOL
    LATECOL MCC     COLC    DC      TNCRS   SEC     CEXTERR RLEC    XONRXC
    XONTXC  XOFFRXC XOFFTXC

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:53 +08:00
Leonid Bloch
bc0f0674f0 e1000: Introduced an array to control the access to the MAC registers
The array of uint8_t's which is introduced here, contains access metadata
about the MAC registers: if a register is accessible, but partly implemented,
or if a register requires a certain compatibility flag in order to be
accessed. Currently, 6 hypothetical flags are supported (3 exist for e1000
so far) but in the future, if more than 6 flags will be needed, the datatype
of this array can simply be swapped for a larger one.

This patch is intended to solve the following current problems:

1) In a scenario of migration between different versions of QEMU, which
differ by the MAC registers implemented in them, some registers need not to
be active if a compatibility flag is set, in order to preserve the machine's
state perfectly for the older version. Checking this for each register
individually, would create a lot of clutter in the code.

2) Some registers are (or may be) only partly implemented (e.g.
placeholders that allow reading and writing, but lack other functions).
In such cases it is better to print a debug warning on read/write attempts.
As above, dealing with this functionality on a per-register level, would
require longer and more messy code.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:52 +08:00
Leonid Bloch
9e11773417 e1000: Add support for migrating the entire MAC registers' array
This patch makes the migration of the entire array of MAC registers
possible during live migration. The entire array is just 128 KB long, so
practically no penalty should be felt when transmitting it, additionally
to the previously transmitted individual registers. The advantage here is
eliminating the need to introduce new vmstate subsections in the future,
when additional MAC registers will be implemented.

Backward compatibility is preserved by introducing a e1000-specific
boolean parameter (in a later patch), which will be on by default.
Setting it to off would enable migration to older versions of QEMU.

Additionally, this parameter will be used to control the access to the
extra MAC registers in the future.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:52 +08:00
Leonid Bloch
20f3e86362 e1000: Cosmetic and alignment fixes
This fixes some alignment and cosmetic issues. The changes are made
in order that the following patches in this series will look like
integral parts of the code surrounding them, while conforming to the
coding style. Although some changes in unrelated areas are also made.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 13:48:53 +08:00
Jason Wang
8304402033 e1000: use alias for default model
Instead of duplicating the "e1000-82540em" device model as "e1000",
make the latter an alias for the former.

Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2015-10-12 13:19:29 +08:00
P J P
b947ac2bf2 e1000: Avoid infinite loop in processing transmit descriptor (CVE-2015-6815)
While processing transmit descriptors, it could lead to an infinite
loop if 'bytes' was to become zero; Add a check to avoid it.

[The guest can force 'bytes' to 0 by setting the hdr_len and mss
descriptor fields to 0.
--Stefan]

Signed-off-by: P J P <pjp@fedoraproject.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1441383666-6590-1-git-send-email-stefanha@redhat.com
2015-09-15 12:51:02 +01:00
Stefan Hajnoczi
5df6a1855b e1000: flush packets when link comes up
e1000_can_receive() checks the link up status register bit.  If the bit
is clear, packets will be queued and the peer may disable receive to
avoid wasting CPU reading packets that cannot be delivered.  The queue
must be flushed once the link comes back up again.

This patch fixes broken e1000 receive with Mac OS X Snow Leopard guests
and tap networking.  Flushing the queue invokes the async send callback,
which re-enables tap fd read.

Reported-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1435223885-12745-1-git-send-email-stefanha@redhat.com
2015-07-07 13:10:26 +01:00
Juan Quintela
5cd8cadae8 migration: Use normal VMStateDescriptions for Subsections
We create optional sections with this patch.  But we already have
optional subsections.  Instead of having two mechanism that do the
same, we can just generalize it.

For subsections we just change:

- Add a needed function to VMStateDescription
- Remove VMStateSubsection (after removal of the needed function
  it is just a VMStateDescription)
- Adjust the whole tree, moving the needed function to the corresponding
  VMStateDescription

Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-06-12 06:53:57 +02:00
Shannon Zhao
4e60a250d3 hw/net/e1000: fix integer endianness
It's detected by coverity.In is_vlan_packet s->mac_reg[VET] is
unsigned int but is dereferenced as a narrower unsigned short.
This may lead to unexpected results depending on machine
endianness.

Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1426224119-8352-1-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-03-27 10:23:50 +00:00
Markus Armbruster
9af21dbee1 pci: Trivial device model conversions to realize
Convert the device models where initialization obviously can't fail.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
2015-02-26 12:42:16 +01:00
Paolo Bonzini
57407ea44c net: remove all cleanup methods from NIC NetClientInfos
All NICs have a cleanup function that, in most cases, zeroes the pointer
to the NICState.  In some cases, it frees data belonging to the NIC.

However, this function is never called except when exiting from QEMU.
It is not necessary to NULL pointers and free data here; the right place
to do that would be in the device's unrealize function, after calling
qemu_del_nic.  Zeroing the NIC multiple times is also wrong for multiqueue
devices.

This cleanup function gets in the way of making the NetClientStates for
the NIC hold an object_ref reference to the object, so get rid of it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-01-12 10:16:23 +00:00
Michael S. Tsirkin
20302e71a5 e1000: defer packets until BM enabled
Some guests seem to set BM for e1000 after
enabling RX.
If packets arrive in the window, device is wedged.
Probably works by luck on real hardware, work around
this by making can_receive depend on BM.

Tested-by: Gabriel Somlo <somlo@cmu.edu>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-19 13:17:06 +00:00
Gonglei
d749e10c4f bootindex: move calling add_boot_device_patch to bootindex setter function
On this way, we can assure the new bootindex take effect
during vm rebooting.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-10-15 10:46:01 +02:00
Gonglei
5df3bf623d e1000: add bootindex to qom property
Add a qom property with the same name 'bootindex',
when we remove it form qdev property, things will
continue to work just fine, and we can use qom features
which are not supported by qdev property.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2014-10-15 09:52:54 +02:00
Peter Maydell
0e4a773705 SCSI changes that enable sending vendor-specific commands via virtio-scsi.
Memory changes for QOMification and automatic tracking of MR lifetime.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJT8et9AAoJEBvWZb6bTYbyIJAQAI3AlLSe27xWoUGfQUgWH30z
 Rt/pShHz3BJMfQpD79JfTH8u6uBpkQmKtflerNT7FhXN9ULDzNq+b/jRtke8nkuy
 ctCt05FhhK00rfWpUoRue4XiCuvbizBU7MK0DI3yCyNdXQyYnFvgnvsJtlqox8Zh
 J5HZcBJEmdCiWBxq7UPk0qBitp4PqNoy7jlD/Ex3m7fJN5WK2cyspQIT9zmhehVn
 B8Nwp+RitDDbXbwm0r18col5rFr/6Nj6+dW1gr+7sVJDLNsmJEqC2l3Kgk0wbPkG
 Uqwbih29me9PC9/L1VLGHY0ApKDQ8JGE0GrYgEg162hbhoxEHkjjoHMhDUfV6Pj8
 NkqcjjWl11UUhgkNqrGafayXbBVnOiEglxy8uXCeq14y9Xd/gjK9Fz6MQvRSOjms
 PFmaKknhdmpxh0DuZmTix7WBmKim8zOiCE0/vrAPvwx5L+d1bn5xh6yQvtVjBMpU
 Sru3Mhdm9bL9dUDBgOM/G6WCxSTVLBlExOblcYkQh03MfabD7bfplcrKYPXt5ull
 Y8YLjqkoIfoy5t0ErvtlpdBJjeEz99JXU+wLQ6NYHnzwzTV+oUtSaEph14mAFOcY
 XkFKdoPDI9PnyEfvy4193du8z/dSbhu7sWgHWbTCQyrcaNnSaVhlH43NUC+p23YN
 8vfEsVLd1X7MFkDBUmWp
 =M+/m
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

SCSI changes that enable sending vendor-specific commands via virtio-scsi.

Memory changes for QOMification and automatic tracking of MR lifetime.

# gpg: Signature made Mon 18 Aug 2014 13:03:09 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg:                 aka "Paolo Bonzini <bonzini@gnu.org>"

* remotes/bonzini/tags/for-upstream:
  mtree: remove write-only field
  memory: Use canonical path component as the name
  memory: Use memory_region_name for name access
  memory: constify memory_region_name
  exec: Abstract away ref to memory region names
  loader: Abstract away ref to memory region names
  tpm_tis: remove instance_finalize callback
  memory: remove memory_region_destroy
  memory: convert memory_region_destroy to object_unparent
  ioport: split deletion and destruction
  nic: do not destroy memory regions in cleanup functions
  vga: do not dynamically allocate chain4_alias
  sysbus: remove unused function sysbus_del_io
  qom: object: move unparenting to the child property's release callback
  qom: object: delete properties before calling instance_finalize
  virtio-scsi: implement parse_cdb
  scsi-block, scsi-generic: implement parse_cdb
  scsi-block: extract scsi_block_is_passthrough
  scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo
  scsi-bus: prepare scsi_req_new for introduction of parse_cdb

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 13:00:57 +01:00
Paolo Bonzini
469b046ead memory: remove memory_region_destroy
The function is empty after the previous patch, so remove it.

Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Gabriel L. Somlo
9616c29045 e1000: use symbolic constants to init phy ctrl & status registers
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:22:25 +02:00
Gabriel L. Somlo
1195fed9e6 e1000: correctly handle phy_ctrl reserved & self-clearing bits
Make phyreg_writeops responsible for actually writing their
respective phy registers, rather than rely on set_mdic() to
do it on their behalf.

The only current instance of phyreg_writeops is set_phy_ctrl();
modify it to write the register on its own, while also correctly
handling reserved and self-clearing bits.

have_autoneg() does not need to check for MII_CR_RESTART_AUTO_NEG,
since the only time the flag comes into play is during set_phy_ctrl(),
and, following this patch, never actually gets written to the phy
control register.

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-14 13:22:25 +02:00
Gabriel L. Somlo
d7a4155265 e1000: factor out checking for auto-negotiation availability
Also fix minor indentation issues in the surrounding code.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-23 17:38:00 +03:00
Gabriel L. Somlo
d52aec9545 e1000: move e1000_autoneg_timer() to after set_ics()
Enable calling set_ics() from within e1000_autoneg_timer() without
the need for a forward declaration.

This patch contains no functional changes.

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-23 17:38:00 +03:00
Gabriel L. Somlo
39bb8ee737 e1000: signal guest on successful link auto-negotiation
Generate a link status change interrupt once link auto-netotiation
is successfully completed. This does not affect Linux and Windows
(XP and 7 tested) in any way, but is needed by the stock OS X driver
(AppleIntel8254XEthernet.kext), which would otherwise fail to notice
the link status change event.

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-23 17:38:00 +03:00