qemu-e2k

Author	SHA1	Message	Date
Peter Maydell	ca3e40e233	vhost, pc, virtio features, fixes, cleanups New features: VT-d support for devices behind a bridge vhost-user migration support Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWKMrnAAoJECgfDbjSjVRpVL0H/iRc31o00QE4nWBRpxUpf8WJ V5RWE8qKkDgBha5bS5Nt4vs8K4jkkHGXCbmygMidWph96hUPK8/yHy1A/wmpBibB 5hVSPDK8onavNGJwpaWDrkhd9OhKAaKOuu49T6+VWJGZY/uX5ayqmcN934y0NPUa 4EhH5tyxPpYOYeW9i/VOMQ374gCJcpzYBMug4NJZRyFpfz/b2mzAQtoqw3EsPtB0 vpVJ+fKiCyG39HFKQJW7cL12yBeXOoyhjfDxpumLqwLWMfmde+vJwTFx6wbechgV aU3jIdvUX8wHCNYaB937NsMaDALoGNqUjbpKnf+xD1w7xr9pwTzdyrGH3rpGLEE= =+G1+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging vhost, pc, virtio features, fixes, cleanups New features: VT-d support for devices behind a bridge vhost-user migration support Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 22 Oct 2015 12:39:19 BST using RSA key ID D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" * remotes/mst/tags/for_upstream: (37 commits) hw/isa/lpc_ich9: inject the SMI on the VCPU that is writing to APM_CNT i386: keep cpu_model field in MachineState uptodate vhost: set the correct queue index in case of migration with multiqueue piix: fix resource leak reported by Coverity seccomp: add memfd_create to whitelist vhost-user-test: check ownership during migration vhost-user-test: add live-migration test vhost-user-test: learn to tweak various qemu arguments vhost-user-test: wrap server in TestServer struct vhost-user-test: remove useless static check vhost-user-test: move wait_for_fds() out vhost: add migration block if memfd failed vhost-user: use an enum helper for features mask vhost user: add rarp sending after live migration for legacy guest vhost user: add support of live migration net: add trace_vhost_user_event vhost-user: document migration log vhost: use a function for each call vhost-user: add a migration blocker vhost-user: send log shm fd along with log_base ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-10-22 12:41:44 +01:00
Thibaut Collet	3e866365e1	vhost user: add rarp sending after live migration for legacy guest A new vhost user message is added to allow QEMU to ask to vhost user backend to broadcast a fake RARP after live migration for guest without GUEST_ANNOUNCE capability. This new message is sent only if the backend supports the new VHOST_USER_PROTOCOL_F_RARP protocol feature. The payload of this new message is the MAC address of the guest (not known by the backend). The MAC address is copied in the first 6 bytes of a u64 to avoid to create a new payload message type. This new message has no equivalent ioctl so a new callback is added in the userOps structure to send the request. Upon reception of this new message the vhost user backend must generate and broadcast a fake RARP request to notify the migration is terminated. Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com> [Rebased and fixed checkpatch errors - Marc-André] Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Thibaut Collet <thibaut.collet@6wind.com>	2015-10-22 14:34:49 +03:00
Marc-André Lureau	c62b91e580	vhost-user: document migration log Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Thibaut Collet <thibaut.collet@6wind.com>	2015-10-22 14:34:49 +03:00
Cornelia Huck	22de58fe15	virtio: add some migration doc Try to cover the basics of virtio migration. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2015-10-22 14:34:48 +03:00
Kevin O'Connor	2cc06a8843	fw_cfg: Define a static signature to be returned on DMA port reads Return a static signature ("QEMU CFG") if the guest does a read to the DMA address io register. Signed-off-by: Kevin O'Connor <kevin@koconnor.net> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>	2015-10-19 15:26:54 +02:00
Marc Marí	c9eae1d4b9	fw_cfg DMA interface documentation Add fw_cfg DMA interface specification in the documentation. Based on Gerd Hoffman's initial implementation. Signed-off-by: Marc Marí <markmb@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>	2015-10-19 15:26:53 +02:00
Gabriel L. Somlo	57c3d238a5	fw_cfg: document fw_cfg_modify_iXX() update functions Document the behavior of fw_cfg_modify_iXX() for leak-less updating of integer-type blobs. Currently only fw_cfg_modify_i16() is coded, but 32- and 64-bit versions may be added later if necessary.. Signed-off-by: Gabriel Somlo <somlo@cmu.edu> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>	2015-10-19 15:26:53 +02:00
Gabriel L. Somlo	6407d76eb4	fw_cfg: insert string blobs via qemu cmdline Allow users to provide custom fw_cfg blobs with ascii string payloads specified directly on the qemu command line. Suggested-by: Jordan Justen <jordan.l.justen@intel.com> Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Gabriel Somlo <somlo@cmu.edu> Message-id: 1443544141-26568-1-git-send-email-somlo@cmu.edu Reviewd-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>	2015-10-19 15:26:53 +02:00
Peter Maydell	526d5809a0	* KVM page size fix for PPC * Support for Linux 4.4's new Hyper-V features * Eliminate g_slice from areas I maintain * checkpatch fix * Peter's cpu_reload_memory_map() cleanups * More changes to MAINTAINERS * Require Python 2.6 * chardev creation fixes * PCI requester id for ARM KVM * cleanups and doc fixes * Allow customization of the Hyper-V vendor id -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJWJKYWAAoJEL/70l94x66D2yYH/Rw06gj9FFVEhfNODmJozCsK zRqRREo+VMo/lIGUSwzI+OCX+yUoivxnsJXchqunK0udPuQ5vZ+mVGyKedg8/SU+ uqXzXMK7QgJK/w7qNA1n0OacNYSosZz9MpOwPgzSLPRda8FbtVKqPBOugSEs+Ymg APtiumz3DGWXUmt+vqRdgdiAvoGkefPODjjPjfSQFukg205KR88tf/b9oN8Z+kDW LtGqG9dUNS/60ulLNQdFInn3x5WpuGky5kk57f47QHpInNcN4/CH0BiguvYNkA9A aFFEWj5RsK7xkhcwSw6JIaSoWoTdrQVd4mB6+WTZN4tfGIIaoDeI6fp2MFmVpZU= =9Tf9 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * KVM page size fix for PPC * Support for Linux 4.4's new Hyper-V features * Eliminate g_slice from areas I maintain * checkpatch fix * Peter's cpu_reload_memory_map() cleanups * More changes to MAINTAINERS * Require Python 2.6 * chardev creation fixes * PCI requester id for ARM KVM * cleanups and doc fixes * Allow customization of the Hyper-V vendor id # gpg: Signature made Mon 19 Oct 2015 09:13:10 BST using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" * remotes/bonzini/tags/for-upstream: (49 commits) kvm: Allow the Hyper-V vendor ID to be specified kvm: Move x86-specific functions into target-i386/kvm.c kvm: Pass PCI device pointer to MSI routing functions hw/pci: Introduce pci_requester_id() kvm: Make KVM_CAP_SIGNAL_MSI globally available doc/rcu: fix g_free_rcu() usage example qemu-char: cleanup after completed conversion to cd->create qemu-char: convert ringbuf backend to data-driven creation qemu-char: convert vc backend to data-driven creation qemu-char: convert spice backend to data-driven creation qemu-char: convert console backend to data-driven creation qemu-char: convert stdio backend to data-driven creation qemu-char: convert testdev backend to data-driven creation qemu-char: convert braille backend to data-driven creation qemu-char: convert msmouse backend to data-driven creation qemu-char: convert mux backend to data-driven creation qemu-char: convert null backend to data-driven creation qemu-char: convert pty backend to data-driven creation qemu-char: convert UDP backend to data-driven creation qemu-char: convert socket backend to data-driven creation ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-10-19 10:52:39 +01:00
Sergey Fedorov	9bda456e41	doc/rcu: fix g_free_rcu() usage example The first argument of g_free_rcu() is a pointer to a structure. But foo_reclaim is used as a function name in the previous example along with &foo as a pointer to the structure being reclaimed. Make the example consistent with the previous one. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Message-Id: <1444837604-13712-1-git-send-email-serge.fdrv@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-10-19 10:13:07 +02:00
Eric Blake	f8b7f1a8ea	qapi: Consistent generated code: prefer visitor 'v' We had some pointless differences in the generated code for visit, command marshalling, and events; unifying them makes it easier for future patches to consolidate to common helper functions. This is one patch of a series to clean up these differences. This patch names the local visitor variable 'v' rather than 'm'. Related objects, such as 'QapiDeallocVisitor', are also named by their initials instead of an unrelated leading m. No change in semantics to the generated code. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1443565276-4535-12-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-10-12 18:46:49 +02:00
Eric Blake	2a0f50e8d9	qapi: Consistent generated code: prefer error 'err' We had some pointless differences in the generated code for visit, command marshalling, and events; unifying them makes it easier for future patches to consolidate to common helper functions. This is one patch of a series to clean up these differences. This patch consistently names the local error variable 'err' rather than 'local_err'. No change in semantics to the generated code. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1443565276-4535-11-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-10-12 18:46:49 +02:00
Markus Armbruster	9b89b6a287	docs: Move files from docs/qmp/ to docs/ Giving QMP its own subdirectory in docs/ is hardly worthwhile when we have just four files, and one of them isn't even in the subdirectory. Move the files from docs/qmp/ to docs/, renaming docs/qmp/README to docs/qmp-intro. Update MAINTAINERS. The new pattern also captures the fourth file docs/writing-qmp-commands.txt. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1443111117-29831-2-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-10-12 18:44:53 +02:00
Lin Ma	2e4ccbbc64	docs: update the usage example of "dtrace" backend in tracing.txt The usage example of dtrace is quite ancient, We have tracetool.py with different parameters instead of the original tracetool shell script for a long time, So update the old information. Signed-off-by: Lin Ma <lma@suse.com> Message-id: 1441954730-17341-1-git-send-email-lma@suse.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-09 10:14:05 +01:00
Peter Maydell	9e071429e6	* First batch of MAINTAINERS updates * IOAPIC fixes (to pass kvm-unit-tests with -machine kernel_irqchip=off) * NBD API upgrades from Daniel * strtosz fixes from Marc-André * improved support for readonly=on on scsi-generic devices * new "info ioapic" and "info lapic" monitor commands * Peter Crosthwaite's ELF_MACHINE cleanups * docs patches from Thomas and Daniel -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJWBSAEAAoJEL/70l94x66DeL4H/21YR4GWCqo30f+W5kx24ZNo by8H2kdZmWKRr/La1JlAReki9GCP1U8Q0cYC8V885gHLKcahWS/75UKwNbw0OSyg 2jj4uREc645TTFAvV5kQ+uAw9F/dchvkXylrVgOoUPipfmYibXY8JLu9AcVnZi6H X5Rvpqo4Uhp2cbRG7rYWrwgpNL+VZmKc8LDdqdlXrkjjanhuAYO2E9NBKaE+xJQQ FHcpkV92iSZFEZ0CB535BTIdNdDM/ae6bw1As27EF10YBTfneCQNazSeh13pLO2n lHit2GZr2VeTSBrPkPsItToY/Gw38duVZK4QM5/wSkHBzyeUJY0ltQrf53veYfk= =uc+I -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * First batch of MAINTAINERS updates * IOAPIC fixes (to pass kvm-unit-tests with -machine kernel_irqchip=off) * NBD API upgrades from Daniel * strtosz fixes from Marc-André * improved support for readonly=on on scsi-generic devices * new "info ioapic" and "info lapic" monitor commands * Peter Crosthwaite's ELF_MACHINE cleanups * docs patches from Thomas and Daniel # gpg: Signature made Fri 25 Sep 2015 11:20:52 BST using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" * remotes/bonzini/tags/for-upstream: (52 commits) doc: Refresh URLs in the qemu-tech documentation docs: describe the QEMU build system structure / design typedef: add typedef for QemuOpts i386: interrupt poll processing i386: partial revert of interrupt poll fix ppc: Rename ELF_MACHINE to be PPC specific i386: Rename ELF_MACHINE to be x86 specific alpha: Remove ELF_MACHINE from cpu.h mips: Remove ELF_MACHINE from cpu.h sparc: Remove ELF_MACHINE from cpu.h s390: Remove ELF_MACHINE from cpu.h sh4: Remove ELF_MACHINE from cpu.h xtensa: Remove ELF_MACHINE from cpu.h tricore: Remove ELF_MACHINE from cpu.h or32: Remove ELF_MACHINE from cpu.h lm32: Remove ELF_MACHINE from cpu.h unicore: Remove ELF_MACHINE from cpu.h moxie: Remove ELF_MACHINE from cpu.h cris: Remove ELF_MACHINE from cpu.h m68k: Remove ELF_MACHINE from cpu.h ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-09-25 21:52:30 +01:00
Peter Maydell	cdf9818242	virtio,pc features, fixes New features: vhost-user multiqueue support virtio-ccw virtio 1 support Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWBOxjAAoJECgfDbjSjVRpao8H/1hV55WvPXyEHB9ian+JPVEb pYFUcKGRO/bWMbXkqWnIBzNPrViPNQHot3zrOcoXtgnBGcuniiteGcAtqj4WEkgb WSa22AI1QrEPfHIkhR3sYdJAsqte/RppnFKLSDDi9TwKOGUho47OnkzJWfB+vuup 7YM/r8YDCkckdvsvfsCwW4Fbjxv7oKSokFkkdV/NwNDocNvRSBS9iAXsQYFdS7tm 8DIkWK63HQDY9in+fYkk8zoaXK7oZMyi3vHd2g4W0t0mGznxj9dxomrJrMo/4GWZ ZrnlB9R1QxpOCtoDtozelxkCnLJhEVjd8xYkGPg+xzYjrxl9aHIWjSNGhf5Q9QY= =5IBX -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging virtio,pc features, fixes New features: vhost-user multiqueue support virtio-ccw virtio 1 support Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 25 Sep 2015 07:40:35 BST using RSA key ID D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" * remotes/mst/tags/for_upstream: MAINTAINERS: add more devices to the PCI section MAINTAINERS: add more devices to the PC section vhost-user: add a new message to disable/enable a specific virt queue. vhost-user: add multiple queue support vhost: introduce vhost_backend_get_vq_index method vhost-user: add VHOST_USER_GET_QUEUE_NUM message vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE vhost-user: add protocol feature negotiation vhost-user: use VHOST_USER_XXX macro for switch statement virtio-ccw: enable virtio-1 virtio-ccw: feature bits > 31 handling virtio-ccw: support ring size changes virtio: ring sizes vs. reset pc: Introduce pc-*-2.5 machine classes q35: Move options common to all classes to pc_i440fx_machine_options() q35: Move options common to all classes to pc_q35_machine_options() virtio-net: unbreak self announcement and guest offloads after migration virtio: right size for virtio_queue_get_avail_size Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-09-25 16:40:05 +01:00
Daniel P. Berrange	717171bd20	docs: describe the QEMU build system structure / design Developers who are new to QEMU, or have a background familiarity with GNU autotools, can have trouble getting their head around the home-grown QEMU build system. This document attempts to explain the structure / design of the configure script and the various Makefile pieces that live across the source tree. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1443102098-13642-1-git-send-email-berrange@redhat.com> Acked-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-25 12:20:18 +02:00
Paolo Bonzini	7c9b2bf677	qemu-thread: add a fast path to the Win32 QemuEvent QemuEvents are used heavily by call_rcu. We do not want them to be slow, but the current implementation does a kernel call on every invocation of qemu_event_* and won't cut it. So, wrap a Win32 manual-reset event with a fast userspace path. The states and transitions are the same as for the futex and mutex/condvar implementations, but the slow path is different of course. The idea is to reset the Win32 event lazily, as part of a test-reset-test-wait sequence. Such a sequence is, indeed, how QemuEvents are used by RCU and other subsystems! The patch includes a formal model of the algorithm. Tested-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de>	2015-09-24 20:52:28 +02:00
Changchun Ouyang	7263a0ad78	vhost-user: add a new message to disable/enable a specific virt queue. Add a new message, VHOST_USER_SET_VRING_ENABLE, to enable or disable a specific virt queue, which is similar to attach/detach queue for tap device. virtio driver on guest doesn't have to use max virt queue pair, it could enable any number of virt queue ranging from 1 to max virt queue pair. Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Tested-by: Marcel Apfelbaum <marcel@redhat.com>	2015-09-24 16:27:53 +03:00
Changchun Ouyang	b931bfbf04	vhost-user: add multiple queue support This patch is initially based a patch from Nikolay Nikolaev. This patch adds vhost-user multiple queue support, by creating a nc and vhost_net pair for each queue. Qemu exits if find that the backend can't support the number of requested queues (by providing queues=# option). The max number is queried by a new message, VHOST_USER_GET_QUEUE_NUM, and is sent only when protocol feature VHOST_USER_PROTOCOL_F_MQ is present first. The max queue check is done at vhost-user initiation stage. We initiate one queue first, which, in the meantime, also gets the max_queues the backend supports. In older version, it was reported that some messages are sent more times than necessary. Here we came an agreement with Michael that we could categorize vhost user messages to 2 types: non-vring specific messages, which should be sent only once, and vring specific messages, which should be sent per queue. Here I introduced a helper function vhost_user_one_time_request(), which lists following messages as non-vring specific messages: VHOST_USER_SET_OWNER VHOST_USER_RESET_DEVICE VHOST_USER_SET_MEM_TABLE VHOST_USER_GET_QUEUE_NUM For above messages, we simply ignore them when they are not sent the first time. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Tested-by: Marcel Apfelbaum <marcel@redhat.com>	2015-09-24 16:27:53 +03:00
Yuanhan Liu	e2051e9e00	vhost-user: add VHOST_USER_GET_QUEUE_NUM message This is for querying how many queues the backend supports if it has mq support(when VHOST_USER_PROTOCOL_F_MQ flag is set from the quried protocol features). vhost_net_get_max_queues() is the interface to export that value, and to tell if the backend supports # of queues user requested, which is done in the following patch. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Marcel Apfelbaum <marcel@redhat.com>	2015-09-24 16:27:52 +03:00
Yuanhan Liu	d1f8b30ec8	vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE Quote from Michael: We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Tested-by: Marcel Apfelbaum <marcel@redhat.com>	2015-09-24 16:27:52 +03:00
Michael S. Tsirkin	dcb10c000c	vhost-user: add protocol feature negotiation Support a separate bitmask for vhost-user protocol features, and messages to get/set protocol features. Invoke them at init. No features are defined yet. [ leverage vhost_user_call for request handling -- Yuanhan Liu ] Signed-off-by: Michael S. Tsirkin <address@hidden> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Tested-by: Marcel Apfelbaum <marcel@redhat.com>	2015-09-24 16:27:52 +03:00
Marc-André Lureau	7b02f5447c	libcacard: use the standalone project libcacard is now a standalone project hosted with the Spice project (see the 2.5.0 release announcement), remove it from qemu tree. Use the library if found during configure or if --enable-smartcard. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-23 23:34:17 +02:00
Bharata B Rao	03d196b7c5	spapr: Support ibm,dynamic-reconfiguration-memory Parse ibm,architecture.vec table obtained from the guest and enable memory node configuration via ibm,dynamic-reconfiguration-memory if guest supports it. This is in preparation to support memory hotplug for sPAPR guests. This changes the way memory node configuration is done. Currently all memory nodes are built upfront. But after this patch, only memory@0 node for RMA is built upfront. Guest kernel boots with just that and rest of the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory) are built when guest does ibm,client-architecture-support call. Note: This patch needs a SLOF enhancement which is already part of SLOF binary in QEMU. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2015-09-23 10:51:10 +10:00
Markus Armbruster	1a9a507b2e	qapi-introspect: Hide type names To eliminate the temptation for clients to look up types by name (which are not ABI), replace all type names by meaningless strings. Reduces output of query-schema by 13 out of 85KiB. As a debugging aid, provide option -u to suppress the hiding. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1442401589-24189-27-git-send-email-armbru@redhat.com>	2015-09-21 09:56:49 +02:00
Markus Armbruster	39a1815816	qapi: New QMP command query-qmp-schema for QMP introspection qapi/introspect.json defines the introspection schema. It's designed for QMP introspection, but should do for similar uses, such as QGA. The introspection schema does not reflect all the rules and restrictions that apply to QAPI schemata. A valid QAPI schema has an introspection value conforming to the introspection schema, but the converse is not true. Introspection lowers away a number of schema details, and makes implicit things explicit: * The built-in types are declared with their JSON type. All integer types are mapped to 'int', because how many bits we use internally is an implementation detail. It could be pressed into external interface service as very approximate range information, but that's a bad idea. If we need range information, we better do it properly. * Implicit type definitions are made explicit, and given auto-generated names: - Array types, named by appending "List" to the name of their element type, like in generated C. - The enumeration types implicitly defined by simple union types, named by appending "Kind" to the name of their simple union type, like in generated C. - Types that don't occur in generated C. Their names start with ':' so they don't clash with the user's names. * All type references are by name. * The struct and union types are generalized into an object type. * Base types are flattened. * Commands take a single argument and return a single result. Dictionary argument or list result is an implicit type definition. The empty object type is used when a command takes no arguments or produces no results. The argument is always of object type, but the introspection schema doesn't reflect that. The 'gen': false directive is omitted as implementation detail. The 'success-response' directive is omitted as well for now, even though it's not an implementation detail, because it's not used by QMP. * Events carry a single data value. Implicit type definition and empty object type use, just like for commands. The value is of object type, but the introspection schema doesn't reflect that. * Types not used by commands or events are omitted. Indirect use counts as use. * Optional members have a default, which can only be null right now Instead of a mandatory "optional" flag, we have an optional default. No default means mandatory, default null means optional without default value. Non-null is available for optional with default (possible future extension). * Clients should not look up types by name, because type names are not ABI. Look up the command or event you're interested in, then follow the references. TODO Should we hide the type names to eliminate the temptation? New generator scripts/qapi-introspect.py computes an introspection value for its input, and generates a C variable holding it. It can generate awfully long lines. Marked TODO. A new test-qmp-input-visitor test case feeds its result for both tests/qapi-schema/qapi-schema-test.json and qapi-schema.json to a QmpInputVisitor to verify it actually conforms to the schema. New QMP command query-qmp-schema takes its return value from that variable. Its reply is some 85KiBytes for me right now. If this turns out to be too much, we have a couple of options: * We can use shorter names in the JSON. Not the QMP style. * Optionally return the sub-schema for commands and events given as arguments. Right now qmp_query_schema() sends the string literal computed by qmp-introspect.py. To compute sub-schema at run time, we'd have to duplicate parts of qapi-introspect.py in C. Unattractive. * Let clients cache the output of query-qmp-schema. It changes only on QEMU upgrades, i.e. rarely. Provide a command query-qmp-schema-hash. Clients can have a cache indexed by hash, and re-query the schema only when they don't have it cached. Even simpler: put the hash in the QMP greeting. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-21 09:56:49 +02:00
Markus Armbruster	2d21291ae6	qapi: Pseudo-type '**' is now unused, drop it 'gen': false needs to stay for now, because netdev_add is still using it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-25-git-send-email-armbru@redhat.com>	2015-09-21 09:56:49 +02:00
Markus Armbruster	b8a98326d5	qapi-schema: Fix up misleading specification of netdev_add It doesn't take a 'props' argument, let alone one in the format "NAME=VALUE,..." The bogus arguments specification doesn't matter due to 'gen': false. Clean it up to be incomplete rather than wrong, and document the incompleteness. While there, improve netdev_add usage example in the manual: add a device option to show how it's done. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-24-git-send-email-armbru@redhat.com>	2015-09-21 09:56:49 +02:00
Markus Armbruster	28770e057f	qapi: Introduce a first class 'any' type It's first class, because unlike '', it actually works, i.e. doesn't require 'gen': false. '' will go away next. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>	2015-09-21 09:56:49 +02:00
Markus Armbruster	f133f2db1e	qapi: Improve built-in type documentation Clarify how they map to JSON. Add how they map to C. Fix the reference to StringInputVisitor. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-20-git-send-email-armbru@redhat.com>	2015-09-21 09:56:49 +02:00
Markus Armbruster	56d92b003a	qapi-commands: De-duplicate output marshaling functions gen_marshal_output() uses its parameter name only for name of the generated function. Name it after the type being marshaled instead of its caller, and drop duplicates. Saves 7 copies of qmp_marshal_output_int() in qemu-ga, and one copy of qmp_marshal_output_str() in qemu-system-*. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-19-git-send-email-armbru@redhat.com>	2015-09-21 09:56:48 +02:00
Markus Armbruster	7fad30f06e	qapi: Rename qmp_marshal_input_FOO() to qmp_marshal_FOO() These functions marshal both input and output. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-17-git-send-email-armbru@redhat.com>	2015-09-21 09:56:48 +02:00
Markus Armbruster	e98859a9b9	qapi: Clean up after recent conversions to QAPISchemaVisitor Generate just 'FOO' instead of 'struct FOO' when possible. Drop helper functions that are now unused. Make pep8 and pylint reasonably happy. Rename generate_FOO() functions to gen_FOO() for consistency. Use more consistent and sensible variable names. Consistently use c_ for mapping keys when their value is a C identifier or type. Simplify gen_enum() and gen_visit_union() Consistently use single quotes for C text string literals. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1442401589-24189-14-git-send-email-armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-21 09:56:48 +02:00
Markus Armbruster	efd2eaa6c2	qapi: De-duplicate enum code generation Duplicated in commit `21cd70d`. Yes, we can't import qapi-types, but that's no excuse. Move the helpers from qapi-types.py to qapi.py, and replace the duplicates in qapi-event.py. The generated event enumeration type's lookup table becomes const-correct (see commit `2e4450f`), and uses explicit indexes instead of relying on order (see commit `912ae9c`). Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1442401589-24189-10-git-send-email-armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-21 09:56:47 +02:00
Markus Armbruster	2b162ccbe8	qapi-types: Convert to QAPISchemaVisitor, fixing flat unions Fixes flat unions to get the base's base members. Test case is from commit `2fc0043`, in qapi-schema-test.json: { 'union': 'UserDefFlatUnion', 'base': 'UserDefUnionBase', 'discriminator': 'enum1', 'data': { 'value1' : 'UserDefA', 'value2' : 'UserDefB', 'value3' : 'UserDefB' } } { 'struct': 'UserDefUnionBase', 'base': 'UserDefZero', 'data': { 'string': 'str', 'enum1': 'EnumOne' } } { 'struct': 'UserDefZero', 'data': { 'integer': 'int' } } Patch's effect on UserDefFlatUnion: struct UserDefFlatUnion { /* Members inherited from UserDefUnionBase: / + int64_t integer; char string; EnumOne enum1; /* Own members: / union { / union tag is @enum1 / void data; UserDefA value1; UserDefB value2; UserDefB *value3; }; }; Flat union visitors remain broken. They'll be fixed next. Code is generated in a different order now, but that doesn't matter. The two guards QAPI_TYPES_BUILTIN_STRUCT_DECL and QAPI_TYPES_BUILTIN_CLEANUP_DECL are replaced by just QAPI_TYPES_BUILTIN. Two ugly special cases for simple unions now stand out like sore thumbs: 1. The type tag is named 'type' everywhere, except in generated C, where it's 'kind'. 2. QAPISchema lowers simple unions to semantically equivalent flat unions. However, the C generated for a simple unions differs from the C generated for its equivalent flat union, and we therefore need special code to preserve that pointless difference for now. Mark both TODO. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-21 09:53:16 +02:00
Daniel P. Berrange	351d36e454	qapi: allow override of default enum prefix naming The camel_to_upper() method applies some heuristics to turn a mixed case type name into an all-uppercase name. This is used for example, to generate enum constant name prefixes. The heuristics don't also generate a satisfactory name though. eg { 'enum': 'QCryptoTLSCredsEndpoint', 'data': ['client', 'server']} Results in Q_CRYPTOTLS_CREDS_ENDPOINT_CLIENT. This has an undesirable _ after the initial Q and is missing an _ between the CRYPTO & TLS strings. Rather than try to add more and more heuristics to try to cope with this, simply allow the QAPI schema to specify the desired enum constant prefix explicitly. eg { 'enum': 'QCryptoTLSCredsEndpoint', 'prefix': 'QCRYPTO_TLS_CREDS_ENDPOINT', 'data': ['client', 'server']} Now gives the QCRYPTO_TLS_CREDS_ENDPOINT_CLIENT name. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2015-09-15 10:59:28 +01:00
Veres Lajos	67cc32ebfd	typofixes - v4 Signed-off-by: Veres Lajos <vlajos@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:45:43 +03:00
Daniel P. Berrange	b6af097528	maint: remove / fix many doubled words Many source files have doubled words (eg "the the", "to to", and so on). Most of these can simply be removed, but a couple were actual mis-spellings (eg "to to" instead of "to do"). There was even one triple word score "to to to" :-) Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:21:38 +03:00
Alberto Garcia	7f65ce834a	docs: document how to configure the qcow2 L2/refcount caches QEMU has options to configure the size of the L2 and refcount caches for the qcow2 format. However, choosing the right sizes for a particular disk image is not a straightforward operation since the ratio between the cache size and the allocated disk space is not obvious and depends on the size of the cluster and the refcount entries. This document attempts to give an overview of both caches and how to configure their sizes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 55de928e139b1ba3f3d40fe9c6c88f30b1f36410.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-09-04 21:00:32 +02:00
Markus Armbruster	94a3f0af38	docs/qapi-code-gen.txt: Fix QAPI schema examples Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-04 15:47:16 +02:00
Markus Armbruster	3a864e7c52	qapi: Generated code cleanup Clean up white-space, brace placement, and superfluous #ifdef QAPI_TYPES_BUILTIN_CLEANUP_DEF. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-04 15:47:16 +02:00
Markus Armbruster	3f99144cd9	qapi-commands: Drop useless initialization In generated command handlers, the assignment to retval dominates its only use. Therefore, its initialization is useless. Drop it. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-04 15:47:15 +02:00
Markus Armbruster	9b090d42ae	qapi: Command returning anonymous type doesn't work, outlaw Reproducer: with { 'command': 'user_def_cmd4', 'returns': { 'a': 'int' } } added to qapi-schema-test.json, qapi-commands.py dies when it tries to generate the command handler function Traceback (most recent call last): File "/work/armbru/qemu/scripts/qapi-commands.py", line 359, in <module> ret = generate_command_decl(cmd['command'], arglist, ret_type) + "\n" File "/work/armbru/qemu/scripts/qapi-commands.py", line 29, in generate_command_decl ret_type=c_type(ret_type), name=c_name(name), File "/work/armbru/qemu/scripts/qapi.py", line 927, in c_type assert isinstance(value, str) and value != "" AssertionError because the return type doesn't exist. Simply outlaw this usage, and drop or dumb down test cases accordingly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-04 15:47:15 +02:00
Markus Armbruster	315932b5ed	qapi: Fix to reject union command and event arguments A command's or event's 'data' must be a struct type, given either as a dictionary, or as struct type name. Commit `dd883c6` tightened the checking there, but not enough: we still accept 'union'. Fix to reject it. We may want to support union types there, but we'll have to extend qapi-commands.py and qapi-events.py for it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-04 15:47:15 +02:00
Markus Armbruster	016a335bd8	qapi-event: Clean up how name of enum QAPIEvent is made Use c_name() instead of ad hoc code. Doesn't upcase the -p prefix, which is an improvement in my book. Unbreaks prefix containing '.', but other funny characters remain broken. To be fixed next. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-04 15:47:13 +02:00
Markus Armbruster	4247f83900	qapi: Clarify docs on including the same file multiple times It's idempotent. While there, update examples to current code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-09-04 15:47:13 +02:00
Paolo Bonzini	05e514b1d4	AioContext: optimize clearing the EventNotifier It is pretty rare for aio_notify to actually set the EventNotifier. It can happen with worker threads such as thread-pool.c's, but otherwise it should never be set thanks to the ctx->notify_me optimization. The previous patch, unfortunately, added an unconditional call to event_notifier_test_and_clear; now add a userspace fast path that avoids the call. Note that it is not possible to do the same with event_notifier_set; it would break, as proved (again) by the included formal model. This patch survived over 3000 reboots on aarch64 KVM. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 1437487673-23740-7-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-22 12:41:40 +01:00
Paolo Bonzini	21a03d17f2	AioContext: fix broken placement of event_notifier_test_and_clear event_notifier_test_and_clear must be called before processing events. Otherwise, an aio_poll could "eat" the notification before the main I/O thread invokes ppoll(). The main I/O thread then never wakes up. This is an example of what could happen: i/o thread vcpu thread worker thread --------------------------------------------------------------------- lock_iothread notify_me = 1 ... unlock_iothread bh->scheduled = 1 event_notifier_set lock_iothread notify_me = 3 ppoll notify_me = 1 aio_dispatch aio_bh_poll thread_pool_completion_bh bh->scheduled = 1 event_notifier_set node->io_read(node->opaque) event_notifier_test_and_clear ppoll * hang * "Tracing" with qemu_clock_get_ns shows pretty much the same behavior as in the previous bug, so there are no new tricks here---just stare more at the code until it is apparent. One could also use a formal model, of course. The included one shows this with three processes: notifier corresponds to a QEMU thread pool worker, temporary_waiter to a VCPU thread that invokes aio_poll(), waiter to the main I/O thread. I would be happy to say that the formal model found the bug for me, but actually I wrote it after the fact. This patch is a bit of a big hammer. The next one optimizes it, with help (this time for real rather than a posteriori :)) from another, similar formal model. Reported-by: Richard W. M. Jones <rjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 1437487673-23740-6-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-22 12:41:40 +01:00
Paolo Bonzini	eabc977973	AioContext: fix broken ctx->dispatching optimization This patch rewrites the ctx->dispatching optimization, which was the cause of some mysterious hangs that could be reproduced on aarch64 KVM only. The hangs were indirectly caused by aio_poll() and in particular by flash memory updates's call to blk_write(), which invokes aio_poll(). Fun stuff: they had an extremely short race window, so much that adding all kind of tracing to either the kernel or QEMU made it go away (a single printf made it half as reproducible). On the plus side, the failure mode (a hang until the next keypress) made it very easy to examine the state of the process with a debugger. And there was a very nice reproducer from Laszlo, which failed pretty often (more than half of the time) on any version of QEMU with a non-debug kernel; it also failed fast, while still in the firmware. So, it could have been worse. For some unknown reason they happened only with virtio-scsi, but that's not important. It's more interesting that they disappeared with io=native, making thread-pool.c a likely suspect for where the bug arose. thread-pool.c is also one of the few places which use bottom halves across threads, by the way. I hope that no other similar bugs exist, but just in case :) I am going to describe how the successful debugging went... Since the likely culprit was the ctx->dispatching optimization, which mostly affects bottom halves, the first observation was that there are two qemu_bh_schedule() invocations in the thread pool: the one in the aio worker and the one in thread_pool_completion_bh. The latter always causes the optimization to trigger, the former may or may not. In order to restrict the possibilities, I introduced new functions qemu_bh_schedule_slow() and qemu_bh_schedule_fast(): /* qemu_bh_schedule_slow: / ctx = bh->ctx; bh->idle = 0; if (atomic_xchg(&bh->scheduled, 1) == 0) { event_notifier_set(&ctx->notifier); } / qemu_bh_schedule_fast: / ctx = bh->ctx; bh->idle = 0; assert(ctx->dispatching); atomic_xchg(&bh->scheduled, 1); Notice how the atomic_xchg is still in qemu_bh_schedule_slow(). This was already debated a few months ago, so I assumed it to be correct. In retrospect this was a very good idea, as you'll see later. Changing thread_pool_completion_bh() to qemu_bh_schedule_fast() didn't trigger the assertion (as expected). Changing the worker's invocation to qemu_bh_schedule_slow() didn't hide the bug (another assumption which luckily held). This already limited heavily the amount of interaction between the threads, hinting that the problematic events must have triggered around thread_pool_completion_bh(). As mentioned early, invoking a debugger to examine the state of a hung process was pretty easy; the iothread was always waiting on a poll(..., -1) system call. Infinite timeouts are much rarer on x86, and this could be the reason why the bug was never observed there. With the buggy sequence more or less resolved to an interaction between thread_pool_completion_bh() and poll(..., -1), my "tracing" strategy was to just add a few qemu_clock_get_ns(QEMU_CLOCK_REALTIME) calls, hoping that the ordering of aio_ctx_prepare(), aio_ctx_dispatch, poll() and qemu_bh_schedule_fast() would provide some hint. The output was: (gdb) p last_prepare $3 = 103885451 (gdb) p last_dispatch $4 = 103876492 (gdb) p last_poll $5 = 115909333 (gdb) p last_schedule $6 = 115925212 Notice how the last call to qemu_poll_ns() came after aio_ctx_dispatch(). This makes little sense unless there is an aio_poll() call involved, and indeed with a slightly different instrumentation you can see that there is one: (gdb) p last_prepare $3 = 107569679 (gdb) p last_dispatch $4 = 107561600 (gdb) p last_aio_poll $5 = 110671400 (gdb) p last_schedule $6 = 110698917 So the scenario becomes clearer: iothread VCPU thread -------------------------------------------------------------------------- aio_ctx_prepare aio_ctx_check qemu_poll_ns(timeout=-1) aio_poll aio_dispatch thread_pool_completion_bh qemu_bh_schedule() At this point bh->scheduled = 1 and the iothread has not been woken up. The solution must be close, but this alone should not be a problem, because the bottom half is only rescheduled to account for rare situations (see commit `3c80ca1`, thread-pool: avoid deadlock in nested aio_poll() calls, 2014-07-15). Introducing a third thread---a thread pool worker thread, which also does qemu_bh_schedule()---does bring out the problematic case. The third thread must be awakened after* the callback is complete and thread_pool_completion_bh has redone the whole loop, explaining the short race window. And then this is what happens: thread pool worker -------------------------------------------------------------------------- <I/O completes> qemu_bh_schedule() Tada, bh->scheduled is already 1, so qemu_bh_schedule() does nothing and the iothread is never woken up. This is where the bh->scheduled optimization comes into play---it is correct, but removing it would have masked the bug. So, what is the bug? Well, the question asked by the ctx->dispatching optimization ("is any active aio_poll dispatching?") was wrong. The right question to ask instead is "is any active aio_poll not dispatching", i.e. in the prepare or poll phases? In that case, the aio_poll is sleeping or might go to sleep anytime soon, and the EventNotifier must be invoked to wake it up. In any other case (including if there is no active aio_poll at all!) we can just wait for the next prepare phase to pick up the event (e.g. a bottom half); the prepare phase will avoid the blocking and service the bottom half. Expressing the invariant with a logic formula, the broken one looked like: !(exists(thread): in_dispatching(thread)) => !optimize or equivalently: !(exists(thread): in_aio_poll(thread) && in_dispatching(thread)) => !optimize In the correct one, the negation is in a slightly different place: (exists(thread): in_aio_poll(thread) && !in_dispatching(thread)) => !optimize or equivalently: (exists(thread): in_prepare_or_poll(thread)) => !optimize Even if the difference boils down to moving an exclamation mark :) the implementation is quite different. However, I think the new one is simpler to understand. In the old implementation, the "exists" was implemented with a boolean value. This didn't really support well the case of multiple concurrent event loops, but I thought that this was okay: aio_poll holds the AioContext lock so there cannot be concurrent aio_poll invocations, and I was just considering nested event loops. However, aio_poll _could_ indeed be concurrent with the GSource. This is why I came up with the wrong invariant. In the new implementation, "exists" is computed simply by counting how many threads are in the prepare or poll phases. There are some interesting points to consider, but the gist of the idea remains: 1) AioContext can be used through GSource as well; as mentioned in the patch, bit 0 of the counter is reserved for the GSource. 2) the counter need not be updated for a non-blocking aio_poll, because it won't sleep forever anyway. This is just a matter of checking the "blocking" variable. This requires some changes to the win32 implementation, but is otherwise not too complicated. 3) as mentioned above, the new implementation will not call aio_notify when there is no active aio_poll at all. The tests have to be adjusted for this change. The calls to aio_notify in async.c are fine; they only want to kick aio_poll out of a blocking wait, but need not do anything if aio_poll is not running. 4) nested aio_poll: these just work with the new implementation; when a nested event loop is invoked, the outer event loop is never in the prepare or poll phases. The outer event loop thus has already decremented the counter. Reported-by: Richard W. M. Jones <rjones@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 1437487673-23740-5-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-22 12:41:40 +01:00

1 2 3 4 5 ...

338 Commits