qemu-e2k

Commit Graph

Author	SHA1	Message	Date
Cédric Le Goater	71255c48e7	aspeed/smc: Add default reset values This simplifies the reset handler and has the benefit to remove some "bad" use of the segments array as an identifier of the controller model. Signed-off-by: Cédric Le Goater <clg@kaod.org>	2021-10-12 08:20:08 +02:00
Cédric Le Goater	f75b533117	aspeed/smc: QOMify AspeedSMCFlash AspeedSMCFlash is a small structure representing the AHB memory window through which the contents of a flash device can be accessed with MMIOs. Introduce an AspeedSMCFlash SysBusDevice model and attach the associated memory region to the newly instantiated objects. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2021-10-12 08:20:08 +02:00
Cédric Le Goater	10f915e4ca	aspeed/smc: Rename AspeedSMCFlash 'id' to 'cs' 'cs' is a more appropriate name to index SPI flash devices. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2021-10-12 08:20:08 +02:00
Cédric Le Goater	6bb55e7967	aspeed/smc: Remove the 'size' attribute from AspeedSMCFlash AspeedSMCFlash::size is only used to compute the initial size of the boot_rom region. Not very useful, so directly call memory_region_size() instead. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2021-10-12 08:20:08 +02:00
Cédric Le Goater	a7d78beff4	aspeed/smc: Remove the 'flash' attribute from AspeedSMCFlash There is no need to keep a reference of the flash qdev in the AspeedSMCFlash state: the SPI bus takes ownership and will release its resources. Remove AspeedSMCFlash::flash. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2021-10-12 08:20:08 +02:00
Cédric Le Goater	30b6852ce4	aspeed/smc: Drop AspeedSMCController structure The characteristics of the Aspeed controllers are described in a AspeedSMCController structure which is redundant with the AspeedSMCClass. Move all attributes under the class and adapt the code to use class attributes instead. This is a large change but it is functionally equivalent. Signed-off-by: Cédric Le Goater <clg@kaod.org>	2021-10-12 08:20:08 +02:00
Mark Cave-Ayland	c7a2f7ba0c	macfb: add vertical blank interrupt The MacOS driver expects a 60.15Hz vertical blank interrupt to be generated by the framebuffer which in turn schedules the mouse driver via the Vertical Retrace Manager. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20211007221253.29024-13-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-10-08 13:31:03 +02:00
Mark Cave-Ayland	df8abbbadf	macfb: add common monitor modes supported by the MacOS toolbox ROM The monitor modes table is found by experimenting with the Monitors Control Panel in MacOS and analysing the reads/writes. From this it can be found that the mode is controlled by writes to the DAFB_MODE_CTRL1 and DAFB_MODE_CTRL2 registers. Implement the first block of DAFB registers as a register array including the existing sense register, the newly discovered control registers above, and also the DAFB_MODE_VADDR1 and DAFB_MODE_VADDR2 registers which are used by NetBSD to determine the current video mode. These experiments also show that the offset of the start of video RAM and the stride can change depending upon the monitor mode, so update macfb_draw_graphic() and both the BI_MAC_VADDR and BI_MAC_VROW bootinfo for the q800 machine accordingly. Finally update macfb_common_realize() so that only the resolution and depth supported by the display type can be specified on the command line, and add an error hint showing the list of supported resolutions and depths if the user tries to specify an invalid display mode. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20211007221253.29024-10-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-10-08 13:31:03 +02:00
Mark Cave-Ayland	4317c51861	macfb: add qdev property to specify display type Since the available resolutions and colour depths are determined by the attached display type, add a qdev property to allow the display type to be specified. The main resolutions of interest are high resolution 1152x870 with 8-bit colour and SVGA resolution up to 800x600 with 24-bit colour so update the q800 machine to allow high resolution mode if specified and otherwise fall back to SVGA. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20211007221253.29024-9-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-10-08 13:31:03 +02:00
Mark Cave-Ayland	e6108b9636	macfb: implement mode sense to allow display type to be detected The MacOS toolbox ROM uses the monitor sense to detect the display type and then offer a fixed set of resolutions and colour depths accordingly. Implement the monitor sense using information found in Apple Technical Note HW26: "Macintosh Quadra Built-In Video" along with some local experiments. Since the default configuration is 640 x 480 with 8-bit colour then hardcode the sense register to return MACFB_DISPLAY_VGA for now. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20211007221253.29024-8-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-10-08 13:31:03 +02:00
Richard Henderson	14f12119aa	mirror: Handle errors after READY cancel v2: add small fix by Stefano, Hanna's series fixed -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEi5wmzbL9FHyIDoahVh8kwfGfefsFAmFfEVMACgkQVh8kwfGf eftAVA//WLtOaiVYPSjEl5EK80kry39VknZkQyeYUzyV7JNr/FRMlbJaIF2sOjH5 KPRpfwBiuijOc8R0s34HY0BpyweRd1rbypHblZkO7EO4XwHx1FLF5kNHF6yV7wPL c9W564sZpc6Z96wSMgC4Is9QHJ6JbO4TJNJsG8v/PHEqGQV/yCYkgBox4loJckww uSAZ7l63IWA8uPSq/rOu34bREKN9s0kHkvFq0JNWk2HtOBLDiRDUYmbSfdjfT4jz np7ojKiffcAJED9JA28Zo2Y+MSId+FyoO4lbt+deMNzIHboy2oVlHouoHHprr61x dIO7Qt1IoMk5IBIfkPRYkReMwxxSVKuIJcWm8Qqtkcg2X0g5ayNUmHwpBMd50h2z XPjrr0YdOixhxMHoBnqlkPlWU0Y/B+YJIQ+mjqp+vRNkk94NoXhsXnCod1ajkgWO zjc/dztew7HvNStJaMM0rnEjanLhzFZKtlMO4WwZHQp2yZG2AINkPStswo2f3AmL FI+2By/UhFKm3BEemf0wYWDPWrPHU+BOiu16KjSKeS0GA9t7GXBUDRxNYPhUheXJ eJKIpNsGbseNxKrAbLyRhAB75Fa/ReZqqybmEcLyal/ball3R/cNF3gaMHeX0o1n HTGIAF5JOAXNGApS5TilkXPZ7jHFOVPh/Fi6/16/08tcgxjVfro= =TVTu -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vsementsov/tags/pull-jobs-2021-10-07-v2' into staging mirror: Handle errors after READY cancel v2: add small fix by Stefano, Hanna's series fixed # gpg: Signature made Thu 07 Oct 2021 08:25:07 AM PDT # gpg: using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB # gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 8B9C 26CD B2FD 147C 880E 86A1 561F 24C1 F19F 79FB * remotes/vsementsov/tags/pull-jobs-2021-10-07-v2: iotests: Add mirror-ready-cancel-error test mirror: Do not clear .cancelled mirror: Stop active mirroring after force-cancel mirror: Check job_is_cancelled() earlier mirror: Use job_is_cancelled() job: Add job_cancel_requested() job: Do not soft-cancel after a job is done jobs: Give Job.force_cancel more meaning job: @force parameter for job_cancel_sync() job: Force-cancel jobs in a failed transaction mirror: Drop s->synced mirror: Keep s->synced on error job: Context changes in job_completed_txn_abort() block/aio_task: assert `max_busy_tasks` is greater than 0 block/backup: avoid integer overflow of `max-workers` Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-10-07 10:26:35 -07:00
Hanna Reitz	08b83bff2a	job: Add job_cancel_requested() Most callers of job_is_cancelled() actually want to know whether the job is on its way to immediate termination. For example, we refuse to pause jobs that are cancelled; but this only makes sense for jobs that are really actually cancelled. A mirror job that is cancelled during READY with force=false should absolutely be allowed to pause. This "cancellation" (which is actually a kind of completion) may take an indefinite amount of time, and so should behave like any job during normal operation. For example, with on-target-error=stop, the job should stop on write errors. (In contrast, force-cancelled jobs should not get write errors, as they should just terminate and not do further I/O.) Therefore, redefine job_is_cancelled() to only return true for jobs that are force-cancelled (which as of HEAD^ means any job that interprets the cancellation request as a request for immediate termination), and add job_cancel_requested() as the general variant, which returns true for any jobs which have been requested to be cancelled, whether it be immediately or after an arbitrarily long completion phase. Finally, here is a justification for how different job_is_cancelled() invocations are treated by this patch: - block/mirror.c (mirror_run()): - The first invocation is a while loop that should loop until the job has been cancelled or scheduled for completion. What kind of cancel does not matter, only the fact that the job is supposed to end. - The second invocation wants to know whether the job has been soft-cancelled. Calling job_cancel_requested() is a bit too broad, but if the job were force-cancelled, we should leave the main loop as soon as possible anyway, so this should not matter here. - The last two invocations already check force_cancel, so they should continue to use job_is_cancelled(). - block/backup.c, block/commit.c, block/stream.c, anything in tests/: These jobs know only force-cancel, so there is no difference between job_is_cancelled() and job_cancel_requested(). We can continue using job_is_cancelled(). - job.c: - job_pause_point(), job_yield(), job_sleep_ns(): Only force-cancelled jobs should be prevented from being paused. Continue using job_is_cancelled(). - job_update_rc(), job_finalize_single(), job_finish_sync(): These functions are all called after the job has left its main loop. The mirror job (the only job that can be soft-cancelled) will clear .cancelled before leaving the main loop if it has been soft-cancelled. Therefore, these functions will observe .cancelled to be true only if the job has been force-cancelled. We can continue to use job_is_cancelled(). (Furthermore, conceptually, a soft-cancelled mirror job should not report to have been cancelled. It should report completion (see also the block-job-cancel QAPI documentation). Therefore, it makes sense for these functions not to distinguish between a soft-cancelled mirror job and a job that has completed as normal.) - job_completed_txn_abort(): All jobs other than @job have been force-cancelled. job_is_cancelled() must be true for them. Regarding @job itself: job_completed_txn_abort() is mostly called when the job's return value is not 0. A soft-cancelled mirror has a return value of 0, and so will not end up here then. However, job_cancel() invokes job_completed_txn_abort() if the job has been deferred to the main loop, which is mostly the case for completed jobs (which skip the assertion), but not for sure. To be safe, use job_cancel_requested() in this assertion. - job_complete(): This is function eventually invoked by the user (through qmp_block_job_complete() or qmp_job_complete(), or job_complete_sync(), which comes from qemu-img). The intention here is to prevent a user from invoking job-complete after the job has been cancelled. This should also apply to soft cancelling: After a mirror job has been soft-cancelled, the user should not be able to decide otherwise and have it complete as normal (i.e. pivoting to the target). - job_cancel(): Both functions are equivalent (see comment there), but we want to use job_is_cancelled(), because this shows that we call job_completed_txn_abort() only for force-cancelled jobs. (As explained for job_update_rc(), soft-cancelled jobs should be treated as if they have completed as normal.) Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-9-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-10-07 10:42:40 +02:00
Hanna Reitz	73895f3838	jobs: Give Job.force_cancel more meaning We largely have two cancel modes for jobs: First, there is actual cancelling. The job is terminated as soon as possible, without trying to reach a consistent result. Second, we have mirror in the READY state. Technically, the job is not really cancelled, but it just is a different completion mode. The job can still run for an indefinite amount of time while it tries to reach a consistent result. We want to be able to clearly distinguish which cancel mode a job is in (when it has been cancelled). We can use Job.force_cancel for this, but right now it only reflects cancel requests from the user with force=true, but clearly, jobs that do not even distinguish between force=false and force=true are effectively always force-cancelled. So this patch has Job.force_cancel signify whether the job will terminate as soon as possible (force_cancel=true) or whether it will effectively remain running despite being "cancelled" (force_cancel=false). To this end, we let jobs that provide JobDriver.cancel() tell the generic job code whether they will terminate as soon as possible or not, and for jobs that do not provide that method we assume they will. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211006151940.214590-7-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-10-07 10:42:34 +02:00
Hanna Reitz	4cfb3f0562	job: @force parameter for job_cancel_sync() Callers should be able to specify whether they want job_cancel_sync() to force-cancel the job or not. In fact, almost all invocations do not care about consistency of the result and just want the job to terminate as soon as possible, so they should pass force=true. The replication block driver is the exception, specifically the active commit job it runs. As for job_cancel_sync_all(), all callers want it to force-cancel all jobs, because that is the point of it: To cancel all remaining jobs as quickly as possible (generally on process termination). So make it invoke job_cancel_sync() with force=true. This changes some iotest outputs, because quitting qemu while a mirror job is active will now lead to it being cancelled instead of completed, which is what we want. (Cancelling a READY mirror job with force=false may take an indefinite amount of time, which we do not want when quitting. If users want consistent results, they must have all jobs be done before they quit qemu.) Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-6-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-10-07 10:42:09 +02:00
Philippe Mathieu-Daudé	31ca70b5ff	hw/char/mchp_pfsoc_mmuart: QOM'ify PolarFire MMUART - Embed SerialMM in MchpPfSoCMMUartState and QOM-initialize it - Alias SERIAL_MM 'chardev' property on MCHP_PFSOC_UART - Forward SerialMM sysbus IRQ in mchp_pfsoc_mmuart_realize() - Add DeviceReset() method - Add vmstate structure for migration - Register device in 'input' category - Keep mchp_pfsoc_mmuart_create() behavior Note, serial_mm_init() calls qdev_set_legacy_instance_id(). This call is only needed for backwards-compatibility of incoming migration data with old versions of QEMU which implemented migration of devices with hand-rolled code. Since this device didn't previously handle migration at all, then it doesn't need to set the legacy instance ID. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Bin Meng <bin.meng@windriver.com> Tested-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210925133407.1259392-4-f4bug@amsat.org Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2021-10-07 08:41:33 +10:00
Philippe Mathieu-Daudé	24ce762df7	hw/char/mchp_pfsoc_mmuart: Use a MemoryRegion container Our device have 2 different I/O regions: - a 16550 UART mapped for 32-bit accesses - 13 extra registers Instead of mapping each region on the main bus, introduce a container, map the 2 devices regions on the container, and map the container on the main bus. Before: (qemu) info mtree ... 0000000020100000-000000002010001f (prio 0, i/o): serial 0000000020100020-000000002010101f (prio 0, i/o): mchp.pfsoc.mmuart 0000000020102000-000000002010201f (prio 0, i/o): serial 0000000020102020-000000002010301f (prio 0, i/o): mchp.pfsoc.mmuart 0000000020104000-000000002010401f (prio 0, i/o): serial 0000000020104020-000000002010501f (prio 0, i/o): mchp.pfsoc.mmuart 0000000020106000-000000002010601f (prio 0, i/o): serial 0000000020106020-000000002010701f (prio 0, i/o): mchp.pfsoc.mmuart After: (qemu) info mtree ... 0000000020100000-0000000020100fff (prio 0, i/o): mchp.pfsoc.mmuart 0000000020100000-000000002010001f (prio 0, i/o): serial 0000000020100020-0000000020100fff (prio 0, i/o): mchp.pfsoc.mmuart.regs 0000000020102000-0000000020102fff (prio 0, i/o): mchp.pfsoc.mmuart 0000000020102000-000000002010201f (prio 0, i/o): serial 0000000020102020-0000000020102fff (prio 0, i/o): mchp.pfsoc.mmuart.regs 0000000020104000-0000000020104fff (prio 0, i/o): mchp.pfsoc.mmuart 0000000020104000-000000002010401f (prio 0, i/o): serial 0000000020104020-0000000020104fff (prio 0, i/o): mchp.pfsoc.mmuart.regs 0000000020106000-0000000020106fff (prio 0, i/o): mchp.pfsoc.mmuart 0000000020106000-000000002010601f (prio 0, i/o): serial 0000000020106020-0000000020106fff (prio 0, i/o): mchp.pfsoc.mmuart.regs Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Tested-by: Bin Meng <bin.meng@windriver.com> Message-id: 20210925133407.1259392-3-f4bug@amsat.org Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2021-10-07 08:41:33 +10:00
Philippe Mathieu-Daudé	284a66a8f6	hw/char/mchp_pfsoc_mmuart: Simplify MCHP_PFSOC_MMUART_REG definition The current MCHP_PFSOC_MMUART_REG_SIZE definition represent the size occupied by all the registers. However all registers are 32-bit wide, and the MemoryRegionOps handlers are restricted to 32-bit: static const MemoryRegionOps mchp_pfsoc_mmuart_ops = { .read = mchp_pfsoc_mmuart_read, .write = mchp_pfsoc_mmuart_write, .impl = { .min_access_size = 4, .max_access_size = 4, }, Avoid being triskaidekaphobic, simplify by using the number of registers. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Bin Meng <bin.meng@windriver.com> Tested-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210925133407.1259392-2-f4bug@amsat.org Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2021-10-07 08:41:33 +10:00
Richard Henderson	6723ff639c	More fixes for fedora-i386-cross Add dup_const_tl Expand MemOp MO_SIZE Move MemOpIdx out of tcg.h Vector support for tcg/s390x -----BEGIN PGP SIGNATURE----- iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmFdvPUdHHJpY2hhcmQu aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV8gcwf/T+J5dCgGisLnjOlh mG16tZczILfmpw76Yne0zwJ8T2WFohkyOegBnfRZHzUM/0JZlMbvNMUKSJd4WKhB fpzHKEeTVo7OlW5i6eo1HqQYcbEKzBMEBLEoDWeyRt3k3hpTcjNuD6tC3CaZoCvs gf9UcYgsp3htRPsoOhmarjv5Wded7N1BDQa0W7amlT2rLPO4L2UILfDXiWmapkcp 0kgiKaI1Criua3BNA1+oGPQQQPVSi1MQmiwX/IW/6fExpC65xLBMI3DIyr1ejFPX rrIyx49dQuUHCrPi9jcZ9eT3z8h1PhmAuREv1/VaDl2BROnXUGCanWZoLEd+YWFH R6mXnQ== =cs7J -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20211006' into staging More fixes for fedora-i386-cross Add dup_const_tl Expand MemOp MO_SIZE Move MemOpIdx out of tcg.h Vector support for tcg/s390x # gpg: Signature made Wed 06 Oct 2021 08:12:53 AM PDT # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate] * remotes/rth/tags/pull-tcg-20211006: (28 commits) tcg/s390x: Implement TCG_TARGET_HAS_cmpsel_vec tcg/s390x: Implement TCG_TARGET_HAS_bitsel_vec tcg/s390x: Implement TCG_TARGET_HAS_sat_vec tcg/s390x: Implement TCG_TARGET_HAS_minmax_vec tcg/s390x: Implement vector shift operations tcg/s390x: Implement TCG_TARGET_HAS_mul_vec tcg/s390x: Implement andc, orc, abs, neg, not vector operations tcg/s390x: Implement minimal vector operations tcg/s390x: Implement tcg_out_dup_vec tcg/s390x: Implement tcg_out_mov for vector types tcg/s390x: Implement tcg_out_ld/st for vector types tcg/s390x: Add host vector framework tcg/s390x: Merge TCG_AREG0 and TCG_REG_CALL_STACK into TCGReg tcg/s390x: Change FACILITY representation tcg/s390x: Rename from tcg/s390 tcg: Expand usadd/ussub with umin/umax hw/core/cpu: Re-sort the non-pointers to the end of CPUClass trace: Split guest_mem_before plugins: Reorg arguments to qemu_plugin_vcpu_mem_cb accel/tcg: Pass MemOpIdx to atomic_trace__post ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-10-06 08:50:23 -07:00
Paolo Bonzini	cc07162953	block: introduce max_hw_iov for use in scsi-generic Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel sources, IOV_MAX in POSIX). Because of this, on some host adapters requests with many iovecs are rejected with -EINVAL by the io_submit() or readv()/writev() system calls. In fact, the same limit applies to SG_IO as well. To fix both the EINVAL and the possible performance issues from using fewer iovecs than allowed by Linux (some HBAs have max_segments as low as 128), introduce a separate entry in BlockLimits to hold the max_segments value from sysfs. This new limit is used only for SG_IO and clamped to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to bs->bl.max_transfer. Reported-by: Halil Pasic <pasic@linux.ibm.com> Cc: Hanna Reitz <hreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-block@nongnu.org Cc: qemu-stable@nongnu.org Fixes: `18473467d5` ("file-posix: try BLKSECTGET on block devices too, do not round to power of 2", 2021-06-25) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210923130436.1187591-1-pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-10-06 10:25:55 +02:00
Vladimir Sementsov-Ogievskiy	621d17378a	block: implement bdrv_new_open_driver_opts() Add version of bdrv_new_open_driver() that supports QDict options. We'll use it in further commit. Simply add one more argument to bdrv_new_open_driver() is worse, as there are too many invocations of bdrv_new_open_driver() to update then. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Suggested-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210920115538.264372-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-10-06 10:25:55 +02:00
Emanuele Giuseppe Esposito	a6297e1ade	include/block.h: remove outdated comment There are a couple of errors in bdrv_drained_begin header comment: - block_job_pause does not exist anymore, it has been replaced with job_pause in `b15de82867` - job_pause is automatically invoked as a .drained_begin callback (child_job_drained_begin) by the child_job BdrvChildClass struct in blockjob.c. So no additional pause should be required. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210903113800.59970-1-eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-10-06 10:25:55 +02:00
Richard Henderson	dc29f4746f	hw/core/cpu: Re-sort the non-pointers to the end of CPUClass Despite the comment, the members were not kept at the end. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-10-05 16:53:17 -07:00
Richard Henderson	37aff08726	plugins: Reorg arguments to qemu_plugin_vcpu_mem_cb Use the MemOpIdx directly, rather than the rearrangement of the same bits currently done by the trace infrastructure. Pass in enum qemu_plugin_mem_rw so that we are able to treat read-modify-write operations as a single operation. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-10-05 16:53:17 -07:00
Richard Henderson	abe2e23eb7	tcg: Split out MemOpIdx to exec/memopidx.h Move this code from tcg/tcg.h to its own header. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-10-05 16:53:17 -07:00
Richard Henderson	9002ffcb72	tcg: Rename TCGMemOpIdx to MemOpIdx We're about to move this out of tcg.h, so rename it as we did when moving MemOp. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-10-05 16:53:17 -07:00
Richard Henderson	4b473e0c60	tcg: Expand MO_SIZE to 3 bits We have lacked expressive support for memory sizes larger than 64-bits for a while. Fixing that requires adjustment to several points where we used this for array indexing, and two places that develop -Wswitch warnings after the change. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-10-05 16:53:17 -07:00
Philipp Tomsich	db637f270b	tcg: add dup_const_tl wrapper dup_const always generates a uint64_t, which may exceed the size of a target_long (generating warnings with recent-enough compilers). To ensure that we can use dup_const both for 64bit and 32bit targets, this adds dup_const_tl, which either maps back to dup_const (for 64bit targets) or provides a similar implementation using 32bit constants. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Message-Id: <20211003214243.3813425-1-philipp.tomsich@vrull.eu> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-10-05 16:53:17 -07:00
Ani Sinha	0e780da76a	hw/i386/acpi: fix conflicting IO address range for acpi pci hotplug in q35 Change `caf108bc58` ("hw/i386/acpi-build: Add ACPI PCI hot-plug methods to Q35") selects an IO address range for acpi based PCI hotplug for q35 arbitrarily. It starts at address 0x0cc4 and ends at 0x0cdb. At the time when the patch was written but the final version of the patch was not yet pushed upstream, this address range was free and did not conflict with any other IO address ranges. However, with the following change, this address range was no longer conflict free as in this change, the IO address range (value of ACPI_PCIHP_SIZE) was incremented by four bytes: `b32bd763a1` ("pci: introduce acpi-index property for PCI device") This can be seen from the output of QMP command 'info mtree' : 0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt 0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr 0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0 0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi 0000000000000cc4-0000000000000cdb (prio 0, i/o): acpi-pci-hotplug 0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-cpu-hotplug It shows that there is a region of conflict between IO regions of acpi pci hotplug and acpi cpu hotplug. Unfortunately, the change `caf108bc58` did not update the IO address range appropriately before it was pushed upstream to accommodate the increased length of the IO address space introduced in change `b32bd763a1`. Due to this bug, windows guests complain 'This device cannot find enough free resources it can use' in the device manager panel for extended IO buses. This issue also breaks the correct functioning of pci hotplug as the following shows that the IO space for pci hotplug has been truncated: (qemu) info mtree -f FlatView #0 AS "I/O", root: io Root memory region: io 0000000000000cc4-0000000000000cd7 (prio 0, i/o): acpi-pci-hotplug 0000000000000cd8-0000000000000cf7 (prio 0, i/o): acpi-cpu-hotplug Therefore, in this fix, we adjust the IO address range for the acpi pci hotplug so that it does not conflict with cpu hotplug and there is no truncation of IO spaces. The starting IO address of PCI hotplug region has been decremented by four bytes in order to accommodate four byte increment in the IO address space introduced by change `b32bd763a1` ("pci: introduce acpi-index property for PCI device") After fixing, the following are the corrected IO ranges: 0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt 0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr 0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0 0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi 0000000000000cc0-0000000000000cd7 (prio 0, i/o): acpi-pci-hotplug 0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-cpu-hotplug This change has been tested using a Windows Server 2019 guest VM. Windows no longer complains after this change. Fixes: `caf108bc58` ("hw/i386/acpi-build: Add ACPI PCI hot-plug methods to Q35") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/561 Signed-off-by: Ani Sinha <ani@anisinha.ca> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Julia Suvorova <jusual@redhat.com> Message-Id: <20210916132838.3469580-3-ani@anisinha.ca> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	a8a5768786	acpi: AcpiGenericAddress no longer used to map/access fields of MMIO, drop packed attribute Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-36-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	538c2ecf1a	acpi: remove no longer used build_header() Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-35-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	cf68410bc9	acpi: build_facs: use build_append_int_noprefix() API to compose table Drop usage of packed structures and explicit endian conversions when building table and use endian agnostic build_append_int_noprefix() API to build it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-34-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	41041e5708	acpi: arm/virt: build_gtdt: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offsets magic from API user. while at it, replace packed structure with endian agnostic build_append_FOO() API. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-33-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	a86d86ac0a	acpi: arm/virt: build_spcr: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offsets magic from API user. while at it, replace packed structure with endian agnostic build_append_FOO() API. PS: Spec is Microsoft hosted, however 1.02 is no where to be found (MS lists only the current revision) and the current revision is 1.07, so bring comments in line with 1.07 as this is the only available spec. There is no content change between originally implemented 1.02 (using QEMU code as reference) and 1.07. The only change is renaming 'Reserved2' field to 'Language', with the same 0 value. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-32-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	271cbb2f2b	acpi: arm/virt: convert build_iort() to endian agnostic build_append_FOO() API Drop usage of packed structures and explicit endian conversions when building IORT table use endian agnostic build_append_int_noprefix() API to build it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210924122802.1455362-30-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	3548494e49	acpi: arm: virt: build_iort: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offsets magic from API user. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-29-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	37f33084ed	acpi: arm/virt: madt: use build_append_int_noprefix() API to compose MADT table Drop usage of packed structures and explicit endian conversions when building MADT table for arm/x86 and use endian agnostic build_append_int_noprefix() API to build it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-26-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	dd092b9c60	acpi: x86: madt: use build_append_int_noprefix() API to compose MADT table Drop usage of packed structures and explicit endian conversions when building MADT table for arm/x86 and use endian agnostic build_append_int_noprefix() API to build it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-25-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	d0aa026a49	acpi: x86: set enabled when composing _MAT entries Instead of composing disabled _MAT entry and then later on patching it to enabled for hotpluggbale CPUs in DSDT, set it to enabled at the time _MAT entry is built. It will allow to drop usage of packed structures in following patches when build_madt() is switched to use build_append_int_noprefix() API. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-24-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	b10e7f4f8f	acpi: x86: remove dead code Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-23-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	99a7545f92	acpi: madt: arm/x86: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offsets magic from API user. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-22-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	91a6b97569	acpi: build_dmar_q35: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offsets magic from API user. While at it switch to build_append_int_noprefix() to build table entries tables. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-19-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	e5b6d55a6e	acpi: use build_append_int_noprefix() API to compose SRAT table Drop usage of packed structures and explicit endian conversions when building SRAT tables for arm/x86 and use endian agnostic build_append_int_noprefix() API to build it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-18-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	255bf20f2e	acpi: arm/x86: build_srat: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offsets magic from API user. While at it switch to build_append_int_noprefix() to build table entries (which also removes some manual offset calculations) Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-17-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	57cb8cfbf2	acpi: build_tpm_tcpa: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offsets magic from API user. While at it switch to build_append_int_noprefix() to build table entries (which also removes some manual offset calculations). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-16-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	43dde1705c	acpi: build_hpet: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offsets magic from API user. while at it convert build_hpet() to endian agnostic build_append_FOO() API Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210924122802.1455362-15-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	f497b7cae1	acpi: build_xsdt: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offsets magic from API user. While at it switch to build_append_int_noprefix() to build entries to other tables (which also removes some manual offset calculations). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-4-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	ea298e83a7	acpi: build_rsdt: use acpi_table_begin()/acpi_table_end() instead of build_header() it replaces error-prone pointer arithmetic for build_header() API, with 2 calls to start and finish table creation, which hides offests magic from API user. While at it switch to build_append_int_noprefix() to build entries to other tables (which also removes some manual offset calculations). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Igor Mammedov	c151fd8710	acpi: add helper routines to initialize ACPI tables Patch introduces acpi_table_begin()/ acpi_table_end() API that hides pointer/offset arithmetic from user as opposed to build_header(), to prevent errors caused by it [1]. acpi_table_begin(): initializes table header and keeps track of table data/offsets acpi_table_end(): sets actual table length and tells bios loader where table is for the later initialization on guest side. 1) commits `bb9feea431` x86: acpi: use offset instead of pointer when using build_header() `4d027afeb3` Virt: ACPI: fix qemu assert due to re-assigned table data address Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210924122802.1455362-2-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Stefan Berger <stefanb@linux.ibm.com> Tested-by: Yanan Wang <wangyanan55@huawei.com>	2021-10-05 17:30:57 -04:00
Stefano Garzarella	46ce017167	vhost-vsock: handle common features in vhost-vsock-common virtio-vsock features, like VIRTIO_VSOCK_F_SEQPACKET, can be handled by vhost-vsock-common parent class. In this way, we can reuse the same code for all virtio-vsock backends (i.e. vhost-vsock, vhost-user-vsock). Let's move `seqpacket` property to vhost-vsock-common class, add vhost_vsock_common_get_features() used by children, and disable `seqpacket` for vhost-user-vsock device for machine types < 6.2. The behavior of vhost-vsock device doesn't change; vhost-user-vsock device now supports `seqpacket` property. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210921161642.206461-3-sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Stefano Garzarella	d6a9378f47	vhost-vsock: fix migration issue when seqpacket is supported Commit `1e08fd0a46` ("vhost-vsock: SOCK_SEQPACKET feature bit support") enabled the SEQPACKET feature bit. This commit is released with QEMU 6.1, so if we try to migrate a VM where the host kernel supports SEQPACKET but machine type version is less than 6.1, we get the following errors: Features 0x130000002 unsupported. Allowed features: 0x179000000 Failed to load virtio-vhost_vsock:virtio error while loading state for instance 0x0 of device '0000:00:05.0/virtio-vhost_vsock' load of migration failed: Operation not permitted Let's disable the feature bit for machine types < 6.1. We add a new OnOffAuto property for this, called `seqpacket`. When it is `auto` (default), QEMU behaves as before, trying to enable the feature, when it is `on` QEMU will fail if the backend (vhost-vsock kernel module) doesn't support it. Fixes: `1e08fd0a46` ("vhost-vsock: SOCK_SEQPACKET feature bit support") Cc: qemu-stable@nongnu.org Reported-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210921161642.206461-2-sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2021-10-05 17:30:57 -04:00
Yanan Wang	2b52619994	machine: Move smp_prefer_sockets to struct SMPCompatProps Now we have a common structure SMPCompatProps used to store information about SMP compatibility stuff, so we can also move smp_prefer_sockets there for cleaner code. No functional change intended. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210929025816.21076-15-wangyanan55@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-01 15:29:15 +02:00
Yanan Wang	7687b2b3ed	machine: Remove smp_parse callback from MachineClass Now we have a generic smp parser for all arches, and there will not be any other arch specific ones, so let's remove the callback from MachineClass and call the parser directly. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210929025816.21076-14-wangyanan55@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-01 15:28:19 +02:00
Yanan Wang	e4a97a893b	machine: Make smp_parse generic enough for all arches Currently the only difference between smp_parse and pc_smp_parse is the support of dies parameter and the related error reporting. With some arch compat variables like "bool dies_supported", we can make smp_parse generic enough for all arches and the PC specific one can be removed. Making smp_parse() generic enough can reduce code duplication and ease the code maintenance, and also allows extending the topology with more arch specific members (e.g., clusters) in the future. Suggested-by: Andrew Jones <drjones@redhat.com> Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210929025816.21076-13-wangyanan55@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-01 15:28:19 +02:00
Yanan Wang	003f230e37	machine: Tweak the order of topology members in struct CpuTopology Now that all the possible topology parameters are integrated in struct CpuTopology, tweak the order of topology members to be "cpus/sockets/ dies/cores/threads/maxcpus" for readability and consistency. We also tweak the comment by adding explanation of dies parameter. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210929025816.21076-12-wangyanan55@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-01 15:28:19 +02:00
Yanan Wang	4a0af2930a	machine: Prefer cores over sockets in smp parsing since 6.2 In the real SMP hardware topology world, it's much more likely that we have high cores-per-socket counts and few sockets totally. While the current preference of sockets over cores in smp parsing results in a virtual cpu topology with low cores-per-sockets counts and a large number of sockets, which is just contrary to the real world. Given that it is better to make the virtual cpu topology be more reflective of the real world and also for the sake of compatibility, we start to prefer cores over sockets over threads in smp parsing since machine type 6.2 for different arches. In this patch, a boolean "smp_prefer_sockets" is added, and we only enable the old preference on older machines and enable the new one since type 6.2 for all arches by using the machine compat mechanism. Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210929025816.21076-10-wangyanan55@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-01 15:28:16 +02:00
Peter Maydell	bb4aa8f59e	target-arm queue: * allwinner-h3: Switch to SMC as PSCI conduit * arm: tcg: Adhere to SMCCC 1.3 section 5.2 * xlnx-zcu102, xlnx-versal-virt: Support BBRAM and eFUSE devices * gdbstub related code cleanups * Don't put FPEXC and FPSID in org.gnu.gdb.arm.vfp XML * Use _init vs _new convention in bus creation function names * sabrelite: Connect SPI flash CS line to GPIO3_19 -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmFV05gZHHBldGVyLm1h eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3kVoD/9rlpi81v6U2zPmW5s/lFB8 m7eqtVpP2T1UwwGPw5jXZ4qAyDyCDXJxtW8B2ePxjXfrFT5f59hy9myBFqDebjNC Xwdwafc17lkUm0CrIEwhMhGYiXs6yak1YcGqEPZ3ceWt67kVByXGj89mLepogCHn LvcjQGC3PuDvDHWnOKdOBhxTu+rvQSDRXpVCuBAd3eJBn9jxG10cdaCr3/Z7VFA/ bnc9bSU8qJ0hYCswHHld9R2Rk9zYDQmrtMpygN6pviCd5qWGEOh8b5vszmrSHYo9 tn0bSp9d9k2wBXrPR5Ux3L0IMRBp7N88tSDw2QyatDltltjsCKw+ZMxjKHh0mxnr N1QF1FteIFliu5GQeMiEWPP87rVZ31quWZUIln6XYo9+aXus8jd88vxdpND1v767 np/q6BW0g+Tuu2T+QRe5V8VBQJzgEAKT7AwCVHC+5Flyq8fWFcFdPp1dygWXdzW3 Yhmq2JwwMq/3MjZY10aymohrvFPAQSx2bGGDS9yi8m5seaJvHjJW5fZQUVapy0vw andiIFNC9KxeQ23AZM0oKkW/d5EckKIkagfiq5+71QhvtbJrXbz+fs7UxHN0IVeX 7px+ih0xJcz3uVxZtZ/kvpBMMe3WEMd9r2tZOhbJ9K8RlCcB11y1AnZaBs/2fess +DzTJOkZGu1oDP4IAAqGBg== =eAN3 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20210930' into staging target-arm queue: * allwinner-h3: Switch to SMC as PSCI conduit * arm: tcg: Adhere to SMCCC 1.3 section 5.2 * xlnx-zcu102, xlnx-versal-virt: Support BBRAM and eFUSE devices * gdbstub related code cleanups * Don't put FPEXC and FPSID in org.gnu.gdb.arm.vfp XML * Use _init vs _new convention in bus creation function names * sabrelite: Connect SPI flash CS line to GPIO3_19 # gpg: Signature made Thu 30 Sep 2021 16:11:20 BST # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "peter.maydell@linaro.org" # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate] # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20210930: (22 commits) hw/arm: sabrelite: Connect SPI flash CS line to GPIO3_19 ide: Rename ide_bus_new() to ide_bus_init() qbus: Rename qbus_create() to qbus_new() qbus: Rename qbus_create_inplace() to qbus_init() pci: Rename pci_root_bus_new_inplace() to pci_root_bus_init() ipack: Rename ipack_bus_new_inplace() to ipack_bus_init() scsi: Replace scsi_bus_new() with scsi_bus_init(), scsi_bus_init_named() target/arm: Don't put FPEXC and FPSID in org.gnu.gdb.arm.vfp XML target/arm: Move gdbstub related code out of helper.c target/arm: Fix coding style issues in gdbstub code in helper.c configs: Don't include 32-bit-only GDB XML in aarch64 linux configs docs/system/arm: xlnx-versal-virt: BBRAM and eFUSE Usage hw/arm: xlnx-zcu102: Add Xilinx eFUSE device hw/arm: xlnx-zcu102: Add Xilinx BBRAM device hw/arm: xlnx-versal-virt: Add Xilinx eFUSE device hw/arm: xlnx-versal-virt: Add Xilinx BBRAM device hw/nvram: Introduce Xilinx battery-backed ram hw/nvram: Introduce Xilinx ZynqMP eFuse device hw/nvram: Introduce Xilinx Versal eFuse device hw/nvram: Introduce Xilinx eFuse QOM ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 21:16:54 +01:00
Peter Maydell	0021c4765a	* SGX implementation for x86 * Miscellaneous bugfixes * Fix dependencies from ROMs to qtests -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmFVu/sUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNFUgf+OexjKqJw4qzbDdQrxWqw3upoFblk y4OrmrhCyCKDwPghnjHUEVGHnNKqKpCLoIvtvFZ7xX/qezpMtZxVUliOVNQGmioR MZU/DbdlvVL/t8yKjfz1ljshk55hnSJ7rAv8LBA+B3uNzyJ+LZU9+Kbvmei5oyex nenCtXnoVNBJMvTBE/KfJbp0UasEb1OTvPBa0Y7mHyDub28FDPKr9WZbloCLUtE+ uXwbZ34VRDsxbLnXh+BJ+ljOQLdsJErAkiPKTnW1/3W8Ti7PzOzvLpbSIVdBv/9A U1qOEm48BjCrG/tFJvTUm0ZM7AHmqYfvmwpenDpL0FhReohMdUa3pycQ9g== =Hicy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * SGX implementation for x86 * Miscellaneous bugfixes * Fix dependencies from ROMs to qtests # gpg: Signature made Thu 30 Sep 2021 14:30:35 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (33 commits) meson_options.txt: Switch the default value for the vnc option to 'auto' build-sys: add HAVE_IPPROTO_MPTCP memory: Add tracepoint for dirty sync memory: Name all the memory listeners target/i386: Fix memory leak in sev_read_file_base64() tests: qtest: bios-tables-test depends on the unpacked edk2 ROMs meson: unpack edk2 firmware even if --disable-blobs target/i386: Add the query-sgx-capabilities QMP command target/i386: Add HMP and QMP interfaces for SGX docs/system: Add SGX documentation to the system manual sgx-epc: Add the fill_device_info() callback support i440fx: Add support for SGX EPC q35: Add support for SGX EPC i386: acpi: Add SGX EPC entry to ACPI tables i386/pc: Add e820 entry for SGX EPC section(s) hw/i386/pc: Account for SGX EPC sections when calculating device memory hw/i386/fw_cfg: Set SGX bits in feature control fw_cfg accordingly Adjust min CPUID level to 0x12 when SGX is enabled i386: Propagate SGX CPUID sub-leafs to KVM i386: kvm: Add support for exposing PROVISIONKEY to guest ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 17:38:31 +01:00
Peter Maydell	fce8f7735f	ppc patch queue for 2021-09-30 Here's the next batch of ppc related patches for qemu-6.2. Highlights are: * Fixes for several TCG math instructions from the El Dorado Institute * A number of improvements to the powernv machine type * Support for a new DEVICE_UNPLUG_GUEST_ERROR QAPI event from Daniel Barboza * Support for the new FORM2 PAPR NUMA representation. This allows more specific NUMA distances, as well as asymmetric configurations * Fix for 64-bit decrementer (used on MicroWatt CPUs) * Assorted fixes and cleanups * A number of updates to MAINTAINERS Note that the DEVICE_UNPLUG_GUEST_ERROR stuff includes changes to files outside my normal area, but has suitable Acks. The MAINTAINERS updates are mostly about marking minor platforms unmaintained / orphaned, and moving some pieces away from myself and Greg. As we move onto other projects, we're going to need to drop more of the ppc maintainership, though we're hoping we can avoid too abrupt a change. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmFVTlEACgkQbDjKyiDZ s5IATRAAi3qB5riIXQEO41SoG5EjwJ9+2VQmagMkgwyNTQp5SyrqiWpTJ7Cq8IMF inEgV8vHv40GFJLdXXjTvdsRkAxh1siVZR7okmbd144ZdwL/ntJ4FFiGG5BjFgf2 DajbDW/RgNDsWkTFbnhKVDNTzFbxHgeGf/g+KOPmQ5MtPPJk/Og2jJVOVE+CCY+Q FsF7xe9ehffrGP4ObkwKM8pNMWEaJ8KP+Z1ZV5WQwQSHPZD4x1qb6aaH8JQWv7Fl pqep5nlUv0KNsRPFj4hGhzjB9JUMMnWtSEuw5GNL0bQAj1v6tPNESRMw6fInAR/A JlvG6iYba2ZKZ7+SGipwYyVzBPnE7ur9i2E5MaANTSQF1zgsSwcYT6J5dYSCZmNH XrEZYYm1MyCTem8PNKrP44QRQAQz+vfHb8hUkwzNsZC7DLLsZ3JQ2AWcAvWFOKRe /s3WyODeXa3gZMCBFbvCWVAPiKNNFGQMsa2/bkF7FpvNGl99xL3o6MDskA2dSdTK 0oreMT9OKdDS2W2LpDCasq6HHVsoJ+hc+TkWCEshEEPb7jUAIOnKqhHh/aqlXVf7 J3r68FFbNbBoPR0SgdLc4teBmUKDJIGpZT8gaJ7Vbx6IiHXH21ZzOHKuTjbJZWWN u1vcjBBqvbQmrwhCyi6Fe5gGM63GcB/ulizM1rNyqIpTUuzyx9I= =shYQ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.2-20210930' into staging ppc patch queue for 2021-09-30 Here's the next batch of ppc related patches for qemu-6.2. Highlights are: * Fixes for several TCG math instructions from the El Dorado Institute * A number of improvements to the powernv machine type * Support for a new DEVICE_UNPLUG_GUEST_ERROR QAPI event from Daniel Barboza * Support for the new FORM2 PAPR NUMA representation. This allows more specific NUMA distances, as well as asymmetric configurations * Fix for 64-bit decrementer (used on MicroWatt CPUs) * Assorted fixes and cleanups * A number of updates to MAINTAINERS Note that the DEVICE_UNPLUG_GUEST_ERROR stuff includes changes to files outside my normal area, but has suitable Acks. The MAINTAINERS updates are mostly about marking minor platforms unmaintained / orphaned, and moving some pieces away from myself and Greg. As we move onto other projects, we're going to need to drop more of the ppc maintainership, though we're hoping we can avoid too abrupt a change. # gpg: Signature made Thu 30 Sep 2021 06:42:41 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dg-gitlab/tags/ppc-for-6.2-20210930: (44 commits) MAINTAINERS: Demote sPAPR from "Supported" to "Maintained" MAINTAINERS: Add information for OpenPIC MAINTAINERS: Remove David & Greg as reviewers/co-maintainers of powernv MAINTAINERS: Orphan obscure ppc platforms MAINTAINERS: Remove David & Greg as reviewers for a number of boards MAINTAINERS: Remove machine specific files from ppc TCG CPUs entry spapr/xive: Fix kvm_xive_source_reset trace event spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables() hw/intc: openpic: Clean up the styles hw/intc: openpic: Drop Raven related codes hw/intc: openpic: Correct the reset value of IPIDR for FSL chipset target/ppc: Fix 64-bit decrementer target/ppc: Convert debug to trace events (decrementer and IRQ) spapr_numa.c: handle auto NUMA node with no distance info spapr_numa.c: FORM2 NUMA affinity support spapr: move FORM1 verifications to post CAS spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array spapr_numa.c: parametrize FORM1 macros spapr_numa.c: scrap 'legacy_numa' concept spapr_numa.c: split FORM1 code into helpers ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 16:13:04 +01:00
Peter Xu	142518bda5	memory: Name all the memory listeners Provide a name field for all the memory listeners. It can be used to identify which memory listener is which. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210817013553.30584-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 15:30:24 +02:00
Yang Zhong	0205c4fa1e	target/i386: Add the query-sgx-capabilities QMP command Libvirt can use query-sgx-capabilities to get the host sgx capabilities to decide how to allocate SGX EPC size to VM. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210910102258.46648-3-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 15:30:24 +02:00
Yang Zhong	57d874c4c7	target/i386: Add HMP and QMP interfaces for SGX The QMP and HMP interfaces can be used by monitor or QMP tools to retrieve the SGX information from VM side when SGX is enabled on Intel platform. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210910102258.46648-2-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 15:30:24 +02:00
Sean Christopherson	0cf4ce00d2	hw/i386/pc: Account for SGX EPC sections when calculating device memory Add helpers to detect if SGX EPC exists above 4g, and if so, where SGX EPC above 4g ends. Use the helpers to adjust the device memory range if SGX EPC exists above 4g. For multiple virtual EPC sections, we just put them together physically contiguous for the simplicity because we don't support EPC NUMA affinity now. Once the SGX EPC NUMA support in the kernel SGX driver, we will support this in the future. Note that SGX EPC is currently hardcoded to reside above 4g. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-18-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 15:30:24 +02:00
Sean Christopherson	1dec2e1f19	i386: Update SGX CPUID info according to hardware/KVM/user input Expose SGX to the guest if and only if KVM is enabled and supports virtualization of SGX. While the majority of ENCLS can be emulated to some degree, because SGX uses a hardware-based root of trust, the attestation aspects of SGX cannot be emulated in software, i.e. ultimately emulation will fail as software cannot generate a valid quote/report. The complexity of partially emulating SGX in Qemu far outweighs the value added, e.g. an SGX specific simulator for userspace applications can emulate SGX for development and testing purposes. Note, access to the PROVISIONKEY is not yet advertised to the guest as KVM blocks access to the PROVISIONKEY by default and requires userspace to provide additional credentials (via ioctl()) to expose PROVISIONKEY. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-13-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 14:50:20 +02:00
Sean Christopherson	dfce81f1b9	vl: Add sgx compound properties to expose SGX EPC sections to guest Because SGX EPC is enumerated through CPUID, EPC "devices" need to be realized prior to realizing the vCPUs themselves, i.e. long before generic devices are parsed and realized. From a virtualization perspective, the CPUID aspect also means that EPC sections cannot be hotplugged without paravirtualizing the guest kernel (hardware does not support hotplugging as EPC sections must be locked down during pre-boot to provide EPC's security properties). So even though EPC sections could be realized through the generic -devices command, they need to be created much earlier for them to actually be usable by the guest. Place all EPC sections in a contiguous block, somewhat arbitrarily starting after RAM above 4g. Ensuring EPC is in a contiguous region simplifies calculations, e.g. device memory base, PCI hole, etc..., allows dynamic calculation of the total EPC size, e.g. exposing EPC to guests does not require -maxmem, and last but not least allows all of EPC to be enumerated in a single ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8. The new compound properties command for sgx like below: ...... -object memory-backend-epc,id=mem1,size=28M,prealloc=on \ -object memory-backend-epc,id=mem2,size=10M \ -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2 Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-6-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 14:50:20 +02:00
Sean Christopherson	80509c5557	i386: Add 'sgx-epc' device to expose EPC sections to guest SGX EPC is enumerated through CPUID, i.e. EPC "devices" need to be realized prior to realizing the vCPUs themselves, which occurs long before generic devices are parsed and realized. Because of this, do not allow 'sgx-epc' devices to be instantiated after vCPUS have been created. The 'sgx-epc' device is essentially a placholder at this time, it will be fully implemented in a future patch along with a dedicated command to create 'sgx-epc' devices. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-5-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 14:50:20 +02:00
Sean Christopherson	c6c0232000	hostmem: Add hostmem-epc as a backend for SGX EPC EPC (Enclave Page Cahe) is a specialized type of memory used by Intel SGX (Software Guard Extensions). The SDM desribes EPC as: The Enclave Page Cache (EPC) is the secure storage used to store enclave pages when they are a part of an executing enclave. For an EPC page, hardware performs additional access control checks to restrict access to the page. After the current page access checks and translations are performed, the hardware checks that the EPC page is accessible to the program currently executing. Generally an EPC page is only accessed by the owner of the executing enclave or an instruction which is setting up an EPC page. Because of its unique requirements, Linux manages EPC separately from normal memory. Similar to memfd, the device /dev/sgx_vepc can be opened to obtain a file descriptor which can in turn be used to mmap() EPC memory. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20210719112136.57018-3-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 14:50:19 +02:00
Sean Christopherson	56918a126a	memory: Add RAM_PROTECTED flag to skip IOMMU mappings Add a new RAMBlock flag to denote "protected" memory, i.e. memory that looks and acts like RAM but is inaccessible via normal mechanisms, including DMA. Use the flag to skip protected memory regions when mapping RAM for DMA in VFIO. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-30 14:50:19 +02:00
Peter Maydell	82c74ac42e	ide: Rename ide_bus_new() to ide_bus_init() The function ide_bus_new() does an in-place initialization. Rename it to ide_bus_init() to follow our _init vs _new convention. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Corey Minyard <cminyard@mvista.com> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: John Snow <jsnow@redhat.com> (Feel free to merge.) Message-id: 20210923121153.23754-7-peter.maydell@linaro.org	2021-09-30 13:44:13 +01:00
Peter Maydell	9388d1701e	qbus: Rename qbus_create() to qbus_new() Rename the "allocate and return" qbus creation function to qbus_new(), to bring it into line with our _init vs _new convention. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Corey Minyard <cminyard@mvista.com> Message-id: 20210923121153.23754-6-peter.maydell@linaro.org	2021-09-30 13:44:08 +01:00
Peter Maydell	d637e1dc6d	qbus: Rename qbus_create_inplace() to qbus_init() Rename qbus_create_inplace() to qbus_init(); this is more in line with our usual naming convention for functions that in-place initialize objects. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 20210923121153.23754-5-peter.maydell@linaro.org	2021-09-30 13:42:10 +01:00
Peter Maydell	8d4cdf01f8	pci: Rename pci_root_bus_new_inplace() to pci_root_bus_init() Rename the pci_root_bus_new_inplace() function to pci_root_bus_init(); this brings the bus type in to line with a "_init for in-place init, _new for allocate-and-return" convention. To do this we need to rename the implementation-internal function that was using the pci_root_bus_init() name to pci_root_bus_internal_init(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 20210923121153.23754-4-peter.maydell@linaro.org	2021-09-30 13:42:10 +01:00
Peter Maydell	43417c0c27	ipack: Rename ipack_bus_new_inplace() to ipack_bus_init() Rename ipack_bus_new_inplace() to ipack_bus_init(), to bring it in to line with a "_init for in-place init, _new for allocate-and-return" convention. Drop the 'name' argument, because the only caller does not pass in a name. If a future caller does need to specify the bus name, we should create an ipack_bus_init_named() function at that point. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 20210923121153.23754-3-peter.maydell@linaro.org	2021-09-30 13:42:10 +01:00
Peter Maydell	739e95f574	scsi: Replace scsi_bus_new() with scsi_bus_init(), scsi_bus_init_named() The function scsi_bus_new() creates a new SCSI bus; callers can either pass in a name argument to specify the name of the new bus, or they can pass in NULL to allow the bus to be given an automatically generated unique name. Almost all callers want to use the autogenerated name; the only exception is the virtio-scsi device. Taking a name argument that should almost always be NULL is an easy-to-misuse API design -- it encourages callers to think perhaps they should pass in some standard name like "scsi" or "scsi-bus". We don't do this anywhere for SCSI, but we do (incorrectly) do it for other bus types such as i2c. The function name also implies that it will return a newly allocated object, when it in fact does in-place allocation. We more commonly name such functions foo_init(), with foo_new() being the allocate-and-return variant. Replace all the scsi_bus_new() callsites with either: * scsi_bus_init() for the usual case where the caller wants an autogenerated bus name * scsi_bus_init_named() for the rare case where the caller needs to specify the bus name and document that for the _named() version it's then the caller's responsibility to think about uniqueness of bus names. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20210923121153.23754-2-peter.maydell@linaro.org	2021-09-30 13:42:10 +01:00
Tong Ho	db1264df32	hw/arm: xlnx-zcu102: Add Xilinx eFUSE device Connect the support for ZynqMP eFUSE one-time field-programmable bit array. The command argument: -drive if=pflash,index=3,... Can be used to optionally connect the bit array to a backend storage, such that field-programmed values in one invocation can be made available to next invocation. The backend storage must be a seekable binary file, and its size must be 768 bytes or larger. A file with all binary 0's is a 'blank'. Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-9-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 13:42:10 +01:00
Tong Ho	7e47e15c8b	hw/arm: xlnx-zcu102: Add Xilinx BBRAM device Connect the support for Xilinx ZynqMP Battery-Backed RAM (BBRAM) The command argument: -drive if=pflash,index=2,... Can be used to optionally connect the bbram to a backend storage, such that field-programmed values in one invocation can be made available to next invocation. The backend storage must be a seekable binary file, and its size must be 36 bytes or larger. A file with all binary 0's is a 'blank'. Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-8-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 13:42:10 +01:00
Tong Ho	5f4910ff12	hw/arm: xlnx-versal-virt: Add Xilinx eFUSE device Connect the support for Versal eFUSE one-time field-programmable bit array. The command argument: -drive if=pflash,index=1,... Can be used to optionally connect the bit array to a backend storage, such that field-programmed values in one invocation can be made available to next invocation. The backend storage must be a seekable binary file, and its size must be 3072 bytes or larger. A file with all binary 0's is a 'blank'. Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-7-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 13:42:10 +01:00
Tong Ho	393185bc9d	hw/arm: xlnx-versal-virt: Add Xilinx BBRAM device Connect the support for Versal Battery-Backed RAM (BBRAM) The command argument: -drive if=pflash,index=0,... Can be used to optionally connect the bbram to a backend storage, such that field-programmed values in one invocation can be made available to next invocation. The backend storage must be a seekable binary file, and its size must be 36 bytes or larger. A file with all binary 0's is a 'blank'. Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-6-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 13:42:10 +01:00
Tong Ho	461a6a6f19	hw/nvram: Introduce Xilinx battery-backed ram This device is present in Versal and ZynqMP product families to store a 256-bit encryption key. Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-5-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 13:42:10 +01:00
Tong Ho	67fa02f89f	hw/nvram: Introduce Xilinx ZynqMP eFuse device This implements the Xilinx ZynqMP eFuse, an one-time field-programmable non-volatile storage device. There is only one such device in the Xilinx ZynqMP product family. Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-4-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 13:42:10 +01:00
Tong Ho	9e4aa1fafe	hw/nvram: Introduce Xilinx Versal eFuse device This implements the Xilinx Versal eFuse, an one-time field-programmable non-volatile storage device. There is only one such device in the Xilinx Versal product family. This device has two separate mmio interfaces, a controller and a flatten readback. The controller provides interfaces for field-programming, configuration, control, and status. The flatten readback is a cache to provide a byte-accessible read-only interface to efficiently read efuse array. Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-3-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 13:42:10 +01:00
Tong Ho	68fbcc344e	hw/nvram: Introduce Xilinx eFuse QOM This introduces the QOM for Xilinx eFuse, an one-time field-programmable storage bit array. The actual mmio interface to the array varies by device families and will be provided in different change-sets. Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com> Signed-off-by: Tong Ho <tong.ho@xilinx.com> Message-id: 20210917052400.1249094-2-tong.ho@xilinx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 13:42:09 +01:00
Peter Maydell	98850d84f7	nbd patches for 2021-09-27 - Vladimir Sementsov-Ogievskiy: Rework coroutines of qemu NBD client to improve reconnect support - Eric Blake: Relax server in regards to NBD_OPT_LIST_META_CONTEXT - Vladimir Sementsov-Ogievskiy: Plumb up 64-bit bulk-zeroing support in block layer, in preparation for future NBD spec extensions - Nir Soffer: Default to writeback cache in qemu-nbd -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmFU1a4ACgkQp6FrSiUn Q2obEggAq4KgnBILBip6TsFc9p7yzVpK9qWzri0vhJHOI7p+dE5Bxqt8sLBEdMQP OiV5WE/ZRHogGULXXJChBmUTK2/z0U57AXVvbCpKWgr48kSj818dk8uhuUjBeWaM 5Qc8PsH+/Rij5WRlVmTePu7QMwXp1h3+gkw3fmLlHRkvzO4MmOBn1lrOqnCxSGbo xnFvfdeplNiexmpImdO6QSaHfDsmnUqOFpBPZUsKXfdHOiRqg2I3eU1ibv8eQH7B oTtBxWI0KHc6kWaJUWqZbEe4ChJTHvAsVdZxmB+il1diIl46lq+s/Zn7nT+0HXTD pS/Fws9rrGKS/7PnGeFOfskVVCRnIw== =VGqR -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-09-27-v2' into staging nbd patches for 2021-09-27 - Vladimir Sementsov-Ogievskiy: Rework coroutines of qemu NBD client to improve reconnect support - Eric Blake: Relax server in regards to NBD_OPT_LIST_META_CONTEXT - Vladimir Sementsov-Ogievskiy: Plumb up 64-bit bulk-zeroing support in block layer, in preparation for future NBD spec extensions - Nir Soffer: Default to writeback cache in qemu-nbd # gpg: Signature made Wed 29 Sep 2021 22:07:58 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2021-09-27-v2: block/nbd: check that received handle is valid block/nbd: drop connection_co block/nbd: refactor nbd_recv_coroutines_wake_all() block/nbd: move nbd_recv_coroutines_wake_all() up block/nbd: nbd_channel_error() shutdown channel unconditionally nbd/client-connection: nbd_co_establish_connection(): fix non set errp nbd/server: Allow LIST_META_CONTEXT without STRUCTURED_REPLY block/io: allow 64bit discard requests block: use int64_t instead of int in driver discard handlers block: make BlockLimits::max_pdiscard 64bit block/io: allow 64bit write-zeroes requests block: use int64_t instead of int in driver write_zeroes handlers block: make BlockLimits::max_pwrite_zeroes 64bit block: use int64_t instead of uint64_t in copy_range driver handlers block: use int64_t instead of uint64_t in driver write handlers block: use int64_t instead of uint64_t in driver read handlers qcow2: check request on vmstate save/load path block/io: bring request check to bdrv_co_(read,write)v_vmstate qemu-nbd: Change default cache mode to writeback Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-30 11:28:31 +01:00
Bin Meng	06caae8af0	hw/intc: openpic: Clean up the styles Correct the multi-line comment format. No functional changes. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <20210918032653.646370-3-bin.meng@windriver.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Bin Meng	86229b68a2	hw/intc: openpic: Drop Raven related codes There is no machine that uses Motorola MCP750 (aka Raven) model. Drop the related codes. While we are here, drop the mentioning of Intel GW80314 I/O companion chip in the comments as it has been obsolete for years, and correct a typo too. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <20210918032653.646370-2-bin.meng@windriver.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza	e0eb84d4f5	spapr_numa.c: FORM2 NUMA affinity support The main feature of FORM2 affinity support is the separation of NUMA distances from ibm,associativity information. This allows for a more flexible and straightforward NUMA distance assignment without relying on complex associations between several levels of NUMA via ibm,associativity matches. Another feature is its extensibility. This base support contains the facilities for NUMA distance assignment, but in the future more facilities will be added for latency, performance, bandwidth and so on. This patch implements the base FORM2 affinity support as follows: - the use of FORM2 associativity is indicated by using bit 2 of byte 5 of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1 or FORM2 affinity. Setting both forms will default to FORM2. We're not advertising FORM2 for pseries-6.1 and older machine versions to prevent guest visible changes in those; - ibm,associativity-reference-points has a new semantic. Instead of being used to calculate distances via NUMA levels, it's now used to indicate the primary domain index in the ibm,associativity domain of each resource. In our case it's set to {0x4}, matching the position where we already place logical_domain_id; - two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table and ibm,numa-distance-table. The index table is used to list all the NUMA logical domains of the platform, in ascending order, and allows for spartial NUMA configurations (although QEMU ATM doesn't support that). ibm,numa-distance-table is an array that contains all the distances from the first NUMA node to all other nodes, then the second NUMA node distances to all other nodes and so on; - get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity() now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest selected FORM2 affinity during CAS. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-7-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza	5dab5abe62	spapr: move FORM1 verifications to post CAS FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less) NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM device that has a different latency than the original NUMA node from the regular memory. FORM2 is also able to deal with asymmetric NUMA distances gracefully, something that our FORM1 implementation doesn't do. Move these FORM1 verifications to a new function and wait until after CAS, when we're sure that we're sticking with FORM1, to enforce them. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-6-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza	a165ac67c3	spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array Introducing a new NUMA affinity, FORM2, requires a new mechanism to switch between affinity modes after CAS. Also, we want FORM2 data structures and functions to be completely separated from the existing FORM1 code, allowing us to avoid adding new code that inherits the existing complexity of FORM1. The idea of switching values used by the write_dt() functions in spapr_numa.c was already introduced in the previous patch, and the same approach will be used when dealing with the FORM1 and FORM2 arrays. We can accomplish that by that by renaming the existing numa_assoc_array to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity data. A new helper get_associativity() is then introduced to be used by the write_dt() functions to retrieve the current ibm,associativity array of a given node, after considering affinity selection that might have been done during CAS. All code that was using numa_assoc_array now needs to retrieve the array by calling this function. This will allow for an easier plug of FORM2 data later on. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza	3a6e4ce684	spapr_numa.c: parametrize FORM1 macros The next preliminary step to introduce NUMA FORM2 affinity is to make the existing code independent of FORM1 macros and values, i.e. MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch accomplishes that by doing the following: - move the NUMA related macros from spapr.h to spapr_numa.c where they are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used to refer to the maximum number of NUMA nodes, including GPU nodes, that the machine can support; - MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific macros are used in FORM1 init functions; - code that uses MAX_DISTANCE_REF_POINTS now retrieves the max_dist_ref_points value using get_max_dist_ref_points(). NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE is replaced by get_vcpu_assoc_size(). These functions are used by the generic device tree functions and h_home_node_associativity() and will allow them to switch between FORM1 and FORM2 without changing their core logic. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Vladimir Sementsov-Ogievskiy	0c8022876f	block: use int64_t instead of int in driver discard handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver discard handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_pdiscard in block/io.c. It is already prepared to work with 64bit requests, but pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver. Let's look at all updated functions: blkdebug: all calculations are still OK, thanks to bdrv_check_qiov_request(). both rule_check and bdrv_co_pdiscard are 64bit blklogwrites: pass to blk_loc_writes_co_log which is 64bit blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK copy-before-write: pass to bdrv_co_pdiscard which is 64bit and to cbw_do_copy_before_write which is 64bit file-posix: one handler calls raw_account_discard() is 64bit and both handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass to RawPosixAIOData::aio_nbytes, which is 64bit (and calls raw_account_discard()) gluster: somehow, third argument of glfs_discard_async is size_t. Let's set max_pdiscard accordingly. iscsi: iscsi_allocmap_set_invalid is 64bit, !is_byte_request_lun_aligned is 64bit. list.num is uint32_t. Let's clarify max_pdiscard and pdiscard_alignment. mirror_top: pass to bdrv_mirror_top_do_write() which is 64bit nbd: protocol limitation. max_pdiscard is alredy set strict enough, keep it as is for now. nvme: buf.nlb is uint32_t and we do shift. So, add corresponding limits to nvme_refresh_limits(). preallocate: pass to bdrv_co_pdiscard() which is 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(), qcow2_cluster_discard() is 64bit. raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too. throttle: pass to bdrv_co_pdiscard() which is 64bit and to throttle_group_co_io_limits_intercept() which is 64bit as well. test-block-iothread: bytes argument is unused Great! Now all drivers are prepared to handle 64bit discard requests, or else have explicit max_pdiscard limits. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-11-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	39af49c0d7	block: make BlockLimits::max_pdiscard 64bit We are going to support 64 bit discard requests. Now update the limit variable. It's absolutely safe. The variable is set in some drivers, and used in bdrv_co_pdiscard(). Update also max_pdiscard variable in bdrv_co_pdiscard(), so that bdrv_co_pdiscard() is now prepared for 64bit requests. The remaining logic including num, offset and bytes variables is already supporting 64bit requests. So the only thing that prevents 64 bit requests is limiting max_pdiscard variable to INT_MAX in bdrv_co_pdiscard(). We'll drop this limitation after updating all block drivers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	f34b2bcf8c	block: use int64_t instead of int in driver write_zeroes handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write_zeroes handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_do_pwrite_zeroes(). bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s max_write_zeroes is limited to INT_MAX. So, updated functions all are safe, they will not get "bytes" larger than before. Still, let's look through all updated functions, and add assertions to the ones which are actually unprepared to values larger than INT_MAX. For these drivers also set explicit max_pwrite_zeroes limit. Let's go: blkdebug: calculations can't overflow, thanks to bdrv_check_qiov_request() in generic layer. rule_check() and bdrv_co_pwrite_zeroes() both have 64bit argument. blklogwrites: pass to blk_log_writes_co_log() with 64bit argument. blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which is OK copy-before-write: Calls cbw_do_copy_before_write() and bdrv_co_pwrite_zeroes, both have 64bit argument. file-posix: both handler calls raw_do_pwrite_zeroes, which is updated. In raw_do_pwrite_zeroes() calculations are OK due to bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes which is uint64_t. Check also where that uint64_t gets handed: handle_aiocb_write_zeroes_block() passes a uint64_t[2] to ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate() which takes off_t (and we compile to always have 64-bit off_t), as does handle_aiocb_write_zeroes_unmap. All look safe. gluster: bytes go to GlusterAIOCB::size which is int64_t and to glfs_zerofill_async works with off_t. iscsi: Aha, here we deal with iscsi_writesame16_task() that has uint32_t num_blocks argument and iscsi_writesame16_task() has uint16_t argument. Make comments, add assertions and clarify max_pwrite_zeroes calculation. iscsi_allocmap_() functions already has int64_t argument is_byte_request_lun_aligned is simple to update, do it. mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t argument nbd: Aha, here we have protocol limitation, and NBDRequest::len is uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are OK for now. nvme: Again, protocol limitation. And no inherent limit for write-zeroes at all. But from code that calculates cdw12 it's obvious that we do have limit and alignment. Let's clarify it. Also, obviously the code is not prepared to handle bytes=0. Let's handle this case too. trace events already 64bit preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: offset + bytes and alignment still works good (thanks to bdrv_check_qiov_request()), so tail calculation is OK qcow2_subcluster_zeroize() has 64bit argument, should be OK trace events updated qed: qed_co_request wants int nb_sectors. Also in code we have size_t used for request length which may be 32bit. So, let's just keep INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and don't care. raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both 64bit. throttle: Both throttle_group_co_io_limits_intercept() and bdrv_co_pwrite_zeroes() are 64bit. vmdk: pass to vmdk_pwritev which is 64bit quorum: pass to quorum_co_pwritev() which is 64bit Hooray! At this point all block drivers are prepared to support 64bit write-zero requests, or have explicitly set max_pwrite_zeroes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: use <= rather than < in assertions relying on max_pwrite_zeroes] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	d544f5d3b1	block: make BlockLimits::max_pwrite_zeroes 64bit We are going to support 64 bit write-zeroes requests. Now update the limit variable. It's absolutely safe. The variable is set in some drivers, and used in bdrv_co_do_pwrite_zeroes(). Update also max_write_zeroes variable in bdrv_co_do_pwrite_zeroes(), so that bdrv_co_do_pwrite_zeroes() is now prepared to 64bit requests. The remaining logic including num, offset and bytes variables is already supporting 64bit requests. So the only thing that prevents 64 bit requests is limiting max_write_zeroes variable to INT_MAX in bdrv_co_do_pwrite_zeroes(). We'll drop this limitation after updating all block drivers. Ah, we also have bdrv_check_request32() in bdrv_co_pwritev_part(). It will be modified to do bdrv_check_request() for write-zeroes path. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-7-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	485350497b	block: use int64_t instead of uint64_t in copy_range driver handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver copy_range handlers parameters which are already 64bit to signed type. Now let's consider all callers. Simple git grep '\->bdrv_co_copy_range' shows the only caller: bdrv_co_copy_range_internal(), which does bdrv_check_request32(), so everything is OK. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_co_copy_range_$from\\|to$\s*=' \| \ awk '{print $4}' \| sed 's/,//' \| sed 's/&//' \| sort \| uniq \| \ while read func; do git grep "$func(" \| \ grep -v "$func(BlockDriverState"; done shows no more callers. So, we are done. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-6-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:32 -05:00
Vladimir Sementsov-Ogievskiy	e75abedab7	block: use int64_t instead of uint64_t in driver write handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_$aio\\|co$_pwritev$_part$\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_$aio\\|co$_pwritev$_part$\?\s*=' \| \ awk '{print $4}' \| sed 's/,//' \| sed 's/&//' \| sort \| uniq \| \ while read func; do git grep "$func(" \| \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Vladimir Sementsov-Ogievskiy	f7ef38dd13	block: use int64_t instead of uint64_t in driver read handlers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver read handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_$aio\\|co$_preadv$_part$\?' shows that's there three callers of driver function: bdrv_driver_preadv() in block/io.c, passes int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_load_vmstate() does bdrv_check_qiov_request(). do_perform_cow_read() has uint64_t argument. And a lot of things in qcow2 driver are uint64_t, so converting it is big job. But we must not work with requests that don't satisfy bdrv_check_qiov_request(), so let's just assert it here. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_$aio\\|co$_preadv$_part$\?\s*=' \| \ awk '{print $4}' \| sed 's/,//' \| sed 's/&//' \| sort \| uniq \| \ while read func; do git grep "$func(" \| \ grep -v "$func(BlockDriverState"; done The only one such caller: QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1); ... ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0); in tests/unit/test-bdrv-drain.c, and it's OK obviously. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix typos] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Vladimir Sementsov-Ogievskiy	558902cc3d	qcow2: check request on vmstate save/load path We modify the request by adding an offset to vmstate. Let's check the modified request. It will help us to safely move .bdrv_co_preadv_part and .bdrv_co_pwritev_part to int64_t type of offset and bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-09-29 13:46:31 -05:00
Cédric Le Goater	92612f1550	ppc/pnv: Rename "id" to "quad-id" in PnvQuad This to avoid possible conflicts with the "id" property of QOM objects. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-9-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-29 19:37:38 +10:00
Cédric Le Goater	daf115cf9a	ppc/xive: Export xive_tctx_word2() helper Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-8-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-29 19:37:38 +10:00
Cédric Le Goater	89d2468d96	ppc/xive: Export priority_to_ipb() helper Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-29 19:37:38 +10:00
Luis Pires	d03bba0bfb	host-utils: introduce uabs64() Introduce uabs64(), a function that returns the absolute value of a 64-bit int as an unsigned value. This avoids the undefined behavior for common abs implementations, where abs of the most negative value is undefined. Signed-off-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20210910112624.72748-4-luis.pires@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-29 19:37:38 +10:00

1 2 3 4 5 ...

12332 Commits