CPUID leaf 12_0_EBX is an Intel-defined feature bits leaf enumerating
the platform's SGX extended capabilities. Currently there is a single
capabilitiy:
- EXINFO: record information about #PFs and #GPs in the enclave's SSA
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-9-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CPUID leaf 12_0_EAX is an Intel-defined feature bits leaf enumerating
the CPU's SGX capabilities, e.g. supported SGX instruction sets.
Currently there are four enumerated capabilities:
- SGX1 instruction set, i.e. "base" SGX
- SGX2 instruction set for dynamic EPC management
- ENCLV instruction set for VMM oversubscription of EPC
- ENCLS-C instruction set for thread safe variants of ENCLS
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-8-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add CPUID defines for SGX and SGX Launch Control (LC), as well as
defines for their associated FEATURE_CONTROL MSR bits. Define the
Launch Enclave Public Key Hash MSRs (LE Hash MSRs), which exist
when SGX LC is present (in CPUID), and are writable when SGX LC is
enabled (in FEATURE_CONTROL).
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-7-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because SGX EPC is enumerated through CPUID, EPC "devices" need to be
realized prior to realizing the vCPUs themselves, i.e. long before
generic devices are parsed and realized. From a virtualization
perspective, the CPUID aspect also means that EPC sections cannot be
hotplugged without paravirtualizing the guest kernel (hardware does
not support hotplugging as EPC sections must be locked down during
pre-boot to provide EPC's security properties).
So even though EPC sections could be realized through the generic
-devices command, they need to be created much earlier for them to
actually be usable by the guest. Place all EPC sections in a
contiguous block, somewhat arbitrarily starting after RAM above 4g.
Ensuring EPC is in a contiguous region simplifies calculations, e.g.
device memory base, PCI hole, etc..., allows dynamic calculation of the
total EPC size, e.g. exposing EPC to guests does not require -maxmem,
and last but not least allows all of EPC to be enumerated in a single
ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8.
The new compound properties command for sgx like below:
......
-object memory-backend-epc,id=mem1,size=28M,prealloc=on \
-object memory-backend-epc,id=mem2,size=10M \
-M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SGX EPC is enumerated through CPUID, i.e. EPC "devices" need to be
realized prior to realizing the vCPUs themselves, which occurs long
before generic devices are parsed and realized. Because of this,
do not allow 'sgx-epc' devices to be instantiated after vCPUS have
been created.
The 'sgx-epc' device is essentially a placholder at this time, it will
be fully implemented in a future patch along with a dedicated command
to create 'sgx-epc' devices.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-5-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the new 'memory-backend-epc' user creatable QOM object in
the ObjectOptions to support SGX since v6.1, or the sgx backend
object cannot bootup.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
EPC (Enclave Page Cahe) is a specialized type of memory used by Intel
SGX (Software Guard Extensions). The SDM desribes EPC as:
The Enclave Page Cache (EPC) is the secure storage used to store
enclave pages when they are a part of an executing enclave. For an
EPC page, hardware performs additional access control checks to
restrict access to the page. After the current page access checks
and translations are performed, the hardware checks that the EPC
page is accessible to the program currently executing. Generally an
EPC page is only accessed by the owner of the executing enclave or
an instruction which is setting up an EPC page.
Because of its unique requirements, Linux manages EPC separately from
normal memory. Similar to memfd, the device /dev/sgx_vepc can be
opened to obtain a file descriptor which can in turn be used to mmap()
EPC memory.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-3-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add new CONFIG_SGX for sgx support in the Qemu, and the Kconfig
default enable sgx in the i386 platform.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-32-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new RAMBlock flag to denote "protected" memory, i.e. memory that
looks and acts like RAM but is inaccessible via normal mechanisms,
including DMA. Use the flag to skip protected memory regions when
mapping RAM for DMA in VFIO.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Linux spi-imx driver does not work on QEMU. The reason is that the
state of m25p80 loops in STATE_READING_DATA state after receiving
RDSR command, the new command is ignored. Before sending a new command,
CS line should be pulled high to make the state of m25p80 back to IDLE.
Currently the SPI flash CS line is connected to the SPI controller, but
on the real board, it's connected to GPIO3_19. This matches the ecspi1
device node in the board dts.
ecspi1 node in imx6qdl-sabrelite.dtsi:
&ecspi1 {
cs-gpios = <&gpio3 19 GPIO_ACTIVE_LOW>;
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_ecspi1>;
status = "okay";
flash: m25p80@0 {
compatible = "sst,sst25vf016b", "jedec,spi-nor";
spi-max-frequency = <20000000>;
reg = <0>;
};
};
Should connect the SSI_GPIO_CS to GPIO3_19 when adding a spi-nor to
spi1 on sabrelite machine.
Verified this patch on Linux v5.14.
Logs:
# echo "01234567899876543210" > test
# mtd_debug erase /dev/mtd0 0x0 0x1000
Erased 4096 bytes from address 0x00000000 in flash
# mtd_debug write /dev/mtdblock0 0x0 20 test
Copied 20 bytes from test to address 0x00000000 in flash
# mtd_debug read /dev/mtdblock0 0x0 20 test_out
Copied 20 bytes from address 0x00000000 in flash to test_out
# cat test_out
01234567899876543210#
Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210927142825.491-1-xchengl.cn@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The function ide_bus_new() does an in-place initialization. Rename
it to ide_bus_init() to follow our _init vs _new convention.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: John Snow <jsnow@redhat.com> (Feel free to merge.)
Message-id: 20210923121153.23754-7-peter.maydell@linaro.org
Rename the "allocate and return" qbus creation function to
qbus_new(), to bring it into line with our _init vs _new convention.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20210923121153.23754-6-peter.maydell@linaro.org
Rename qbus_create_inplace() to qbus_init(); this is more in line
with our usual naming convention for functions that in-place
initialize objects.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-5-peter.maydell@linaro.org
Rename the pci_root_bus_new_inplace() function to
pci_root_bus_init(); this brings the bus type in to line with a
"_init for in-place init, _new for allocate-and-return" convention.
To do this we need to rename the implementation-internal function
that was using the pci_root_bus_init() name to
pci_root_bus_internal_init().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-4-peter.maydell@linaro.org
Rename ipack_bus_new_inplace() to ipack_bus_init(), to bring it in to
line with a "_init for in-place init, _new for allocate-and-return"
convention. Drop the 'name' argument, because the only caller does
not pass in a name. If a future caller does need to specify the bus
name, we should create an ipack_bus_init_named() function at that
point.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-3-peter.maydell@linaro.org
The function scsi_bus_new() creates a new SCSI bus; callers can
either pass in a name argument to specify the name of the new bus, or
they can pass in NULL to allow the bus to be given an automatically
generated unique name. Almost all callers want to use the
autogenerated name; the only exception is the virtio-scsi device.
Taking a name argument that should almost always be NULL is an
easy-to-misuse API design -- it encourages callers to think perhaps
they should pass in some standard name like "scsi" or "scsi-bus". We
don't do this anywhere for SCSI, but we do (incorrectly) do it for
other bus types such as i2c.
The function name also implies that it will return a newly allocated
object, when it in fact does in-place allocation. We more commonly
name such functions foo_init(), with foo_new() being the
allocate-and-return variant.
Replace all the scsi_bus_new() callsites with either:
* scsi_bus_init() for the usual case where the caller wants
an autogenerated bus name
* scsi_bus_init_named() for the rare case where the caller
needs to specify the bus name
and document that for the _named() version it's then the caller's
responsibility to think about uniqueness of bus names.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210923121153.23754-2-peter.maydell@linaro.org
Currently we send VFP XML which includes D0..D15 or D0..D31, plus
FPSID, FPSCR and FPEXC. The upstream GDB tolerates this, but its
definition of this XML feature does not include FPSID or FPEXC. In
particular, for M-profile cores there are no FPSID or FPEXC
registers, so advertising those is wrong.
Move FPSID and FPEXC into their own bit of XML which we only send for
A and R profile cores. This brings our definition of the XML
org.gnu.gdb.arm.vfp feature into line with GDB's own (at least for
non-Neon cores...) and means we don't claim to have FPSID and FPEXC
on M-profile.
(It seems unlikely to me that any gdbstub users really care about
being able to look at FPEXC and FPSID; but we've supplied them to gdb
for a decade and it's not hard to keep doing so.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210921162901.17508-5-peter.maydell@linaro.org
Currently helper.c includes some code which is part of the arm
target's gdbstub support. This code has a better home: in gdbstub.c
and gdbstub64.c. Move it there.
Because aarch64_fpu_gdb_get_reg() and aarch64_fpu_gdb_set_reg() move
into gdbstub64.c, this means that they're now compiled only for
TARGET_AARCH64 rather than always. That is the only case when they
would ever be used, but it does mean that the ifdef in
arm_cpu_register_gdb_regs_for_features() needs to be adjusted to
match.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210921162901.17508-4-peter.maydell@linaro.org
We're going to move this code to a different file; fix the coding
style first so checkpatch doesn't complain. This includes deleting
the spurious 'break' statements after returns in the
vfp_gdb_get_reg() function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210921162901.17508-3-peter.maydell@linaro.org
The aarch64-linux QEMU usermode binaries can never run 32-bit
code, so they do not need to include the GDB XML for it.
(arm_cpu_register_gdb_regs_for_features() will not use these
XML files if the CPU has ARM_FEATURE_AARCH64, so we will not
advertise to gdb that we have them.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210921162901.17508-2-peter.maydell@linaro.org
Add BBRAM and eFUSE usage to the Xilinx Versal Virt board
document.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-10-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for ZynqMP eFUSE one-time field-programmable
bit array.
The command argument:
-drive if=pflash,index=3,...
Can be used to optionally connect the bit array to a
backend storage, such that field-programmed values
in one invocation can be made available to next
invocation.
The backend storage must be a seekable binary file, and
its size must be 768 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-9-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for Xilinx ZynqMP Battery-Backed RAM (BBRAM)
The command argument:
-drive if=pflash,index=2,...
Can be used to optionally connect the bbram to a backend
storage, such that field-programmed values in one
invocation can be made available to next invocation.
The backend storage must be a seekable binary file, and
its size must be 36 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-8-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for Versal eFUSE one-time field-programmable
bit array.
The command argument:
-drive if=pflash,index=1,...
Can be used to optionally connect the bit array to a
backend storage, such that field-programmed values
in one invocation can be made available to next
invocation.
The backend storage must be a seekable binary file, and
its size must be 3072 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-7-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for Versal Battery-Backed RAM (BBRAM)
The command argument:
-drive if=pflash,index=0,...
Can be used to optionally connect the bbram to a backend
storage, such that field-programmed values in one
invocation can be made available to next invocation.
The backend storage must be a seekable binary file, and
its size must be 36 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-6-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This device is present in Versal and ZynqMP product
families to store a 256-bit encryption key.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-5-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This implements the Xilinx ZynqMP eFuse, an one-time
field-programmable non-volatile storage device. There is
only one such device in the Xilinx ZynqMP product family.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-4-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This implements the Xilinx Versal eFuse, an one-time
field-programmable non-volatile storage device. There is
only one such device in the Xilinx Versal product family.
This device has two separate mmio interfaces, a controller
and a flatten readback.
The controller provides interfaces for field-programming,
configuration, control, and status.
The flatten readback is a cache to provide a byte-accessible
read-only interface to efficiently read efuse array.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-3-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This introduces the QOM for Xilinx eFuse, an one-time
field-programmable storage bit array.
The actual mmio interface to the array varies by device
families and will be provided in different change-sets.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-2-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The SMCCC 1.3 spec section 5.2 says
The Unknown SMC Function Identifier is a sign-extended value of (-1)
that is returned in the R0, W0 or X0 registers. An implementation must
return this error code when it receives:
* An SMC or HVC call with an unknown Function Identifier
* An SMC or HVC call for a removed Function Identifier
* An SMC64/HVC64 call from AArch32 state
To comply with these statements, let's always return -1 when we encounter
an unknown HVC or SMC call.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Allwinner H3 SoC uses Cortex-A7 cores which support virtualization.
However, today we are configuring QEMU to use HVC as PSCI conduit.
That means HVC calls get trapped into QEMU instead of the guest's own
emulated CPU and thus break the guest's ability to execute virtualization.
Fix this by moving to SMC as conduit, freeing up HYP completely to the VM.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-id: 20210920203931.66527-1-agraf@csgraf.de
Fixes: 740dafc0ba ("hw/arm: add Allwinner H3 System-on-Chip")
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- Vladimir Sementsov-Ogievskiy: Rework coroutines of qemu NBD client
to improve reconnect support
- Eric Blake: Relax server in regards to NBD_OPT_LIST_META_CONTEXT
- Vladimir Sementsov-Ogievskiy: Plumb up 64-bit bulk-zeroing support
in block layer, in preparation for future NBD spec extensions
- Nir Soffer: Default to writeback cache in qemu-nbd
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmFU1a4ACgkQp6FrSiUn
Q2obEggAq4KgnBILBip6TsFc9p7yzVpK9qWzri0vhJHOI7p+dE5Bxqt8sLBEdMQP
OiV5WE/ZRHogGULXXJChBmUTK2/z0U57AXVvbCpKWgr48kSj818dk8uhuUjBeWaM
5Qc8PsH+/Rij5WRlVmTePu7QMwXp1h3+gkw3fmLlHRkvzO4MmOBn1lrOqnCxSGbo
xnFvfdeplNiexmpImdO6QSaHfDsmnUqOFpBPZUsKXfdHOiRqg2I3eU1ibv8eQH7B
oTtBxWI0KHc6kWaJUWqZbEe4ChJTHvAsVdZxmB+il1diIl46lq+s/Zn7nT+0HXTD
pS/Fws9rrGKS/7PnGeFOfskVVCRnIw==
=VGqR
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-09-27-v2' into staging
nbd patches for 2021-09-27
- Vladimir Sementsov-Ogievskiy: Rework coroutines of qemu NBD client
to improve reconnect support
- Eric Blake: Relax server in regards to NBD_OPT_LIST_META_CONTEXT
- Vladimir Sementsov-Ogievskiy: Plumb up 64-bit bulk-zeroing support
in block layer, in preparation for future NBD spec extensions
- Nir Soffer: Default to writeback cache in qemu-nbd
# gpg: Signature made Wed 29 Sep 2021 22:07:58 BST
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2021-09-27-v2:
block/nbd: check that received handle is valid
block/nbd: drop connection_co
block/nbd: refactor nbd_recv_coroutines_wake_all()
block/nbd: move nbd_recv_coroutines_wake_all() up
block/nbd: nbd_channel_error() shutdown channel unconditionally
nbd/client-connection: nbd_co_establish_connection(): fix non set errp
nbd/server: Allow LIST_META_CONTEXT without STRUCTURED_REPLY
block/io: allow 64bit discard requests
block: use int64_t instead of int in driver discard handlers
block: make BlockLimits::max_pdiscard 64bit
block/io: allow 64bit write-zeroes requests
block: use int64_t instead of int in driver write_zeroes handlers
block: make BlockLimits::max_pwrite_zeroes 64bit
block: use int64_t instead of uint64_t in copy_range driver handlers
block: use int64_t instead of uint64_t in driver write handlers
block: use int64_t instead of uint64_t in driver read handlers
qcow2: check request on vmstate save/load path
block/io: bring request check to bdrv_co_(read,write)v_vmstate
qemu-nbd: Change default cache mode to writeback
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
qemu/KVM on Power is no longer my primary job responsibility, nor Greg
Kurz'. I still have some time for upstream maintenance, but it's no longer
accurate to say that I'm paid to do so. Therefore, reduce sPAPR (the
"pseries" machine type) from Supported to Maintained.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The OpenPIC interrupt controller was once the de facto standard on ppc
machines. In qemu it's now only used on some Macintosh and the
Freescale e500 machine. It has no listed maintainer, and as far as I
know, no-one who's really familiar with it any more.
Since I'm moving away from the area, I no longer have capacity to do even
minimal maintenance of it under the auspices of the ppc targets in general.
Therefore, this patch lists the main part of openpic, and marks it as
"Odd Fixes" to be looked after by Mark Cave-Ayland who handles the
Macintosh targets. The openpic_kvm variant is only used on e500, so
add it to the files for that machine type (itself already Orphaned).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
With our interests moving to other areas, Greg and myself no longer have
capacity to be regular reviewers of code for the powernv machine type, let
alone co-maintainers. Additionally, not being IBM employees, we don't have
easy access to the hardware information we'd need for good review.
Therefore, remove our names as reviewers and/or co-maintainers of the
powernv machine type, and the related XIVE interrupt controller.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Cédric Le Goater <clg@kaod.org>
There are a nunber of old embedded ppc machine types which have been little
changed and in "Odd Fixes" state for a long time. With both myself and
Greg Kurz moving toward other areas, we no longer have the capacity to
keep reviewing and maintaining even the rare patches that come in for those
platforms.
Therefore, remove our names as reviewers and mark these platforms as
orphaned.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Greg and I are moving towards other areas and no longer have capacity to
act as regular reviewers for several of the secondary ppc machine types.
So, remove ourselves as reviewers for Macintosh, PReP, sam460ex and
pegasos2 in MAINTAINERS.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Currently the PowerPC TCG CPUs entry in MAINTAINERS lists all of hw/ppc/
and include/hw/ppc. Nearly all the files in those places are related to
specific ppc machine types, rather than to the actual CPUs however. Those
machine types list their own files separately, often overlapping with this.
For greater clarity, remove these misleading entries from the TCG CPUs
stanza, leaving just hw/ppc/ppc.c and hw/ppc/ppc_booke.c which are the only
ones common to a wide range of PPC TCG cpus each.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The trace event was placed in the wrong routine. Move it under
kvmppc_xive_source_reset_one().
Fixes: 4e960974d4 ("xive: Add trace events")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210922070205.1235943-1-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch has a handful of modifications for the recent added
FORM2 support:
- to not allocate more than the necessary size in 'distance_table'.
At this moment the array is oversized due to allocating uint32_t for
all elements, when most of them fits in an uint8_t. Fix it by
changing the array to uint8_t and allocating the exact size;
- use stl_be_p() to store the uint32_t at the start of 'distance_table';
- use sizeof(uint32_t) to skip the uint32_t length when populating the
distances;
- use the NUMA_DISTANCE_MIN macro from sysemu/numa.h to avoid hardcoding
the local distance value.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210922122852.130054-2-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Correct the multi-line comment format. No functional changes.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210918032653.646370-3-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no machine that uses Motorola MCP750 (aka Raven) model.
Drop the related codes.
While we are here, drop the mentioning of Intel GW80314 I/O
companion chip in the comments as it has been obsolete for years,
and correct a typo too.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210918032653.646370-2-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The reset value of IPIDR should be zero for Freescale chipset, per
the following 2 manuals I checked:
- P2020RM (https://www.nxp.com/webapp/Download?colCode=P2020RM)
- P4080RM (https://www.nxp.com/webapp/Download?colCode=P4080RM)
Currently it is set to 1, which leaves the IPI enabled on core 0
after power-on reset. Such may cause unexpected interrupt to be
delivered to core 0 if the IPI is triggered from core 0 to other
cores later.
Fixes: ffd5e9fe02 ("openpic: Reset IRQ source private members")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/584
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210918032653.646370-1-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current way the mask is built can overflow with a 64-bit decrementer.
Use sextract64() to extract the signed values and remove the logic to
handle negative values which has become useless.
Cc: Luis Fernando Fujita Pires <luis.pires@eldorado.org.br>
Fixes: a8dafa5251 ("target/ppc: Implement large decrementer support for TCG")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210920061203.989563-5-clg@kaod.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
for the pSeries machine if none was specified, but without node distance
information for the single node created.
NUMA FORM1 affinity code didn't rely on numa_state information to do its
job, but FORM2 does. As is now, this is the result of a pSeries guest
with NUMA FORM2 affinity when no NUMA nodes is specified:
$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15681 MB
No distance information available.
This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're
enforcing that the local distance (the distance to the node to itself) is
always 10. This allows for the proper creation of the NUMA distance tables,
fixing the output of 'numactl -H' in the guest:
$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15685 MB
node distances:
node 0
0: 10
CC: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-8-danielhb413@gmail.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.
This patch implements the base FORM2 affinity support as follows:
- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;
- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;
- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;
- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity()
now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest
selected FORM2 affinity during CAS.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also able to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.
Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
structures and functions to be completely separated from the existing
FORM1 code, allowing us to avoid adding new code that inherits the
existing complexity of FORM1.
The idea of switching values used by the write_dt() functions in
spapr_numa.c was already introduced in the previous patch, and
the same approach will be used when dealing with the FORM1 and FORM2
arrays.
We can accomplish that by that by renaming the existing numa_assoc_array
to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity
data. A new helper get_associativity() is then introduced to be used by the
write_dt() functions to retrieve the current ibm,associativity array of
a given node, after considering affinity selection that might have been
done during CAS. All code that was using numa_assoc_array now needs to
retrieve the array by calling this function.
This will allow for an easier plug of FORM2 data later on.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch
accomplishes that by doing the following:
- move the NUMA related macros from spapr.h to spapr_numa.c where they
are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used
to refer to the maximum number of NUMA nodes, including GPU nodes, that
the machine can support;
- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to
FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific
macros are used in FORM1 init functions;
- code that uses MAX_DISTANCE_REF_POINTS now retrieves the
max_dist_ref_points value using get_max_dist_ref_points().
NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE
is replaced by get_vcpu_assoc_size(). These functions are used by the
generic device tree functions and h_home_node_associativity() and will
allow them to switch between FORM1 and FORM2 without changing their core
logic.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>