Commit Graph

93722 Commits

Author SHA1 Message Date
Nicholas Piggin 120f738a46 spapr: implement nested-hv capability for the virtual hypervisor
This implements the Nested KVM HV hcall API for spapr under TCG.

The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.

The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.

The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.

The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-10-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin 7cebc5db2e target/ppc: Introduce a vhyp framework for nested HV support
Introduce virtual hypervisor methods that can support a "Nested KVM HV"
implementation using the bare metal 2-level radix MMU, and using HV
exceptions to return from H_ENTER_NESTED (rather than cause interrupts).

HV exceptions can now be raised in the TCG spapr machine when running a
nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.

HV exceptions are intercepted in the exception handler code and instead
of causing interrupts in the guest and switching the machine to HV mode,
they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
interrupt vector numer as return value as required by the hcall API.

Address translation is provided by the 2-level page table walker that is
implemented for the bare metal radix MMU. The partition scope page table
is pointed to the L1's partition scope by the get_pate vhc method.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220216102545.1808018-9-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin 3680e99461 target/ppc: Add powerpc_reset_excp_state helper
This moves the logic to reset the QEMU exception state into its own
function.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-8-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin 4c6cf6b295 target/ppc: add helper for books vhyp hypercall handler
The virtual hypervisor currently always intercepts and handles
hypercalls but with a future change this will not always be the case.

Add a helper for the test so the logic is abstracted from the mechanism.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20220216102545.1808018-7-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin f32d4ab41c target/ppc: make vhyp get_pate method take lpid and return success
In prepartion for implementing a full partition table option for
vhyp, update the get_pate method to take an lpid and return a
success/fail indicator.

The spapr implementation currently just asserts lpid is always 0
and always return success.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-6-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin 4dce0bde30 target/ppc: add vhyp addressing mode helper for radix MMU
The radix on vhyp MMU uses a single-level radix table walk, with the
partition scope mapping provided by the flat QEMU machine memory.

A subsequent change will use the two-level radix walk on vhyp in some
situations, so provide a helper which can abstract that logic.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20220216102545.1808018-5-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin 93aeb70210 ppc: allow the hdecr timer to be created/destroyed
Machines which don't emulate the HDEC facility are able to use the
timer for something else. Provide functions to start and stop the
hdecr timer.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-4-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin 5ff40b0124 spapr: prevent hdec timer being set up under virtual hypervisor
The spapr virtual hypervisor does not require the hdecr timer.
Remove it.

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20220216102545.1808018-3-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin 4ffcef2a88 target/ppc: raise HV interrupts for partition table entry problems
Invalid or missing partition table entry exceptions should cause HV
interrupts. HDSISR is set to bad MMU config, which is consistent with
the ISA and experimentally matches what POWER9 generates.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-2-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Shivaprasad G Bhat 8601b4f11d spapr: nvdimm: Introduce spapr-nvdimm device
If the device backend is not persistent memory for the nvdimm, there is
need for explicit IO flushes on the backend to ensure persistence.

On SPAPR, the issue is addressed by adding a new hcall to request for
an explicit flush from the guest when the backend is not pmem. So, the
approach here is to convey when the hcall flush is required in a device
tree property. The guest once it knows the device backend is not pmem,
makes the hcall whenever flush is required.

To set the device tree property, a new PAPR specific device type inheriting
the nvdimm device is implemented. When the backend doesn't have pmem=on
the device tree property "ibm,hcall-flush-required" is set, and the guest
makes hcall H_SCM_FLUSH requesting for an explicit flush. The new device
has boolean property pmem-override which when "on" advertises the device
tree property even when pmem=on for the backend. The flush function
invokes the fdatasync or pmem_persist() based on the type of backend.

The vmstate structures are made part of the spapr-nvdimm device object.
The patch attempts to keep the migration compatibility between source and
destination while rejecting the incompatibles ones with failures.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <164396256092.109112.17933240273840803354.stgit@ltczzess4.aus.stglabs.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Shivaprasad G Bhat b5513584a0 spapr: nvdimm: Implement H_SCM_FLUSH hcall
The patch adds support for the SCM flush hcall for the nvdimm devices.
To be available for exploitation by guest through the next patch. The
hcall is applicable only for new SPAPR specific device class which is
also introduced in this patch.

The hcall expects the semantics such that the flush to return with
H_LONG_BUSY_ORDER_10_MSEC when the operation is expected to take longer
time along with a continue_token. The hcall to be called again by providing
the continue_token to get the status. So, all fresh requests are put into
a 'pending' list and flush worker is submitted to the thread pool. The
thread pool completion callbacks move the requests to 'completed' list,
which are cleaned up after collecting the return status for the guest
in subsequent hcall from the guest.

The semantics makes it necessary to preserve the continue_tokens and
their return status across migrations. So, the completed flush states
are forwarded to the destination and the pending ones are restarted
at the destination in post_load. The necessary nvdimm flush specific
vmstate structures are also introduced in this patch which are to be
saved in the new SPAPR specific nvdimm device to be introduced in the
following patch.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <164396254862.109112.16675611182159105748.stgit@ltczzess4.aus.stglabs.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Shivaprasad G Bhat 3e35960bf1 nvdimm: Add realize, unrealize callbacks to NVDIMMDevice class
A new subclass inheriting NVDIMMDevice is going to be introduced in
subsequent patches. The new subclass uses the realize and unrealize
callbacks. Add them on NVDIMMClass to appropriately call them as part
of plug-unplug.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Acked-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <164396253158.109112.1926755104259023743.stgit@ltczzess4.aus.stglabs.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:13 +01:00
Greg Kurz 45b04ef48d virtiofsd: Add basic support for FUSE_SYNCFS request
Honor the expected behavior of syncfs() to synchronously flush all data
and metadata to disk on linux systems.

If virtiofsd is started with '-o announce_submounts', the client is
expected to send a FUSE_SYNCFS request for each individual submount.
In this case, we just create a new file descriptor on the submount
inode with lo_inode_open(), call syncfs() on it and close it. The
intermediary file is needed because O_PATH descriptors aren't
backed by an actual file and syncfs() would fail with EBADF.

If virtiofsd is started without '-o announce_submounts' or if the
client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client
only sends a single FUSE_SYNCFS request for the root inode. The
server would thus need to track submounts internally and call
syncfs() on each of them. This will be implemented later.

Note that syncfs() might suffer from a time penalty if the submounts
are being hammered by some unrelated workload on the host. The only
solution to prevent that is to avoid shared mounts.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20220215181529.164070-2-groug@kaod.org>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal 963061dc11 virtiofsd: Add an option to enable/disable security label
Provide an option "-o security_label/no_security_label" to enable/disable
security label functionality. By default these are turned off.

If enabled, server will indicate to client that it is capable of handling
one security label during file creation. Typically this is expected to
be a SELinux label. File server will set this label on the file. It will
try to set it atomically wherever possible. But its not possible in
all the cases.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-11-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal a675c9a600 virtiofsd: Create new file using O_TMPFILE and set security context
If guest and host policies can't work with each other, then guest security
context (selinux label) needs to be set into an xattr. Say remap guest
security.selinux xattr to trusted.virtiofs.security.selinux.

That means setting "fscreate" is not going to help as that's ony useful
for security.selinux xattr on host.

So we need another method which is atomic. Use O_TMPFILE to create new
file, set xattr and then linkat() to proper place.

But this works only for regular files. So dir, symlinks will continue
to be non-atomic.

Also if host filesystem does not support O_TMPFILE, we fallback to
non-atomic behavior.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-10-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal 0c3f81e131 virtiofsd: Create new file with security context
This patch adds support for creating new file with security context
as sent by client. It basically takes three paths.

- If no security context enabled, then it continues to create files without
  security context.

- If security context is enabled and but security.selinux has not been
  remapped, then it uses /proc/thread-self/attr/fscreate knob to set
  security context and then create the file. This will make sure that
  newly created file gets the security context as set in "fscreate" and
  this is atomic w.r.t file creation.

  This is useful and host and guest SELinux policies don't conflict and
  can work with each other. In that case, guest security.selinux xattr
  is not remapped and it is passthrough as "security.selinux" xattr
  on host.

- If security context is enabled but security.selinux xattr has been
  remapped to something else, then it first creates the file and then
  uses setxattr() to set the remapped xattr with the security context.
  This is a non-atomic operation w.r.t file creation.

  This mode will be most versatile and allow host and guest to have their
  own separate SELinux xattrs and have their own separate SELinux policies.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-9-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal cb282e556a virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate
Soon we will be able to create and also set security context on the file
atomically using /proc/self/task/tid/attr/fscreate knob. If this knob
is available on the system, first set the knob with the desired context
and then create the file. It will be created with the context set in
fscreate. This works basically for SELinux and its per thread.

This patch just introduces the helper functions. Subsequent patches will
make use of these helpers.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-8-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Manually merged gettid syscall number fixup from Vivek
2022-02-17 17:22:26 +00:00
Vivek Goyal 81489726ad virtiofsd: Move core file creation code in separate function
Move core file creation bits in a separate function. Soon this is going
to get more complex as file creation need to set security context also.
And there will be multiple modes of file creation in next patch.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-7-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal 36cfab870e virtiofsd, fuse_lowlevel.c: Add capability to parse security context
Add capability to enable and parse security context as sent by client
and put into fuse_req. Filesystems now can get security context from
request and set it on files during creation.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-6-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal 4c7c393c7b virtiofsd: Extend size of fuse_conn_info->capable and ->want fields
->capable keeps track of what capabilities kernel supports and ->wants keep
track of what capabilities filesytem wants.

Right now these fields are 32bit in size. But now fuse has run out of
bits and capabilities can now have bit number which are higher than 31.

That means 32 bit fields are not suffcient anymore. Increase size to 64
bit so that we can add newer capabilities and still be able to use existing
code to check and set the capabilities.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-5-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal 776dc4b165 virtiofsd: Parse extended "struct fuse_init_in"
Add some code to parse extended "struct fuse_init_in". And use a local
variable "flag" to represent 64 bit flags. This will make it easier
to add more features without having to worry about two 32bit flags (->flags
and ->flags2) in "fuse_struct_in".

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-4-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixed up long line
2022-02-17 17:22:19 +00:00
Vivek Goyal ef17dd6a8e linux-headers: Update headers to v5.17-rc1
Update headers to 5.17-rc1. I need latest fuse changes.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-3-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:21:45 +00:00
Vivek Goyal a086d54c6f virtiofsd: Fix breakage due to fuse_init_in size change
Kernel version 5.17 has increased the size of "struct fuse_init_in" struct.
Previously this struct was 16 bytes and now it has been extended to
64 bytes in size.

Once qemu headers are updated to latest, it will expect to receive 64 byte
size struct (for protocol version major 7 and minor > 6). But if guest is
booting older kernel (older than 5.17), then it still sends older
fuse_init_in of size 16 bytes. And do_init() fails. It is expecting
64 byte struct. And this results in mount of virtiofs failing.

Fix this by parsing 16 bytes only for now. Separate patches will be
posted which will parse rest of the bytes and enable new functionality.
Right now we don't support any of the new functionality, so we don't
lose anything by not parsing bytes beyond 16.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-2-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:21:43 +00:00
Vitaly Chikunov e64e27d5cb 9pfs: Fix segfault in do_readdir_many caused by struct dirent overread
`struct dirent' returned from readdir(3) could be shorter (or longer)
than `sizeof(struct dirent)', thus memcpy of sizeof length will overread
into unallocated page causing SIGSEGV. Example stack trace:

 #0  0x00005555559ebeed v9fs_co_readdir_many (/usr/bin/qemu-system-x86_64 + 0x497eed)
 #1  0x00005555559ec2e9 v9fs_readdir (/usr/bin/qemu-system-x86_64 + 0x4982e9)
 #2  0x0000555555eb7983 coroutine_trampoline (/usr/bin/qemu-system-x86_64 + 0x963983)
 #3  0x00007ffff73e0be0 n/a (n/a + 0x0)

While fixing this, provide a helper for any future `struct dirent' cloning.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/841
Cc: qemu-stable@nongnu.org
Co-authored-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Acked-by: Greg Kurz <groug@kaod.org>
Tested-by: Vitaly Chikunov <vt@altlinux.org>
Message-Id: <20220216181821.3481527-1-vt@altlinux.org>
[C.S. - Fix typo in source comment. ]
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
2022-02-17 16:57:58 +01:00
Greg Kurz 494fbbd3ed tests/9pfs: Use g_autofree and g_autoptr where possible
It is recommended to use g_autofree or g_autoptr as it reduces
the odds of introducing memory leaks in future changes.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20220201151508.190035-3-groug@kaod.org>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
2022-02-17 16:57:58 +01:00
Greg Kurz ba6112e40c tests/9pfs: Fix leak of local_test_path
local_test_path is allocated in virtio_9p_create_local_test_dir() to hold the path
of the temporary directory. It should be freed in virtio_9p_remove_local_test_dir()
when the temporary directory is removed. Clarify the lifecycle of local_test_path
while here.

Based-on: <f6602123c6f7d0d593466231b04fba087817abbd.1642879848.git.qemu_oss@crudebyte.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20220201151508.190035-2-groug@kaod.org>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
2022-02-17 16:57:58 +01:00
Christian Schoenebeck 68c66a96c8 tests/9pfs: fix mkdir() being called twice
The 9p test cases use mkdtemp() to create a temporary directory for
running the 'local' 9p tests with real files/dirs. Unlike mktemp()
which only generates a unique file name, mkdtemp() also creates the
directory, therefore the subsequent mkdir() was wrong and caused
errors on some systems.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Fixes: 136b7af2 (tests/9pfs: fix test dir for parallel tests)
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/832
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <Greg Kurz <groug@kaod.org>
Message-Id: <f6602123c6f7d0d593466231b04fba087817abbd.1642879848.git.qemu_oss@crudebyte.com>
2022-02-17 16:57:58 +01:00
Christian Schoenebeck 65ceee0ae5 tests/9pfs: use g_autofree where possible
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <E1mn1fA-0005qZ-TM@lizzy.crudebyte.com>
2022-02-17 16:57:57 +01:00
Sebastian Hasler 41af4459ac virtiofsd: Do not support blocking flock
With the current implementation, blocking flock can lead to
deadlock. Thus, it's better to return EOPNOTSUPP if a user attempts
to perform a blocking flock request.

Signed-off-by: Sebastian Hasler <sebastian.hasler@stuvus.uni-stuttgart.de>
Message-Id: <20220113153249.710216-1-sebastian.hasler@stuvus.uni-stuttgart.de>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2022-02-16 17:29:31 +00:00
Peter Maydell c13b8e9973 Fourth RISC-V PR for QEMU 7.0
* Remove old Ibex PLIC header file
  * Allow writing 8 bytes with generic loader
  * Fixes for RV128
  * Refactor RISC-V CPU configs
  * Initial support for XVentanaCondOps custom extension
  * Fix for vill field in vtype
  * Fix trap cause for RV32 HS-mode CSR access from RV64 HS-mode
  * Support for svnapot, svinval and svpbmt extensions
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE9sSsRtSTSGjTuM6PIeENKd+XcFQFAmIMmLQACgkQIeENKd+X
 cFTDuQf7Ba9LS+bPHShVBbbZSJEeaFbrNOBKbDT2pBVpJ02yj/yfEiwNp1sD1ich
 8Ud6QHUzBVZ1+yYXfZfrMZPD1B+Yq++9hAKLxlFQjS5E6e3WTUQTvkDqoABG03Wu
 4QPqVcoPGVBvnhO4kuMoDidItljTUGEOGG1m/kc3eXsQ/e9A1PgDWsZMYkLDKkxE
 DOCzgyCNKeWB4Wq6IRTUqmXNMl6WjMKO8qouQUGdlROfQ9HFAfELwIBoCRgQ7LCT
 KiOM04ts5Fv5qjhFH/e4+9zxCiS9YX56qJooNHgNFqr1EOzGl1XzMu77KkT8LhC4
 vJKX+9v4VO8u9fybDEOyELRXHgkEdQ==
 =R9lk
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/alistair/tags/pull-riscv-to-apply-20220216' into staging

Fourth RISC-V PR for QEMU 7.0

 * Remove old Ibex PLIC header file
 * Allow writing 8 bytes with generic loader
 * Fixes for RV128
 * Refactor RISC-V CPU configs
 * Initial support for XVentanaCondOps custom extension
 * Fix for vill field in vtype
 * Fix trap cause for RV32 HS-mode CSR access from RV64 HS-mode
 * Support for svnapot, svinval and svpbmt extensions

# gpg: Signature made Wed 16 Feb 2022 06:24:52 GMT
# gpg:                using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [full]
# Primary key fingerprint: F6C4 AC46 D493 4868 D3B8  CE8F 21E1 0D29 DF97 7054

* remotes/alistair/tags/pull-riscv-to-apply-20220216: (35 commits)
  docs/system: riscv: Update description of CPU
  target/riscv: add support for svpbmt extension
  target/riscv: add support for svinval extension
  target/riscv: add support for svnapot extension
  target/riscv: add PTE_A/PTE_D/PTE_U bits check for inner PTE
  target/riscv: Ignore reserved bits in PTE for RV64
  hw/intc: Add RISC-V AIA APLIC device emulation
  target/riscv: Allow users to force enable AIA CSRs in HART
  hw/riscv: virt: Use AIA INTC compatible string when available
  target/riscv: Implement AIA IMSIC interface CSRs
  target/riscv: Implement AIA xiselect and xireg CSRs
  target/riscv: Implement AIA mtopi, stopi, and vstopi CSRs
  target/riscv: Implement AIA interrupt filtering CSRs
  target/riscv: Implement AIA hvictl and hviprioX CSRs
  target/riscv: Implement AIA CSRs for 64 local interrupts on RV32
  target/riscv: Implement AIA local interrupt priorities
  target/riscv: Allow AIA device emulation to set ireg rmw callback
  target/riscv: Add defines for AIA CSRs
  target/riscv: Add AIA cpu feature
  target/riscv: Allow setting CPU feature from machine/device emulation
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-16 09:57:11 +00:00
Yu Li 7035b8420f docs/system: riscv: Update description of CPU
Since the hypervisor extension been non experimental and enabled for
default CPU, the previous command is no longer available and the
option `x-h=true` or `h=true` is also no longer required.

Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <9040401e-8f87-ef4a-d840-6703f08d068c@bytedance.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:25:52 +10:00
Weiwei Li bbce8ba8e6 target/riscv: add support for svpbmt extension
- add PTE_PBMT bits: It uses two PTE bits, but otherwise has no effect on QEMU, since QEMU is sequentially consistent and doesn't model PMAs currently
- add PTE_PBMT bit check for inner PTE

Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220204022658.18097-6-liweiwei@iscas.ac.cn>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:25:52 +10:00
Weiwei Li c5d77ddd8e target/riscv: add support for svinval extension
- sinval.vma, hinval.vvma and hinval.gvma do the same as sfence.vma, hfence.vvma and hfence.gvma except extension check
- do nothing other than extension check for sfence.w.inval and sfence.inval.ir

Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220204022658.18097-5-liweiwei@iscas.ac.cn>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:25:52 +10:00
Weiwei Li 2bacb22446 target/riscv: add support for svnapot extension
- add PTE_N bit
- add PTE_N bit check for inner PTE
- update address translation to support 64KiB continuous region (napot_bits = 4)

Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220204022658.18097-4-liweiwei@iscas.ac.cn>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:25:52 +10:00
Weiwei Li b6ecc63c56 target/riscv: add PTE_A/PTE_D/PTE_U bits check for inner PTE
For non-leaf PTEs, the D, A, and U bits are reserved for future standard use.

Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220204022658.18097-3-liweiwei@iscas.ac.cn>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:25:52 +10:00
Guo Ren 05e6ca5e15 target/riscv: Ignore reserved bits in PTE for RV64
Highest bits of PTE has been used for svpbmt, ref: [1], [2], so we
need to ignore them. They cannot be a part of ppn.

1: The RISC-V Instruction Set Manual, Volume II: Privileged Architecture
   4.4 Sv39: Page-Based 39-bit Virtual-Memory System
   4.5 Sv48: Page-Based 48-bit Virtual-Memory System

2: https://github.com/riscv/virtual-memory/blob/main/specs/663-Svpbmt-diff.pdf

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Cc: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220204022658.18097-2-liweiwei@iscas.ac.cn>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:25:52 +10:00
Anup Patel e8f79343cf hw/intc: Add RISC-V AIA APLIC device emulation
The RISC-V AIA (Advanced Interrupt Architecture) defines a new
interrupt controller for wired interrupts called APLIC (Advanced
Platform Level Interrupt Controller). The APLIC is capabable of
forwarding wired interupts to RISC-V HARTs directly or as MSIs
(Message Signaled Interupts).

This patch adds device emulation for RISC-V AIA APLIC.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-19-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel 91870b510a target/riscv: Allow users to force enable AIA CSRs in HART
We add "x-aia" command-line option for RISC-V HART using which
allows users to force enable CPU AIA CSRs without changing the
interrupt controller available in RISC-V machine.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-18-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel d207863cd3 hw/riscv: virt: Use AIA INTC compatible string when available
We should use the AIA INTC compatible string in the CPU INTC
DT nodes when the CPUs support AIA feature. This will allow
Linux INTC driver to use AIA local interrupt CSRs.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-17-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel ac4b0302b0 target/riscv: Implement AIA IMSIC interface CSRs
The AIA specification defines IMSIC interface CSRs for easy access
to the per-HART IMSIC registers without using indirect xiselect and
xireg CSRs. This patch implements the AIA IMSIC interface CSRs.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-16-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel d1ceff405a target/riscv: Implement AIA xiselect and xireg CSRs
The AIA specification defines [m|s|vs]iselect and [m|s|vs]ireg CSRs
which allow indirect access to interrupt priority arrays and per-HART
IMSIC registers. This patch implements AIA xiselect and xireg CSRs.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-15-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel c7de92b4e8 target/riscv: Implement AIA mtopi, stopi, and vstopi CSRs
The AIA specification introduces new [m|s|vs]topi CSRs for
reporting pending local IRQ number and associated IRQ priority.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-14-anup@brainfault.org
[ Changed by AF:
 - Fixup indentation
]
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel d0237b4df0 target/riscv: Implement AIA interrupt filtering CSRs
The AIA specificaiton adds interrupt filtering support for M-mode
and HS-mode. Using AIA interrupt filtering M-mode and H-mode can
take local interrupt 13 or above and selectively inject same local
interrupt to lower privilege modes.

At the moment, we don't have any local interrupts above 12 so we
add dummy implementation (i.e. read zero and ignore write) of AIA
interrupt filtering CSRs.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-13-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel 2b60239879 target/riscv: Implement AIA hvictl and hviprioX CSRs
The AIA hvictl and hviprioX CSRs allow hypervisor to control
interrupts visible at VS-level. This patch implements AIA hvictl
and hviprioX CSRs.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-12-anup@brainfault.org
[ Changes by AF:
 - Fix possible unintilised variable error in rmw_sie()
]
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel d028ac7512 target/riscv: Implement AIA CSRs for 64 local interrupts on RV32
The AIA specification adds new CSRs for RV32 so that RISC-V hart can
support 64 local interrupts on both RV32 and RV64.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-11-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel 43dc93af36 target/riscv: Implement AIA local interrupt priorities
The AIA spec defines programmable 8-bit priority for each local interrupt
at M-level, S-level and VS-level so we extend local interrupt processing
to consider AIA interrupt priorities. The AIA CSRs which help software
configure local interrupt priorities will be added by subsequent patches.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20220204174700.534953-10-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:19 +10:00
Anup Patel 69077dd687 target/riscv: Allow AIA device emulation to set ireg rmw callback
The AIA device emulation (such as AIA IMSIC) should be able to set
(or provide) AIA ireg read-modify-write callback for each privilege
level of a RISC-V HART.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-9-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:18 +10:00
Anup Patel aa7508bbc6 target/riscv: Add defines for AIA CSRs
The RISC-V AIA specification extends RISC-V local interrupts and
introduces new CSRs. This patch adds defines for the new AIA CSRs.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-8-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:18 +10:00
Anup Patel 32b0ada038 target/riscv: Add AIA cpu feature
We define a CPU feature for AIA CSR support in RISC-V CPUs which
can be set by machine/device emulation. The RISC-V CSR emulation
will also check this feature for emulating AIA CSRs.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-7-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:18 +10:00
Anup Patel f87adf23fa target/riscv: Allow setting CPU feature from machine/device emulation
The machine or device emulation should be able to force set certain
CPU features because:
1) We can have certain CPU features which are in-general optional
   but implemented by RISC-V CPUs on the machine.
2) We can have devices which require a certain CPU feature. For example,
   AIA IMSIC devices expect AIA CSRs implemented by RISC-V CPUs.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-id: 20220204174700.534953-6-anup@brainfault.org
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-02-16 12:24:18 +10:00