s390-bios: Support booting from real dasd device
Allows guest to boot from a vfio configured real dasd device. Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <1554388475-18329-16-git-send-email-jjherne@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
This commit is contained in:
parent
69333c36dc
commit
efa47d36da
@ -1181,6 +1181,7 @@ S: Supported
|
||||
F: hw/s390x/ipl.*
|
||||
F: pc-bios/s390-ccw/
|
||||
F: pc-bios/s390-ccw.img
|
||||
F: docs/devel/s390-dasd-ipl.txt
|
||||
T: git https://github.com/borntraeger/qemu.git s390-next
|
||||
L: qemu-s390x@nongnu.org
|
||||
|
||||
|
133
docs/devel/s390-dasd-ipl.txt
Normal file
133
docs/devel/s390-dasd-ipl.txt
Normal file
@ -0,0 +1,133 @@
|
||||
*****************************
|
||||
***** s390 hardware IPL *****
|
||||
*****************************
|
||||
|
||||
The s390 hardware IPL process consists of the following steps.
|
||||
|
||||
1. A READ IPL ccw is constructed in memory location 0x0.
|
||||
This ccw, by definition, reads the IPL1 record which is located on the disk
|
||||
at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw
|
||||
so when it is complete another ccw will be fetched and executed from memory
|
||||
location 0x08.
|
||||
|
||||
2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00.
|
||||
IPL1 data is 24 bytes in length and consists of the following pieces of
|
||||
information: [psw][read ccw][tic ccw]. When the machine executes the Read
|
||||
IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
|
||||
location 0x0. Then the ccw program at 0x08 which consists of a read
|
||||
ccw and a tic ccw is automatically executed because of the chain flag from
|
||||
the original READ IPL ccw. The read ccw will read the IPL2 data into memory
|
||||
and the TIC (Transfer In Channel) will transfer control to the channel
|
||||
program contained in the IPL2 data. The TIC channel command is the
|
||||
equivalent of a branch/jump/goto instruction for channel programs.
|
||||
NOTE: The ccws in IPL1 are defined by the architecture to be format 0.
|
||||
|
||||
3. Execute IPL2.
|
||||
The TIC ccw instruction at the end of the IPL1 channel program will begin
|
||||
the execution of the IPL2 channel program. IPL2 is stage-2 of the boot
|
||||
process and will contain a larger channel program than IPL1. The point of
|
||||
IPL2 is to find and load either the operating system or a small program that
|
||||
loads the operating system from disk. At the end of this step all or some of
|
||||
the real operating system is loaded into memory and we are ready to hand
|
||||
control over to the guest operating system. At this point the guest
|
||||
operating system is entirely responsible for loading any more data it might
|
||||
need to function. NOTE: The IPL2 channel program might read data into memory
|
||||
location 0 thereby overwriting the IPL1 psw and channel program. This is ok
|
||||
as long as the data placed in location 0 contains a psw whose instruction
|
||||
address points to the guest operating system code to execute at the end of
|
||||
the IPL/boot process.
|
||||
NOTE: The ccws in IPL2 are defined by the architecture to be format 0.
|
||||
|
||||
4. Start executing the guest operating system.
|
||||
The psw that was loaded into memory location 0 as part of the ipl process
|
||||
should contain the needed flags for the operating system we have loaded. The
|
||||
psw's instruction address will point to the location in memory where we want
|
||||
to start executing the operating system. This psw is loaded (via LPSW
|
||||
instruction) causing control to be passed to the operating system code.
|
||||
|
||||
In a non-virtualized environment this process, handled entirely by the hardware,
|
||||
is kicked off by the user initiating a "Load" procedure from the hardware
|
||||
management console. This "Load" procedure crafts a special "Read IPL" ccw in
|
||||
memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking
|
||||
off the reading of IPL1 data. Since the channel program from IPL1 will be
|
||||
written immediately after the special "Read IPL" ccw, the IPL1 channel program
|
||||
will be executed immediately (the special read ccw has the chaining bit turned
|
||||
on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel
|
||||
program to be executed automatically. After this sequence completes the "Load"
|
||||
procedure then loads the psw from 0x0.
|
||||
|
||||
**********************************************************
|
||||
***** How this all pertains to QEMU (and the kernel) *****
|
||||
**********************************************************
|
||||
|
||||
In theory we should merely have to do the following to IPL/boot a guest
|
||||
operating system from a DASD device:
|
||||
|
||||
1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on.
|
||||
2. Execute channel program at 0x0.
|
||||
3. LPSW 0x0.
|
||||
|
||||
However, our emulation of the machine's channel program logic within the kernel
|
||||
is missing one key feature that is required for this process to work:
|
||||
non-prefetch of ccw data.
|
||||
|
||||
When we start a channel program we pass the channel subsystem parameters via an
|
||||
ORB (Operation Request Block). One of those parameters is a prefetch bit. If the
|
||||
bit is on then the vfio-ccw kernel driver is allowed to read the entire channel
|
||||
program from guest memory before it starts executing it. This means that any
|
||||
channel commands that read additional channel commands will not work as expected
|
||||
because the newly read commands will only exist in guest memory and NOT within
|
||||
the kernel's channel subsystem memory. The kernel vfio-ccw driver currently
|
||||
requires this bit to be on for all channel programs. This is a problem because
|
||||
the IPL process consists of transferring control from the "Read IPL" ccw
|
||||
immediately to the IPL1 channel program that was read by "Read IPL".
|
||||
|
||||
Not being able to turn off prefetch will also prevent the TIC at the end of the
|
||||
IPL1 channel program from transferring control to the IPL2 channel program.
|
||||
|
||||
Lastly, in some cases (the zipl bootloader for example) the IPL2 program also
|
||||
transfers control to another channel program segment immediately after reading
|
||||
it from the disk. So we need to be able to handle this case.
|
||||
|
||||
**************************
|
||||
***** What QEMU does *****
|
||||
**************************
|
||||
|
||||
Since we are forced to live with prefetch we cannot use the very simple IPL
|
||||
procedure we defined in the preceding section. So we compensate by doing the
|
||||
following.
|
||||
|
||||
1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit.
|
||||
2. Execute "Read IPL" at 0x0.
|
||||
|
||||
So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08.
|
||||
|
||||
4. Write a custom channel program that will seek to the IPL2 record and then
|
||||
execute the READ and TIC ccws from IPL1. Normally the seek is not required
|
||||
because after reading the IPL1 record the disk is automatically positioned
|
||||
to read the very next record which will be IPL2. But since we are not reading
|
||||
both IPL1 and IPL2 as part of the same channel program we must manually set
|
||||
the position.
|
||||
|
||||
5. Grab the target address of the TIC instruction from the IPL1 channel program.
|
||||
This address is where the IPL2 channel program starts.
|
||||
|
||||
Now IPL2 is loaded into memory somewhere, and we know the address.
|
||||
|
||||
6. Execute the IPL2 channel program at the address obtained in step #5.
|
||||
|
||||
Because this channel program can be dynamic, we must use a special algorithm
|
||||
that detects a READ immediately followed by a TIC and breaks the ccw chain
|
||||
by turning off the chain bit in the READ ccw. When control is returned from
|
||||
the kernel/hardware to the QEMU bios code we immediately issue another start
|
||||
subchannel to execute the remaining TIC instruction. This causes the entire
|
||||
channel program (starting from the TIC) and all needed data to be refetched
|
||||
thereby stepping around the limitation that would otherwise prevent this
|
||||
channel program from executing properly.
|
||||
|
||||
Now the operating system code is loaded somewhere in guest memory and the psw
|
||||
in memory location 0x0 will point to entry code for the guest operating
|
||||
system.
|
||||
|
||||
7. LPSW 0x0.
|
||||
LPSW transfers control to the guest operating system and we're done.
|
@ -10,7 +10,7 @@ $(call set-vpath, $(SRC_PATH)/pc-bios/s390-ccw)
|
||||
.PHONY : all clean build-all
|
||||
|
||||
OBJECTS = start.o main.o bootmap.o jump2ipl.o sclp.o menu.o \
|
||||
virtio.o virtio-scsi.o virtio-blkdev.o libc.o cio.o
|
||||
virtio.o virtio-scsi.o virtio-blkdev.o libc.o cio.o dasd-ipl.o
|
||||
|
||||
QEMU_CFLAGS := $(filter -W%, $(QEMU_CFLAGS))
|
||||
QEMU_CFLAGS += -ffreestanding -fno-delete-null-pointer-checks -msoft-float
|
||||
|
235
pc-bios/s390-ccw/dasd-ipl.c
Normal file
235
pc-bios/s390-ccw/dasd-ipl.c
Normal file
@ -0,0 +1,235 @@
|
||||
/*
|
||||
* S390 IPL (boot) from a real DASD device via vfio framework.
|
||||
*
|
||||
* Copyright (c) 2019 Jason J. Herne <jjherne@us.ibm.com>
|
||||
*
|
||||
* This work is licensed under the terms of the GNU GPL, version 2 or (at
|
||||
* your option) any later version. See the COPYING file in the top-level
|
||||
* directory.
|
||||
*/
|
||||
|
||||
#include "libc.h"
|
||||
#include "s390-ccw.h"
|
||||
#include "s390-arch.h"
|
||||
#include "dasd-ipl.h"
|
||||
#include "helper.h"
|
||||
|
||||
static char prefix_page[PAGE_SIZE * 2]
|
||||
__attribute__((__aligned__(PAGE_SIZE * 2)));
|
||||
|
||||
static void enable_prefixing(void)
|
||||
{
|
||||
memcpy(&prefix_page, lowcore, 4096);
|
||||
set_prefix(ptr2u32(&prefix_page));
|
||||
}
|
||||
|
||||
static void disable_prefixing(void)
|
||||
{
|
||||
set_prefix(0);
|
||||
/* Copy io interrupt info back to low core */
|
||||
memcpy((void *)&lowcore->subchannel_id, prefix_page + 0xB8, 12);
|
||||
}
|
||||
|
||||
static bool is_read_tic_ccw_chain(Ccw0 *ccw)
|
||||
{
|
||||
Ccw0 *next_ccw = ccw + 1;
|
||||
|
||||
return ((ccw->cmd_code == CCW_CMD_DASD_READ ||
|
||||
ccw->cmd_code == CCW_CMD_DASD_READ_MT) &&
|
||||
ccw->chain && next_ccw->cmd_code == CCW_CMD_TIC);
|
||||
}
|
||||
|
||||
static bool dynamic_cp_fixup(uint32_t ccw_addr, uint32_t *next_cpa)
|
||||
{
|
||||
Ccw0 *cur_ccw = (Ccw0 *)(uint64_t)ccw_addr;
|
||||
Ccw0 *tic_ccw;
|
||||
|
||||
while (true) {
|
||||
/* Skip over inline TIC (it might not have the chain bit on) */
|
||||
if (cur_ccw->cmd_code == CCW_CMD_TIC &&
|
||||
cur_ccw->cda == ptr2u32(cur_ccw) - 8) {
|
||||
cur_ccw += 1;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (!cur_ccw->chain) {
|
||||
break;
|
||||
}
|
||||
if (is_read_tic_ccw_chain(cur_ccw)) {
|
||||
/*
|
||||
* Breaking a chain of CCWs may alter the semantics or even the
|
||||
* validity of a channel program. The heuristic implemented below
|
||||
* seems to work well in practice for the channel programs
|
||||
* generated by zipl.
|
||||
*/
|
||||
tic_ccw = cur_ccw + 1;
|
||||
*next_cpa = tic_ccw->cda;
|
||||
cur_ccw->chain = 0;
|
||||
return true;
|
||||
}
|
||||
cur_ccw += 1;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
static int run_dynamic_ccw_program(SubChannelId schid, uint16_t cutype,
|
||||
uint32_t cpa)
|
||||
{
|
||||
bool has_next;
|
||||
uint32_t next_cpa = 0;
|
||||
int rc;
|
||||
|
||||
do {
|
||||
has_next = dynamic_cp_fixup(cpa, &next_cpa);
|
||||
|
||||
print_int("executing ccw chain at ", cpa);
|
||||
enable_prefixing();
|
||||
rc = do_cio(schid, cutype, cpa, CCW_FMT0);
|
||||
disable_prefixing();
|
||||
|
||||
if (rc) {
|
||||
break;
|
||||
}
|
||||
cpa = next_cpa;
|
||||
} while (has_next);
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
static void make_readipl(void)
|
||||
{
|
||||
Ccw0 *ccwIplRead = (Ccw0 *)0x00;
|
||||
|
||||
/* Create Read IPL ccw at address 0 */
|
||||
ccwIplRead->cmd_code = CCW_CMD_READ_IPL;
|
||||
ccwIplRead->cda = 0x00; /* Read into address 0x00 in main memory */
|
||||
ccwIplRead->chain = 0; /* Chain flag */
|
||||
ccwIplRead->count = 0x18; /* Read 0x18 bytes of data */
|
||||
}
|
||||
|
||||
static void run_readipl(SubChannelId schid, uint16_t cutype)
|
||||
{
|
||||
if (do_cio(schid, cutype, 0x00, CCW_FMT0)) {
|
||||
panic("dasd-ipl: Failed to run Read IPL channel program\n");
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* The architecture states that IPL1 data should consist of a psw followed by
|
||||
* format-0 READ and TIC CCWs. Let's sanity check.
|
||||
*/
|
||||
static void check_ipl1(void)
|
||||
{
|
||||
Ccw0 *ccwread = (Ccw0 *)0x08;
|
||||
Ccw0 *ccwtic = (Ccw0 *)0x10;
|
||||
|
||||
if (ccwread->cmd_code != CCW_CMD_DASD_READ ||
|
||||
ccwtic->cmd_code != CCW_CMD_TIC) {
|
||||
panic("dasd-ipl: IPL1 data invalid. Is this disk really bootable?\n");
|
||||
}
|
||||
}
|
||||
|
||||
static void check_ipl2(uint32_t ipl2_addr)
|
||||
{
|
||||
Ccw0 *ccw = u32toptr(ipl2_addr);
|
||||
|
||||
if (ipl2_addr == 0x00) {
|
||||
panic("IPL2 address invalid. Is this disk really bootable?\n");
|
||||
}
|
||||
if (ccw->cmd_code == 0x00) {
|
||||
panic("IPL2 ccw data invalid. Is this disk really bootable?\n");
|
||||
}
|
||||
}
|
||||
|
||||
static uint32_t read_ipl2_addr(void)
|
||||
{
|
||||
Ccw0 *ccwtic = (Ccw0 *)0x10;
|
||||
|
||||
return ccwtic->cda;
|
||||
}
|
||||
|
||||
static void ipl1_fixup(void)
|
||||
{
|
||||
Ccw0 *ccwSeek = (Ccw0 *) 0x08;
|
||||
Ccw0 *ccwSearchID = (Ccw0 *) 0x10;
|
||||
Ccw0 *ccwSearchTic = (Ccw0 *) 0x18;
|
||||
Ccw0 *ccwRead = (Ccw0 *) 0x20;
|
||||
CcwSeekData *seekData = (CcwSeekData *) 0x30;
|
||||
CcwSearchIdData *searchData = (CcwSearchIdData *) 0x38;
|
||||
|
||||
/* move IPL1 CCWs to make room for CCWs needed to locate record 2 */
|
||||
memcpy(ccwRead, (void *)0x08, 16);
|
||||
|
||||
/* Disable chaining so we don't TIC to IPL2 channel program */
|
||||
ccwRead->chain = 0x00;
|
||||
|
||||
ccwSeek->cmd_code = CCW_CMD_DASD_SEEK;
|
||||
ccwSeek->cda = ptr2u32(seekData);
|
||||
ccwSeek->chain = 1;
|
||||
ccwSeek->count = sizeof(*seekData);
|
||||
seekData->reserved = 0x00;
|
||||
seekData->cyl = 0x00;
|
||||
seekData->head = 0x00;
|
||||
|
||||
ccwSearchID->cmd_code = CCW_CMD_DASD_SEARCH_ID_EQ;
|
||||
ccwSearchID->cda = ptr2u32(searchData);
|
||||
ccwSearchID->chain = 1;
|
||||
ccwSearchID->count = sizeof(*searchData);
|
||||
searchData->cyl = 0;
|
||||
searchData->head = 0;
|
||||
searchData->record = 2;
|
||||
|
||||
/* Go back to Search CCW if correct record not yet found */
|
||||
ccwSearchTic->cmd_code = CCW_CMD_TIC;
|
||||
ccwSearchTic->cda = ptr2u32(ccwSearchID);
|
||||
}
|
||||
|
||||
static void run_ipl1(SubChannelId schid, uint16_t cutype)
|
||||
{
|
||||
uint32_t startAddr = 0x08;
|
||||
|
||||
if (do_cio(schid, cutype, startAddr, CCW_FMT0)) {
|
||||
panic("dasd-ipl: Failed to run IPL1 channel program\n");
|
||||
}
|
||||
}
|
||||
|
||||
static void run_ipl2(SubChannelId schid, uint16_t cutype, uint32_t addr)
|
||||
{
|
||||
if (run_dynamic_ccw_program(schid, cutype, addr)) {
|
||||
panic("dasd-ipl: Failed to run IPL2 channel program\n");
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Limitations in vfio-ccw support complicate the IPL process. Details can
|
||||
* be found in docs/devel/s390-dasd-ipl.txt
|
||||
*/
|
||||
void dasd_ipl(SubChannelId schid, uint16_t cutype)
|
||||
{
|
||||
PSWLegacy *pswl = (PSWLegacy *) 0x00;
|
||||
uint32_t ipl2_addr;
|
||||
|
||||
/* Construct Read IPL CCW and run it to read IPL1 from boot disk */
|
||||
make_readipl();
|
||||
run_readipl(schid, cutype);
|
||||
ipl2_addr = read_ipl2_addr();
|
||||
check_ipl1();
|
||||
|
||||
/*
|
||||
* Fixup IPL1 channel program to account for vfio-ccw limitations, then run
|
||||
* it to read IPL2 channel program from boot disk.
|
||||
*/
|
||||
ipl1_fixup();
|
||||
run_ipl1(schid, cutype);
|
||||
check_ipl2(ipl2_addr);
|
||||
|
||||
/*
|
||||
* Run IPL2 channel program to read operating system code from boot disk
|
||||
*/
|
||||
run_ipl2(schid, cutype, ipl2_addr);
|
||||
|
||||
/* Transfer control to the guest operating system */
|
||||
pswl->mask |= PSW_MASK_EAMODE; /* Force z-mode */
|
||||
pswl->addr |= PSW_MASK_BAMODE; /* ... */
|
||||
jump_to_low_kernel();
|
||||
}
|
16
pc-bios/s390-ccw/dasd-ipl.h
Normal file
16
pc-bios/s390-ccw/dasd-ipl.h
Normal file
@ -0,0 +1,16 @@
|
||||
/*
|
||||
* S390 IPL (boot) from a real DASD device via vfio framework.
|
||||
*
|
||||
* Copyright (c) 2019 Jason J. Herne <jjherne@us.ibm.com>
|
||||
*
|
||||
* This work is licensed under the terms of the GNU GPL, version 2 or (at
|
||||
* your option) any later version. See the COPYING file in the top-level
|
||||
* directory.
|
||||
*/
|
||||
|
||||
#ifndef DASD_IPL_H
|
||||
#define DASD_IPL_H
|
||||
|
||||
void dasd_ipl(SubChannelId schid, uint16_t cutype);
|
||||
|
||||
#endif /* DASD_IPL_H */
|
@ -13,6 +13,7 @@
|
||||
#include "s390-ccw.h"
|
||||
#include "cio.h"
|
||||
#include "virtio.h"
|
||||
#include "dasd-ipl.h"
|
||||
|
||||
char stack[PAGE_SIZE * 8] __attribute__((__aligned__(PAGE_SIZE)));
|
||||
static SubChannelId blk_schid = { .one = 1 };
|
||||
@ -209,6 +210,10 @@ int main(void)
|
||||
|
||||
cutype = cu_type(blk_schid);
|
||||
switch (cutype) {
|
||||
case CU_TYPE_DASD_3990:
|
||||
case CU_TYPE_DASD_2107:
|
||||
dasd_ipl(blk_schid, cutype); /* no return */
|
||||
break;
|
||||
case CU_TYPE_VIRTIO:
|
||||
virtio_setup();
|
||||
zipl_load(); /* no return */
|
||||
|
@ -87,4 +87,17 @@ typedef struct LowCore {
|
||||
|
||||
extern LowCore const *lowcore;
|
||||
|
||||
static inline void set_prefix(uint32_t address)
|
||||
{
|
||||
asm volatile("spx %0" : : "m" (address) : "memory");
|
||||
}
|
||||
|
||||
static inline uint32_t store_prefix(void)
|
||||
{
|
||||
uint32_t address;
|
||||
|
||||
asm volatile("stpx %0" : "=m" (address));
|
||||
return address;
|
||||
}
|
||||
|
||||
#endif
|
||||
|
Loading…
Reference in New Issue
Block a user