qemu-e2k/hw/intc
David Woodhouse 54ad31fb0a hw/intc/ioapic: Update KVM routes before redelivering IRQ, on RTE update
A Linux guest will perform IRQ migration after the IRQ has happened,
updating the RTE to point to the new destination CPU and then unmasking
the interrupt.

However, when the guest updates the RTE, ioapic_mem_write() calls
ioapic_service(), which redelivers the pending level interrupt via
kvm_set_irq(), *before* calling ioapic_update_kvm_routes() which sets
the new target CPU.

Thus, the IRQ which is supposed to go to the new target CPU is instead
misdelivered to the previous target. An example where the guest kernel
is attempting to migrate from CPU#2 to CPU#0 shows:

xenstore_read tx 0 path control/platform-feature-xs_reset_watches
ioapic_set_irq vector: 11 level: 1
ioapic_set_remote_irr set remote irr for pin 11
ioapic_service: trigger KVM IRQ 11
[    0.523627] The affinity mask was 0-3 and the handler is on 2
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x27 size 0x4 val 0x26
ioapic_update_kvm_routes: update KVM route for IRQ 11: fee02000 8021
ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x18021
xenstore_reset_watches
ioapic_set_irq vector: 11 level: 1
ioapic_mem_read ioapic mem read addr 0x10 regsel: 0x26 size 0x4 retval 0x1c021
[    0.524569] ioapic_ack_level IRQ 11 moveit = 1
ioapic_eoi_broadcast EOI broadcast for vector 33
ioapic_clear_remote_irr clear remote irr for pin 11 vector 33
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x26
ioapic_mem_read ioapic mem read addr 0x10 regsel: 0x26 size 0x4 retval 0x18021
[    0.525235] ioapic_finish_move IRQ 11 calls irq_move_masked_irq()
[    0.526147] irq_do_set_affinity for IRQ 11, 0
[    0.526732] ioapic_set_affinity for IRQ 11, 0
[    0.527330] ioapic_setup_msg_from_msi for IRQ11 target 0
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x27
ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x27 size 0x4 val 0x0
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x27 size 0x4 val 0x26
ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x18021
[    0.527623] ioapic_set_affinity returns 0
[    0.527623] ioapic_finish_move IRQ 11 calls unmask_ioapic_irq()
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x26
ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x8021
ioapic_set_remote_irr set remote irr for pin 11
ioapic_service: trigger KVM IRQ 11
ioapic_update_kvm_routes: update KVM route for IRQ 11: fee00000 8021
[    0.529571] The affinity mask was 0 and the handler is on 2
[    xenstore_watch path memory/target token FFFFFFFF92847D40

There are no other code paths in ioapic_mem_write() which need the KVM
IRQ routing table to be updated, so just shift the call from the end
of the function to happen right before the call to ioapic_service()
and thus deliver the re-enabled IRQ to the right place.

Alternative fixes might have been just to remove the part in
ioapic_service() which delivers the IRQ via kvm_set_irq() because
surely delivering as MSI ought to work just fine anyway in all cases?
That code lacks a comment justifying its existence.

Or maybe in the specific case shown in the above log, it would have
sufficed for ioapic_update_kvm_routes() to update the route *even*
when the IRQ is masked. It's not like it's actually going to get
triggered unless QEMU deliberately does so, anyway? But that only
works because the target CPU happens to be in the high word of the
RTE; if something in the *low* word (vector, perhaps) was changed
at the same time as the unmask, we'd still trigger with stale data.

Fixes: 15eafc2e60 "kvm: x86: add support for KVM_CAP_SPLIT_IRQCHIP"
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230308111952.2728440-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-15 11:52:25 +01:00
..
allwinner-a10-pic.c
apic_common.c hw/intc: Extract the IRQ counting functions into a separate file 2023-01-13 16:22:57 +01:00
apic.c hw: Move ioapic*.h to intc/ 2023-02-27 22:29:01 +01:00
arm_gic_common.c hw/intc: Convert TYPE_ARM_GIC_COMMON to 3-phase reset 2022-12-15 11:18:20 +00:00
arm_gic_kvm.c hw/intc: Convert TYPE_ARM_GIC_KVM to 3-phase reset 2022-12-15 11:18:20 +00:00
arm_gic.c hw/intc: add implementation of GICD_IIDR to Arm GIC 2022-11-21 11:45:13 +00:00
arm_gicv2m.c
arm_gicv3_common.c hw/intc: Convert TYPE_ARM_GICV3_COMMON to 3-phase reset 2022-12-15 11:18:20 +00:00
arm_gicv3_cpuif_common.c
arm_gicv3_cpuif.c target/arm: Mark up sysregs for HFGRTR bits 36..63 2023-02-03 12:59:23 +00:00
arm_gicv3_dist.c bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx 2023-01-18 11:14:34 +01:00
arm_gicv3_its_common.c hw/intc: Convert TYPE_ARM_GICV3_ITS_COMMON to 3-phase reset 2022-12-15 11:18:20 +00:00
arm_gicv3_its_kvm.c hw/intc: Convert TYPE_KVM_ARM_ITS to 3-phase reset 2022-12-15 11:18:20 +00:00
arm_gicv3_its.c bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx 2023-01-18 11:14:34 +01:00
arm_gicv3_kvm.c hw/intc: Convert TYPE_KVM_ARM_GICV3 to 3-phase reset 2022-12-15 11:18:20 +00:00
arm_gicv3_redist.c bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx 2023-01-18 11:14:34 +01:00
arm_gicv3.c
armv7m_nvic.c hw/intc/armv7m_nvic: Use QOM cast CPU() macro 2023-02-27 13:27:05 +00:00
aspeed_vic.c
bcm2835_ic.c
bcm2836_control.c
etraxfs_pic.c
exynos4210_combiner.c bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx 2023-01-18 11:14:34 +01:00
exynos4210_gic.c
gic_internal.h
gicv3_internal.h
goldfish_pic.c
grlib_irqmp.c
heathrow_pic.c
i8259_common.c hw/intc/i8259: Implement legacy LTIM Edge/Level Bank Select 2023-03-08 00:37:48 +01:00
i8259.c hw/intc/i8259: Implement legacy LTIM Edge/Level Bank Select 2023-03-08 00:37:48 +01:00
imx_avic.c
imx_gpcv2.c
intc.c
ioapic_common.c hw: Move ioapic*.h to intc/ 2023-02-27 22:29:01 +01:00
ioapic_internal.h hw: Move ioapic*.h to intc/ 2023-02-27 22:29:01 +01:00
ioapic.c hw/intc/ioapic: Update KVM routes before redelivering IRQ, on RTE update 2023-03-15 11:52:25 +01:00
Kconfig hw/intc: Select MSI_NONBROKEN in RISC-V AIA interrupt controllers 2023-01-06 10:42:55 +10:00
kvm_irqcount.c hw/intc: Extract the IRQ counting functions into a separate file 2023-01-13 16:22:57 +01:00
loongarch_extioi.c hw/intc: Fix LoongArch extioi coreisr accessing 2022-11-04 17:07:40 +08:00
loongarch_ipi.c
loongarch_pch_msi.c hw/intc/loongarch_pch_msi: add irq number property 2023-01-06 10:54:20 +08:00
loongarch_pch_pic.c hw/intc/loongarch_pch: Change default irq number of pch irq controller 2023-01-06 14:12:43 +08:00
loongson_liointc.c
m68k_irqc.c
meson.build hw/intc: Mark more interrupt-controller files as target independent 2023-01-16 17:56:59 +01:00
mips_gic.c hw/mips: Declare all length properties as unsigned 2023-03-08 00:37:48 +01:00
nios2_vic.c
omap_intc.c hw/intc/omap_intc: Use CamelCase for TYPE_OMAP_INTC type name 2023-01-12 17:15:09 +00:00
ompic.c
openpic_kvm.c
openpic.c
pl190.c
pnv_xive2_regs.h
pnv_xive2.c include/hw/ppc: Split pnv_chip.h off pnv.h 2023-01-20 07:25:10 +01:00
pnv_xive_regs.h
pnv_xive.c include/hw/ppc: Split pnv_chip.h off pnv.h 2023-01-20 07:25:10 +01:00
ppc-uic.c
realview_gic.c
riscv_aclint.c hw: intc: Use cpu_by_arch_id to fetch CPU state 2023-03-05 15:33:40 -08:00
riscv_aplic.c hw: intc: Use cpu_by_arch_id to fetch CPU state 2023-03-05 15:33:40 -08:00
riscv_imsic.c hw: intc: Use cpu_by_arch_id to fetch CPU state 2023-03-05 15:33:40 -08:00
rx_icu.c
s390_flic_kvm.c
s390_flic.c
sh_intc.c
sifive_plic.c hw/intc: sifive_plic: Fix the pending register range check 2023-01-06 10:42:55 +10:00
slavio_intctl.c
spapr_xive_kvm.c
spapr_xive.c
trace-events hw/intc: Extract the IRQ counting functions into a separate file 2023-01-13 16:22:57 +01:00
trace.h
vgic_common.h
xics_kvm.c
xics_pnv.c
xics_spapr.c
xics.c hw/intc/xics: Convert TYPE_ICS to 3-phase reset 2022-12-16 15:59:07 +00:00
xilinx_intc.c hw/intc/xilinx_intc: Use 'XpsIntc' typedef instead of 'struct xlx_pic' 2023-01-12 17:15:09 +00:00
xive2.c
xive.c
xlnx-pmu-iomod-intc.c
xlnx-zynqmp-ipi.c