Commit Graph

44 Commits

Author SHA1 Message Date
Dexuan Cui
948373b3ed PCI: hv: Only queue new work items in hv_pci_devices_present() if necessary
If there is pending work in hv_pci_devices_present() we just need to add
the new dr entry into the dr_list. Add a check to detect pending work
items and update the code to skip queuing work if pending work items
are detected.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
2018-03-16 18:19:03 +00:00
Dexuan Cui
fca288c015 PCI: hv: Remove the bogus test in hv_eject_device_work()
When kernel is executing hv_eject_device_work(), hpdev->state value must
be hv_pcichild_ejecting; any other value would consist in a bug,
therefore replace the bogus check with an explicit WARN_ON() on the
condition failure detection.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
2018-03-16 18:19:02 +00:00
Dexuan Cui
df3f2159f4 PCI: hv: Fix a comment typo in _hv_pcifront_read_config()
Comment in _hv_pcifront_read_config() contains a typo, fix it.

No functional change.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: changed commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
2018-03-16 18:19:02 +00:00
Dexuan Cui
de0aa7b2f9 PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()
1. With the patch "x86/vector/msi: Switch to global reservation mode",
the recent v4.15 and newer kernels always hang for 1-vCPU Hyper-V VM
with SR-IOV. This is because when we reach hv_compose_msi_msg() by
request_irq() -> request_threaded_irq() ->__setup_irq()->irq_startup()
-> __irq_startup() -> irq_domain_activate_irq() -> ... ->
msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
disabled in __setup_irq().

Note: when we reach hv_compose_msi_msg() by another code path:
pci_enable_msix_range() -> ... -> irq_domain_activate_irq() -> ... ->
hv_compose_msi_msg(), local irq is not disabled.

hv_compose_msi_msg() depends on an interrupt from the host.
With interrupts disabled, a UP VM always hangs in the busy loop in
the function, because the interrupt callback hv_pci_onchannelcallback()
can not be called.

We can do nothing but work it around by polling the channel. This
is ugly, but we don't have any other choice.

2. If the host is ejecting the VF device before we reach
hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg()
forever, because at this time the host doesn't respond to the
CREATE_INTERRUPT request. This issue exists the first day the
pci-hyperv driver appears in the kernel.

Luckily, this can also by worked around by polling the channel
for the PCI_EJECT message and hpdev->state, and by checking the
PCI vendor ID.

Note: actually the above 2 issues also happen to a SMP VM, if
"hbus->hdev->channel->target_cpu == smp_processor_id()" is true.

Fixes: 4900be8360 ("x86/vector/msi: Switch to global reservation mode")
Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
Tested-by: Chris Valean <v-chvale@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: <stable@vger.kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
2018-03-16 18:19:01 +00:00
Dexuan Cui
021ad274d7 PCI: hv: Serialize the present and eject work items
When we hot-remove the device, we first receive a PCI_EJECT message and
then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.

The first message is offloaded to hv_eject_device_work(), and the second
is offloaded to pci_devices_present_work(). Both the paths can be running
list_del(&hpdev->list_entry), causing general protection fault, because
system_wq can run them concurrently.

The patch eliminates the race condition.

Since access to present/eject work items is serialized, we do not need the
hbus->enum_sem anymore, so remove it.

Fixes: 4daace0d8c ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Link: https://lkml.kernel.org/r/KL1P15301MB00064DA6B4D221123B5241CFBFD70@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
Tested-by: Chris Valean <v-chvale@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: squashed semaphore removal patch]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: <stable@vger.kernel.org> # v4.6+
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
2018-03-16 18:18:50 +00:00
Linus Torvalds
105cf3c8c6 pci-v4.16-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJad5lgAAoJEFmIoMA60/r8s2kQAI3PztawDpaCP9Z12pkbBHSt
 Ho0xTyk9rCZi9kQJbNjc+a+QrlA3QmTHXIXerB3LSWoh7M+XhsECjem92eHpgLNS
 JvYPhTfOrCr0vdiAmOz6hD0AqN/psrbfzgiJhSwomsGEFS77k7kERSJckRv81sxb
 Aj5F/WjucAgLorwm4auveAJEQ7atE7/6pkXzoqYm4G6NLOb46jUcRGndrnvXZBlz
 fws8fBM4BHyi7i25CYQl24tFq1CGax1rIPgLg+4KnH76bQk/N6Ju0sGVSzfh+hG8
 SIerK9bJbzGRAuNKoxB3aO1dyzsK3x9WztE2mG98w5trOISPIR1FqnvC/225FWAU
 d6eIXiC7wKnEx+DElNTzCjzfHc7SAJoupO32H7CoiTe5zPUlWlxJ1zLYkK1gt50q
 m8PRBiYTglxyznzrO0drtcdjEzvbdZNRrsYnul4wi1vSHzjk6F6XLtzT10XWM1M1
 1pXLB8384FTj0Hu4bq6Y3Aivkmz0Sf+eQM2NaOwe+Zj7/1VV0d3lvi4LUXkqzLCA
 FoXPJSMxG2Qu+iflCeYRQBJjExaZH3eNLZ3dT6QpcJrjaFVedd9u5DeeFqNL27zV
 bhr8TdqrR4p4rc8EBAGoCapw96IxLZROKB3gxbrZVOpfIZpzthwHbElHX6aqUgF4
 w/EV1JWs36WXWaxFk8wd
 =ttq9
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - skip AER driver error recovery callbacks for correctable errors
   reported via ACPI APEI, as we already do for errors reported via the
   native path (Tyler Baicar)

 - fix DPC shared interrupt handling (Alex Williamson)

 - print full DPC interrupt number (Keith Busch)

 - enable DPC only if AER is available (Keith Busch)

 - simplify DPC code (Bjorn Helgaas)

 - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
   Helgaas)

 - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
   Helgaas)

 - move ASPM internal interfaces out of public header (Bjorn Helgaas)

 - allow hot-removal of VGA devices (Mika Westerberg)

 - speed up unplug and shutdown by assuming Thunderbolt controllers
   don't support Command Completed events (Lukas Wunner)

 - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
   Jay Cornwall)

 - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)

 - clean up PCI DMA interface usage (Christoph Hellwig)

 - remove PCI pool API (replaced with DMA pool) (Romain Perier)

 - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
   Kaya)

 - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)

 - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)

 - remove warnings on sysfs mmap failure (Bjorn Helgaas)

 - quiet ROM validation messages (Alex Deucher)

 - remove redundant memory alloc failure messages (Markus Elfring)

 - fill in types for compile-time VGA and other I/O port resources
   (Bjorn Helgaas)

 - make "pci=pcie_scan_all" work for Root Ports as well as Downstream
   Ports to help AmigaOne X1000 (Bjorn Helgaas)

 - add SPDX tags to all PCI files (Bjorn Helgaas)

 - quirk Marvell 9128 DMA aliases (Alex Williamson)

 - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)

 - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
   Cassel)

 - use DMA API to get MSI address for DesignWare IP (Niklas Cassel)

 - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)

 - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)

 - add support for ARTPEC-7 SoC (Niklas Cassel)

 - add endpoint-mode support for ARTPEC (Niklas Cassel)

 - add Cadence PCIe host and endpoint controller driver (Cyrille
   Pitchen)

 - handle multiple INTx status bits being set in dra7xx (Vignesh R)

 - translate dra7xx hwirq range to fix INTD handling (Vignesh R)

 - remove deprecated Exynos PHY initialization code (Jaehoon Chung)

 - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)

 - fix NULL pointer dereference in iProc BCMA driver (Ray Jui)

 - fix Keystone interrupt-controller-node lookup (Johan Hovold)

 - constify qcom driver structures (Julia Lawall)

 - rework Tegra config space mapping to increase space available for
   endpoints (Vidya Sagar)

 - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)

 - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)

 - add support for Global Fabric Manager Server (GFMS) event to
   Microsemi Switchtec switch driver (Logan Gunthorpe)

 - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)

* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
  PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
  PCI: endpoint: Fix EPF device name to support multi-function devices
  PCI: endpoint: Add the function number as argument to EPC ops
  PCI: cadence: Add host driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
  PCI: Add vendor ID for Cadence
  PCI: Add generic function to probe PCI host controllers
  PCI: generic: fix missing call of pci_free_resource_list()
  PCI: OF: Add generic function to parse and allocate PCI resources
  PCI: Regroup all PCI related entries into drivers/pci/Makefile
  PCI/DPC: Reformat DPC register definitions
  PCI/DPC: Add and use DPC Status register field definitions
  PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
  PCI/DPC: Remove unnecessary RP PIO register structs
  PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
  PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
  PCI/DPC: Make RP PIO log size check more generic
  PCI/DPC: Rename local "status" to "dpc_status"
  PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
  ...
2018-02-06 09:59:40 -08:00
Bjorn Helgaas
8cfab3cf63 PCI: Add SPDX GPL-2.0 to replace GPL v2 boilerplate
Add SPDX GPL-2.0 to all PCI files that specified the GPL version 2 license.

Remove the boilerplate GPL version 2 language, relying on the assertion in
b24413180f ("License cleanup: add SPDX GPL-2.0 license identifier to
files with no license") that the SPDX identifier may be used instead of the
full boilerplate text.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-28 15:48:29 -06:00
Linus Torvalds
f39d7d78b7 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A couple of fixlets for x86:

   - Fix the ESPFIX double fault handling for 5-level pagetables

   - Fix the commandline parsing for 'apic=' on 32bit systems and update
     documentation

   - Make zombie stack traces reliable

   - Fix kexec with stack canary

   - Fix the delivery mode for APICs which was missed when the x86
     vector management was converted to single target delivery. Caused a
     regression due to the broken hardware which ignores affinity
     settings in lowest prio delivery mode.

   - Unbreak modules when AMD memory encryption is enabled

   - Remove an unused parameter of prepare_switch_to"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Switch all APICs to Fixed delivery mode
  x86/apic: Update the 'apic=' description of setting APIC driver
  x86/apic: Avoid wrong warning when parsing 'apic=' in X86-32 case
  x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
  x86: Remove unused parameter of prepare_switch_to
  x86/stacktrace: Make zombie stack traces reliable
  x86/mm: Unbreak modules that use the DMA API
  x86/build: Make isoimage work on Debian
  x86/espfix/64: Fix espfix double-fault handling on 5-level systems
2017-12-31 13:13:56 -08:00
Thomas Gleixner
a31e58e129 x86/apic: Switch all APICs to Fixed delivery mode
Some of the APIC incarnations are operating in lowest priority delivery
mode. This worked as long as the vector management code allocated the same
vector on all possible CPUs for each interrupt.

Lowest priority delivery mode does not necessarily respect the affinity
setting and may redirect to some other online CPU. This was documented
somewhere in the old code and the conversion to single target delivery
missed to update the delivery mode of the affected APIC drivers which
results in spurious interrupts on some of the affected CPU/Chipset
combinations.

Switch the APIC drivers over to Fixed delivery mode and remove all
leftovers of lowest priority delivery mode.

Switching to Fixed delivery mode is not a problem on these CPUs because the
kernel already uses Fixed delivery mode for IPIs. The reason for this is
that th SDM explicitely forbids lowest prio mode for IPIs. The reason is
obvious: If the irq routing does not honor destination targets in lowest
prio mode then an IPI targeted at CPU1 might end up on CPU0, which would be
a fatal problem in many cases.

As a consequence of this change, the apic::irq_delivery_mode field is now
pointless, but this needs to be cleaned up in a separate patch.

Fixes: fdba46ffb4 ("x86/apic: Get rid of multi CPU affinity")
Reported-by: vcaputo@pengaru.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: vcaputo@pengaru.com
Cc: Pavel Machek <pavel@ucw.cz>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712281140440.1688@nanos
2017-12-29 14:20:48 +01:00
Dexuan Cui
79aa801e89 PCI: hv: Use effective affinity mask
The effective_affinity_mask is always set when an interrupt is assigned in
__assign_irq_vector() -> apic->cpu_mask_to_apicid(), e.g. for struct apic
apic_physflat: -> default_cpu_mask_to_apicid() ->
irq_data_update_effective_affinity(), but it looks d->common->affinity
remains all-1's before the user space or the kernel changes it later.

In the early allocation/initialization phase of an IRQ, we should use the
effective_affinity_mask, otherwise Hyper-V may not deliver the interrupt to
the expected CPU.  Without the patch, if we assign 7 Mellanox ConnectX-3
VFs to a 32-vCPU VM, one of the VFs may fail to receive interrupts.

Tested-by: Adrian Suhov <v-adsuho@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
Cc: stable@vger.kernel.org
Cc: Jork Loeser <jloeser@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
2017-11-07 18:06:39 -06:00
Linus Torvalds
0d519f2d1e pci-v4.14-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
 vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
 64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
 XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
 0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
 B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
 nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
 c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
 4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
 h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
 GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
 EG76ridTolUFVV+wzJD9
 =9cLP
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - add enhanced Downstream Port Containment support, which prints more
   details about Root Port Programmed I/O errors (Dongdong Liu)

 - add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)

 - add MediaTek MT2712 and MT7622 support (Ryder Lee)

 - add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)

 - add Qualcom IPQ8074 support (Varadarajan Narayanan)

 - add R-Car r8a7743/5 device tree support (Biju Das)

 - add Rockchip per-lane PHY support for better power management (Shawn
   Lin)

 - fix IRQ mapping for hot-added devices by replacing the
   pci_fixup_irqs() boot-time design with a host bridge hook called at
   probe-time (Lorenzo Pieralisi, Matthew Minter)

 - fix race when enabling two devices that results in upstream bridge
   not being enabled correctly (Srinath Mannam)

 - fix pciehp power fault infinite loop (Keith Busch)

 - fix SHPC bridge MSI hotplug events by enabling bus mastering
   (Aleksandr Bezzubikov)

 - fix a VFIO issue by correcting PCIe capability sizes (Alex
   Williamson)

 - fix an INTD issue on Xilinx and possibly other drivers by unifying
   INTx IRQ domain support (Paul Burton)

 - avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
   Roedel)

 - allow APM X-Gene device assignment to guests by adding an ACS quirk
   (Feng Kan)

 - fix driver crashes by disabling Extended Tags on Broadcom HT2100
   (Extended Tags support is required for PCIe Receivers but not
   Requesters, and we now enable them by default when Requesters support
   them) (Sinan Kaya)

 - fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
   use the real Requester ID (not a phantom RID) (Robin Murphy)

 - prevent assignment of Intel VMD children to guests (which may be
   supported eventually, but isn't yet) by not associating an IOMMU with
   them (Jon Derrick)

 - fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
   Bauer)

 - fix a Function-Level Reset issue with Intel 750 NVMe by waiting
   longer (up to 60sec instead of 1sec) for device to become ready
   (Sinan Kaya)

 - fix a Function-Level Reset issue on iProc Stingray by working around
   hardware defects in the CRS implementation (Oza Pawandeep)

 - fix an issue with Intel NVMe P3700 after an iProc reset by adding a
   delay during shutdown (Oza Pawandeep)

 - fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
   in compose_msi_msg() (Stephen Hemminger)

 - fix a wireless LAN driver timeout by clearing DesignWare MSI
   interrupt status after it is handled, not before (Faiz Abbas)

 - fix DesignWare ATU enable checking (Jisheng Zhang)

 - reduce Layerscape dependencies on the bootloader by doing more
   initialization in the driver (Hou Zhiqiang)

 - improve Intel VMD performance allowing allocation of more IRQ vectors
   than present CPUs (Keith Busch)

 - improve endpoint framework support for initial DMA mask, different
   BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
   Vijay Abraham I, Stan Drozd)

 - rework CRS support to add periodic messages while we poll during
   enumeration and after Function-Level Reset and prepare for possible
   other uses of CRS (Sinan Kaya)

 - clean up Root Port AER handling by removing unnecessary code and
   moving error handler methods to struct pcie_port_service_driver
   (Christoph Hellwig)

 - clean up error handling paths in various drivers (Bjorn Andersson,
   Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
   Lorenzo Pieralisi, Sergei Shtylyov)

 - clean up SR-IOV resource handling by disabling VF decoding before
   updating the corresponding resource structs (Gavin Shan)

 - clean up DesignWare-based drivers by unifying quirks to update Class
   Code and Interrupt Pin and related handling of write-protected
   registers (Hou Zhiqiang)

 - clean up by adding empty generic pcibios_align_resource() and
   pcibios_fixup_bus() and removing empty arch-specific implementations
   (Palmer Dabbelt)

 - request exclusive reset control for several drivers to allow cleanup
   elsewhere (Philipp Zabel)

 - constify various structures (Arvind Yadav, Bhumika Goyal)

 - convert from full_name() to %pOF (Rob Herring)

 - remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
   Lin)

* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
  PCI: xgene: Clean up whitespace
  PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
  PCI: xgene: Fix platform_get_irq() error handling
  PCI: xilinx-nwl: Fix platform_get_irq() error handling
  PCI: rockchip: Fix platform_get_irq() error handling
  PCI: altera: Fix platform_get_irq() error handling
  PCI: spear13xx: Fix platform_get_irq() error handling
  PCI: artpec6: Fix platform_get_irq() error handling
  PCI: armada8k: Fix platform_get_irq() error handling
  PCI: dra7xx: Fix platform_get_irq() error handling
  PCI: exynos: Fix platform_get_irq() error handling
  PCI: iproc: Clean up whitespace
  PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
  PCI: iproc: Add 500ms delay during device shutdown
  PCI: Fix typos and whitespace errors
  PCI: Remove unused "res" variable from pci_resource_io()
  PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
  PCI/AER: Reformat AER register definitions
  iommu/vt-d: Prevent VMD child devices from being remapping targets
  x86/PCI: Use is_vmd() rather than relying on the domain number
  ...
2017-09-08 15:47:43 -07:00
Vitaly Kuznetsov
7415aea607 hyper-v: Globalize vp_index
To support implementing remote TLB flushing on Hyper-V with a hypercall
we need to make vp_index available outside of vmbus module. Rename and
globalize.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170802160921.21791-7-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 16:50:23 +02:00
Stephen Hemminger
80bfeeb9dd PCI: hv: Do not sleep in compose_msi_msg()
The setup of MSI with Hyper-V host was sleeping with locks held.  This
error is reported when doing SR-IOV hotplug with kernel built with lockdep:

    BUG: sleeping function called from invalid context at kernel/sched/completion.c:93
    in_atomic(): 1, irqs_disabled(): 1, pid: 1405, name: ip
    3 locks held by ip/1405:
   #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff976b10bb>] rtnetlink_rcv+0x1b/0x40
   #1:  (&desc->request_mutex){+.+...}, at: [<ffffffff970ddd33>] __setup_irq+0xb3/0x720
   #2:  (&irq_desc_lock_class){-.-...}, at: [<ffffffff970ddd65>] __setup_irq+0xe5/0x720
   irq event stamp: 3476
   hardirqs last  enabled at (3475): [<ffffffff971b3005>] get_page_from_freelist+0x225/0xc90
   hardirqs last disabled at (3476): [<ffffffff978024e7>] _raw_spin_lock_irqsave+0x27/0x90
   softirqs last  enabled at (2446): [<ffffffffc05ef0b0>] ixgbevf_configure+0x380/0x7c0 [ixgbevf]
   softirqs last disabled at (2444): [<ffffffffc05ef08d>] ixgbevf_configure+0x35d/0x7c0 [ixgbevf]

The workaround is to poll for host response instead of blocking on
completion.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-03 18:15:44 -05:00
Jork Loeser
7dcf90e9e0 PCI: hv: Use vPCI protocol version 1.2
Update the Hyper-V vPCI driver to use the Server-2016 version of the vPCI
protocol, fixing MSI creation and retargeting issues.

Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-07-02 18:43:09 -05:00
Jork Loeser
b1db7e7e1d PCI: hv: Add vPCI version protocol negotiation
Hyper-V vPCI offers different protocol versions.  Add the infra for
negotiating the one to use.

Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-07-02 18:43:09 -05:00
Jork Loeser
02c3764c77 PCI: hv: Temporary own CPU-number-to-vCPU-number infra
To ease parallel effort to centralize CPU-number-to-vCPU-number conversion,
temporarily stand up own version, file-local hv_tmp_cpu_nr_to_vp_nr().
Once the changes have merged, this work-around can be removed, and the
calls replaced with hv_cpu_number_to_vp_number().

Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-07-02 18:43:09 -05:00
Jork Loeser
be66b67365 PCI: hv: Use page allocation for hbus structure
The hv_pcibus_device structure contains an in-memory hypercall argument
that must not cross a page boundary.  Allocate the structure as a page to
ensure that.

Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-07-02 18:43:08 -05:00
Jork Loeser
691ac1dc58 PCI: hv: Fix comment formatting and use proper integer fields
Fix comment formatting and use proper integer fields.

Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-07-02 18:43:08 -05:00
Elena Reshetova
24196f0c7d PCI: hv: Convert hv_pci_dev.refs from atomic_t to refcount_t
refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
2017-04-18 09:02:48 -05:00
K. Y. Srinivasan
59c58ceeea PCI: hv: Allocate interrupt descriptors with GFP_ATOMIC
The memory allocation here needs to be non-blocking.  Fix the issue.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Long Li <longli@microsoft.com>
Cc: <stable@vger.kernel.org>
2017-04-04 14:00:01 -05:00
K. Y. Srinivasan
433fcf6b7b PCI: hv: Specify CPU_AFFINITY_ALL for MSI affinity when >= 32 CPUs
When we have 32 or more CPUs in the affinity mask, we should use a special
constant to specify that to the host. Fix this issue.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Long Li <longli@microsoft.com>
Cc: <stable@vger.kernel.org>
2017-04-04 13:58:20 -05:00
Long Li
414428c5da PCI: hv: Lock PCI bus on device eject
A PCI_EJECT message can arrive at the same time we are calling
pci_scan_child_bus() in the workqueue for the previous PCI_BUS_RELATIONS
message or in create_root_hv_pci_bus().  In this case we could potentially
modify the bus from multiple places.

Properly lock the bus access.

Thanks Dexuan Cui <decui@microsoft.com> for pointing out the race condition
in create_root_hv_pci_bus().

Reported-by: Xiaofeng Wang <xiaofwan@redhat.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-03-24 09:54:56 -05:00
Long Li
d3a78d8bf7 PCI: hv: Properly handle PCI bus remove
hv_pci_devices_present() is called in hv_pci_remove() when we remove a PCI
device from the host, e.g., by disabling SR-IOV on a device.  In
hv_pci_remove(), the bus is already removed before the call, so we don't
need to rescan the bus in the workqueue scheduled from
hv_pci_devices_present().

By introducing bus state hv_pcibus_removed, we can avoid this situation.

Reported-by: Xiaofeng Wang <xiaofwan@redhat.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2017-03-24 09:50:25 -05:00
Haiyang Zhang
4a9b0933bd PCI: hv: Use device serial number as PCI domain
Use the device serial number as the PCI domain.  The serial numbers start
with 1 and are unique within a VM.  So names, such as VF NIC names, that
include domain number as part of the name, can be shorter than that based
on part of bus UUID previously.  The new names will also stay same for VMs
created with copied VHD and same number of devices.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
2017-02-17 13:53:29 -06:00
Dexuan Cui
60e2e2fbaf PCI: hv: Fix wslot_to_devfn() to fix warnings on device removal
The devfn of 00:02.0 is 0x10.  devfn_to_wslot(0x10) == 0x2, and
wslot_to_devfn(0x2) should be 0x10, while it's 0x2 in the current code.

Due to this, hv_eject_device_work() -> pci_get_domain_bus_and_slot()
returns NULL and pci_stop_and_remove_bus_device() is not called.

Later when the real device driver's .remove() is invoked by
hv_pci_remove() -> pci_stop_root_bus(), some warnings can be noticed
because the VM has lost the access to the underlying device at that
time.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
CC: stable@vger.kernel.org
CC: K. Y. Srinivasan <kys@microsoft.com>
CC: Stephen Hemminger <sthemmin@microsoft.com>
2017-02-10 15:18:46 -06:00
Long Li
0de8ce3ee8 PCI: hv: Allocate physically contiguous hypercall params buffer
hv_do_hypercall() assumes that we pass a segment from a physically
contiguous buffer.  A buffer allocated on the stack may not work if
CONFIG_VMAP_STACK=y is set.

Use kmalloc() to allocate this buffer.

Reported-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2016-11-29 17:22:43 -06:00
Dexuan Cui
e74d2ebdda PCI: hv: Delete the device earlier from hbus->children for hot-remove
After we send a PCI_EJECTION_COMPLETE message to the host, the host will
immediately send us a PCI_BUS_RELATIONS message with
relations->device_count == 0, so pci_devices_present_work(), running on
another thread, can find the being-ejected device, mark the
hpdev->reported_missing to true, and run list_move_tail()/list_del() for
the device -- this races hv_eject_device_work() -> list_del().

Move the list_del() in hv_eject_device_work() to an earlier place, i.e.,
before we send PCI_EJECTION_COMPLETE, so later the
pci_devices_present_work() can't see the device.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-11-16 16:46:44 -06:00
Dexuan Cui
17978524a6 PCI: hv: Fix hv_pci_remove() for hot-remove
1. We don't really need such a big on-stack buffer when sending the
teardown_packet: vmbus_sendpacket() here only uses sizeof(struct
pci_message).

2. In the hot-remove case (PCI_EJECT), after we send PCI_EJECTION_COMPLETE
to the host, the host will send a RESCIND_CHANNEL message to us and the
host won't access the per-channel ringbuffer any longer, so we needn't send
PCI_RESOURCES_RELEASED/PCI_BUS_D0EXIT to the host, and we shouldn't expect
the host's completion message of PCI_BUS_D0EXIT, which will never come.

3. We should send PCI_BUS_D0EXIT after hv_send_resources_released().

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-11-16 16:45:32 -06:00
Dexuan Cui
8286e96d95 PCI: hv: Use the correct buffer size in new_pcichild_device()
We don't really need such a big on-stack buffer.  vmbus_sendpacket() here
only uses sizeof(struct pci_child_message).

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
2016-11-16 16:43:58 -06:00
Tobias Klauser
542ccf4551 PCI: hv: Make unnecessarily global IRQ masking functions static
Make hv_irq_mask() and hv_irq_unmask() static as they are only used in
pci-hyperv.c

This fixes a sparse warning.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2016-10-31 13:22:42 -05:00
Dexuan Cui
a5b45b7b95 PCI: hv: Handle hv_pci_generic_compl() error case
'completion_status' is used in some places, e.g.,
hv_pci_protocol_negotiation(), so we should make sure it's initialized in
error case too, though the error is unlikely here.

[bhelgaas: fix changelog typo and nearby whitespace]
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06 12:23:30 -05:00
Dexuan Cui
665e2245eb PCI: hv: Handle vmbus_sendpacket() failure in hv_compose_msi_msg()
Handle vmbus_sendpacket() failure in hv_compose_msi_msg().

I happened to find this when reading the code.  I didn't get a real issue
however.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06 12:21:57 -05:00
Dexuan Cui
617ceb62ea PCI: hv: Remove the unused 'wrk' in struct hv_pcibus_device
Remove the unused 'wrk' member in struct hv_pcibus_device.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06 12:21:23 -05:00
Dexuan Cui
7d0f8eec97 PCI: hv: Use pci_function_description[0] in struct definitions
The 2 structs can use a zero-length array here, because dynamic memory of
the correct size is allocated in hv_pci_devices_present() and we don't need
this extra element.

No functional change.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06 12:20:44 -05:00
Dexuan Cui
0c6045d8c0 PCI: hv: Use zero-length array in struct pci_packet
Use zero-length array in struct pci_packet and rename struct pci_message's
field "message_type" to "type".  This makes the code more readable.

No functionality change.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
2016-09-06 12:15:46 -05:00
Wei Yongjun
4f1cb01a78 PCI: hv: Use list_move_tail() instead of list_del() + list_add_tail()
Use list_move_tail() instead of list_del() + list_add_tail().  No
functional change intended

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-08-22 14:35:38 -05:00
Cathy Avery
0c6e617f65 PCI: hv: Fix interrupt cleanup path
SR-IOV disabled from the host causes a memory leak.  pci-hyperv usually
first receives a PCI_EJECT notification and then proceeds to delete the
hpdev list entry in hv_eject_device_work().  Later in hv_msi_free() since
the device is no longer on the device list hpdev is NULL and hv_msi_free
returns without freeing int_desc as part of hv_int_desc_free().

Signed-off-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-07-25 12:33:36 -05:00
Vitaly Kuznetsov
837d741ea2 PCI: hv: Handle all pending messages in hv_pci_onchannelcallback()
When we have an interrupt from the host we have a bit set in event page
indicating there are messages for the particular channel.  We need to read
them all as we won't get signaled for what was on the queue before we
cleared the bit in vmbus_on_event().  This applies to all Hyper-V drivers
and the pass-through driver should do the same.

I did not meet any bugs; the issue was found by code inspection.  We don't
have many events going through hv_pci_onchannelcallback(), which explains
why nobody reported the issue before.

While on it, fix handling non-zero vmbus_recvpacket_raw() return values by
dropping out.  If the return value is not zero, it is wrong to inspect
buffer or bytes_recvd as these may contain invalid data.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-06-17 12:45:30 -05:00
Vitaly Kuznetsov
60fcdac813 PCI: hv: Don't leak buffer in hv_pci_onchannelcallback()
We don't free buffer on several code paths in hv_pci_onchannelcallback(),
put kfree() to the end of the function to fix the issue.  Direct { kfree();
return; } can now be replaced with a simple 'break';

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-06-17 12:45:30 -05:00
Linus Torvalds
5af2344013 Char / Misc driver update for 4.7-rc1
Here's the big char and misc driver update for 4.7-rc1.
 
 Lots of different tiny driver subsystems have updates here with new
 drivers and functionality.  Details in the shortlog.
 
 All have been in linux-next with no reported issues for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlc/0YYACgkQMUfUDdst+ynmtACeLpLLKZsy1v7WfkW92cLSOPBD
 2C8AoLFPKoh55rlOJrNz3bW9ANAaOloX
 =/nsL
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc driver updates from Greg KH:
 "Here's the big char and misc driver update for 4.7-rc1.

  Lots of different tiny driver subsystems have updates here with new
  drivers and functionality.  Details in the shortlog.

  All have been in linux-next with no reported issues for a while"

* tag 'char-misc-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (125 commits)
  mcb: Delete num_cells variable which is not required
  mcb: Fixed bar number assignment for the gdd
  mcb: Replace ioremap and request_region with the devm version
  mcb: Implement bus->dev.release callback
  mcb: export bus information via sysfs
  mcb: Correctly initialize the bus's device
  mei: bus: call mei_cl_read_start under device lock
  coresight: etb10: adjust read pointer only when needed
  coresight: configuring ETF in FIFO mode when acting as link
  coresight: tmc: implementing TMC-ETF AUX space API
  coresight: moving struct cs_buffers to header file
  coresight: tmc: keep track of memory width
  coresight: tmc: make sysFS and Perf mode mutually exclusive
  coresight: tmc: dump system memory content only when needed
  coresight: tmc: adding mode of operation for link/sinks
  coresight: tmc: getting rid of multiple read access
  coresight: tmc: allocating memory when needed
  coresight: tmc: making prepare/unprepare functions generic
  coresight: tmc: splitting driver in ETB/ETF and ETR components
  coresight: tmc: cleaning up header file
  ...
2016-05-20 21:20:31 -07:00
Vitaly Kuznetsov
bdd74440d9 PCI: hv: Add explicit barriers to config space access
I'm trying to pass-through Broadcom BCM5720 NIC (Dell device 1f5b) on a
Dell R720 server.  Everything works fine when the target VM has only one
CPU, but SMP guests reboot when the NIC driver accesses PCI config space
with hv_pcifront_read_config()/hv_pcifront_write_config().  The reboot
appears to be induced by the hypervisor and no crash is observed.  Windows
event logs are not helpful at all ('Virtual machine ... has quit
unexpectedly').  The particular access point is always different and
putting debug between them (printk/mdelay/...) moves the issue further
away.  The server model affects the issue as well: on Dell R420 I'm able to
pass-through BCM5720 NIC to SMP guests without issues.

While I'm obviously failing to reveal the essence of the issue I was able
to come up with a (possible) solution: if explicit barriers are added to
hv_pcifront_read_config()/hv_pcifront_write_config() the issue goes away.
The essential minimum is rmb() at the end on _hv_pcifront_read_config() and
wmb() at the end of _hv_pcifront_write_config() but I'm not confident it
will be sufficient for all hardware.  I suggest the following barriers:

1) wmb()/mb() between choosing the function and writing to its space.
2) mb() before releasing the spinlock in both _hv_pcifront_read_config()/
   _hv_pcifront_write_config() to ensure that consecutive reads/writes to
  the space won't get re-ordered as drivers may count on that.

Config space access is not supposed to be performance-critical so these
explicit barriers should not cause any slowdown.

[bhelgaas: use Linux "barriers" terminology]
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-05-04 17:03:41 -05:00
Vitaly Kuznetsov
deb22e5c84 PCI: hv: Report resources release after stopping the bus
Kernel hang is observed when pci-hyperv module is release with device
drivers still attached.  E.g., when I do 'rmmod pci_hyperv' with BCM5720
device pass-through-ed (tg3 module) I see the following:

 NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [rmmod:2104]
 ...
 Call Trace:
  [<ffffffffa0641487>] tg3_read_mem+0x87/0x100 [tg3]
  [<ffffffffa063f000>] ? 0xffffffffa063f000
  [<ffffffffa0644375>] tg3_poll_fw+0x85/0x150 [tg3]
  [<ffffffffa0649877>] tg3_chip_reset+0x357/0x8c0 [tg3]
  [<ffffffffa064ca8b>] tg3_halt+0x3b/0x190 [tg3]
  [<ffffffffa0657611>] tg3_stop+0x171/0x230 [tg3]
  ...
  [<ffffffffa064c550>] tg3_remove_one+0x90/0x140 [tg3]
  [<ffffffff813bee59>] pci_device_remove+0x39/0xc0
  [<ffffffff814a3201>] __device_release_driver+0xa1/0x160
  [<ffffffff814a32e3>] device_release_driver+0x23/0x30
  [<ffffffff813b794a>] pci_stop_bus_device+0x8a/0xa0
  [<ffffffff813b7ab6>] pci_stop_root_bus+0x36/0x60
  [<ffffffffa02c3f38>] hv_pci_remove+0x238/0x260 [pci_hyperv]

The problem seems to be that we report local resources release before
stopping the bus and removing devices from it and device drivers may try to
perform some operations with these resources on shutdown.  Move resources
release report after we do pci_stop_root_bus().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
2016-05-02 15:40:42 -05:00
Jake Oshins
696ca5e82c drivers:hv: Use new vmbus_mmio_free() from client drivers.
This patch modifies all the callers of vmbus_mmio_allocate()
to call vmbus_mmio_free() instead of release_mem_region().

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-30 14:01:37 -07:00
Jake Oshins
4daace0d8c PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs
Add a new driver which exposes a root PCI bus whenever a PCI Express device
is passed through to a guest VM under Hyper-V.  The device can be single-
or multi-function.  The interrupts for the devices are managed by an IRQ
domain, implemented within the driver.

[bhelgaas: fold in race condition fix (http://lkml.kernel.org/r/1456340196-13717-1-git-send-email-jakeo@microsoft.com)]
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-02-16 16:56:12 -06:00