Commit Graph

9 Commits

Author SHA1 Message Date
Luck, Tony 3482fb5e0c ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface
When I added support for ACPI5 I made the assumption that
injected processor errors would just need to know the APICID,
memory errors just the address and mask, and PCIe errors just the
segment/bus/device/function. So I had the code check the type of injection
and multiplex the "param1" value appropriately.

This was not a good assumption :-(

There are injection scenarios where we need to specify more than one of
these items. E.g. injecting a cache error we need to specify an APICID
of the cpu that owns the cache, and also an address (so that we can trip
the error by accessing the address).

Add a "flags" file to give the user direct access to specify which items
are valid in the ACPI SET_ERROR_TYPE_WITH_ADDRESS structure. Also add
new files param3 and param4 to hold all these values.

For backwards compatability with old injection scripts we maintain the
old behaviour if flags remains set at zero (or is reset to 0).

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-12-17 16:04:22 -08:00
Chen Gong ace3647afb ACPI/APEI: Update einj documentation for param1/param2
To ensure EINJ working well when injecting errors via EINJ
table, add some restrictions: param1 must be a valid physical
RAM address and param2 must specify page granularity or
narrower.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-06 15:28:11 -07:00
Chen Gong 6ef19ab7fa Update documentation for parameter *notrigger* in einj.txt
Add description of parameter notrigger in the einj.txt.
One can utilize this new parameter to do some SRAR injection
test. Pay attention, the operation is highly depended on the
BIOS implementation. If no proper BIOS supports it, even if
enabling this parameter, expected result will not happen.

v2:
  Update the documentation suggested by Tony

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-30 03:30:19 -04:00
Tony Luck c130bd6f82 acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec
ACPI 5.0 provides extensions to the EINJ mechanism to specify the
target for the error injection - by APICID for cpu related errors,
by address for memory related errors, and by segment/bus/device/function
for PCIe related errors. Also extensions for vendor specific error
injections.

Tested-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-01-18 01:14:17 -05:00
Huang Ying c3e6088e10 ACPI, APEI, EINJ Param support is disabled by default
EINJ parameter support is only usable for some specific BIOS.
Originally, it is expected to have no harm for BIOS does not support
it.  But now, we found it will cause issue (memory overwriting) for
some BIOS.  So param support is disabled by default and only enabled
when newly added module parameter named "param_extension" is
explicitly specified.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 11:15:59 -04:00
Huang Ying c413d76820 ACPI, APEI, Add PCIe AER error information printing support
The AER error information printing support is implemented in
drivers/pci/pcie/aer/aer_print.c.  So some string constants, functions
and macros definitions can be re-used without being exported.

The original PCIe AER error information printing function is not
re-used directly because the overall format is quite different.  And
changing the original printing format may make some original users'
scripts broken.

Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-03-21 22:59:08 -04:00
Huang Ying f59c55d04b ACPI, APEI, Add APEI generic error status printing support
In APEI, Hardware error information reported by firmware to Linux
kernel is in the data structure of APEI generic error status (struct
acpi_hes_generic_status).  While now printk is used by Linux kernel to
report hardware error information to user space.

So, this patch adds printing support for the data structure, so that
the corresponding hardware error information can be reported to user
space via printk.

PCIe AER information printing is not implemented yet.  Will refactor the
original PCIe AER information printing code to avoid code duplicating.

The output format is as follow:

<error record> :=
APEI generic hardware error status
severity: <integer>, <severity string>
section: <integer>, severity: <integer>, <severity string>
flags: <integer>
<section flags strings>
fru_id: <uuid string>
fru_text: <string>
section_type: <section type string>
<section data>

<severity string>* := recoverable | fatal | corrected | info

<section flags strings># :=
[primary][, containment warning][, reset][, threshold exceeded]\
[, resource not accessible][, latent error]

<section type string> := generic processor error | memory error | \
PCIe error | unknown, <uuid string>

<section data> :=
<generic processor section data> | <memory section data> | \
<pcie section data> | <null>

<generic processor section data> :=
[processor_type: <integer>, <proc type string>]
[processor_isa: <integer>, <proc isa string>]
[error_type: <integer>
<proc error type strings>]
[operation: <integer>, <proc operation string>]
[flags: <integer>
<proc flags strings>]
[level: <integer>]
[version_info: <integer>]
[processor_id: <integer>]
[target_address: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[IP: <integer>]

<proc type string>* := IA32/X64 | IA64

<proc isa string>* := IA32 | IA64 | X64

<processor error type strings># :=
[cache error][, TLB error][, bus error][, micro-architectural error]

<proc operation string>* := unknown or generic | data read | data write | \
instruction execution

<proc flags strings># :=
[restartable][, precise IP][, overflow][, corrected]

<memory section data> :=
[error_status: <integer>]
[physical_address: <integer>]
[physical_address_mask: <integer>]
[node: <integer>]
[card: <integer>]
[module: <integer>]
[bank: <integer>]
[device: <integer>]
[row: <integer>]
[column: <integer>]
[bit_position: <integer>]
[requestor_id: <integer>]
[responder_id: <integer>]
[target_id: <integer>]
[error_type: <integer>, <mem error type string>]

<mem error type string>* :=
unknown | no error | single-bit ECC | multi-bit ECC | \
single-symbol chipkill ECC | multi-symbol chipkill ECC | master abort | \
target abort | parity error | watchdog timeout | invalid address | \
mirror Broken | memory sparing | scrub corrected error | \
scrub uncorrected error

<pcie section data> :=
[port_type: <integer>, <pcie port type string>]
[version: <integer>.<integer>]
[command: <integer>, status: <integer>]
[device_id: <integer>:<integer>:<integer>.<integer>
slot: <integer>
secondary_bus: <integer>
vendor_id: <integer>, device_id: <integer>
class_code: <integer>]
[serial number: <integer>, <integer>]
[bridge: secondary_status: <integer>, control: <integer>]

<pcie port type string>* := PCIe end point | legacy PCI end point | \
unknown | unknown | root port | upstream switch port | \
downstream switch port | PCIe to PCI/PCI-X bridge | \
PCI/PCI-X to PCIe bridge | root complex integrated endpoint device | \
root complex event collector

Where, [] designate corresponding content is optional

All <field string> description with * has the following format:

field: <integer>, <field string>

Where value of <integer> should be the position of "string" in <field
string> description. Otherwise, <field string> will be "unknown".

All <field strings> description with # has the following format:

field: <integer>
<field strings>

Where each string in <fields strings> corresponding to one set bit of
<integer>. The bit position is the position of "string" in <field
strings> description.

For more detailed explanation of every field, please refer to UEFI
specification version 2.3 or later, section Appendix N: Common
Platform Error Record.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-12-13 23:42:12 -05:00
Huang Ying 6e320ec1d9 ACPI, APEI, EINJ injection parameters support
Some hardware error injection needs parameters, for example, it is
useful to specify memory address and memory address mask for memory
errors.

Some BIOSes allow parameters to be specified via an unpublished
extension. This patch adds support to it. The parameters will be
ignored on machines without necessary BIOS support.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-19 22:42:08 -04:00
Huang Ying ea8c071cad ACPI, APEI, Document for APEI
Add document for APEI, including kernel parameters and EINJ debug file
sytem interface.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-19 22:39:49 -04:00