From d0117a0e2780f7803fe55d543ab119416d7582e6 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Sat, 25 Feb 2017 18:18:22 -0500 Subject: [PATCH 01/69] x86: msr-index.h: define EPB mid-points These are currently open-coded into intel_pstate.c Signed-off-by: Len Brown --- arch/x86/include/asm/msr-index.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 710273c617b8..a92d9bd154f6 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -462,9 +462,11 @@ #define MSR_MISC_PWR_MGMT 0x000001aa #define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0 -#define ENERGY_PERF_BIAS_PERFORMANCE 0 -#define ENERGY_PERF_BIAS_NORMAL 6 -#define ENERGY_PERF_BIAS_POWERSAVE 15 +#define ENERGY_PERF_BIAS_PERFORMANCE 0 +#define ENERGY_PERF_BIAS_BALANCE_PERFORMANCE 4 +#define ENERGY_PERF_BIAS_NORMAL 6 +#define ENERGY_PERF_BIAS_BALANCE_POWERSAVE 8 +#define ENERGY_PERF_BIAS_POWERSAVE 15 #define MSR_IA32_PACKAGE_THERM_STATUS 0x000001b1 From 8d84e906f5db80540510e448226f2718a686eb2a Mon Sep 17 00:00:00 2001 From: Len Brown Date: Sat, 25 Feb 2017 11:56:29 -0500 Subject: [PATCH 02/69] x86: msr-index.h: define HWP.EPP values The Hardware Performance State request MSR has a field to express the "Energy Performance Preference" (HWP.EPP). Decode that field so the definition may be shared by by the intel_pstate driver and any utilities that decode the same register. Signed-off-by: Len Brown --- arch/x86/include/asm/msr-index.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index a92d9bd154f6..50c0c3204a92 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -239,6 +239,10 @@ #define HWP_MAX_PERF(x) ((x & 0xff) << 8) #define HWP_DESIRED_PERF(x) ((x & 0xff) << 16) #define HWP_ENERGY_PERF_PREFERENCE(x) ((x & 0xff) << 24) +#define HWP_EPP_PERFORMANCE 0x00 +#define HWP_EPP_BALANCE_PERFORMANCE 0x80 +#define HWP_EPP_BALANCE_POWERSAVE 0xC0 +#define HWP_EPP_POWERSAVE 0xFF #define HWP_ACTIVITY_WINDOW(x) ((x & 0xff3) << 32) #define HWP_PACKAGE_CONTROL(x) ((x & 0x1) << 42) From 2fc49cb0b508947bf048ecb0f5710169e62ce68e Mon Sep 17 00:00:00 2001 From: Len Brown Date: Sat, 29 Apr 2017 00:11:46 -0400 Subject: [PATCH 03/69] x86: msr-index.h: fix shifts to ULL results in HWP macros. x = 1 ulong_long = x << 32; results in: warning: left shift count >= width of type x = 8 ulong_long = x << 24; results in a sign extended ulong_long Cast x to unsigned long long in these macros to prevent these errors. Signed-off-by: Len Brown --- arch/x86/include/asm/msr-index.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 50c0c3204a92..6da2b30781ff 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -238,13 +238,13 @@ #define HWP_MIN_PERF(x) (x & 0xff) #define HWP_MAX_PERF(x) ((x & 0xff) << 8) #define HWP_DESIRED_PERF(x) ((x & 0xff) << 16) -#define HWP_ENERGY_PERF_PREFERENCE(x) ((x & 0xff) << 24) +#define HWP_ENERGY_PERF_PREFERENCE(x) (((unsigned long long) x & 0xff) << 24) #define HWP_EPP_PERFORMANCE 0x00 #define HWP_EPP_BALANCE_PERFORMANCE 0x80 #define HWP_EPP_BALANCE_POWERSAVE 0xC0 #define HWP_EPP_POWERSAVE 0xFF -#define HWP_ACTIVITY_WINDOW(x) ((x & 0xff3) << 32) -#define HWP_PACKAGE_CONTROL(x) ((x & 0x1) << 42) +#define HWP_ACTIVITY_WINDOW(x) ((unsigned long long)(x & 0xff3) << 32) +#define HWP_PACKAGE_CONTROL(x) ((unsigned long long)(x & 0x1) << 42) /* IA32_HWP_STATUS */ #define HWP_GUARANTEED_CHANGE(x) (x & 0x1) From 4beec1d7519691b4b6c6b764e75b4e694a09c5f7 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Thu, 10 Dec 2015 01:28:19 -0500 Subject: [PATCH 04/69] tools/power x86_energy_perf_policy: support HWP.EPP x86_energy_perf_policy(8) was created as an example of how the user, or upper-level OS, can manage MSR_IA32_ENERGY_PERF_BIAS (EPB). Hardware consults EPB when it makes internal decisions balancing energy-saving vs performance. For example, should HW quickly or slowly transition into and out of power-saving idles states? Should HW quickly or slowly ramp frequency up or down in response to demand in the turbo-frequency range? Depending on the processor, EPB may have package, core, or CPU thread scope. As such, the only general policy is to write the same value to EPB on every CPU in the system. Recent platforms add support for Hardware Performance States (HWP). HWP effectively extends hardware frequency control from the opportunistic turbo-frequency range to control the entire range of available processor frequencies. Just as turbo-mode used EPB, HWP can use EPB to help decicde how quickly to ramp frequency and voltage up and down in response to changing demand. Indeed, BDX and BDX-DE, the first processors to support HWP, use EPB for this purpose. Starting in SKL, HWP no longer looks to EPB for influence. Instead, it looks in a new MSR specifically for this purpose: IA32_HWP_REQUEST.Energy_Performance_Preference (HWP.EPP). HWP.EPP is like EPB, except that it is specific to HWP-mode frequency selection. Also, HWP.EPP is defined to have per CPU-thread scope. Starting in SKX, IA32_HWP_REQUEST is augmented by IA32_HWP_REQUEST_PKG -- which has the same function, but is defined to have package-wide scope. A new bit in IA32_HWP_REQUEST determines if it over-rides the IA32_HWP_REQUEST_PKG or not. Note that HWP-mode can be enabled in several ways. The "in-band" method is for HWP to be exposed in CPUID, and for the Linux intel_pstate driver to recognized that, and thus enable HWP. In this case, starting in Linux 4.10, intel_pstate exports cpufreq sysfs attribute "energy_performance_preference" which can be used to manage HWP.EPP. This interface can be used to set HWP.EPP to these values: 0 performance 128 balance_performance (default) 192 balance_power 255 power Here, x86_energy_performance_policy is updated to use idential strings and values as intel_pstate. But HWP-mode may also be enabled by firmware before the OS boots, and the OS may not be aware of HWP. In this case, intel_pstate is not available to provide sysfs attributes, and x86_energy_perf_policy or a similar utility is invaluable for managing HWP.EPP, for this utility works the same, no matter if cpufreq is enabled or not. Signed-off-by: Len Brown --- .../power/x86/x86_energy_perf_policy/Makefile | 27 +- .../x86_energy_perf_policy.8 | 243 ++- .../x86_energy_perf_policy.c | 1580 ++++++++++++++--- 3 files changed, 1539 insertions(+), 311 deletions(-) diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile b/tools/power/x86/x86_energy_perf_policy/Makefile index 971c9ffdcb50..a711eec0c895 100644 --- a/tools/power/x86/x86_energy_perf_policy/Makefile +++ b/tools/power/x86/x86_energy_perf_policy/Makefile @@ -1,10 +1,27 @@ -DESTDIR ?= +CC = $(CROSS_COMPILE)gcc +BUILD_OUTPUT := $(CURDIR) +PREFIX := /usr +DESTDIR := + +ifeq ("$(origin O)", "command line") + BUILD_OUTPUT := $(O) +endif x86_energy_perf_policy : x86_energy_perf_policy.c +CFLAGS += -Wall +CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"' +%: %.c + @mkdir -p $(BUILD_OUTPUT) + $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@ + +.PHONY : clean clean : - rm -f x86_energy_perf_policy + @rm -f $(BUILD_OUTPUT)/x86_energy_perf_policy + +install : x86_energy_perf_policy + install -d $(DESTDIR)$(PREFIX)/bin + install $(BUILD_OUTPUT)/x86_energy_perf_policy $(DESTDIR)$(PREFIX)/bin/x86_energy_perf_policy + install -d $(DESTDIR)$(PREFIX)/share/man/man8 + install x86_energy_perf_policy.8 $(DESTDIR)$(PREFIX)/share/man/man8 -install : - install x86_energy_perf_policy ${DESTDIR}/usr/bin/ - install x86_energy_perf_policy.8 ${DESTDIR}/usr/share/man/man8/ diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8 b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8 index 8eaaad648cdb..17db1c3af4d0 100644 --- a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8 +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8 @@ -1,104 +1,213 @@ -.\" This page Copyright (C) 2010 Len Brown +.\" This page Copyright (C) 2010 - 2015 Len Brown .\" Distributed under the GPL, Copyleft 1994. .TH X86_ENERGY_PERF_POLICY 8 .SH NAME -x86_energy_perf_policy \- read or write MSR_IA32_ENERGY_PERF_BIAS +x86_energy_perf_policy \- Manage Energy vs. Performance Policy via x86 Model Specific Registers .SH SYNOPSIS -.ft B .B x86_energy_perf_policy -.RB [ "\-c cpu" ] -.RB [ "\-v" ] -.RB "\-r" +.RB "[ options ] [ scope ] [field \ value]" .br -.B x86_energy_perf_policy -.RB [ "\-c cpu" ] -.RB [ "\-v" ] -.RB 'performance' +.RB "scope: \-\-cpu\ cpu-list | \-\-pkg\ pkg-list" .br -.B x86_energy_perf_policy -.RB [ "\-c cpu" ] -.RB [ "\-v" ] -.RB 'normal' +.RB "cpu-list, pkg-list: # | #,# | #-# | all" .br -.B x86_energy_perf_policy -.RB [ "\-c cpu" ] -.RB [ "\-v" ] -.RB 'powersave' +.RB "field: \-\-all | \-\-epb | \-\-hwp-epp | \-\-hwp-min | \-\-hwp-max | \-\-hwp-desired" .br -.B x86_energy_perf_policy -.RB [ "\-c cpu" ] -.RB [ "\-v" ] -.RB n +.RB "other: (\-\-force | \-\-hwp-enable | \-\-turbo-enable) value)" .br +.RB "value: # | default | performance | balance-performance | balance-power | power" .SH DESCRIPTION \fBx86_energy_perf_policy\fP -allows software to convey -its policy for the relative importance of performance -versus energy savings to the processor. +displays and updates energy-performance policy settings specific to +Intel Architecture Processors. Settings are accessed via Model Specific Register (MSR) +updates, no matter if the Linux cpufreq sub-system is enabled or not. -The processor uses this information in model-specific ways -when it must select trade-offs between performance and -energy efficiency. +Policy in MSR_IA32_ENERGY_PERF_BIAS (EPB) +may affect a wide range of hardware decisions, +such as how aggressively the hardware enters and exits CPU idle states (C-states) +and Processor Performance States (P-states). +This policy hint does not replace explicit OS C-state and P-state selection. +Rather, it tells the hardware how aggressively to implement those selections. +Further, it allows the OS to influence energy/performance trade-offs where there +is no software interface, such as in the opportunistic "turbo-mode" P-state range. +Note that MSR_IA32_ENERGY_PERF_BIAS is defined per CPU, +but some implementations +share a single MSR among all CPUs in each processor package. +On those systems, a write to EPB on one processor will +be visible, and will have an effect, on all CPUs +in the same processor package. -This policy hint does not supersede Processor Performance states -(P-states) or CPU Idle power states (C-states), but allows -software to have influence where it would otherwise be unable -to express a preference. +Hardware P-States (HWP) are effectively an expansion of hardware +P-state control from the opportunistic turbo-mode P-state range +to include the entire range of available P-states. +On Broadwell Xeon, the initial HWP implementation, EBP influenced HWP. +That influence was removed in subsequent generations, +where it was moved to the +Energy_Performance_Preference (EPP) field in +a pair of dedicated MSRs -- MSR_IA32_HWP_REQUEST and MSR_IA32_HWP_REQUEST_PKG. -For example, this setting may tell the hardware how -aggressively or conservatively to control frequency -in the "turbo range" above the explicitly OS-controlled -P-state frequency range. It may also tell the hardware -how aggressively is should enter the OS requested C-states. +EPP is the most commonly managed knob in HWP mode, +but MSR_IA32_HWP_REQUEST also allows the user to specify +minimum-frequency for Quality-of-Service, +and maximum-frequency for power-capping. +MSR_IA32_HWP_REQUEST is defined per-CPU. -Support for this feature is indicated by CPUID.06H.ECX.bit3 -per the Intel Architectures Software Developer's Manual. +MSR_IA32_HWP_REQUEST_PKG has the same capability as MSR_IA32_HWP_REQUEST, +but it can simultaneously set the default policy for all CPUs within a package. +A bit in per-CPU MSR_IA32_HWP_REQUEST indicates whether it is +over-ruled-by or exempt-from MSR_IA32_HWP_REQUEST_PKG. -.SS Options -\fB-c\fP limits operation to a single CPU. -The default is to operate on all CPUs. -Note that MSR_IA32_ENERGY_PERF_BIAS is defined per -logical processor, but that the initial implementations -of the MSR were shared among all processors in each package. +MSR_HWP_CAPABILITIES shows the default values for the fields +in MSR_IA32_HWP_REQUEST. It is displayed when no values +are being written. + +.SS SCOPE OPTIONS .PP -\fB-v\fP increases verbosity. By default -x86_energy_perf_policy is silent. +\fB-c, --cpu\fP Operate on the MSR_IA32_HWP_REQUEST for each CPU in a CPU-list. +The CPU-list may be comma-separated CPU numbers, with dash for range +or the string "all". Eg. '--cpu 1,4,6-8' or '--cpu all'. +When --cpu is used, \fB--hwp-use-pkg\fP is available, which specifies whether the per-cpu +MSR_IA32_HWP_REQUEST should be over-ruled by MSR_IA32_HWP_REQUEST_PKG (1), +or exempt from MSR_IA32_HWP_REQUEST_PKG (0). + +\fB-p, --pkg\fP Operate on the MSR_IA32_HWP_REQUEST_PKG for each package in the package-list. +The list is a string of individual package numbers separated +by commas, and or ranges of package numbers separated by a dash, +or the string "all". +For example '--pkg 1,3' or '--pkg all' + +.SS VALUE OPTIONS .PP -\fB-r\fP is for "read-only" mode - the unchanged state -is read and displayed. -.PP -.I performance -Set a policy where performance is paramount. -The processor will be unwilling to sacrifice any performance -for the sake of energy saving. This is the hardware default. -.PP -.I normal +.I normal | default Set a policy with a normal balance between performance and energy efficiency. The processor will tolerate minor performance compromise for potentially significant energy savings. -This reasonable default for most desktops and servers. +This is a reasonable default for most desktops and servers. +"default" is a synonym for "normal". .PP -.I powersave +.I performance +Set a policy for maximum performance, +accepting no performance sacrifice for the benefit of energy efficiency. +.PP +.I balance-performance +Set a policy with a high priority on performance, +but allowing some performance loss to benefit energy efficiency. +.PP +.I balance-power +Set a policy where the performance and power are balanced. +This is the default. +.PP +.I power Set a policy where the processor can accept -a measurable performance hit to maximize energy efficiency. -.PP -.I n -Set MSR_IA32_ENERGY_PERF_BIAS to the specified number. -The range of valid numbers is 0-15, where 0 is maximum -performance and 15 is maximum energy efficiency. +a measurable performance impact to maximize energy efficiency. +.PP +The following table shows the mapping from the value strings above to actual MSR values. +This mapping is defined in the Linux-kernel header, msr-index.h. + +.nf +VALUE STRING EPB EPP +performance 0 0 +balance-performance 4 128 +normal, default 6 128 +balance-power 8 192 +power 15 255 +.fi +.PP +For MSR_IA32_HWP_REQUEST performance fields +(--hwp-min, --hwp-max, --hwp-desired), the value option +is in units of 100 MHz, Eg. 12 signifies 1200 MHz. + +.SS FIELD OPTIONS +\fB-a, --all value-string\fP Sets all EPB and EPP and HWP limit fields to the value associated with +the value-string. In addition, enables turbo-mode and HWP-mode, if they were previous disabled. +Thus "--all normal" will set a system without cpufreq into a well known configuration. +.PP +\fB-B, --epb\fP set EPB per-core or per-package. +See value strings in the table above. +.PP +\fB-d, --debug\fP debug increases verbosity. By default +x86_energy_perf_policy is silent for updates, +and verbose for read-only mode. +.PP +\fB-P, --hwp-epp\fP set HWP.EPP per-core or per-package. +See value strings in the table above. +.PP +\fB-m, --hwp-min\fP request HWP to not go below the specified core/bus ratio. +The "default" is the value found in IA32_HWP_CAPABILITIES.min. +.PP +\fB-M, --hwp-max\fP request HWP not exceed a the specified core/bus ratio. +The "default" is the value found in IA32_HWP_CAPABILITIES.max. +.PP +\fB-D, --hwp-desired\fP request HWP 'desired' frequency. +The "normal" setting is 0, which +corresponds to 'full autonomous' HWP control. +Non-zero performance values request a specific performance +level on this processor, specified in multiples of 100 MHz. +.PP +\fB-w, --hwp-window\fP specify integer number of microsec +in the sliding window that HWP uses to maintain average frequency. +This parameter is meaningful only when the "desired" field above is non-zero. +Default is 0, allowing the HW to choose. +.SH OTHER OPTIONS +.PP +\fB-f, --force\fP writes the specified values without bounds checking. +.PP +\fB-U, --hwp-use-pkg\fP (0 | 1), when used in conjunction with --cpu, +indicates whether the per-CPU MSR_IA32_HWP_REQUEST should be overruled (1) +or exempt (0) from per-Package MSR_IA32_HWP_REQUEST_PKG settings. +The default is exempt. +.PP +\fB-H, --hwp-enable\fP enable HardWare-P-state (HWP) mode. Once enabled, system RESET is required to disable HWP mode. +.PP +\fB-t, --turbo-enable\fP enable (1) or disable (0) turbo mode. +.PP +\fB-v, --version\fP print version and exit. +.PP +If no request to change policy is made, +the default behavior is to read +and display the current system state, +including the default capabilities. +.SH WARNING +.PP +This utility writes directly to Model Specific Registers. +There is no locking or coordination should this utility +be used to modify HWP limit fields at the same time that +intel_pstate's sysfs attributes access the same MSRs. +.PP +Note that --hwp-desired and --hwp-window are considered experimental. +Future versions of Linux reserve the right to access these +fields internally -- potentially conflicting with user-space access. +.SH EXAMPLE +.nf +# sudo x86_energy_perf_policy +cpu0: EPB 6 +cpu0: HWP_REQ: min 6 max 35 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 +cpu0: HWP_CAP: low 1 eff 8 guar 27 high 35 +cpu1: EPB 6 +cpu1: HWP_REQ: min 6 max 35 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 +cpu1: HWP_CAP: low 1 eff 8 guar 27 high 35 +cpu2: EPB 6 +cpu2: HWP_REQ: min 6 max 35 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 +cpu2: HWP_CAP: low 1 eff 8 guar 27 high 35 +cpu3: EPB 6 +cpu3: HWP_REQ: min 6 max 35 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 +cpu3: HWP_CAP: low 1 eff 8 guar 27 high 35 +.fi .SH NOTES -.B "x86_energy_perf_policy " +.B "x86_energy_perf_policy" runs only as root. .SH FILES .ta .nf /dev/cpu/*/msr .fi - .SH "SEE ALSO" +.nf msr(4) +Intel(R) 64 and IA-32 Architectures Software Developer's Manual +.fi .PP .SH AUTHORS .nf -Written by Len Brown +Len Brown diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c index 40b3e5482f8a..65bbe627a425 100644 --- a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c @@ -3,322 +3,1424 @@ * policy preference bias on recent X86 processors. */ /* - * Copyright (c) 2010, Intel Corporation. + * Copyright (c) 2010 - 2017 Intel Corporation. * Len Brown * - * This program is free software; you can redistribute it and/or modify it - * under the terms and conditions of the GNU General Public License, - * version 2, as published by the Free Software Foundation. - * - * This program is distributed in the hope it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - * more details. - * - * You should have received a copy of the GNU General Public License along with - * this program; if not, write to the Free Software Foundation, Inc., - * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + * This program is released under GPL v2 */ +#define _GNU_SOURCE +#include MSRHEADER #include #include #include +#include #include #include +#include +#include #include #include #include +#include #include #include +#include +#include -unsigned int verbose; /* set with -v */ -unsigned int read_only; /* set with -r */ +#define OPTARG_NORMAL (INT_MAX - 1) +#define OPTARG_POWER (INT_MAX - 2) +#define OPTARG_BALANCE_POWER (INT_MAX - 3) +#define OPTARG_BALANCE_PERFORMANCE (INT_MAX - 4) +#define OPTARG_PERFORMANCE (INT_MAX - 5) + +struct msr_hwp_cap { + unsigned char highest; + unsigned char guaranteed; + unsigned char efficient; + unsigned char lowest; +}; + +struct msr_hwp_request { + unsigned char hwp_min; + unsigned char hwp_max; + unsigned char hwp_desired; + unsigned char hwp_epp; + unsigned int hwp_window; + unsigned char hwp_use_pkg; +} req_update; + +unsigned int debug; +unsigned int verbose; +unsigned int force; char *progname; -unsigned long long new_bias; -int cpu = -1; +int base_cpu; +unsigned char update_epb; +unsigned long long new_epb; +unsigned char turbo_is_enabled; +unsigned char update_turbo; +unsigned char turbo_update_value; +unsigned char update_hwp_epp; +unsigned char update_hwp_min; +unsigned char update_hwp_max; +unsigned char update_hwp_desired; +unsigned char update_hwp_window; +unsigned char update_hwp_use_pkg; +unsigned char update_hwp_enable; +#define hwp_update_enabled() (update_hwp_enable | update_hwp_epp | update_hwp_max | update_hwp_min | update_hwp_desired | update_hwp_window | update_hwp_use_pkg) +int max_cpu_num; +int max_pkg_num; +#define MAX_PACKAGES 64 +unsigned int first_cpu_in_pkg[MAX_PACKAGES]; +unsigned long long pkg_present_set; +unsigned long long pkg_selected_set; +cpu_set_t *cpu_present_set; +cpu_set_t *cpu_selected_set; +int genuine_intel; + +size_t cpu_setsize; + +char *proc_stat = "/proc/stat"; + +unsigned int has_epb; /* MSR_IA32_ENERGY_PERF_BIAS */ +unsigned int has_hwp; /* IA32_PM_ENABLE, IA32_HWP_CAPABILITIES */ + /* IA32_HWP_REQUEST, IA32_HWP_STATUS */ +unsigned int has_hwp_notify; /* IA32_HWP_INTERRUPT */ +unsigned int has_hwp_activity_window; /* IA32_HWP_REQUEST[bits 41:32] */ +unsigned int has_hwp_epp; /* IA32_HWP_REQUEST[bits 31:24] */ +unsigned int has_hwp_request_pkg; /* IA32_HWP_REQUEST_PKG */ + +unsigned int bdx_highest_ratio; /* - * Usage: - * - * -c cpu: limit action to a single CPU (default is all CPUs) - * -v: verbose output (can invoke more than once) - * -r: read-only, don't change any settings - * - * performance - * Performance is paramount. - * Unwilling to sacrifice any performance - * for the sake of energy saving. (hardware default) - * - * normal - * Can tolerate minor performance compromise - * for potentially significant energy savings. - * (reasonable default for most desktops and servers) - * - * powersave - * Can tolerate significant performance hit - * to maximize energy savings. - * - * n - * a numerical value to write to the underlying MSR. + * maintain compatibility with original implementation, but don't document it: */ void usage(void) { - printf("%s: [-c cpu] [-v] " - "(-r | 'performance' | 'normal' | 'powersave' | n)\n", - progname); + fprintf(stderr, "%s [options] [scope][field value]\n", progname); + fprintf(stderr, "scope: --cpu cpu-list [--hwp-use-pkg #] | --pkg pkg-list\n"); + fprintf(stderr, "field: --all | --epb | --hwp-epp | --hwp-min | --hwp-max | --hwp-desired\n"); + fprintf(stderr, "other: --hwp-enable | --turbo-enable (0 | 1) | --help | --force\n"); + fprintf(stderr, + "value: ( # | \"normal\" | \"performance\" | \"balance-performance\" | \"balance-power\"| \"power\")\n"); + fprintf(stderr, "--hwp-window usec\n"); + + fprintf(stderr, "Specify only Energy Performance BIAS (legacy usage):\n"); + fprintf(stderr, "%s: [-c cpu] [-v] (-r | policy-value )\n", progname); + exit(1); } -#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0 +/* + * If bdx_highest_ratio is set, + * then we must translate between MSR format and simple ratio + * used on the cmdline. + */ +int ratio_2_msr_perf(int ratio) +{ + int msr_perf; -#define BIAS_PERFORMANCE 0 -#define BIAS_BALANCE 6 -#define BIAS_POWERSAVE 15 + if (!bdx_highest_ratio) + return ratio; + + msr_perf = ratio * 255 / bdx_highest_ratio; + + if (debug) + fprintf(stderr, "%d = ratio_to_msr_perf(%d)\n", msr_perf, ratio); + + return msr_perf; +} +int msr_perf_2_ratio(int msr_perf) +{ + int ratio; + double d; + + if (!bdx_highest_ratio) + return msr_perf; + + d = (double)msr_perf * (double) bdx_highest_ratio / 255.0; + d = d + 0.5; /* round */ + ratio = (int)d; + + if (debug) + fprintf(stderr, "%d = msr_perf_ratio(%d) {%f}\n", ratio, msr_perf, d); + + return ratio; +} +int parse_cmdline_epb(int i) +{ + if (!has_epb) + errx(1, "EPB not enabled on this platform"); + + update_epb = 1; + + switch (i) { + case OPTARG_POWER: + return ENERGY_PERF_BIAS_POWERSAVE; + case OPTARG_BALANCE_POWER: + return ENERGY_PERF_BIAS_BALANCE_POWERSAVE; + case OPTARG_NORMAL: + return ENERGY_PERF_BIAS_NORMAL; + case OPTARG_BALANCE_PERFORMANCE: + return ENERGY_PERF_BIAS_BALANCE_PERFORMANCE; + case OPTARG_PERFORMANCE: + return ENERGY_PERF_BIAS_PERFORMANCE; + } + if (i < 0 || i > ENERGY_PERF_BIAS_POWERSAVE) + errx(1, "--epb must be from 0 to 15"); + return i; +} + +#define HWP_CAP_LOWEST 0 +#define HWP_CAP_HIGHEST 255 + +/* + * "performance" changes hwp_min to cap.highest + * All others leave it at cap.lowest + */ +int parse_cmdline_hwp_min(int i) +{ + update_hwp_min = 1; + + switch (i) { + case OPTARG_POWER: + case OPTARG_BALANCE_POWER: + case OPTARG_NORMAL: + case OPTARG_BALANCE_PERFORMANCE: + return HWP_CAP_LOWEST; + case OPTARG_PERFORMANCE: + return HWP_CAP_HIGHEST; + } + return i; +} +/* + * "power" changes hwp_max to cap.lowest + * All others leave it at cap.highest + */ +int parse_cmdline_hwp_max(int i) +{ + update_hwp_max = 1; + + switch (i) { + case OPTARG_POWER: + return HWP_CAP_LOWEST; + case OPTARG_NORMAL: + case OPTARG_BALANCE_POWER: + case OPTARG_BALANCE_PERFORMANCE: + case OPTARG_PERFORMANCE: + return HWP_CAP_HIGHEST; + } + return i; +} +/* + * for --hwp-des, all strings leave it in autonomous mode + * If you want to change it, you need to explicitly pick a value + */ +int parse_cmdline_hwp_desired(int i) +{ + update_hwp_desired = 1; + + switch (i) { + case OPTARG_POWER: + case OPTARG_BALANCE_POWER: + case OPTARG_BALANCE_PERFORMANCE: + case OPTARG_NORMAL: + case OPTARG_PERFORMANCE: + return 0; /* autonomous */ + } + return i; +} + +int parse_cmdline_hwp_window(int i) +{ + unsigned int exponent; + + update_hwp_window = 1; + + switch (i) { + case OPTARG_POWER: + case OPTARG_BALANCE_POWER: + case OPTARG_NORMAL: + case OPTARG_BALANCE_PERFORMANCE: + case OPTARG_PERFORMANCE: + return 0; + } + if (i < 0 || i > 1270000000) { + fprintf(stderr, "--hwp-window: 0 for auto; 1 - 1270000000 usec for window duration\n"); + usage(); + } + for (exponent = 0; ; ++exponent) { + if (debug) + printf("%d 10^%d\n", i, exponent); + + if (i <= 127) + break; + + i = i / 10; + } + if (debug) + fprintf(stderr, "%d*10^%d: 0x%x\n", i, exponent, (exponent << 7) | i); + + return (exponent << 7) | i; +} +int parse_cmdline_hwp_epp(int i) +{ + update_hwp_epp = 1; + + switch (i) { + case OPTARG_POWER: + return HWP_EPP_POWERSAVE; + case OPTARG_BALANCE_POWER: + return HWP_EPP_BALANCE_POWERSAVE; + case OPTARG_NORMAL: + case OPTARG_BALANCE_PERFORMANCE: + return HWP_EPP_BALANCE_PERFORMANCE; + case OPTARG_PERFORMANCE: + return HWP_EPP_PERFORMANCE; + } + if (i < 0 || i > 0xff) { + fprintf(stderr, "--hwp-epp must be from 0 to 0xff\n"); + usage(); + } + return i; +} +int parse_cmdline_turbo(int i) +{ + update_turbo = 1; + + switch (i) { + case OPTARG_POWER: + return 0; + case OPTARG_NORMAL: + case OPTARG_BALANCE_POWER: + case OPTARG_BALANCE_PERFORMANCE: + case OPTARG_PERFORMANCE: + return 1; + } + if (i < 0 || i > 1) { + fprintf(stderr, "--turbo-enable: 1 to enable, 0 to disable\n"); + usage(); + } + return i; +} + +int parse_optarg_string(char *s) +{ + int i; + char *endptr; + + if (!strncmp(s, "default", 7)) + return OPTARG_NORMAL; + + if (!strncmp(s, "normal", 6)) + return OPTARG_NORMAL; + + if (!strncmp(s, "power", 9)) + return OPTARG_POWER; + + if (!strncmp(s, "balance-power", 17)) + return OPTARG_BALANCE_POWER; + + if (!strncmp(s, "balance-performance", 19)) + return OPTARG_BALANCE_PERFORMANCE; + + if (!strncmp(s, "performance", 11)) + return OPTARG_PERFORMANCE; + + i = strtol(s, &endptr, 0); + if (s == endptr) { + fprintf(stderr, "no digits in \"%s\"\n", s); + usage(); + } + if (i == LONG_MIN || i == LONG_MAX) + errx(-1, "%s", s); + + if (i > 0xFF) + errx(-1, "%d (0x%x) must be < 256", i, i); + + if (i < 0) + errx(-1, "%d (0x%x) must be >= 0", i, i); + return i; +} + +void parse_cmdline_all(char *s) +{ + force++; + update_hwp_enable = 1; + req_update.hwp_min = parse_cmdline_hwp_min(parse_optarg_string(s)); + req_update.hwp_max = parse_cmdline_hwp_max(parse_optarg_string(s)); + req_update.hwp_epp = parse_cmdline_hwp_epp(parse_optarg_string(s)); + if (has_epb) + new_epb = parse_cmdline_epb(parse_optarg_string(s)); + turbo_update_value = parse_cmdline_turbo(parse_optarg_string(s)); + req_update.hwp_desired = parse_cmdline_hwp_desired(parse_optarg_string(s)); + req_update.hwp_window = parse_cmdline_hwp_window(parse_optarg_string(s)); +} + +void validate_cpu_selected_set(void) +{ + int cpu; + + if (CPU_COUNT_S(cpu_setsize, cpu_selected_set) == 0) + errx(0, "no CPUs requested"); + + for (cpu = 0; cpu <= max_cpu_num; ++cpu) { + if (CPU_ISSET_S(cpu, cpu_setsize, cpu_selected_set)) + if (!CPU_ISSET_S(cpu, cpu_setsize, cpu_present_set)) + errx(1, "Requested cpu% is not present", cpu); + } +} + +void parse_cmdline_cpu(char *s) +{ + char *startp, *endp; + int cpu = 0; + + if (pkg_selected_set) { + usage(); + errx(1, "--cpu | --pkg"); + } + cpu_selected_set = CPU_ALLOC((max_cpu_num + 1)); + if (cpu_selected_set == NULL) + err(1, "cpu_selected_set"); + CPU_ZERO_S(cpu_setsize, cpu_selected_set); + + for (startp = s; startp && *startp;) { + + if (*startp == ',') { + startp++; + continue; + } + + if (*startp == '-') { + int end_cpu; + + startp++; + end_cpu = strtol(startp, &endp, 10); + if (startp == endp) + continue; + + while (cpu <= end_cpu) { + if (cpu > max_cpu_num) + errx(1, "Requested cpu%d exceeds max cpu%d", cpu, max_cpu_num); + CPU_SET_S(cpu, cpu_setsize, cpu_selected_set); + cpu++; + } + startp = endp; + continue; + } + + if (strncmp(startp, "all", 3) == 0) { + for (cpu = 0; cpu <= max_cpu_num; cpu += 1) { + if (CPU_ISSET_S(cpu, cpu_setsize, cpu_present_set)) + CPU_SET_S(cpu, cpu_setsize, cpu_selected_set); + } + startp += 3; + if (*startp == 0) + break; + } + /* "--cpu even" is not documented */ + if (strncmp(startp, "even", 4) == 0) { + for (cpu = 0; cpu <= max_cpu_num; cpu += 2) { + if (CPU_ISSET_S(cpu, cpu_setsize, cpu_present_set)) + CPU_SET_S(cpu, cpu_setsize, cpu_selected_set); + } + startp += 4; + if (*startp == 0) + break; + } + + /* "--cpu odd" is not documented */ + if (strncmp(startp, "odd", 3) == 0) { + for (cpu = 1; cpu <= max_cpu_num; cpu += 2) { + if (CPU_ISSET_S(cpu, cpu_setsize, cpu_present_set)) + CPU_SET_S(cpu, cpu_setsize, cpu_selected_set); + } + startp += 3; + if (*startp == 0) + break; + } + + cpu = strtol(startp, &endp, 10); + if (startp == endp) + errx(1, "--cpu cpu-set: confused by '%s'", startp); + if (cpu > max_cpu_num) + errx(1, "Requested cpu%d exceeds max cpu%d", cpu, max_cpu_num); + CPU_SET_S(cpu, cpu_setsize, cpu_selected_set); + startp = endp; + } + + validate_cpu_selected_set(); + +} + +void parse_cmdline_pkg(char *s) +{ + char *startp, *endp; + int pkg = 0; + + if (cpu_selected_set) { + usage(); + errx(1, "--pkg | --cpu"); + } + pkg_selected_set = 0; + + for (startp = s; startp && *startp;) { + + if (*startp == ',') { + startp++; + continue; + } + + if (*startp == '-') { + int end_pkg; + + startp++; + end_pkg = strtol(startp, &endp, 10); + if (startp == endp) + continue; + + while (pkg <= end_pkg) { + if (pkg > max_pkg_num) + errx(1, "Requested pkg%d exceeds max pkg%d", pkg, max_pkg_num); + pkg_selected_set |= 1 << pkg; + pkg++; + } + startp = endp; + continue; + } + + if (strncmp(startp, "all", 3) == 0) { + pkg_selected_set = pkg_present_set; + return; + } + + pkg = strtol(startp, &endp, 10); + if (pkg > max_pkg_num) + errx(1, "Requested pkg%d Exceeds max pkg%d", pkg, max_pkg_num); + pkg_selected_set |= 1 << pkg; + startp = endp; + } +} + +void for_packages(unsigned long long pkg_set, int (func)(int)) +{ + int pkg_num; + + for (pkg_num = 0; pkg_num <= max_pkg_num; ++pkg_num) { + if (pkg_set & (1UL << pkg_num)) + func(pkg_num); + } +} + +void print_version(void) +{ + printf("x86_energy_perf_policy 17.05.11 (C) Len Brown \n"); +} void cmdline(int argc, char **argv) { int opt; + int option_index = 0; + + static struct option long_options[] = { + {"all", required_argument, 0, 'a'}, + {"cpu", required_argument, 0, 'c'}, + {"pkg", required_argument, 0, 'p'}, + {"debug", no_argument, 0, 'd'}, + {"hwp-desired", required_argument, 0, 'D'}, + {"epb", required_argument, 0, 'B'}, + {"force", no_argument, 0, 'f'}, + {"hwp-enable", no_argument, 0, 'e'}, + {"help", no_argument, 0, 'h'}, + {"hwp-epp", required_argument, 0, 'P'}, + {"hwp-min", required_argument, 0, 'm'}, + {"hwp-max", required_argument, 0, 'M'}, + {"read", no_argument, 0, 'r'}, + {"turbo-enable", required_argument, 0, 't'}, + {"hwp-use-pkg", required_argument, 0, 'u'}, + {"version", no_argument, 0, 'v'}, + {"hwp-window", required_argument, 0, 'w'}, + {0, 0, 0, 0 } + }; progname = argv[0]; - while ((opt = getopt(argc, argv, "+rvc:")) != -1) { + while ((opt = getopt_long_only(argc, argv, "+a:c:dD:E:e:f:m:M:rt:u:vw", + long_options, &option_index)) != -1) { switch (opt) { + case 'a': + parse_cmdline_all(optarg); + break; + case 'B': + new_epb = parse_cmdline_epb(parse_optarg_string(optarg)); + break; case 'c': - cpu = atoi(optarg); + parse_cmdline_cpu(optarg); + break; + case 'e': + update_hwp_enable = 1; + break; + case 'h': + usage(); + break; + case 'd': + debug++; + verbose++; + break; + case 'f': + force++; + break; + case 'D': + req_update.hwp_desired = parse_cmdline_hwp_desired(parse_optarg_string(optarg)); + break; + case 'm': + req_update.hwp_min = parse_cmdline_hwp_min(parse_optarg_string(optarg)); + break; + case 'M': + req_update.hwp_max = parse_cmdline_hwp_max(parse_optarg_string(optarg)); + break; + case 'p': + parse_cmdline_pkg(optarg); + break; + case 'P': + req_update.hwp_epp = parse_cmdline_hwp_epp(parse_optarg_string(optarg)); break; case 'r': - read_only = 1; + /* v1 used -r to specify read-only mode, now the default */ + break; + case 't': + turbo_update_value = parse_cmdline_turbo(parse_optarg_string(optarg)); + break; + case 'u': + update_hwp_use_pkg++; + if (atoi(optarg) == 0) + req_update.hwp_use_pkg = 0; + else + req_update.hwp_use_pkg = 1; break; case 'v': - verbose++; + print_version(); + exit(0); + break; + case 'w': + req_update.hwp_window = parse_cmdline_hwp_window(parse_optarg_string(optarg)); break; default: usage(); } } - /* if -r, then should be no additional optind */ - if (read_only && (argc > optind)) - usage(); - /* - * if no -r , then must be one additional optind + * v1 allowed "performance"|"normal"|"power" with no policy specifier + * to update BIAS. Continue to support that, even though no longer documented. */ - if (!read_only) { + if (argc == optind + 1) + new_epb = parse_cmdline_epb(parse_optarg_string(argv[optind])); - if (argc != optind + 1) { - printf("must supply -r or policy param\n"); - usage(); - } - - if (!strcmp("performance", argv[optind])) { - new_bias = BIAS_PERFORMANCE; - } else if (!strcmp("normal", argv[optind])) { - new_bias = BIAS_BALANCE; - } else if (!strcmp("powersave", argv[optind])) { - new_bias = BIAS_POWERSAVE; - } else { - char *endptr; - - new_bias = strtoull(argv[optind], &endptr, 0); - if (endptr == argv[optind] || - new_bias > BIAS_POWERSAVE) { - fprintf(stderr, "invalid value: %s\n", - argv[optind]); - usage(); - } - } + if (argc > optind + 1) { + fprintf(stderr, "stray parameter '%s'\n", argv[optind + 1]); + usage(); } } + +int get_msr(int cpu, int offset, unsigned long long *msr) +{ + int retval; + char pathname[32]; + int fd; + + sprintf(pathname, "/dev/cpu/%d/msr", cpu); + fd = open(pathname, O_RDONLY); + if (fd < 0) + err(-1, "%s open failed, try chown or chmod +r /dev/cpu/*/msr, or run as root", pathname); + + retval = pread(fd, msr, sizeof(*msr), offset); + if (retval != sizeof(*msr)) + err(-1, "%s offset 0x%llx read failed", pathname, (unsigned long long)offset); + + if (debug > 1) + fprintf(stderr, "get_msr(cpu%d, 0x%X, 0x%llX)\n", cpu, offset, *msr); + + close(fd); + return 0; +} + +int put_msr(int cpu, int offset, unsigned long long new_msr) +{ + char pathname[32]; + int retval; + int fd; + + sprintf(pathname, "/dev/cpu/%d/msr", cpu); + fd = open(pathname, O_RDWR); + if (fd < 0) + err(-1, "%s open failed, try chown or chmod +r /dev/cpu/*/msr, or run as root", pathname); + + retval = pwrite(fd, &new_msr, sizeof(new_msr), offset); + if (retval != sizeof(new_msr)) + err(-2, "pwrite(cpu%d, offset 0x%x, 0x%llx) = %d", cpu, offset, new_msr, retval); + + close(fd); + + if (debug > 1) + fprintf(stderr, "put_msr(cpu%d, 0x%X, 0x%llX)\n", cpu, offset, new_msr); + + return 0; +} + +void print_hwp_cap(int cpu, struct msr_hwp_cap *cap, char *str) +{ + if (cpu != -1) + printf("cpu%d: ", cpu); + + printf("HWP_CAP: low %d eff %d guar %d high %d\n", + cap->lowest, cap->efficient, cap->guaranteed, cap->highest); +} +void read_hwp_cap(int cpu, struct msr_hwp_cap *cap, unsigned int msr_offset) +{ + unsigned long long msr; + + get_msr(cpu, msr_offset, &msr); + + cap->highest = msr_perf_2_ratio(HWP_HIGHEST_PERF(msr)); + cap->guaranteed = msr_perf_2_ratio(HWP_GUARANTEED_PERF(msr)); + cap->efficient = msr_perf_2_ratio(HWP_MOSTEFFICIENT_PERF(msr)); + cap->lowest = msr_perf_2_ratio(HWP_LOWEST_PERF(msr)); +} + +void print_hwp_request(int cpu, struct msr_hwp_request *h, char *str) +{ + if (cpu != -1) + printf("cpu%d: ", cpu); + + if (str) + printf("%s", str); + + printf("HWP_REQ: min %d max %d des %d epp %d window 0x%x (%d*10^%dus) use_pkg %d\n", + h->hwp_min, h->hwp_max, h->hwp_desired, h->hwp_epp, + h->hwp_window, h->hwp_window & 0x7F, (h->hwp_window >> 7) & 0x7, h->hwp_use_pkg); +} +void print_hwp_request_pkg(int pkg, struct msr_hwp_request *h, char *str) +{ + printf("pkg%d: ", pkg); + + if (str) + printf("%s", str); + + printf("HWP_REQ_PKG: min %d max %d des %d epp %d window 0x%x (%d*10^%dus)\n", + h->hwp_min, h->hwp_max, h->hwp_desired, h->hwp_epp, + h->hwp_window, h->hwp_window & 0x7F, (h->hwp_window >> 7) & 0x7); +} +void read_hwp_request(int cpu, struct msr_hwp_request *hwp_req, unsigned int msr_offset) +{ + unsigned long long msr; + + get_msr(cpu, msr_offset, &msr); + + hwp_req->hwp_min = msr_perf_2_ratio((((msr) >> 0) & 0xff)); + hwp_req->hwp_max = msr_perf_2_ratio((((msr) >> 8) & 0xff)); + hwp_req->hwp_desired = msr_perf_2_ratio((((msr) >> 16) & 0xff)); + hwp_req->hwp_epp = (((msr) >> 24) & 0xff); + hwp_req->hwp_window = (((msr) >> 32) & 0x3ff); + hwp_req->hwp_use_pkg = (((msr) >> 42) & 0x1); +} + +void write_hwp_request(int cpu, struct msr_hwp_request *hwp_req, unsigned int msr_offset) +{ + unsigned long long msr = 0; + + if (debug > 1) + printf("cpu%d: requesting min %d max %d des %d epp %d window 0x%0x use_pkg %d\n", + cpu, hwp_req->hwp_min, hwp_req->hwp_max, + hwp_req->hwp_desired, hwp_req->hwp_epp, + hwp_req->hwp_window, hwp_req->hwp_use_pkg); + + msr |= HWP_MIN_PERF(ratio_2_msr_perf(hwp_req->hwp_min)); + msr |= HWP_MAX_PERF(ratio_2_msr_perf(hwp_req->hwp_max)); + msr |= HWP_DESIRED_PERF(ratio_2_msr_perf(hwp_req->hwp_desired)); + msr |= HWP_ENERGY_PERF_PREFERENCE(hwp_req->hwp_epp); + msr |= HWP_ACTIVITY_WINDOW(hwp_req->hwp_window); + msr |= HWP_PACKAGE_CONTROL(hwp_req->hwp_use_pkg); + + put_msr(cpu, msr_offset, msr); +} + +int print_cpu_msrs(int cpu) +{ + unsigned long long msr; + struct msr_hwp_request req; + struct msr_hwp_cap cap; + + if (has_epb) { + get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, &msr); + + printf("cpu%d: EPB %u\n", cpu, (unsigned int) msr); + } + + if (!has_hwp) + return 0; + + read_hwp_request(cpu, &req, MSR_HWP_REQUEST); + print_hwp_request(cpu, &req, ""); + + read_hwp_cap(cpu, &cap, MSR_HWP_CAPABILITIES); + print_hwp_cap(cpu, &cap, ""); + + return 0; +} + +int print_pkg_msrs(int pkg) +{ + struct msr_hwp_request req; + unsigned long long msr; + + if (!has_hwp) + return 0; + + read_hwp_request(first_cpu_in_pkg[pkg], &req, MSR_HWP_REQUEST_PKG); + print_hwp_request_pkg(pkg, &req, ""); + + if (has_hwp_notify) { + get_msr(first_cpu_in_pkg[pkg], MSR_HWP_INTERRUPT, &msr); + fprintf(stderr, + "pkg%d: MSR_HWP_INTERRUPT: 0x%08llx (Excursion_Min-%sabled, Guaranteed_Perf_Change-%sabled)\n", + pkg, msr, + ((msr) & 0x2) ? "EN" : "Dis", + ((msr) & 0x1) ? "EN" : "Dis"); + } + get_msr(first_cpu_in_pkg[pkg], MSR_HWP_STATUS, &msr); + fprintf(stderr, + "pkg%d: MSR_HWP_STATUS: 0x%08llx (%sExcursion_Min, %sGuaranteed_Perf_Change)\n", + pkg, msr, + ((msr) & 0x4) ? "" : "No-", + ((msr) & 0x1) ? "" : "No-"); + + return 0; +} + /* - * validate_cpuid() - * returns on success, quietly exits on failure (make verbose with -v) + * Assumption: All HWP systems have 100 MHz bus clock */ -void validate_cpuid(void) +int ratio_2_sysfs_khz(int ratio) +{ + int bclk_khz = 100 * 1000; /* 100,000 KHz = 100 MHz */ + + return ratio * bclk_khz; +} +/* + * If HWP is enabled and cpufreq sysfs attribtes are present, + * then update sysfs, so that it will not become + * stale when we write to MSRs. + * (intel_pstate's max_perf_pct and min_perf_pct will follow cpufreq, + * so we don't have to touch that.) + */ +void update_cpufreq_scaling_freq(int is_max, int cpu, unsigned int ratio) +{ + char pathname[64]; + FILE *fp; + int retval; + int khz; + + sprintf(pathname, "/sys/devices/system/cpu/cpu%d/cpufreq/scaling_%s_freq", + cpu, is_max ? "max" : "min"); + + fp = fopen(pathname, "w"); + if (!fp) { + if (debug) + perror(pathname); + return; + } + + khz = ratio_2_sysfs_khz(ratio); + retval = fprintf(fp, "%d", khz); + if (retval < 0) + if (debug) + perror("fprintf"); + if (debug) + printf("echo %d > %s\n", khz, pathname); + + fclose(fp); +} + +/* + * We update all sysfs before updating any MSRs because of + * bugs in cpufreq/intel_pstate where the sysfs writes + * for a CPU may change the min/max values on other CPUS. + */ + +int update_sysfs(int cpu) +{ + if (!has_hwp) + return 0; + + if (!hwp_update_enabled()) + return 0; + + if (access("/sys/devices/system/cpu/cpu0/cpufreq", F_OK)) + return 0; + + if (update_hwp_min) + update_cpufreq_scaling_freq(0, cpu, req_update.hwp_min); + + if (update_hwp_max) + update_cpufreq_scaling_freq(1, cpu, req_update.hwp_max); + + return 0; +} + +int verify_hwp_req_self_consistency(int cpu, struct msr_hwp_request *req) +{ + /* fail if min > max requested */ + if (req->hwp_min > req->hwp_max) { + errx(1, "cpu%d: requested hwp-min %d > hwp_max %d", + cpu, req->hwp_min, req->hwp_max); + } + + /* fail if desired > max requestd */ + if (req->hwp_desired && (req->hwp_desired > req->hwp_max)) { + errx(1, "cpu%d: requested hwp-desired %d > hwp_max %d", + cpu, req->hwp_desired, req->hwp_max); + } + /* fail if desired < min requestd */ + if (req->hwp_desired && (req->hwp_desired < req->hwp_min)) { + errx(1, "cpu%d: requested hwp-desired %d < requested hwp_min %d", + cpu, req->hwp_desired, req->hwp_min); + } + + return 0; +} + +int check_hwp_request_v_hwp_capabilities(int cpu, struct msr_hwp_request *req, struct msr_hwp_cap *cap) +{ + if (update_hwp_max) { + if (req->hwp_max > cap->highest) + errx(1, "cpu%d: requested max %d > capabilities highest %d, use --force?", + cpu, req->hwp_max, cap->highest); + if (req->hwp_max < cap->lowest) + errx(1, "cpu%d: requested max %d < capabilities lowest %d, use --force?", + cpu, req->hwp_max, cap->lowest); + } + + if (update_hwp_min) { + if (req->hwp_min > cap->highest) + errx(1, "cpu%d: requested min %d > capabilities highest %d, use --force?", + cpu, req->hwp_min, cap->highest); + if (req->hwp_min < cap->lowest) + errx(1, "cpu%d: requested min %d < capabilities lowest %d, use --force?", + cpu, req->hwp_min, cap->lowest); + } + + if (update_hwp_min && update_hwp_max && (req->hwp_min > req->hwp_max)) + errx(1, "cpu%d: requested min %d > requested max %d", + cpu, req->hwp_min, req->hwp_max); + + if (update_hwp_desired && req->hwp_desired) { + if (req->hwp_desired > req->hwp_max) + errx(1, "cpu%d: requested desired %d > requested max %d, use --force?", + cpu, req->hwp_desired, req->hwp_max); + if (req->hwp_desired < req->hwp_min) + errx(1, "cpu%d: requested desired %d < requested min %d, use --force?", + cpu, req->hwp_desired, req->hwp_min); + if (req->hwp_desired < cap->lowest) + errx(1, "cpu%d: requested desired %d < capabilities lowest %d, use --force?", + cpu, req->hwp_desired, cap->lowest); + if (req->hwp_desired > cap->highest) + errx(1, "cpu%d: requested desired %d > capabilities highest %d, use --force?", + cpu, req->hwp_desired, cap->highest); + } + + return 0; +} + +int update_hwp_request(int cpu) +{ + struct msr_hwp_request req; + struct msr_hwp_cap cap; + + int msr_offset = MSR_HWP_REQUEST; + + read_hwp_request(cpu, &req, msr_offset); + if (debug) + print_hwp_request(cpu, &req, "old: "); + + if (update_hwp_min) + req.hwp_min = req_update.hwp_min; + + if (update_hwp_max) + req.hwp_max = req_update.hwp_max; + + if (update_hwp_desired) + req.hwp_desired = req_update.hwp_desired; + + if (update_hwp_window) + req.hwp_window = req_update.hwp_window; + + if (update_hwp_epp) + req.hwp_epp = req_update.hwp_epp; + + req.hwp_use_pkg = req_update.hwp_use_pkg; + + read_hwp_cap(cpu, &cap, MSR_HWP_CAPABILITIES); + if (debug) + print_hwp_cap(cpu, &cap, ""); + + if (!force) + check_hwp_request_v_hwp_capabilities(cpu, &req, &cap); + + verify_hwp_req_self_consistency(cpu, &req); + + write_hwp_request(cpu, &req, msr_offset); + + if (debug) { + read_hwp_request(cpu, &req, msr_offset); + print_hwp_request(cpu, &req, "new: "); + } + return 0; +} +int update_hwp_request_pkg(int pkg) +{ + struct msr_hwp_request req; + struct msr_hwp_cap cap; + int cpu = first_cpu_in_pkg[pkg]; + + int msr_offset = MSR_HWP_REQUEST_PKG; + + read_hwp_request(cpu, &req, msr_offset); + if (debug) + print_hwp_request_pkg(pkg, &req, "old: "); + + if (update_hwp_min) + req.hwp_min = req_update.hwp_min; + + if (update_hwp_max) + req.hwp_max = req_update.hwp_max; + + if (update_hwp_desired) + req.hwp_desired = req_update.hwp_desired; + + if (update_hwp_window) + req.hwp_window = req_update.hwp_window; + + if (update_hwp_epp) + req.hwp_epp = req_update.hwp_epp; + + read_hwp_cap(cpu, &cap, MSR_HWP_CAPABILITIES); + if (debug) + print_hwp_cap(cpu, &cap, ""); + + if (!force) + check_hwp_request_v_hwp_capabilities(cpu, &req, &cap); + + verify_hwp_req_self_consistency(cpu, &req); + + write_hwp_request(cpu, &req, msr_offset); + + if (debug) { + read_hwp_request(cpu, &req, msr_offset); + print_hwp_request_pkg(pkg, &req, "new: "); + } + return 0; +} + +int enable_hwp_on_cpu(int cpu) +{ + unsigned long long msr; + + get_msr(cpu, MSR_PM_ENABLE, &msr); + put_msr(cpu, MSR_PM_ENABLE, 1); + + if (verbose) + printf("cpu%d: MSR_PM_ENABLE old: %d new: %d\n", cpu, (unsigned int) msr, 1); + + return 0; +} + +int update_cpu_msrs(int cpu) +{ + unsigned long long msr; + + + if (update_epb) { + get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, &msr); + put_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, new_epb); + + if (verbose) + printf("cpu%d: ENERGY_PERF_BIAS old: %d new: %d\n", + cpu, (unsigned int) msr, (unsigned int) new_epb); + } + + if (update_turbo) { + int turbo_is_present_and_disabled; + + get_msr(cpu, MSR_IA32_MISC_ENABLE, &msr); + + turbo_is_present_and_disabled = ((msr & MSR_IA32_MISC_ENABLE_TURBO_DISABLE) != 0); + + if (turbo_update_value == 1) { + if (turbo_is_present_and_disabled) { + msr &= ~MSR_IA32_MISC_ENABLE_TURBO_DISABLE; + put_msr(cpu, MSR_IA32_MISC_ENABLE, msr); + if (verbose) + printf("cpu%d: turbo ENABLE\n", cpu); + } + } else { + /* + * if "turbo_is_enabled" were known to be describe this cpu + * then we could use it here to skip redundant disable requests. + * but cpu may be in a different package, so we always write. + */ + msr |= MSR_IA32_MISC_ENABLE_TURBO_DISABLE; + put_msr(cpu, MSR_IA32_MISC_ENABLE, msr); + if (verbose) + printf("cpu%d: turbo DISABLE\n", cpu); + } + } + + if (!has_hwp) + return 0; + + if (!hwp_update_enabled()) + return 0; + + update_hwp_request(cpu); + return 0; +} + +/* + * Open a file, and exit on failure + */ +FILE *fopen_or_die(const char *path, const char *mode) +{ + FILE *filep = fopen(path, "r"); + + if (!filep) + err(1, "%s: open failed", path); + return filep; +} + +unsigned int get_pkg_num(int cpu) +{ + FILE *fp; + char pathname[128]; + unsigned int pkg; + int retval; + + sprintf(pathname, "/sys/devices/system/cpu/cpu%d/topology/physical_package_id", cpu); + + fp = fopen_or_die(pathname, "r"); + retval = fscanf(fp, "%d\n", &pkg); + if (retval != 1) + errx(1, "%s: failed to parse", pathname); + return pkg; +} + +int set_max_cpu_pkg_num(int cpu) +{ + unsigned int pkg; + + if (max_cpu_num < cpu) + max_cpu_num = cpu; + + pkg = get_pkg_num(cpu); + + if (pkg >= MAX_PACKAGES) + errx(1, "cpu%d: %d >= MAX_PACKAGES (%d)", cpu, pkg, MAX_PACKAGES); + + if (pkg > max_pkg_num) + max_pkg_num = pkg; + + if ((pkg_present_set & (1ULL << pkg)) == 0) { + pkg_present_set |= (1ULL << pkg); + first_cpu_in_pkg[pkg] = cpu; + } + + return 0; +} +int mark_cpu_present(int cpu) +{ + CPU_SET_S(cpu, cpu_setsize, cpu_present_set); + return 0; +} + +/* + * run func(cpu) on every cpu in /proc/stat + * return max_cpu number + */ +int for_all_proc_cpus(int (func)(int)) +{ + FILE *fp; + int cpu_num; + int retval; + + fp = fopen_or_die(proc_stat, "r"); + + retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n"); + if (retval != 0) + err(1, "%s: failed to parse format", proc_stat); + + while (1) { + retval = fscanf(fp, "cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n", &cpu_num); + if (retval != 1) + break; + + retval = func(cpu_num); + if (retval) { + fclose(fp); + return retval; + } + } + fclose(fp); + return 0; +} + +void for_all_cpus_in_set(size_t set_size, cpu_set_t *cpu_set, int (func)(int)) +{ + int cpu_num; + + for (cpu_num = 0; cpu_num <= max_cpu_num; ++cpu_num) + if (CPU_ISSET_S(cpu_num, set_size, cpu_set)) + func(cpu_num); +} + +void init_data_structures(void) +{ + for_all_proc_cpus(set_max_cpu_pkg_num); + + cpu_setsize = CPU_ALLOC_SIZE((max_cpu_num + 1)); + + cpu_present_set = CPU_ALLOC((max_cpu_num + 1)); + if (cpu_present_set == NULL) + err(3, "CPU_ALLOC"); + CPU_ZERO_S(cpu_setsize, cpu_present_set); + for_all_proc_cpus(mark_cpu_present); +} + +/* clear has_hwp if it is not enable (or being enabled) */ + +void verify_hwp_is_enabled(void) +{ + unsigned long long msr; + + if (!has_hwp) /* set in early_cpuid() */ + return; + + /* MSR_PM_ENABLE[1] == 1 if HWP is enabled and MSRs visible */ + get_msr(base_cpu, MSR_PM_ENABLE, &msr); + if ((msr & 1) == 0) { + fprintf(stderr, "HWP can be enabled using '--hwp-enable'\n"); + has_hwp = 0; + return; + } +} + +int req_update_bounds_check(void) +{ + if (!hwp_update_enabled()) + return 0; + + /* fail if min > max requested */ + if ((update_hwp_max && update_hwp_min) && + (req_update.hwp_min > req_update.hwp_max)) { + printf("hwp-min %d > hwp_max %d\n", req_update.hwp_min, req_update.hwp_max); + return -EINVAL; + } + + /* fail if desired > max requestd */ + if (req_update.hwp_desired && update_hwp_max && + (req_update.hwp_desired > req_update.hwp_max)) { + printf("hwp-desired cannot be greater than hwp_max\n"); + return -EINVAL; + } + /* fail if desired < min requestd */ + if (req_update.hwp_desired && update_hwp_min && + (req_update.hwp_desired < req_update.hwp_min)) { + printf("hwp-desired cannot be less than hwp_min\n"); + return -EINVAL; + } + + return 0; +} + +void set_base_cpu(void) +{ + base_cpu = sched_getcpu(); + if (base_cpu < 0) + err(-ENODEV, "No valid cpus found"); +} + + +void probe_dev_msr(void) +{ + struct stat sb; + char pathname[32]; + + sprintf(pathname, "/dev/cpu/%d/msr", base_cpu); + if (stat(pathname, &sb)) + if (system("/sbin/modprobe msr > /dev/null 2>&1")) + err(-5, "no /dev/cpu/0/msr, Try \"# modprobe msr\" "); +} +/* + * early_cpuid() + * initialize turbo_is_enabled, has_hwp, has_epb + * before cmdline is parsed + */ +void early_cpuid(void) +{ + unsigned int eax, ebx, ecx, edx, max_level; + unsigned int fms, family, model; + + __get_cpuid(0, &max_level, &ebx, &ecx, &edx); + + if (max_level < 6) + errx(1, "Processor not supported\n"); + + __get_cpuid(1, &fms, &ebx, &ecx, &edx); + family = (fms >> 8) & 0xf; + model = (fms >> 4) & 0xf; + if (family == 6 || family == 0xf) + model += ((fms >> 16) & 0xf) << 4; + + if (model == 0x4F) { + unsigned long long msr; + + get_msr(base_cpu, MSR_TURBO_RATIO_LIMIT, &msr); + + bdx_highest_ratio = msr & 0xFF; + } + + __get_cpuid(0x6, &eax, &ebx, &ecx, &edx); + turbo_is_enabled = (eax >> 1) & 1; + has_hwp = (eax >> 7) & 1; + has_epb = (ecx >> 3) & 1; +} + +/* + * parse_cpuid() + * set + * has_hwp, has_hwp_notify, has_hwp_activity_window, has_hwp_epp, has_hwp_request_pkg, has_epb + */ +void parse_cpuid(void) { unsigned int eax, ebx, ecx, edx, max_level; unsigned int fms, family, model, stepping; eax = ebx = ecx = edx = 0; - asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx), - "=d" (edx) : "a" (0)); + __get_cpuid(0, &max_level, &ebx, &ecx, &edx); - if (ebx != 0x756e6547 || edx != 0x49656e69 || ecx != 0x6c65746e) { - if (verbose) - fprintf(stderr, "%.4s%.4s%.4s != GenuineIntel", - (char *)&ebx, (char *)&edx, (char *)&ecx); - exit(1); - } + if (ebx == 0x756e6547 && edx == 0x49656e69 && ecx == 0x6c65746e) + genuine_intel = 1; - asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx"); + if (debug) + fprintf(stderr, "CPUID(0): %.4s%.4s%.4s ", + (char *)&ebx, (char *)&edx, (char *)&ecx); + + __get_cpuid(1, &fms, &ebx, &ecx, &edx); family = (fms >> 8) & 0xf; model = (fms >> 4) & 0xf; stepping = fms & 0xf; if (family == 6 || family == 0xf) model += ((fms >> 16) & 0xf) << 4; - if (verbose > 1) - printf("CPUID %d levels family:model:stepping " - "0x%x:%x:%x (%d:%d:%d)\n", max_level, - family, model, stepping, family, model, stepping); - - if (!(edx & (1 << 5))) { - if (verbose) - printf("CPUID: no MSR\n"); - exit(1); + if (debug) { + fprintf(stderr, "%d CPUID levels; family:model:stepping 0x%x:%x:%x (%d:%d:%d)\n", + max_level, family, model, stepping, family, model, stepping); + fprintf(stderr, "CPUID(1): %s %s %s %s %s %s %s %s\n", + ecx & (1 << 0) ? "SSE3" : "-", + ecx & (1 << 3) ? "MONITOR" : "-", + ecx & (1 << 7) ? "EIST" : "-", + ecx & (1 << 8) ? "TM2" : "-", + edx & (1 << 4) ? "TSC" : "-", + edx & (1 << 5) ? "MSR" : "-", + edx & (1 << 22) ? "ACPI-TM" : "-", + edx & (1 << 29) ? "TM" : "-"); } - /* - * Support for MSR_IA32_ENERGY_PERF_BIAS - * is indicated by CPUID.06H.ECX.bit3 - */ - asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (6)); - if (verbose) - printf("CPUID.06H.ECX: 0x%x\n", ecx); - if (!(ecx & (1 << 3))) { - if (verbose) - printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n"); - exit(1); - } + if (!(edx & (1 << 5))) + errx(1, "CPUID: no MSR"); + + + __get_cpuid(0x6, &eax, &ebx, &ecx, &edx); + /* turbo_is_enabled already set */ + /* has_hwp already set */ + has_hwp_notify = eax & (1 << 8); + has_hwp_activity_window = eax & (1 << 9); + has_hwp_epp = eax & (1 << 10); + has_hwp_request_pkg = eax & (1 << 11); + + if (!has_hwp_request_pkg && update_hwp_use_pkg) + errx(1, "--hwp-use-pkg is not available on this hardware"); + + /* has_epb already set */ + + if (debug) + fprintf(stderr, + "CPUID(6): %sTURBO, %sHWP, %sHWPnotify, %sHWPwindow, %sHWPepp, %sHWPpkg, %sEPB\n", + turbo_is_enabled ? "" : "No-", + has_hwp ? "" : "No-", + has_hwp_notify ? "" : "No-", + has_hwp_activity_window ? "" : "No-", + has_hwp_epp ? "" : "No-", + has_hwp_request_pkg ? "" : "No-", + has_epb ? "" : "No-"); + return; /* success */ } -unsigned long long get_msr(int cpu, int offset) -{ - unsigned long long msr; - char msr_path[32]; - int retval; - int fd; - - sprintf(msr_path, "/dev/cpu/%d/msr", cpu); - fd = open(msr_path, O_RDONLY); - if (fd < 0) { - printf("Try \"# modprobe msr\"\n"); - perror(msr_path); - exit(1); - } - - retval = pread(fd, &msr, sizeof msr, offset); - - if (retval != sizeof msr) { - printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval); - exit(-2); - } - close(fd); - return msr; -} - -unsigned long long put_msr(int cpu, unsigned long long new_msr, int offset) -{ - unsigned long long old_msr; - char msr_path[32]; - int retval; - int fd; - - sprintf(msr_path, "/dev/cpu/%d/msr", cpu); - fd = open(msr_path, O_RDWR); - if (fd < 0) { - perror(msr_path); - exit(1); - } - - retval = pread(fd, &old_msr, sizeof old_msr, offset); - if (retval != sizeof old_msr) { - perror("pwrite"); - printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval); - exit(-2); - } - - retval = pwrite(fd, &new_msr, sizeof new_msr, offset); - if (retval != sizeof new_msr) { - perror("pwrite"); - printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval); - exit(-2); - } - - close(fd); - - return old_msr; -} - -void print_msr(int cpu) -{ - printf("cpu%d: 0x%016llx\n", - cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS)); -} - -void update_msr(int cpu) -{ - unsigned long long previous_msr; - - previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS); - - if (verbose) - printf("cpu%d msr0x%x 0x%016llx -> 0x%016llx\n", - cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias); - - return; -} - -char *proc_stat = "/proc/stat"; -/* - * run func() on every cpu in /dev/cpu - */ -void for_every_cpu(void (func)(int)) -{ - FILE *fp; - int retval; - - fp = fopen(proc_stat, "r"); - if (fp == NULL) { - perror(proc_stat); - exit(1); - } - - retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n"); - if (retval != 0) { - perror("/proc/stat format"); - exit(1); - } - - while (1) { - int cpu; - - retval = fscanf(fp, - "cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n", - &cpu); - if (retval != 1) - break; - - func(cpu); - } - fclose(fp); -} - int main(int argc, char **argv) { + set_base_cpu(); + probe_dev_msr(); + init_data_structures(); + + early_cpuid(); /* initial cpuid parse before cmdline */ + cmdline(argc, argv); - if (verbose > 1) - printf("x86_energy_perf_policy Nov 24, 2010" - " - Len Brown \n"); - if (verbose > 1 && !read_only) - printf("new_bias %lld\n", new_bias); + if (debug) + print_version(); - validate_cpuid(); + parse_cpuid(); - if (cpu != -1) { - if (read_only) - print_msr(cpu); - else - update_msr(cpu); - } else { - if (read_only) - for_every_cpu(print_msr); - else - for_every_cpu(update_msr); + /* If CPU-set and PKG-set are not initialized, default to all CPUs */ + if ((cpu_selected_set == 0) && (pkg_selected_set == 0)) + cpu_selected_set = cpu_present_set; + + /* + * If HWP is being enabled, do it now, so that subsequent operations + * that access HWP registers can work. + */ + if (update_hwp_enable) + for_all_cpus_in_set(cpu_setsize, cpu_selected_set, enable_hwp_on_cpu); + + /* If HWP present, but disabled, warn and ignore from here forward */ + verify_hwp_is_enabled(); + + if (req_update_bounds_check()) + return -EINVAL; + + /* display information only, no updates to settings */ + if (!update_epb && !update_turbo && !hwp_update_enabled()) { + if (cpu_selected_set) + for_all_cpus_in_set(cpu_setsize, cpu_selected_set, print_cpu_msrs); + + if (has_hwp_request_pkg) { + if (pkg_selected_set == 0) + pkg_selected_set = pkg_present_set; + + for_packages(pkg_selected_set, print_pkg_msrs); + } + + return 0; } + /* update CPU set */ + if (cpu_selected_set) { + for_all_cpus_in_set(cpu_setsize, cpu_selected_set, update_sysfs); + for_all_cpus_in_set(cpu_setsize, cpu_selected_set, update_cpu_msrs); + } else if (pkg_selected_set) + for_packages(pkg_selected_set, update_hwp_request_pkg); + return 0; } From 3cedbc5a6d7f7c5539e139f89ec9f6e1ed668418 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Mon, 1 May 2017 23:06:08 -0400 Subject: [PATCH 05/69] intel_pstate: use updated msr-index.h HWP.EPP values intel_pstate exports sysfs attributes for setting and observing HWP.EPP. These attributes use strings to describe 4 operating states, and inside the driver, these strings are mapped to numerical register values. The authorative mapping between the strings and numerical HWP.EPP values are now globally defined in msr-index.h, replacing the out-dated mapping that were open-coded into intel_pstate.c new old string --- --- ------ 0 0 performance 128 64 balance_performance 192 128 balance_power 255 192 power Note that the HW and BIOS default value on most system is 128, which intel_pstate will now call "balance_performance" while it used to call it "balance_power". Signed-off-by: Len Brown --- drivers/cpufreq/intel_pstate.c | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 50bd6d987fc3..ab8ebaeb3621 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -716,6 +716,12 @@ static const char * const energy_perf_strings[] = { "power", NULL }; +static const unsigned int epp_values[] = { + HWP_EPP_PERFORMANCE, + HWP_EPP_BALANCE_PERFORMANCE, + HWP_EPP_BALANCE_POWERSAVE, + HWP_EPP_POWERSAVE +}; static int intel_pstate_get_energy_pref_index(struct cpudata *cpu_data) { @@ -727,17 +733,14 @@ static int intel_pstate_get_energy_pref_index(struct cpudata *cpu_data) return epp; if (static_cpu_has(X86_FEATURE_HWP_EPP)) { - /* - * Range: - * 0x00-0x3F : Performance - * 0x40-0x7F : Balance performance - * 0x80-0xBF : Balance power - * 0xC0-0xFF : Power - * The EPP is a 8 bit value, but our ranges restrict the - * value which can be set. Here only using top two bits - * effectively. - */ - index = (epp >> 6) + 1; + if (epp == HWP_EPP_PERFORMANCE) + return 1; + if (epp <= HWP_EPP_BALANCE_PERFORMANCE) + return 2; + if (epp <= HWP_EPP_BALANCE_POWERSAVE) + return 3; + else + return 4; } else if (static_cpu_has(X86_FEATURE_EPB)) { /* * Range: @@ -775,15 +778,8 @@ static int intel_pstate_set_energy_pref_index(struct cpudata *cpu_data, value &= ~GENMASK_ULL(31, 24); - /* - * If epp is not default, convert from index into - * energy_perf_strings to epp value, by shifting 6 - * bits left to use only top two bits in epp. - * The resultant epp need to shifted by 24 bits to - * epp position in MSR_HWP_REQUEST. - */ if (epp == -EINVAL) - epp = (pref_index - 1) << 6; + epp = epp_values[pref_index - 1]; value |= (u64)epp << 24; ret = wrmsrl_on_cpu(cpu_data->cpu, MSR_HWP_REQUEST, value); From 64fd1c7040880292710e6592ddc88d0d73cfb6fb Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 12 Jun 2017 22:48:41 +0200 Subject: [PATCH 06/69] ACPI / PM: Run wakeup notify handlers synchronously The work functions provided by the users of acpi_add_pm_notifier() should be run synchronously before re-enabling the wakeup GPE in case they are used to clear the status and/or disable the wakeup signaling at the source. Otherwise, which is the case currently in the PCI bus type code, the same wakeup event may be signaled for multiple times while the execution of the work function in response to it has already been queued up. Fortunately, acpi_add_pm_notifier() is only used by PCI and by ACPI device PM code internally, so the change is relatively straightforward to make. Signed-off-by: Rafael J. Wysocki Acked-by: Bjorn Helgaas --- drivers/acpi/device_pm.c | 27 +++++++++++---------------- drivers/pci/pci-acpi.c | 17 +++++++---------- include/acpi/acpi_bus.h | 4 ++-- 3 files changed, 20 insertions(+), 28 deletions(-) diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index 993fd31394c8..a199983390b6 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -400,8 +400,8 @@ static void acpi_pm_notify_handler(acpi_handle handle, u32 val, void *not_used) if (adev->wakeup.flags.notifier_present) { __pm_wakeup_event(adev->wakeup.ws, 0); - if (adev->wakeup.context.work.func) - queue_pm_work(&adev->wakeup.context.work); + if (adev->wakeup.context.func) + adev->wakeup.context.func(&adev->wakeup.context); } mutex_unlock(&acpi_pm_notifier_lock); @@ -413,7 +413,7 @@ static void acpi_pm_notify_handler(acpi_handle handle, u32 val, void *not_used) * acpi_add_pm_notifier - Register PM notify handler for given ACPI device. * @adev: ACPI device to add the notify handler for. * @dev: Device to generate a wakeup event for while handling the notification. - * @work_func: Work function to execute when handling the notification. + * @func: Work function to execute when handling the notification. * * NOTE: @adev need not be a run-wake or wakeup device to be a valid source of * PM wakeup events. For example, wakeup events may be generated for bridges @@ -421,11 +421,11 @@ static void acpi_pm_notify_handler(acpi_handle handle, u32 val, void *not_used) * bridge itself doesn't have a wakeup GPE associated with it. */ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, - void (*work_func)(struct work_struct *work)) + void (*func)(struct acpi_device_wakeup_context *context)) { acpi_status status = AE_ALREADY_EXISTS; - if (!dev && !work_func) + if (!dev && !func) return AE_BAD_PARAMETER; mutex_lock(&acpi_pm_notifier_lock); @@ -435,8 +435,7 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); adev->wakeup.context.dev = dev; - if (work_func) - INIT_WORK(&adev->wakeup.context.work, work_func); + adev->wakeup.context.func = func; status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, acpi_pm_notify_handler, NULL); @@ -469,10 +468,7 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) if (ACPI_FAILURE(status)) goto out; - if (adev->wakeup.context.work.func) { - cancel_work_sync(&adev->wakeup.context.work); - adev->wakeup.context.work.func = NULL; - } + adev->wakeup.context.func = NULL; adev->wakeup.context.dev = NULL; wakeup_source_unregister(adev->wakeup.ws); @@ -658,16 +654,15 @@ EXPORT_SYMBOL(acpi_pm_device_sleep_state); /** * acpi_pm_notify_work_func - ACPI devices wakeup notification work function. - * @work: Work item to handle. + * @context: Device wakeup context. */ -static void acpi_pm_notify_work_func(struct work_struct *work) +static void acpi_pm_notify_work_func(struct acpi_device_wakeup_context *context) { - struct device *dev; + struct device *dev = context->dev; - dev = container_of(work, struct acpi_device_wakeup_context, work)->dev; if (dev) { pm_wakeup_event(dev, 0); - pm_runtime_resume(dev); + pm_request_resume(dev); } } diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index 001860361434..9284ae6f669a 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -395,29 +395,26 @@ bool pciehp_is_native(struct pci_dev *pdev) /** * pci_acpi_wake_bus - Root bus wakeup notification fork function. - * @work: Work item to handle. + * @context: Device wakeup context. */ -static void pci_acpi_wake_bus(struct work_struct *work) +static void pci_acpi_wake_bus(struct acpi_device_wakeup_context *context) { struct acpi_device *adev; struct acpi_pci_root *root; - adev = container_of(work, struct acpi_device, wakeup.context.work); + adev = container_of(context, struct acpi_device, wakeup.context); root = acpi_driver_data(adev); pci_pme_wakeup_bus(root->bus); } /** * pci_acpi_wake_dev - PCI device wakeup notification work function. - * @handle: ACPI handle of a device the notification is for. - * @work: Work item to handle. + * @context: Device wakeup context. */ -static void pci_acpi_wake_dev(struct work_struct *work) +static void pci_acpi_wake_dev(struct acpi_device_wakeup_context *context) { - struct acpi_device_wakeup_context *context; struct pci_dev *pci_dev; - context = container_of(work, struct acpi_device_wakeup_context, work); pci_dev = to_pci_dev(context->dev); if (pci_dev->pme_poll) @@ -425,7 +422,7 @@ static void pci_acpi_wake_dev(struct work_struct *work) if (pci_dev->current_state == PCI_D3cold) { pci_wakeup_event(pci_dev); - pm_runtime_resume(&pci_dev->dev); + pm_request_resume(&pci_dev->dev); return; } @@ -434,7 +431,7 @@ static void pci_acpi_wake_dev(struct work_struct *work) pci_check_pme_status(pci_dev); pci_wakeup_event(pci_dev); - pm_runtime_resume(&pci_dev->dev); + pm_request_resume(&pci_dev->dev); pci_pme_wakeup_bus(pci_dev->subordinate); } diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 197f3fffc9a7..79c0af419300 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -319,7 +319,7 @@ struct acpi_device_wakeup_flags { }; struct acpi_device_wakeup_context { - struct work_struct work; + void (*func)(struct acpi_device_wakeup_context *context); struct device *dev; }; @@ -599,7 +599,7 @@ static inline bool acpi_device_always_present(struct acpi_device *adev) #ifdef CONFIG_PM acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, - void (*work_func)(struct work_struct *work)); + void (*func)(struct acpi_device_wakeup_context *context)); acpi_status acpi_remove_pm_notifier(struct acpi_device *adev); int acpi_pm_device_sleep_state(struct device *, int *, int); int acpi_pm_device_run_wake(struct device *, bool); From d438aa223e931bfb74dd459bb6c7f3139c48b528 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 12 Jun 2017 22:49:40 +0200 Subject: [PATCH 07/69] USB / PCI / PM: Allow the PCI core to do the resume cleanup hcd_pci_resume_noirq() used as a universal _resume_noirq handler for PCI USB controllers calls pci_back_from_sleep() which is unnecessary and may become problematic. It is unnecessary, because the PCI bus type carries out post-suspend cleanup of all PCI devices during resume and that covers all things done by the pci_back_from_sleep(). There is no reason why USB cannot follow all of the other PCI devices in that respect. It will become problematic after subsequent changes that make it possible to go back to sleep again after executing dpm_resume_noirq() if no valid system wakeup events have been detected at that point. Namely, calling pci_back_from_sleep() at the _resume_noirq stage will cause the wakeup status of the devices in question to be cleared and if any of them has triggered system wakeup, that event may be missed then. For the above reasons, drop the pci_back_from_sleep() invocation from hcd_pci_resume_noirq(). Signed-off-by: Rafael J. Wysocki Acked-by: Alan Stern Acked-by: Greg Kroah-Hartman --- drivers/usb/core/hcd-pci.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c index 7859d738df41..ea829ad798c0 100644 --- a/drivers/usb/core/hcd-pci.c +++ b/drivers/usb/core/hcd-pci.c @@ -584,12 +584,7 @@ static int hcd_pci_suspend_noirq(struct device *dev) static int hcd_pci_resume_noirq(struct device *dev) { - struct pci_dev *pci_dev = to_pci_dev(dev); - - powermac_set_asic(pci_dev, 1); - - /* Go back to D0 and disable remote wakeup */ - pci_back_from_sleep(pci_dev); + powermac_set_asic(to_pci_dev(dev), 1); return 0; } From 190cab84711a3f453e2100d0c9238f42261cf426 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 12 Jun 2017 22:50:24 +0200 Subject: [PATCH 08/69] ACPI / PM: Change log level of wakeup-related message Change the log level of the "System wakeup enabled/disabled by ACPI" message in acpi_pm_device_sleep_wake() to "debug" to reduce to log noise level. Signed-off-by: Rafael J. Wysocki --- drivers/acpi/device_pm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index a199983390b6..fa7405286d71 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -756,8 +756,8 @@ int acpi_pm_device_sleep_wake(struct device *dev, bool enable) error = acpi_device_wakeup(adev, acpi_target_system_state(), enable); if (!error) - dev_info(dev, "System wakeup %s by ACPI\n", - enable ? "enabled" : "disabled"); + dev_dbg(dev, "System wakeup %s by ACPI\n", + enable ? "enabled" : "disabled"); return error; } From 235d81a630ca2d39818da96f0c14bc960ffbaeb5 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 12 Jun 2017 22:51:07 +0200 Subject: [PATCH 09/69] ACPI / PM: Clean up device wakeup enable/disable code The wakeup.flags.enabled flag in struct acpi_device is not used consistently, as there is no reason why it should only apply to the enabling/disabling of the wakeup GPE, so put the invocation of acpi_enable_wakeup_device_power() under it too. Moreover, it is not necessary to call acpi_enable_wakeup_devices() and acpi_disable_wakeup_devices() for suspend-to-idle, so don't do that. Signed-off-by: Rafael J. Wysocki --- drivers/acpi/device_pm.c | 19 ++++++++----------- drivers/acpi/sleep.c | 4 ++-- 2 files changed, 10 insertions(+), 13 deletions(-) diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index fa7405286d71..f13c62c4b117 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -688,26 +688,23 @@ static int acpi_device_wakeup(struct acpi_device *adev, u32 target_state, acpi_status res; int error; + if (adev->wakeup.flags.enabled) + return 0; + error = acpi_enable_wakeup_device_power(adev, target_state); if (error) return error; - if (adev->wakeup.flags.enabled) - return 0; - res = acpi_enable_gpe(wakeup->gpe_device, wakeup->gpe_number); - if (ACPI_SUCCESS(res)) { - adev->wakeup.flags.enabled = 1; - } else { + if (ACPI_FAILURE(res)) { acpi_disable_wakeup_device_power(adev); return -EIO; } - } else { - if (adev->wakeup.flags.enabled) { - acpi_disable_gpe(wakeup->gpe_device, wakeup->gpe_number); - adev->wakeup.flags.enabled = 0; - } + adev->wakeup.flags.enabled = 1; + } else if (adev->wakeup.flags.enabled) { + acpi_disable_gpe(wakeup->gpe_device, wakeup->gpe_number); acpi_disable_wakeup_device_power(adev); + adev->wakeup.flags.enabled = 0; } return 0; } diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c index 097d630ab886..a4782c75ebdd 100644 --- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -658,19 +658,19 @@ static int acpi_freeze_begin(void) static int acpi_freeze_prepare(void) { - acpi_enable_wakeup_devices(ACPI_STATE_S0); acpi_enable_all_wakeup_gpes(); acpi_os_wait_events_complete(); if (acpi_sci_irq_valid()) enable_irq_wake(acpi_sci_irq); + return 0; } static void acpi_freeze_restore(void) { - acpi_disable_wakeup_devices(ACPI_STATE_S0); if (acpi_sci_irq_valid()) disable_irq_wake(acpi_sci_irq); + acpi_enable_all_runtime_gpes(); } From 604d895857c2fc6748feb805c2290a60491eed43 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 12 Jun 2017 22:51:55 +0200 Subject: [PATCH 10/69] PM / sleep: Print timing information if debug is enabled Avoid printing the device suspend/resume timing information if CONFIG_PM_DEBUG is not set to reduce the log noise level. Signed-off-by: Rafael J. Wysocki --- drivers/base/power/main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 9faee1c893e5..253f860e8981 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -417,6 +417,7 @@ static void pm_dev_err(struct device *dev, pm_message_t state, char *info, dev_name(dev), pm_verb(state.event), info, error); } +#ifdef CONFIG_PM_DEBUG static void dpm_show_time(ktime_t starttime, pm_message_t state, char *info) { ktime_t calltime; @@ -433,6 +434,9 @@ static void dpm_show_time(ktime_t starttime, pm_message_t state, char *info) info ?: "", info ? " " : "", pm_verb(state.event), usecs / USEC_PER_MSEC, usecs % USEC_PER_MSEC); } +#else +static inline void dpm_show_time(ktime_t starttime, pm_message_t state, char *info) {} +#endif /* CONFIG_PM_DEBUG */ static int dpm_run_callback(pm_callback_t cb, struct device *dev, pm_message_t state, char *info) From dc15e71eefc766373833602c353cf6b4f49da036 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 12 Jun 2017 22:53:36 +0200 Subject: [PATCH 11/69] PCI / PM: Restore PME Enable if skipping wakeup setup The wakeup_prepared PCI device flag is used for preventing subsequent changes of PCI device wakeup settings in the same way (e.g. enabling device wakeup twice in a row). However, in some cases PME Enable may be updated by things like PCI configuration space restoration in the meantime and it may need to be set again even though the rest of the settings need not change, so modify __pci_enable_wake() to do that when it is about to return early. Also, it is reasonable to expect that __pci_enable_wake() will always clear PME Status when invoked to disable device wakeup, so make it do so even if it is going to return early then. Signed-off-by: Rafael J. Wysocki Acked-by: Bjorn Helgaas --- drivers/pci/pci.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 563901cd9c06..5641035e58fa 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1805,6 +1805,23 @@ static void __pci_pme_active(struct pci_dev *dev, bool enable) pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); } +static void pci_pme_restore(struct pci_dev *dev) +{ + u16 pmcsr; + + if (!dev->pme_support) + return; + + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); + if (dev->wakeup_prepared) { + pmcsr |= PCI_PM_CTRL_PME_ENABLE; + } else { + pmcsr &= ~PCI_PM_CTRL_PME_ENABLE; + pmcsr |= PCI_PM_CTRL_PME_STATUS; + } + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); +} + /** * pci_pme_active - enable or disable PCI device's PME# function * @dev: PCI device to handle. @@ -1899,9 +1916,14 @@ int __pci_enable_wake(struct pci_dev *dev, pci_power_t state, if (enable && !runtime && !device_may_wakeup(&dev->dev)) return -EINVAL; - /* Don't do the same thing twice in a row for one device. */ - if (!!enable == !!dev->wakeup_prepared) + /* + * Don't do the same thing twice in a row for one device, but restore + * PME Enable in case it has been updated by config space restoration. + */ + if (!!enable == !!dev->wakeup_prepared) { + pci_pme_restore(dev); return 0; + } /* * According to "PCI System Architecture" 4th ed. by Tom Shanley & Don From 63dada87f7ef7d4a536765c816fbbe7c4b9f3c85 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Mon, 12 Jun 2017 22:55:46 +0200 Subject: [PATCH 12/69] platform/x86: Add driver for ACPI INT0002 Virtual GPIO device Some peripherals on Bay Trail and Cherry Trail platforms signal a Power Management Event (PME) to the Power Management Controller (PMC) to wakeup the system. When this happens software needs to explicitly clear the PME bus 0 status bit in the GPE0a_STS register to avoid an IRQ storm on IRQ 9. This is modelled in ACPI through the INT0002 ACPI device, which is called a "Virtual GPIO controller" in ACPI because it defines the event handler to call when the PME triggers through _AEI and _L02 methods as would be done for a real GPIO interrupt in ACPI. This commit adds a driver which registers the Virtual GPIOs expected by the DSDT on these devices, letting gpiolib-acpi claim the virtual GPIO and install a GPIO-interrupt handler which call the _L02 handler as it would for a real GPIO controller. Signed-off-by: Hans de Goede Reviewed-by: Andy Shevchenko Reviewed-by: Linus Walleij Signed-off-by: Rafael J. Wysocki --- drivers/platform/x86/Kconfig | 19 ++ drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_int0002_vgpio.c | 219 +++++++++++++++++++++ 3 files changed, 239 insertions(+) create mode 100644 drivers/platform/x86/intel_int0002_vgpio.c diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 8489020ecf44..a3ccc3c795a5 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -794,6 +794,25 @@ config INTEL_CHT_INT33FE This driver instantiates i2c-clients for these, so that standard i2c drivers for these chips can bind to the them. +config INTEL_INT0002_VGPIO + tristate "Intel ACPI INT0002 Virtual GPIO driver" + depends on GPIOLIB && ACPI + select GPIOLIB_IRQCHIP + ---help--- + Some peripherals on Bay Trail and Cherry Trail platforms signal a + Power Management Event (PME) to the Power Management Controller (PMC) + to wakeup the system. When this happens software needs to explicitly + clear the PME bus 0 status bit in the GPE0a_STS register to avoid an + IRQ storm on IRQ 9. + + This is modelled in ACPI through the INT0002 ACPI device, which is + called a "Virtual GPIO controller" in ACPI because it defines the + event handler to call when the PME triggers through _AEI and _L02 + methods as would be done for a real GPIO interrupt in ACPI. + + To compile this driver as a module, choose M here: the module will + be called intel_int0002_vgpio. + config INTEL_HID_EVENT tristate "INTEL HID Event" depends on ACPI diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile index 182a3ed6605a..ab22ce77fb66 100644 --- a/drivers/platform/x86/Makefile +++ b/drivers/platform/x86/Makefile @@ -46,6 +46,7 @@ obj-$(CONFIG_TOSHIBA_BT_RFKILL) += toshiba_bluetooth.o obj-$(CONFIG_TOSHIBA_HAPS) += toshiba_haps.o obj-$(CONFIG_TOSHIBA_WMI) += toshiba-wmi.o obj-$(CONFIG_INTEL_CHT_INT33FE) += intel_cht_int33fe.o +obj-$(CONFIG_INTEL_INT0002_VGPIO) += intel_int0002_vgpio.o obj-$(CONFIG_INTEL_HID_EVENT) += intel-hid.o obj-$(CONFIG_INTEL_VBTN) += intel-vbtn.o obj-$(CONFIG_INTEL_SCU_IPC) += intel_scu_ipc.o diff --git a/drivers/platform/x86/intel_int0002_vgpio.c b/drivers/platform/x86/intel_int0002_vgpio.c new file mode 100644 index 000000000000..92dc230ef5b2 --- /dev/null +++ b/drivers/platform/x86/intel_int0002_vgpio.c @@ -0,0 +1,219 @@ +/* + * Intel INT0002 "Virtual GPIO" driver + * + * Copyright (C) 2017 Hans de Goede + * + * Loosely based on android x86 kernel code which is: + * + * Copyright (c) 2014, Intel Corporation. + * + * Author: Dyut Kumar Sil + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Some peripherals on Bay Trail and Cherry Trail platforms signal a Power + * Management Event (PME) to the Power Management Controller (PMC) to wakeup + * the system. When this happens software needs to clear the PME bus 0 status + * bit in the GPE0a_STS register to avoid an IRQ storm on IRQ 9. + * + * This is modelled in ACPI through the INT0002 ACPI device, which is + * called a "Virtual GPIO controller" in ACPI because it defines the event + * handler to call when the PME triggers through _AEI and _L02 / _E02 + * methods as would be done for a real GPIO interrupt in ACPI. Note this + * is a hack to define an AML event handler for the PME while using existing + * ACPI mechanisms, this is not a real GPIO at all. + * + * This driver will bind to the INT0002 device, and register as a GPIO + * controller, letting gpiolib-acpi.c call the _L02 handler as it would + * for a real GPIO controller. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#define DRV_NAME "INT0002 Virtual GPIO" + +/* For some reason the virtual GPIO pin tied to the GPE is numbered pin 2 */ +#define GPE0A_PME_B0_VIRT_GPIO_PIN 2 + +#define GPE0A_PME_B0_STS_BIT BIT(13) +#define GPE0A_PME_B0_EN_BIT BIT(13) +#define GPE0A_STS_PORT 0x420 +#define GPE0A_EN_PORT 0x428 + +#define ICPU(model) { X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, } + +static const struct x86_cpu_id int0002_cpu_ids[] = { +/* + * Limit ourselves to Cherry Trail for now, until testing shows we + * need to handle the INT0002 device on Baytrail too. + * ICPU(INTEL_FAM6_ATOM_SILVERMONT1), * Valleyview, Bay Trail * + */ + ICPU(INTEL_FAM6_ATOM_AIRMONT), /* Braswell, Cherry Trail */ + {} +}; + +/* + * As this is not a real GPIO at all, but just a hack to model an event in + * ACPI the get / set functions are dummy functions. + */ + +static int int0002_gpio_get(struct gpio_chip *chip, unsigned int offset) +{ + return 0; +} + +static void int0002_gpio_set(struct gpio_chip *chip, unsigned int offset, + int value) +{ +} + +static int int0002_gpio_direction_output(struct gpio_chip *chip, + unsigned int offset, int value) +{ + return 0; +} + +static void int0002_irq_ack(struct irq_data *data) +{ + outl(GPE0A_PME_B0_STS_BIT, GPE0A_STS_PORT); +} + +static void int0002_irq_unmask(struct irq_data *data) +{ + u32 gpe_en_reg; + + gpe_en_reg = inl(GPE0A_EN_PORT); + gpe_en_reg |= GPE0A_PME_B0_EN_BIT; + outl(gpe_en_reg, GPE0A_EN_PORT); +} + +static void int0002_irq_mask(struct irq_data *data) +{ + u32 gpe_en_reg; + + gpe_en_reg = inl(GPE0A_EN_PORT); + gpe_en_reg &= ~GPE0A_PME_B0_EN_BIT; + outl(gpe_en_reg, GPE0A_EN_PORT); +} + +static irqreturn_t int0002_irq(int irq, void *data) +{ + struct gpio_chip *chip = data; + u32 gpe_sts_reg; + + gpe_sts_reg = inl(GPE0A_STS_PORT); + if (!(gpe_sts_reg & GPE0A_PME_B0_STS_BIT)) + return IRQ_NONE; + + generic_handle_irq(irq_find_mapping(chip->irqdomain, + GPE0A_PME_B0_VIRT_GPIO_PIN)); + + pm_system_wakeup(); + + return IRQ_HANDLED; +} + +static struct irq_chip int0002_irqchip = { + .name = DRV_NAME, + .irq_ack = int0002_irq_ack, + .irq_mask = int0002_irq_mask, + .irq_unmask = int0002_irq_unmask, +}; + +static int int0002_probe(struct platform_device *pdev) +{ + struct device *dev = &pdev->dev; + const struct x86_cpu_id *cpu_id; + struct gpio_chip *chip; + int irq, ret; + + /* Menlow has a different INT0002 device? */ + cpu_id = x86_match_cpu(int0002_cpu_ids); + if (!cpu_id) + return -ENODEV; + + irq = platform_get_irq(pdev, 0); + if (irq < 0) { + dev_err(dev, "Error getting IRQ: %d\n", irq); + return irq; + } + + chip = devm_kzalloc(dev, sizeof(*chip), GFP_KERNEL); + if (!chip) + return -ENOMEM; + + chip->label = DRV_NAME; + chip->parent = dev; + chip->owner = THIS_MODULE; + chip->get = int0002_gpio_get; + chip->set = int0002_gpio_set; + chip->direction_input = int0002_gpio_get; + chip->direction_output = int0002_gpio_direction_output; + chip->base = -1; + chip->ngpio = GPE0A_PME_B0_VIRT_GPIO_PIN + 1; + chip->irq_need_valid_mask = true; + + ret = devm_gpiochip_add_data(&pdev->dev, chip, NULL); + if (ret) { + dev_err(dev, "Error adding gpio chip: %d\n", ret); + return ret; + } + + bitmap_clear(chip->irq_valid_mask, 0, GPE0A_PME_B0_VIRT_GPIO_PIN); + + /* + * We manually request the irq here instead of passing a flow-handler + * to gpiochip_set_chained_irqchip, because the irq is shared. + */ + ret = devm_request_irq(dev, irq, int0002_irq, + IRQF_SHARED | IRQF_NO_THREAD, "INT0002", chip); + if (ret) { + dev_err(dev, "Error requesting IRQ %d: %d\n", irq, ret); + return ret; + } + + ret = gpiochip_irqchip_add(chip, &int0002_irqchip, 0, handle_edge_irq, + IRQ_TYPE_NONE); + if (ret) { + dev_err(dev, "Error adding irqchip: %d\n", ret); + return ret; + } + + gpiochip_set_chained_irqchip(chip, &int0002_irqchip, irq, NULL); + + return 0; +} + +static const struct acpi_device_id int0002_acpi_ids[] = { + { "INT0002", 0 }, + { }, +}; +MODULE_DEVICE_TABLE(acpi, int0002_acpi_ids); + +static struct platform_driver int0002_driver = { + .driver = { + .name = DRV_NAME, + .acpi_match_table = int0002_acpi_ids, + }, + .probe = int0002_probe, +}; + +module_platform_driver(int0002_driver); + +MODULE_AUTHOR("Hans de Goede "); +MODULE_DESCRIPTION("Intel INT0002 Virtual GPIO driver"); +MODULE_LICENSE("GPL"); From 33e4f80ee69b5168badf37edbfed796eb48434b9 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 12 Jun 2017 22:56:34 +0200 Subject: [PATCH 13/69] ACPI / PM: Ignore spurious SCI wakeups from suspend-to-idle The ACPI SCI (System Control Interrupt) is set up as a wakeup IRQ during suspend-to-idle transitions and, consequently, any events signaled through it wake up the system from that state. However, on some systems some of the events signaled via the ACPI SCI while suspended to idle should not cause the system to wake up. In fact, quite often they should just be discarded. Arguably, systems should not resume entirely on such events, but in order to decide which events really should cause the system to resume and which are spurious, it is necessary to resume up to the point when ACPI SCIs are actually handled and processed, which is after executing dpm_resume_noirq() in the system resume path. For this reasons, add a loop around freeze_enter() in which the platforms can process events signaled via multiplexed IRQ lines like the ACPI SCI and add suspend-to-idle hooks that can be used for this purpose to struct platform_freeze_ops. In the ACPI case, the ->wake hook is used for checking if the SCI has triggered while suspended and deferring the interrupt-induced system wakeup until the events signaled through it are actually processed sufficiently to decide whether or not the system should resume. In turn, the ->sync hook allows all of the relevant event queues to be flushed so as to prevent events from being missed due to race conditions. In addition to that, some ACPI code processing wakeup events needs to be modified to use the "hard" version of wakeup triggers, so that it will cause a system resume to happen on device-induced wakeup events even if the "soft" mechanism to prevent the system from suspending is not enabled. However, to preserve the existing behavior with respect to suspend-to-RAM, this only is done in the suspend-to-idle case and only if an SCI has occurred while suspended. Signed-off-by: Rafael J. Wysocki --- drivers/acpi/battery.c | 2 +- drivers/acpi/button.c | 5 +++-- drivers/acpi/device_pm.c | 9 ++++++++- drivers/acpi/internal.h | 2 ++ drivers/acpi/sleep.c | 37 +++++++++++++++++++++++++++++++++++++ drivers/base/power/main.c | 5 ----- drivers/base/power/wakeup.c | 18 ++++++++++++------ include/acpi/acpi_bus.h | 6 +++++- include/linux/suspend.h | 7 +++++-- kernel/power/process.c | 2 +- kernel/power/suspend.c | 35 +++++++++++++++++++++++++++++------ 11 files changed, 103 insertions(+), 25 deletions(-) diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c index d42eeef9d928..1cbb88d938e5 100644 --- a/drivers/acpi/battery.c +++ b/drivers/acpi/battery.c @@ -782,7 +782,7 @@ static int acpi_battery_update(struct acpi_battery *battery, bool resume) if ((battery->state & ACPI_BATTERY_STATE_CRITICAL) || (test_bit(ACPI_BATTERY_ALARM_PRESENT, &battery->flags) && (battery->capacity_now <= battery->alarm))) - pm_wakeup_event(&battery->device->dev, 0); + acpi_pm_wakeup_event(&battery->device->dev); return result; } diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c index e19f530f1083..91cfdf377df7 100644 --- a/drivers/acpi/button.c +++ b/drivers/acpi/button.c @@ -217,7 +217,7 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state) } if (state) - pm_wakeup_event(&device->dev, 0); + acpi_pm_wakeup_event(&device->dev); ret = blocking_notifier_call_chain(&acpi_lid_notifier, state, device); if (ret == NOTIFY_DONE) @@ -402,7 +402,7 @@ static void acpi_button_notify(struct acpi_device *device, u32 event) } else { int keycode; - pm_wakeup_event(&device->dev, 0); + acpi_pm_wakeup_event(&device->dev); if (button->suspended) break; @@ -534,6 +534,7 @@ static int acpi_button_add(struct acpi_device *device) lid_device = device; } + device_init_wakeup(&device->dev, true); printk(KERN_INFO PREFIX "%s [%s]\n", name, acpi_device_bid(device)); return 0; diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index f13c62c4b117..ca0210213773 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -24,6 +24,7 @@ #include #include #include +#include #include "internal.h" @@ -385,6 +386,12 @@ EXPORT_SYMBOL(acpi_bus_power_manageable); #ifdef CONFIG_PM static DEFINE_MUTEX(acpi_pm_notifier_lock); +void acpi_pm_wakeup_event(struct device *dev) +{ + pm_wakeup_dev_event(dev, 0, acpi_s2idle_wakeup()); +} +EXPORT_SYMBOL_GPL(acpi_pm_wakeup_event); + static void acpi_pm_notify_handler(acpi_handle handle, u32 val, void *not_used) { struct acpi_device *adev; @@ -399,7 +406,7 @@ static void acpi_pm_notify_handler(acpi_handle handle, u32 val, void *not_used) mutex_lock(&acpi_pm_notifier_lock); if (adev->wakeup.flags.notifier_present) { - __pm_wakeup_event(adev->wakeup.ws, 0); + pm_wakeup_ws_event(adev->wakeup.ws, 0, acpi_s2idle_wakeup()); if (adev->wakeup.context.func) adev->wakeup.context.func(&adev->wakeup.context); } diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h index 66229ffa909b..75924ea69071 100644 --- a/drivers/acpi/internal.h +++ b/drivers/acpi/internal.h @@ -198,8 +198,10 @@ void acpi_ec_remove_query_handler(struct acpi_ec *ec, u8 query_bit); Suspend/Resume -------------------------------------------------------------------------- */ #ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT +extern bool acpi_s2idle_wakeup(void); extern int acpi_sleep_init(void); #else +static inline bool acpi_s2idle_wakeup(void) { return false; } static inline int acpi_sleep_init(void) { return -ENXIO; } #endif diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c index a4782c75ebdd..555de11a56b6 100644 --- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -650,6 +650,8 @@ static const struct platform_suspend_ops acpi_suspend_ops_old = { .recover = acpi_pm_finish, }; +static bool s2idle_wakeup; + static int acpi_freeze_begin(void) { acpi_scan_lock_acquire(); @@ -666,6 +668,33 @@ static int acpi_freeze_prepare(void) return 0; } +static void acpi_freeze_wake(void) +{ + /* + * If IRQD_WAKEUP_ARMED is not set for the SCI at this point, it means + * that the SCI has triggered while suspended, so cancel the wakeup in + * case it has not been a wakeup event (the GPEs will be checked later). + */ + if (acpi_sci_irq_valid() && + !irqd_is_wakeup_armed(irq_get_irq_data(acpi_sci_irq))) { + pm_system_cancel_wakeup(); + s2idle_wakeup = true; + } +} + +static void acpi_freeze_sync(void) +{ + /* + * Process all pending events in case there are any wakeup ones. + * + * The EC driver uses the system workqueue, so that one needs to be + * flushed too. + */ + acpi_os_wait_events_complete(); + flush_scheduled_work(); + s2idle_wakeup = false; +} + static void acpi_freeze_restore(void) { if (acpi_sci_irq_valid()) @@ -682,6 +711,8 @@ static void acpi_freeze_end(void) static const struct platform_freeze_ops acpi_freeze_ops = { .begin = acpi_freeze_begin, .prepare = acpi_freeze_prepare, + .wake = acpi_freeze_wake, + .sync = acpi_freeze_sync, .restore = acpi_freeze_restore, .end = acpi_freeze_end, }; @@ -700,9 +731,15 @@ static void acpi_sleep_suspend_setup(void) } #else /* !CONFIG_SUSPEND */ +#define s2idle_wakeup (false) static inline void acpi_sleep_suspend_setup(void) {} #endif /* !CONFIG_SUSPEND */ +bool acpi_s2idle_wakeup(void) +{ + return s2idle_wakeup; +} + #ifdef CONFIG_PM_SLEEP static u32 saved_bm_rld; diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 253f860e8981..ef5b6a6e5045 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1095,11 +1095,6 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a if (async_error) goto Complete; - if (pm_wakeup_pending()) { - async_error = -EBUSY; - goto Complete; - } - if (dev->power.syscore || dev->power.direct_complete) goto Complete; diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c313b600d356..9c36b27996fc 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -28,8 +28,8 @@ bool events_check_enabled __read_mostly; /* First wakeup IRQ seen by the kernel in the last cycle. */ unsigned int pm_wakeup_irq __read_mostly; -/* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +/* If greater than 0 and the system is suspending, terminate the suspend. */ +static atomic_t pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. @@ -855,20 +855,26 @@ bool pm_wakeup_pending(void) pm_print_active_wakeup_sources(); } - return ret || pm_abort_suspend; + return ret || atomic_read(&pm_abort_suspend) > 0; } void pm_system_wakeup(void) { - pm_abort_suspend = true; + atomic_inc(&pm_abort_suspend); freeze_wake(); } EXPORT_SYMBOL_GPL(pm_system_wakeup); -void pm_wakeup_clear(void) +void pm_system_cancel_wakeup(void) +{ + atomic_dec(&pm_abort_suspend); +} + +void pm_wakeup_clear(bool reset) { - pm_abort_suspend = false; pm_wakeup_irq = 0; + if (reset) + atomic_set(&pm_abort_suspend, 0); } void pm_system_irq_wakeup(unsigned int irq_number) diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 79c0af419300..63a90a624a0f 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -598,15 +598,19 @@ static inline bool acpi_device_always_present(struct acpi_device *adev) #endif #ifdef CONFIG_PM +void acpi_pm_wakeup_event(struct device *dev); acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, void (*func)(struct acpi_device_wakeup_context *context)); acpi_status acpi_remove_pm_notifier(struct acpi_device *adev); int acpi_pm_device_sleep_state(struct device *, int *, int); int acpi_pm_device_run_wake(struct device *, bool); #else +static inline void acpi_pm_wakeup_event(struct device *dev) +{ +} static inline acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, - void (*work_func)(struct work_struct *work)) + void (*func)(struct acpi_device_wakeup_context *context)) { return AE_SUPPORT; } diff --git a/include/linux/suspend.h b/include/linux/suspend.h index d9718378a8be..0b1cf32edfd7 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -189,6 +189,8 @@ struct platform_suspend_ops { struct platform_freeze_ops { int (*begin)(void); int (*prepare)(void); + void (*wake)(void); + void (*sync)(void); void (*restore)(void); void (*end)(void); }; @@ -428,7 +430,8 @@ extern unsigned int pm_wakeup_irq; extern bool pm_wakeup_pending(void); extern void pm_system_wakeup(void); -extern void pm_wakeup_clear(void); +extern void pm_system_cancel_wakeup(void); +extern void pm_wakeup_clear(bool reset); extern void pm_system_irq_wakeup(unsigned int irq_number); extern bool pm_get_wakeup_count(unsigned int *count, bool block); extern bool pm_save_wakeup_count(unsigned int count); @@ -478,7 +481,7 @@ static inline int unregister_pm_notifier(struct notifier_block *nb) static inline bool pm_wakeup_pending(void) { return false; } static inline void pm_system_wakeup(void) {} -static inline void pm_wakeup_clear(void) {} +static inline void pm_wakeup_clear(bool reset) {} static inline void pm_system_irq_wakeup(unsigned int irq_number) {} static inline void lock_system_sleep(void) {} diff --git a/kernel/power/process.c b/kernel/power/process.c index c7209f060eeb..78672d324a6e 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -132,7 +132,7 @@ int freeze_processes(void) if (!pm_freezing) atomic_inc(&system_freezing_cnt); - pm_wakeup_clear(); + pm_wakeup_clear(true); pr_info("Freezing user space processes ... "); pm_freezing = true; error = try_to_freeze_tasks(true); diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 15e6baef5c73..3ecf275d7e44 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -72,6 +72,8 @@ static void freeze_begin(void) static void freeze_enter(void) { + trace_suspend_resume(TPS("machine_suspend"), PM_SUSPEND_FREEZE, true); + spin_lock_irq(&suspend_freeze_lock); if (pm_wakeup_pending()) goto out; @@ -84,11 +86,9 @@ static void freeze_enter(void) /* Push all the CPUs into the idle loop. */ wake_up_all_idle_cpus(); - pr_debug("PM: suspend-to-idle\n"); /* Make the current CPU wait so it can enter the idle loop too. */ wait_event(suspend_freeze_wait_head, suspend_freeze_state == FREEZE_STATE_WAKE); - pr_debug("PM: resume from suspend-to-idle\n"); cpuidle_pause(); put_online_cpus(); @@ -98,6 +98,31 @@ static void freeze_enter(void) out: suspend_freeze_state = FREEZE_STATE_NONE; spin_unlock_irq(&suspend_freeze_lock); + + trace_suspend_resume(TPS("machine_suspend"), PM_SUSPEND_FREEZE, false); +} + +static void s2idle_loop(void) +{ + pr_debug("PM: suspend-to-idle\n"); + + do { + freeze_enter(); + + if (freeze_ops && freeze_ops->wake) + freeze_ops->wake(); + + dpm_resume_noirq(PMSG_RESUME); + if (freeze_ops && freeze_ops->sync) + freeze_ops->sync(); + + if (pm_wakeup_pending()) + break; + + pm_wakeup_clear(false); + } while (!dpm_suspend_noirq(PMSG_SUSPEND)); + + pr_debug("PM: resume from suspend-to-idle\n"); } void freeze_wake(void) @@ -371,10 +396,8 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) * all the devices are suspended. */ if (state == PM_SUSPEND_FREEZE) { - trace_suspend_resume(TPS("machine_suspend"), state, true); - freeze_enter(); - trace_suspend_resume(TPS("machine_suspend"), state, false); - goto Platform_wake; + s2idle_loop(); + goto Platform_early_resume; } error = disable_nonboot_cpus(); From c0944883c97c0ddc71da67cc731590a7c878a1a2 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 9 May 2017 14:00:51 -0700 Subject: [PATCH 14/69] x86/power/64: Use char arrays for asm function names This switches the hibernate_64.S function names into character arrays to match other areas of the kernel where this is done (e.g., linker scripts). Specifically this fixes a compile-time error noticed by the future CONFIG_FORTIFY_SOURCE routines that complained about PAGE_SIZE being copied out of the "single byte" core_restore_code variable. Additionally drops the "acpi_save_state_mem" exern which does not appear to be used anywhere else in the kernel. Signed-off-by: Kees Cook Acked-by: Ingo Molnar Signed-off-by: Rafael J. Wysocki --- arch/x86/include/asm/suspend_64.h | 5 ++--- arch/x86/power/hibernate_64.c | 6 +++--- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h index 6136a18152af..2bd96b4df140 100644 --- a/arch/x86/include/asm/suspend_64.h +++ b/arch/x86/include/asm/suspend_64.h @@ -42,8 +42,7 @@ struct saved_context { set_debugreg((thread)->debugreg##register, register) /* routines for saving/restoring kernel state */ -extern int acpi_save_state_mem(void); -extern char core_restore_code; -extern char restore_registers; +extern char core_restore_code[]; +extern char restore_registers[]; #endif /* _ASM_X86_SUSPEND_64_H */ diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c index a6e21fee22ea..2ab1c5059a61 100644 --- a/arch/x86/power/hibernate_64.c +++ b/arch/x86/power/hibernate_64.c @@ -147,7 +147,7 @@ static int relocate_restore_code(void) if (!relocated_restore_code) return -ENOMEM; - memcpy((void *)relocated_restore_code, &core_restore_code, PAGE_SIZE); + memcpy((void *)relocated_restore_code, core_restore_code, PAGE_SIZE); /* Make the page containing the relocated code executable */ pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code); @@ -292,8 +292,8 @@ int arch_hibernation_header_save(void *addr, unsigned int max_size) if (max_size < sizeof(struct restore_data_record)) return -EOVERFLOW; - rdr->jump_address = (unsigned long)&restore_registers; - rdr->jump_address_phys = __pa_symbol(&restore_registers); + rdr->jump_address = (unsigned long)restore_registers; + rdr->jump_address_phys = __pa_symbol(restore_registers); rdr->cr3 = restore_cr3; rdr->magic = RESTORE_MAGIC; From b4883ca449473e8879a062c1f55f9d062c168ae5 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 16 May 2017 10:52:43 +0530 Subject: [PATCH 15/69] PM / Domains: pdd->dev can't be NULL in genpd_dev_pm_qos_notifier() The pm_domain_data (pdd) pointer is set from genpd_alloc_dev_data() and pdd->dev is guaranteed to be valid. There is no need to check pdd and pdd->dev in rest of the code as pdd->dev will always be valid for a non NULL pdd pointer. Signed-off-by: Viresh Kumar Acked-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index ad196427b4f2..bf3945a58cce 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -443,7 +443,7 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb, pdd = dev->power.subsys_data ? dev->power.subsys_data->domain_data : NULL; - if (pdd && pdd->dev) { + if (pdd) { to_gpd_data(pdd)->td.constraint_changed = true; genpd = dev_to_genpd(dev); } else { From c74b32fadc00f2412d86a19295eb867b1a772948 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 23 May 2017 09:32:10 +0530 Subject: [PATCH 16/69] PM / OPP: Reorganize _generic_set_opp_regulator() The code was overly complicated here because of the limitations that we had with RCUs (Couldn't use opp-table and OPPs outside RCU protected section and can't call sleep-able routines from within that). But that is long gone now. Reorganize _generic_set_opp_regulator() in order to avoid using "struct dev_pm_set_opp_data" and copying data into it for the case where opp_table->set_opp is not set. Signed-off-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- drivers/base/power/opp/core.c | 77 ++++++++++++++++------------------- 1 file changed, 36 insertions(+), 41 deletions(-) diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c index dae61720b314..c4590fc017ba 100644 --- a/drivers/base/power/opp/core.c +++ b/drivers/base/power/opp/core.c @@ -543,17 +543,18 @@ _generic_set_opp_clk_only(struct device *dev, struct clk *clk, return ret; } -static int _generic_set_opp(struct dev_pm_set_opp_data *data) +static int _generic_set_opp_regulator(const struct opp_table *opp_table, + struct device *dev, + unsigned long old_freq, + unsigned long freq, + struct dev_pm_opp_supply *old_supply, + struct dev_pm_opp_supply *new_supply) { - struct dev_pm_opp_supply *old_supply = data->old_opp.supplies; - struct dev_pm_opp_supply *new_supply = data->new_opp.supplies; - unsigned long old_freq = data->old_opp.rate, freq = data->new_opp.rate; - struct regulator *reg = data->regulators[0]; - struct device *dev= data->dev; + struct regulator *reg = opp_table->regulators[0]; int ret; /* This function only supports single regulator per device */ - if (WARN_ON(data->regulator_count > 1)) { + if (WARN_ON(opp_table->regulator_count > 1)) { dev_err(dev, "multiple regulators are not supported\n"); return -EINVAL; } @@ -566,7 +567,7 @@ static int _generic_set_opp(struct dev_pm_set_opp_data *data) } /* Change frequency */ - ret = _generic_set_opp_clk_only(dev, data->clk, old_freq, freq); + ret = _generic_set_opp_clk_only(dev, opp_table->clk, old_freq, freq); if (ret) goto restore_voltage; @@ -580,12 +581,12 @@ static int _generic_set_opp(struct dev_pm_set_opp_data *data) return 0; restore_freq: - if (_generic_set_opp_clk_only(dev, data->clk, freq, old_freq)) + if (_generic_set_opp_clk_only(dev, opp_table->clk, freq, old_freq)) dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n", __func__, old_freq); restore_voltage: /* This shouldn't harm even if the voltages weren't updated earlier */ - if (old_supply->u_volt) + if (old_supply) _set_opp_voltage(dev, reg, old_supply); return ret; @@ -603,10 +604,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) { struct opp_table *opp_table; unsigned long freq, old_freq; - int (*set_opp)(struct dev_pm_set_opp_data *data); struct dev_pm_opp *old_opp, *opp; - struct regulator **regulators; - struct dev_pm_set_opp_data *data; struct clk *clk; int ret, size; @@ -661,38 +659,35 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) dev_dbg(dev, "%s: switching OPP: %lu Hz --> %lu Hz\n", __func__, old_freq, freq); - regulators = opp_table->regulators; - /* Only frequency scaling */ - if (!regulators) { + if (!opp_table->regulators) { ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); - goto put_opps; + } else if (!opp_table->set_opp) { + ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq, + IS_ERR(old_opp) ? NULL : old_opp->supplies, + opp->supplies); + } else { + struct dev_pm_set_opp_data *data; + + data = opp_table->set_opp_data; + data->regulators = opp_table->regulators; + data->regulator_count = opp_table->regulator_count; + data->clk = clk; + data->dev = dev; + + data->old_opp.rate = old_freq; + size = sizeof(*opp->supplies) * opp_table->regulator_count; + if (IS_ERR(old_opp)) + memset(data->old_opp.supplies, 0, size); + else + memcpy(data->old_opp.supplies, old_opp->supplies, size); + + data->new_opp.rate = freq; + memcpy(data->new_opp.supplies, opp->supplies, size); + + ret = opp_table->set_opp(data); } - if (opp_table->set_opp) - set_opp = opp_table->set_opp; - else - set_opp = _generic_set_opp; - - data = opp_table->set_opp_data; - data->regulators = regulators; - data->regulator_count = opp_table->regulator_count; - data->clk = clk; - data->dev = dev; - - data->old_opp.rate = old_freq; - size = sizeof(*opp->supplies) * opp_table->regulator_count; - if (IS_ERR(old_opp)) - memset(data->old_opp.supplies, 0, size); - else - memcpy(data->old_opp.supplies, old_opp->supplies, size); - - data->new_opp.rate = freq; - memcpy(data->new_opp.supplies, opp->supplies, size); - - ret = set_opp(data); - -put_opps: dev_pm_opp_put(opp); put_old_opp: if (!IS_ERR(old_opp)) From 478256bddb0323898694bd22acdf42a70d4ff024 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 23 May 2017 09:32:11 +0530 Subject: [PATCH 17/69] PM / OPP: Don't create copy of regulators unnecessarily This code was required while the OPP core was managed with help of RCUs, but not anymore. Get rid of unnecessary alloc/memcpy operations. Signed-off-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- drivers/base/power/opp/core.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c index c4590fc017ba..5ee7aadf0abf 100644 --- a/drivers/base/power/opp/core.c +++ b/drivers/base/power/opp/core.c @@ -180,7 +180,7 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct device *dev) { struct opp_table *opp_table; struct dev_pm_opp *opp; - struct regulator *reg, **regulators; + struct regulator *reg; unsigned long latency_ns = 0; int ret, i, count; struct { @@ -198,15 +198,9 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct device *dev) if (!count) goto put_opp_table; - regulators = kmalloc_array(count, sizeof(*regulators), GFP_KERNEL); - if (!regulators) - goto put_opp_table; - uV = kmalloc_array(count, sizeof(*uV), GFP_KERNEL); if (!uV) - goto free_regulators; - - memcpy(regulators, opp_table->regulators, count * sizeof(*regulators)); + goto put_opp_table; mutex_lock(&opp_table->lock); @@ -232,15 +226,13 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct device *dev) * isn't freed, while we are executing this routine. */ for (i = 0; i < count; i++) { - reg = regulators[i]; + reg = opp_table->regulators[i]; ret = regulator_set_voltage_time(reg, uV[i].min, uV[i].max); if (ret > 0) latency_ns += ret * 1000; } kfree(uV); -free_regulators: - kfree(regulators); put_opp_table: dev_pm_opp_put_opp_table(opp_table); From 688a48b0d27000167e0450bc1243e29a560395ca Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 23 May 2017 09:32:12 +0530 Subject: [PATCH 18/69] PM / OPP: opp-microvolt is not optional if regulators are set If dev_pm_opp_set_regulators() is called for a device and its regulators are set in the OPP core, the OPP nodes for the device must contain the "opp-microvolt" property, otherwise there is something wrong and we better error out. Signed-off-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- drivers/base/power/opp/of.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/opp/of.c b/drivers/base/power/opp/of.c index 779428676f63..57eec1ca0569 100644 --- a/drivers/base/power/opp/of.c +++ b/drivers/base/power/opp/of.c @@ -131,8 +131,14 @@ static int opp_parse_supplies(struct dev_pm_opp *opp, struct device *dev, prop = of_find_property(opp->np, name, NULL); /* Missing property isn't a problem, but an invalid entry is */ - if (!prop) - return 0; + if (!prop) { + if (!opp_table->regulator_count) + return 0; + + dev_err(dev, "%s: opp-microvolt missing although OPP managing regulators\n", + __func__); + return -EINVAL; + } } vcount = of_property_count_u32_elems(opp->np, name); From 1fae788ed640e2a961c2edab54bfd56c73e2506a Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 23 May 2017 09:32:13 +0530 Subject: [PATCH 19/69] PM / OPP: Don't create debugfs "supply-0" directory unnecessarily We create "supply-0" debugfs directory even if the device doesn't do voltage scaling. That looks confusing, as if the regulator is found but we never managed to get voltage levels for it. Avoid creating such a directory unnecessarily. Signed-off-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- drivers/base/power/opp/debugfs.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/base/power/opp/debugfs.c b/drivers/base/power/opp/debugfs.c index 95f433db4ac7..81cf120fcf43 100644 --- a/drivers/base/power/opp/debugfs.c +++ b/drivers/base/power/opp/debugfs.c @@ -40,11 +40,10 @@ static bool opp_debug_create_supplies(struct dev_pm_opp *opp, struct dentry *pdentry) { struct dentry *d; - int i = 0; + int i; char *name; - /* Always create at least supply-0 directory */ - do { + for (i = 0; i < opp_table->regulator_count; i++) { name = kasprintf(GFP_KERNEL, "supply-%d", i); /* Create per-opp directory */ @@ -70,7 +69,7 @@ static bool opp_debug_create_supplies(struct dev_pm_opp *opp, if (!debugfs_create_ulong("u_amp", S_IRUGO, d, &opp->supplies[i].u_amp)) return false; - } while (++i < opp_table->regulator_count); + } return true; } From 91f9e850d465147197280cbeaf51d3fb51f61ca0 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 8 Jun 2017 02:15:52 +0200 Subject: [PATCH 20/69] platform: x86: intel-vbtn: Wake up the system from suspend-to-idle Allow the intel-vbtn driver to wake up the system from suspend-to-idle by configuring its platform device as a wakeup one by default and switching it over to a system wakeup events triggering mode during system suspend transitions. Signed-off-by: Rafael J. Wysocki Acked-by: Andy Shevchenko --- drivers/platform/x86/intel-vbtn.c | 39 ++++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/drivers/platform/x86/intel-vbtn.c b/drivers/platform/x86/intel-vbtn.c index c2035e121ac2..61f106377661 100644 --- a/drivers/platform/x86/intel-vbtn.c +++ b/drivers/platform/x86/intel-vbtn.c @@ -23,6 +23,7 @@ #include #include #include +#include #include MODULE_LICENSE("GPL"); @@ -46,6 +47,7 @@ static const struct key_entry intel_vbtn_keymap[] = { struct intel_vbtn_priv { struct input_dev *input_dev; + bool wakeup_mode; }; static int intel_vbtn_input_setup(struct platform_device *device) @@ -73,9 +75,15 @@ static void notify_handler(acpi_handle handle, u32 event, void *context) struct platform_device *device = context; struct intel_vbtn_priv *priv = dev_get_drvdata(&device->dev); - if (!sparse_keymap_report_event(priv->input_dev, event, 1, true)) - dev_info(&device->dev, "unknown event index 0x%x\n", - event); + if (priv->wakeup_mode) { + if (sparse_keymap_entry_from_scancode(priv->input_dev, event)) { + pm_wakeup_hard_event(&device->dev); + return; + } + } else if (sparse_keymap_report_event(priv->input_dev, event, 1, true)) { + return; + } + dev_info(&device->dev, "unknown event index 0x%x\n", event); } static int intel_vbtn_probe(struct platform_device *device) @@ -109,6 +117,7 @@ static int intel_vbtn_probe(struct platform_device *device) if (ACPI_FAILURE(status)) return -EBUSY; + device_init_wakeup(&device->dev, true); return 0; } @@ -125,10 +134,34 @@ static int intel_vbtn_remove(struct platform_device *device) return 0; } +static int intel_vbtn_pm_prepare(struct device *dev) +{ + struct intel_vbtn_priv *priv = dev_get_drvdata(dev); + + priv->wakeup_mode = true; + return 0; +} + +static int intel_vbtn_pm_resume(struct device *dev) +{ + struct intel_vbtn_priv *priv = dev_get_drvdata(dev); + + priv->wakeup_mode = false; + return 0; +} + +static const struct dev_pm_ops intel_vbtn_pm_ops = { + .prepare = intel_vbtn_pm_prepare, + .resume = intel_vbtn_pm_resume, + .restore = intel_vbtn_pm_resume, + .thaw = intel_vbtn_pm_resume, +}; + static struct platform_driver intel_vbtn_pl_driver = { .driver = { .name = "intel-vbtn", .acpi_match_table = intel_vbtn_ids, + .pm = &intel_vbtn_pm_ops, }, .probe = intel_vbtn_probe, .remove = intel_vbtn_remove, From ef884112e55c60d9e208b6524ae1841ae7e2fb2c Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 8 Jun 2017 02:16:13 +0200 Subject: [PATCH 21/69] platform: x86: intel-hid: Wake up the system from suspend-to-idle Allow the intel-hid driver to wake up the system from suspend-to-idle by configuring its platform device as a wakeup one by default and switching it over to a system wakeup events triggering mode during system suspend transitions. Signed-off-by: Rafael J. Wysocki Acked-by: Andy Shevchenko --- drivers/platform/x86/intel-hid.c | 40 ++++++++++++++++++++++++++++---- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/drivers/platform/x86/intel-hid.c b/drivers/platform/x86/intel-hid.c index 63ba2cbd04c2..8519e0f97bdd 100644 --- a/drivers/platform/x86/intel-hid.c +++ b/drivers/platform/x86/intel-hid.c @@ -23,6 +23,7 @@ #include #include #include +#include #include MODULE_LICENSE("GPL"); @@ -75,6 +76,7 @@ static const struct key_entry intel_array_keymap[] = { struct intel_hid_priv { struct input_dev *input_dev; struct input_dev *array; + bool wakeup_mode; }; static int intel_hid_set_enable(struct device *device, bool enable) @@ -116,23 +118,37 @@ static void intel_button_array_enable(struct device *device, bool enable) dev_warn(device, "failed to set button capability\n"); } +static int intel_hid_pm_prepare(struct device *device) +{ + struct intel_hid_priv *priv = dev_get_drvdata(device); + + priv->wakeup_mode = true; + return 0; +} + static int intel_hid_pl_suspend_handler(struct device *device) { - intel_hid_set_enable(device, false); - intel_button_array_enable(device, false); - + if (pm_suspend_via_firmware()) { + intel_hid_set_enable(device, false); + intel_button_array_enable(device, false); + } return 0; } static int intel_hid_pl_resume_handler(struct device *device) { - intel_hid_set_enable(device, true); - intel_button_array_enable(device, true); + struct intel_hid_priv *priv = dev_get_drvdata(device); + priv->wakeup_mode = false; + if (pm_resume_via_firmware()) { + intel_hid_set_enable(device, true); + intel_button_array_enable(device, true); + } return 0; } static const struct dev_pm_ops intel_hid_pl_pm_ops = { + .prepare = intel_hid_pm_prepare, .freeze = intel_hid_pl_suspend_handler, .thaw = intel_hid_pl_resume_handler, .restore = intel_hid_pl_resume_handler, @@ -186,6 +202,19 @@ static void notify_handler(acpi_handle handle, u32 event, void *context) unsigned long long ev_index; acpi_status status; + if (priv->wakeup_mode) { + /* Wake up on 5-button array events only. */ + if (event == 0xc0 || !priv->array) + return; + + if (sparse_keymap_entry_from_scancode(priv->array, event)) + pm_wakeup_hard_event(&device->dev); + else + dev_info(&device->dev, "unknown event 0x%x\n", event); + + return; + } + /* 0xC0 is for HID events, other values are for 5 button array */ if (event != 0xc0) { if (!priv->array || @@ -270,6 +299,7 @@ static int intel_hid_probe(struct platform_device *device) "failed to enable HID power button\n"); } + device_init_wakeup(&device->dev, true); return 0; err_remove_notify: From 8110dd281e155e5010ffd657bba4742ebef7a93f Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 23 Jun 2017 15:24:32 +0200 Subject: [PATCH 22/69] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent systems Some recent Dell laptops, including the XPS13 model numbers 9360 and 9365, cannot be woken up from suspend-to-idle by pressing the power button which is unexpected and makes that feature less usable on those systems. Moreover, on the 9365 ACPI S3 (suspend-to-RAM) is not expected to be used at all (the OS these systems ship with never exercises the ACPI S3 path in the firmware) and suspend-to-idle is the only viable system suspend mechanism there. The reason why the power button wakeup from suspend-to-idle doesn't work on those systems is because their power button events are signaled by the EC (Embedded Controller), whose GPE (General Purpose Event) line is disabled during suspend-to-idle transitions in Linux. That is done on purpose, because in general the EC tends to be noisy for various reasons (battery and thermal updates and similar, for example) and all events signaled by it would kick the CPUs out of deep idle states while in suspend-to-idle, which effectively might defeat its purpose. Of course, on the Dell systems in question the EC GPE must be enabled during suspend-to-idle transitions for the button press events to be signaled while suspended at all, but fortunately there is a way out of this puzzle. First of all, those systems have the ACPI_FADT_LOW_POWER_S0 flag set in their ACPI tables, which means that the OS is expected to prefer the "low power S0 idle" system state over ACPI S3 on them. That causes the most recent versions of other OSes to simply ignore ACPI S3 on those systems, so it is reasonable to expect that it should not be necessary to block GPEs during suspend-to-idle on them. Second, in addition to that, the systems in question provide a special firmware interface that can be used to indicate to the platform that the OS is transitioning into a system-wide low-power state in which certain types of activity are not desirable or that it is leaving such a state and that (in principle) should allow the platform to adjust its operation mode accordingly. That interface is a special _DSM object under a System Power Management Controller device (PNP0D80). The expected way to use it is to invoke function 0 from it on system initialization, functions 3 and 5 during suspend transitions and functions 4 and 6 during resume transitions (to reverse the actions carried out by the former). In particular, function 5 from the "Low-Power S0" device _DSM is expected to cause the platform to put itself into a low-power operation mode which should include making the EC less verbose (so to speak). Next, on resume, function 6 switches the platform back to the "working-state" operation mode. In accordance with the above, modify the ACPI suspend-to-idle code to look for the "Low-Power S0" _DSM interface on platforms with the ACPI_FADT_LOW_POWER_S0 flag set in the ACPI tables. If it's there, use it during suspend-to-idle transitions as prescribed and avoid changing the GPE configuration in that case. [That should reflect what the most recent versions of other OSes do.] Also modify the ACPI EC driver to make it handle events during suspend-to-idle in the usual way if the "Low-Power S0" _DSM interface is going to be used to make the power button events work while suspended on the Dell machines mentioned above Link: http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf Signed-off-by: Rafael J. Wysocki --- drivers/acpi/ec.c | 2 +- drivers/acpi/internal.h | 2 + drivers/acpi/sleep.c | 113 ++++++++++++++++++++++++++++++++++++++-- 3 files changed, 112 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index c24235d8fb52..156e15c35ffa 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -1835,7 +1835,7 @@ static int acpi_ec_suspend(struct device *dev) struct acpi_ec *ec = acpi_driver_data(to_acpi_device(dev)); - if (ec_freeze_events) + if (acpi_sleep_no_ec_events() && ec_freeze_events) acpi_ec_disable_event(ec); return 0; } diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h index 75924ea69071..be79f7db1850 100644 --- a/drivers/acpi/internal.h +++ b/drivers/acpi/internal.h @@ -199,9 +199,11 @@ void acpi_ec_remove_query_handler(struct acpi_ec *ec, u8 query_bit); -------------------------------------------------------------------------- */ #ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT extern bool acpi_s2idle_wakeup(void); +extern bool acpi_sleep_no_ec_events(void); extern int acpi_sleep_init(void); #else static inline bool acpi_s2idle_wakeup(void) { return false; } +static inline bool acpi_sleep_no_ec_events(void) { return true; } static inline int acpi_sleep_init(void) { return -ENXIO; } #endif diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c index 555de11a56b6..be17664736b2 100644 --- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -650,18 +650,108 @@ static const struct platform_suspend_ops acpi_suspend_ops_old = { .recover = acpi_pm_finish, }; +static bool s2idle_in_progress; static bool s2idle_wakeup; +/* + * On platforms supporting the Low Power S0 Idle interface there is an ACPI + * device object with the PNP0D80 compatible device ID (System Power Management + * Controller) and a specific _DSM method under it. That method, if present, + * can be used to indicate to the platform that the OS is transitioning into a + * low-power state in which certain types of activity are not desirable or that + * it is leaving such a state, which allows the platform to adjust its operation + * mode accordingly. + */ +static const struct acpi_device_id lps0_device_ids[] = { + {"PNP0D80", }, + {"", }, +}; + +#define ACPI_LPS0_DSM_UUID "c4eb40a0-6cd2-11e2-bcfd-0800200c9a66" + +#define ACPI_LPS0_SCREEN_OFF 3 +#define ACPI_LPS0_SCREEN_ON 4 +#define ACPI_LPS0_ENTRY 5 +#define ACPI_LPS0_EXIT 6 + +#define ACPI_S2IDLE_FUNC_MASK ((1 << ACPI_LPS0_ENTRY) | (1 << ACPI_LPS0_EXIT)) + +static acpi_handle lps0_device_handle; +static guid_t lps0_dsm_guid; +static char lps0_dsm_func_mask; + +static void acpi_sleep_run_lps0_dsm(unsigned int func) +{ + union acpi_object *out_obj; + + if (!(lps0_dsm_func_mask & (1 << func))) + return; + + out_obj = acpi_evaluate_dsm(lps0_device_handle, &lps0_dsm_guid, 1, func, NULL); + ACPI_FREE(out_obj); + + acpi_handle_debug(lps0_device_handle, "_DSM function %u evaluation %s\n", + func, out_obj ? "successful" : "failed"); +} + +static int lps0_device_attach(struct acpi_device *adev, + const struct acpi_device_id *not_used) +{ + union acpi_object *out_obj; + + if (lps0_device_handle) + return 0; + + if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) + return 0; + + guid_parse(ACPI_LPS0_DSM_UUID, &lps0_dsm_guid); + /* Check if the _DSM is present and as expected. */ + out_obj = acpi_evaluate_dsm(adev->handle, &lps0_dsm_guid, 1, 0, NULL); + if (out_obj && out_obj->type == ACPI_TYPE_BUFFER) { + char bitmask = *(char *)out_obj->buffer.pointer; + + if ((bitmask & ACPI_S2IDLE_FUNC_MASK) == ACPI_S2IDLE_FUNC_MASK) { + lps0_dsm_func_mask = bitmask; + lps0_device_handle = adev->handle; + } + + acpi_handle_debug(adev->handle, "_DSM function mask: 0x%x\n", + bitmask); + } else { + acpi_handle_debug(adev->handle, + "_DSM function 0 evaluation failed\n"); + } + ACPI_FREE(out_obj); + return 0; +} + +static struct acpi_scan_handler lps0_handler = { + .ids = lps0_device_ids, + .attach = lps0_device_attach, +}; + static int acpi_freeze_begin(void) { acpi_scan_lock_acquire(); + s2idle_in_progress = true; return 0; } static int acpi_freeze_prepare(void) { - acpi_enable_all_wakeup_gpes(); - acpi_os_wait_events_complete(); + if (lps0_device_handle) { + acpi_sleep_run_lps0_dsm(ACPI_LPS0_SCREEN_OFF); + acpi_sleep_run_lps0_dsm(ACPI_LPS0_ENTRY); + } else { + /* + * The configuration of GPEs is changed here to avoid spurious + * wakeups, but that should not be necessary if this is a + * "low-power S0" platform and the low-power S0 _DSM is present. + */ + acpi_enable_all_wakeup_gpes(); + acpi_os_wait_events_complete(); + } if (acpi_sci_irq_valid()) enable_irq_wake(acpi_sci_irq); @@ -700,11 +790,17 @@ static void acpi_freeze_restore(void) if (acpi_sci_irq_valid()) disable_irq_wake(acpi_sci_irq); - acpi_enable_all_runtime_gpes(); + if (lps0_device_handle) { + acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT); + acpi_sleep_run_lps0_dsm(ACPI_LPS0_SCREEN_ON); + } else { + acpi_enable_all_runtime_gpes(); + } } static void acpi_freeze_end(void) { + s2idle_in_progress = false; acpi_scan_lock_release(); } @@ -727,11 +823,15 @@ static void acpi_sleep_suspend_setup(void) suspend_set_ops(old_suspend_ordering ? &acpi_suspend_ops_old : &acpi_suspend_ops); + + acpi_scan_add_handler(&lps0_handler); freeze_set_ops(&acpi_freeze_ops); } #else /* !CONFIG_SUSPEND */ -#define s2idle_wakeup (false) +#define s2idle_in_progress (false) +#define s2idle_wakeup (false) +#define lps0_device_handle (NULL) static inline void acpi_sleep_suspend_setup(void) {} #endif /* !CONFIG_SUSPEND */ @@ -740,6 +840,11 @@ bool acpi_s2idle_wakeup(void) return s2idle_wakeup; } +bool acpi_sleep_no_ec_events(void) +{ + return !s2idle_in_progress || !lps0_device_handle; +} + #ifdef CONFIG_PM_SLEEP static u32 saved_bm_rld; From b21569cf1de925e0a42c9964bd7f520cb4a4d875 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 22 Jun 2017 09:15:11 +0530 Subject: [PATCH 23/69] PM / OPP: Use - instead of @ for DT entries Compiling the DT file with W=1, DTC warns like follows: Warning (unit_address_vs_reg): Node /opp_table0/opp@1000000000 has a unit name, but no reg property Fix this by replacing '@' with '-' as the OPP nodes will never have a "reg" property. Reported-by: Krzysztof Kozlowski Reported-by: Masahiro Yamada Suggested-by: Mark Rutland Signed-off-by: Viresh Kumar Acked-by: Rob Herring Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- Documentation/devicetree/bindings/opp/opp.txt | 38 +++++++++---------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt index 63725498bd20..e36d261b9ba6 100644 --- a/Documentation/devicetree/bindings/opp/opp.txt +++ b/Documentation/devicetree/bindings/opp/opp.txt @@ -186,20 +186,20 @@ Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together. compatible = "operating-points-v2"; opp-shared; - opp@1000000000 { + opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <975000 970000 985000>; opp-microamp = <70000>; clock-latency-ns = <300000>; opp-suspend; }; - opp@1100000000 { + opp-1100000000 { opp-hz = /bits/ 64 <1100000000>; opp-microvolt = <1000000 980000 1010000>; opp-microamp = <80000>; clock-latency-ns = <310000>; }; - opp@1200000000 { + opp-1200000000 { opp-hz = /bits/ 64 <1200000000>; opp-microvolt = <1025000>; clock-latency-ns = <290000>; @@ -265,20 +265,20 @@ independently. * independently. */ - opp@1000000000 { + opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <975000 970000 985000>; opp-microamp = <70000>; clock-latency-ns = <300000>; opp-suspend; }; - opp@1100000000 { + opp-1100000000 { opp-hz = /bits/ 64 <1100000000>; opp-microvolt = <1000000 980000 1010000>; opp-microamp = <80000>; clock-latency-ns = <310000>; }; - opp@1200000000 { + opp-1200000000 { opp-hz = /bits/ 64 <1200000000>; opp-microvolt = <1025000>; opp-microamp = <90000; @@ -341,20 +341,20 @@ DVFS state together. compatible = "operating-points-v2"; opp-shared; - opp@1000000000 { + opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <975000 970000 985000>; opp-microamp = <70000>; clock-latency-ns = <300000>; opp-suspend; }; - opp@1100000000 { + opp-1100000000 { opp-hz = /bits/ 64 <1100000000>; opp-microvolt = <1000000 980000 1010000>; opp-microamp = <80000>; clock-latency-ns = <310000>; }; - opp@1200000000 { + opp-1200000000 { opp-hz = /bits/ 64 <1200000000>; opp-microvolt = <1025000>; opp-microamp = <90000>; @@ -367,20 +367,20 @@ DVFS state together. compatible = "operating-points-v2"; opp-shared; - opp@1300000000 { + opp-1300000000 { opp-hz = /bits/ 64 <1300000000>; opp-microvolt = <1050000 1045000 1055000>; opp-microamp = <95000>; clock-latency-ns = <400000>; opp-suspend; }; - opp@1400000000 { + opp-1400000000 { opp-hz = /bits/ 64 <1400000000>; opp-microvolt = <1075000>; opp-microamp = <100000>; clock-latency-ns = <400000>; }; - opp@1500000000 { + opp-1500000000 { opp-hz = /bits/ 64 <1500000000>; opp-microvolt = <1100000 1010000 1110000>; opp-microamp = <95000>; @@ -409,7 +409,7 @@ Example 4: Handling multiple regulators compatible = "operating-points-v2"; opp-shared; - opp@1000000000 { + opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <970000>, /* Supply 0 */ <960000>, /* Supply 1 */ @@ -422,7 +422,7 @@ Example 4: Handling multiple regulators /* OR */ - opp@1000000000 { + opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <975000 970000 985000>, /* Supply 0 */ <965000 960000 975000>, /* Supply 1 */ @@ -435,7 +435,7 @@ Example 4: Handling multiple regulators /* OR */ - opp@1000000000 { + opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <975000 970000 985000>, /* Supply 0 */ <965000 960000 975000>, /* Supply 1 */ @@ -467,7 +467,7 @@ Example 5: opp-supported-hw status = "okay"; opp-shared; - opp@600000000 { + opp-600000000 { /* * Supports all substrate and process versions for 0xF * cuts, i.e. only first four cuts. @@ -478,7 +478,7 @@ Example 5: opp-supported-hw ... }; - opp@800000000 { + opp-800000000 { /* * Supports: * - cuts: only one, 6th cut (represented by 6th bit). @@ -510,7 +510,7 @@ Example 6: opp-microvolt-, opp-microamp-: compatible = "operating-points-v2"; opp-shared; - opp@1000000000 { + opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt-slow = <915000 900000 925000>; opp-microvolt-fast = <975000 970000 985000>; @@ -518,7 +518,7 @@ Example 6: opp-microvolt-, opp-microamp-: opp-microamp-fast = <71000>; }; - opp@1200000000 { + opp-1200000000 { opp-hz = /bits/ 64 <1200000000>; opp-microvolt-slow = <915000 900000 925000>, /* Supply vcc0 */ <925000 910000 935000>; /* Supply vcc1 */ From a0df77348ab37687c9513bc2dd772cbc6eadbe63 Mon Sep 17 00:00:00 2001 From: Tao Wang Date: Tue, 23 May 2017 16:13:18 +0800 Subject: [PATCH 24/69] cpufreq: dt: Add support for hi3660 Add the compatible string for supporting the generic device tree cpufreq-dt driver on Hisilicon's 3660 SoC. Signed-off-by: Tao Wang Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq-dt-platdev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c index 921b4a6c3d16..1c262923fe58 100644 --- a/drivers/cpufreq/cpufreq-dt-platdev.c +++ b/drivers/cpufreq/cpufreq-dt-platdev.c @@ -31,6 +31,7 @@ static const struct of_device_id machines[] __initconst = { { .compatible = "arm,integrator-ap", }, { .compatible = "arm,integrator-cp", }, + { .compatible = "hisilicon,hi3660", }, { .compatible = "hisilicon,hi6220", }, { .compatible = "fsl,imx27", }, From 3fafb4e77279ed375077e95f54ee687b481c24d2 Mon Sep 17 00:00:00 2001 From: Octavian Purdila Date: Tue, 30 May 2017 18:57:18 +0300 Subject: [PATCH 25/69] cpufreq: imx6q: imx6ull should use the same flow as imx6ul This fixes an issue with imx6ull where setting the frequency to 528Mhz would actually set the ARM clock to 324Mhz. Signed-off-by: Octavian Purdila Signed-off-by: Leonard Crestez Acked-by: Viresh Kumar Reviewed-by: Fabio Estevam Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/imx6q-cpufreq.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/imx6q-cpufreq.c b/drivers/cpufreq/imx6q-cpufreq.c index 9c13f097fd8c..b6edd3ccaa55 100644 --- a/drivers/cpufreq/imx6q-cpufreq.c +++ b/drivers/cpufreq/imx6q-cpufreq.c @@ -101,7 +101,8 @@ static int imx6q_set_target(struct cpufreq_policy *policy, unsigned int index) * - Reprogram pll1_sys_clk and reparent pll1_sw_clk back to it * - Disable pll2_pfd2_396m_clk */ - if (of_machine_is_compatible("fsl,imx6ul")) { + if (of_machine_is_compatible("fsl,imx6ul") || + of_machine_is_compatible("fsl,imx6ull")) { /* * When changing pll1_sw_clk's parent to pll1_sys_clk, * CPU may run at higher than 528MHz, this will lead to @@ -215,7 +216,8 @@ static int imx6q_cpufreq_probe(struct platform_device *pdev) goto put_clk; } - if (of_machine_is_compatible("fsl,imx6ul")) { + if (of_machine_is_compatible("fsl,imx6ul") || + of_machine_is_compatible("fsl,imx6ull")) { pll2_bus_clk = clk_get(cpu_dev, "pll2_bus"); secondary_sel_clk = clk_get(cpu_dev, "secondary_sel"); if (IS_ERR(pll2_bus_clk) || IS_ERR(secondary_sel_clk)) { From e773f5c7e8c6279b2560ae1ca2a1d37cc12fe7c3 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 7 Jun 2017 20:13:24 +0200 Subject: [PATCH 26/69] cpufreq: exynos5440: Fix inconsistent indenting Fix inconsistent indenting and unneeded white space in assignment. Signed-off-by: Krzysztof Kozlowski Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/exynos5440-cpufreq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/cpufreq/exynos5440-cpufreq.c b/drivers/cpufreq/exynos5440-cpufreq.c index 9180d34cc9fc..b6b369c22272 100644 --- a/drivers/cpufreq/exynos5440-cpufreq.c +++ b/drivers/cpufreq/exynos5440-cpufreq.c @@ -173,12 +173,12 @@ static void exynos_enable_dvfs(unsigned int cur_frequency) /* Enable PSTATE Change Event */ tmp = __raw_readl(dvfs_info->base + XMU_PMUEVTEN); tmp |= (1 << PSTATE_CHANGED_EVTEN_SHIFT); - __raw_writel(tmp, dvfs_info->base + XMU_PMUEVTEN); + __raw_writel(tmp, dvfs_info->base + XMU_PMUEVTEN); /* Enable PSTATE Change IRQ */ tmp = __raw_readl(dvfs_info->base + XMU_PMUIRQEN); tmp |= (1 << PSTATE_CHANGED_IRQEN_SHIFT); - __raw_writel(tmp, dvfs_info->base + XMU_PMUIRQEN); + __raw_writel(tmp, dvfs_info->base + XMU_PMUIRQEN); /* Set initial performance index */ cpufreq_for_each_entry(pos, freq_table) @@ -330,7 +330,7 @@ static int exynos_cpufreq_probe(struct platform_device *pdev) struct resource res; unsigned int cur_frequency; - np = pdev->dev.of_node; + np = pdev->dev.of_node; if (!np) return -ENODEV; From 829a4e8c0e9aab17bcfe2ddb070388b8ada26292 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Wed, 21 Jun 2017 10:29:13 +0530 Subject: [PATCH 27/69] PM / OPP: Add dev_pm_opp_{set|put}_clkname() In order to support OPP switching, OPP layer needs to get pointer to the clock for the device. Simple cases work fine without using the routines added by this patch (i.e. by passing connection-id as NULL), but for a device with multiple clocks available, the OPP core needs to know the exact name of the clk to use. Add a new set of APIs to get that done. Tested-by: Rajendra Nayak Signed-off-by: Viresh Kumar Reviewed-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- drivers/base/power/opp/core.c | 67 +++++++++++++++++++++++++++++++++++ include/linux/pm_opp.h | 9 +++++ 2 files changed, 76 insertions(+) diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c index 5ee7aadf0abf..a8cc14fd8ae4 100644 --- a/drivers/base/power/opp/core.c +++ b/drivers/base/power/opp/core.c @@ -1362,6 +1362,73 @@ void dev_pm_opp_put_regulators(struct opp_table *opp_table) } EXPORT_SYMBOL_GPL(dev_pm_opp_put_regulators); +/** + * dev_pm_opp_set_clkname() - Set clk name for the device + * @dev: Device for which clk name is being set. + * @name: Clk name. + * + * In order to support OPP switching, OPP layer needs to get pointer to the + * clock for the device. Simple cases work fine without using this routine (i.e. + * by passing connection-id as NULL), but for a device with multiple clocks + * available, the OPP core needs to know the exact name of the clk to use. + * + * This must be called before any OPPs are initialized for the device. + */ +struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char *name) +{ + struct opp_table *opp_table; + int ret; + + opp_table = dev_pm_opp_get_opp_table(dev); + if (!opp_table) + return ERR_PTR(-ENOMEM); + + /* This should be called before OPPs are initialized */ + if (WARN_ON(!list_empty(&opp_table->opp_list))) { + ret = -EBUSY; + goto err; + } + + /* Already have default clk set, free it */ + if (!IS_ERR(opp_table->clk)) + clk_put(opp_table->clk); + + /* Find clk for the device */ + opp_table->clk = clk_get(dev, name); + if (IS_ERR(opp_table->clk)) { + ret = PTR_ERR(opp_table->clk); + if (ret != -EPROBE_DEFER) { + dev_err(dev, "%s: Couldn't find clock: %d\n", __func__, + ret); + } + goto err; + } + + return opp_table; + +err: + dev_pm_opp_put_opp_table(opp_table); + + return ERR_PTR(ret); +} +EXPORT_SYMBOL_GPL(dev_pm_opp_set_clkname); + +/** + * dev_pm_opp_put_clkname() - Releases resources blocked for clk. + * @opp_table: OPP table returned from dev_pm_opp_set_clkname(). + */ +void dev_pm_opp_put_clkname(struct opp_table *opp_table) +{ + /* Make sure there are no concurrent readers while updating opp_table */ + WARN_ON(!list_empty(&opp_table->opp_list)); + + clk_put(opp_table->clk); + opp_table->clk = ERR_PTR(-EINVAL); + + dev_pm_opp_put_opp_table(opp_table); +} +EXPORT_SYMBOL_GPL(dev_pm_opp_put_clkname); + /** * dev_pm_opp_register_set_opp_helper() - Register custom set OPP helper * @dev: Device for which the helper is getting registered. diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index a6685b3dde26..51ec727b4824 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -121,6 +121,8 @@ struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name) void dev_pm_opp_put_prop_name(struct opp_table *opp_table); struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[], unsigned int count); void dev_pm_opp_put_regulators(struct opp_table *opp_table); +struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char * name); +void dev_pm_opp_put_clkname(struct opp_table *opp_table); struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data)); void dev_pm_opp_register_put_opp_helper(struct opp_table *opp_table); int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq); @@ -257,6 +259,13 @@ static inline struct opp_table *dev_pm_opp_set_regulators(struct device *dev, co static inline void dev_pm_opp_put_regulators(struct opp_table *opp_table) {} +static inline struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char * name) +{ + return ERR_PTR(-ENOTSUPP); +} + +static inline void dev_pm_opp_put_clkname(struct opp_table *opp_table) {} + static inline int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) { return -ENOTSUPP; From 0370f0f975e5561c658aa86e2c372079e6a425eb Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 15 Jun 2017 10:55:59 +0100 Subject: [PATCH 28/69] cpufreq: sfi: make freq_table static pointer freq_table can be made static as it does not need to be in global scope. Cleans up sparse warning: "symbol 'freq_table' was not declared. Should it be static?" Signed-off-by: Colin Ian King Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/sfi-cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/sfi-cpufreq.c b/drivers/cpufreq/sfi-cpufreq.c index 992ce6f9abec..3779742f86e3 100644 --- a/drivers/cpufreq/sfi-cpufreq.c +++ b/drivers/cpufreq/sfi-cpufreq.c @@ -24,7 +24,7 @@ #include -struct cpufreq_frequency_table *freq_table; +static struct cpufreq_frequency_table *freq_table; static struct sfi_freq_table_entry *sfi_cpufreq_array; static int num_freq_table_entries; From 51204e0639c49ada02fd823782ad673b6326d748 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 16 Jun 2017 20:03:11 -0700 Subject: [PATCH 29/69] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" cpufreq_quick_get() allows cpufreq drivers to over-ride cpu_khz that is otherwise reported in x86 /proc/cpuinfo "cpu MHz". There are four problems with this scheme, any of them is sufficient justification to delete it. 1. Depending on which cpufreq driver is loaded, the behavior of this field is different. 2. Distros complain that they have to explain to users why and how this field changes. Distros have requested a constant. 3. The two major providers of this information, acpi_cpufreq and intel_pstate, both "get it wrong" in different ways. acpi_cpufreq lies to the user by telling them that they are running at whatever frequency was last requested by software. intel_pstate lies to the user by telling them that they are running at the average frequency computed over an undefined measurement. But an average computed over an undefined interval, is itself, undefined... 4. On modern processors, user space utilities, such as turbostat(1), are more accurate and more precise, while supporing concurrent measurement over arbitrary intervals. Users who have been consulting /proc/cpuinfo to track changing CPU frequency will be dissapointed that it no longer wiggles -- perhaps being unaware of the limitations of the information they have been consuming. Yes, they can change their scripts to look in sysfs cpufreq/scaling_cur_frequency. Here they will find the same data of dubious quality here removed from /proc/cpuinfo. The value in sysfs will be addressed in a subsequent patch to address issues 1-3, above. Issue 4 will remain -- users that really care about accurate frequency information should not be using either proc or sysfs kernel interfaces. They should be using using turbostat(8), or a similar purpose-built analysis tool. Signed-off-by: Len Brown Reviewed-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- arch/x86/kernel/cpu/proc.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 6df621ae62a7..218f79825b3c 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -2,7 +2,6 @@ #include #include #include -#include /* * Get CPU information for use by the procfs. @@ -76,14 +75,9 @@ static int show_cpuinfo(struct seq_file *m, void *v) if (c->microcode) seq_printf(m, "microcode\t: 0x%x\n", c->microcode); - if (cpu_has(c, X86_FEATURE_TSC)) { - unsigned int freq = cpufreq_quick_get(cpu); - - if (!freq) - freq = cpu_khz; + if (cpu_has(c, X86_FEATURE_TSC)) seq_printf(m, "cpu MHz\t\t: %u.%03u\n", - freq / 1000, (freq % 1000)); - } + cpu_khz / 1000, (cpu_khz % 1000)); /* Cache size */ if (c->x86_cache_size >= 0) From 1a4fe38add8be45eb3873ad756561b9baf6bcaef Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Mon, 12 Jun 2017 16:30:27 -0700 Subject: [PATCH 30/69] cpufreq: intel_pstate: Remove max/min fractions to limit performance In the current model the max/min perf limits are a fraction of current user space limits to the allowed max_freq or 100% for global limits. This results in wrong ratio limits calculation because of rounding issues for some user space limits. Initially we tried to solve this issue by issue by having more shift bits to increase precision. Still there are isolated cases where we still have error. This can be avoided by using ratios all together. Since the way we get cpuinfo.max_freq is by multiplying scaling factor to max ratio, we can easily keep the max/min ratios in terms of ratios and not fractions. For example: if the max ratio = 36 cpuinfo.max_freq = 36 * 100000 = 3600000 Suppose user space sets a limit of 1200000, then we can calculate max ratio limit as = 36 * 1200000 / 3600000 = 12 This will be correct for any user limits. The other advantage is that, we don't need to do any calculation in the fast path as ratio limit is already calculated via set_policy() callback. Signed-off-by: Srinivas Pandruvada Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 114 ++++++++++++++++++--------------- 1 file changed, 63 insertions(+), 51 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index eb1158532de3..be0290a1a5e2 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -231,10 +231,8 @@ struct global_params { * @prev_cummulative_iowait: IO Wait time difference from last and * current sample * @sample: Storage for storing last Sample data - * @min_perf: Minimum capacity limit as a fraction of the maximum - * turbo P-state capacity. - * @max_perf: Maximum capacity limit as a fraction of the maximum - * turbo P-state capacity. + * @min_perf_ratio: Minimum capacity in terms of PERF or HWP ratios + * @max_perf_ratio: Maximum capacity in terms of PERF or HWP ratios * @acpi_perf_data: Stores ACPI perf information read from _PSS * @valid_pss_table: Set to true for valid ACPI _PSS entries found * @epp_powersave: Last saved HWP energy performance preference @@ -266,8 +264,8 @@ struct cpudata { u64 prev_tsc; u64 prev_cummulative_iowait; struct sample sample; - int32_t min_perf; - int32_t max_perf; + int32_t min_perf_ratio; + int32_t max_perf_ratio; #ifdef CONFIG_ACPI struct acpi_processor_performance acpi_perf_data; bool valid_pss_table; @@ -794,25 +792,32 @@ static struct freq_attr *hwp_cpufreq_attrs[] = { NULL, }; +static void intel_pstate_get_hwp_max(unsigned int cpu, int *phy_max, + int *current_max) +{ + u64 cap; + + rdmsrl_on_cpu(cpu, MSR_HWP_CAPABILITIES, &cap); + if (global.no_turbo) + *current_max = HWP_GUARANTEED_PERF(cap); + else + *current_max = HWP_HIGHEST_PERF(cap); + + *phy_max = HWP_HIGHEST_PERF(cap); +} + static void intel_pstate_hwp_set(unsigned int cpu) { struct cpudata *cpu_data = all_cpu_data[cpu]; - int min, hw_min, max, hw_max; - u64 value, cap; + int max, min; + u64 value; s16 epp; - rdmsrl_on_cpu(cpu, MSR_HWP_CAPABILITIES, &cap); - hw_min = HWP_LOWEST_PERF(cap); - if (global.no_turbo) - hw_max = HWP_GUARANTEED_PERF(cap); - else - hw_max = HWP_HIGHEST_PERF(cap); + max = cpu_data->max_perf_ratio; + min = cpu_data->min_perf_ratio; - max = fp_ext_toint(hw_max * cpu_data->max_perf); if (cpu_data->policy == CPUFREQ_POLICY_PERFORMANCE) min = max; - else - min = fp_ext_toint(hw_max * cpu_data->min_perf); rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value); @@ -1528,8 +1533,7 @@ static void intel_pstate_max_within_limits(struct cpudata *cpu) update_turbo_state(); pstate = intel_pstate_get_base_pstate(cpu); - pstate = max(cpu->pstate.min_pstate, - fp_ext_toint(pstate * cpu->max_perf)); + pstate = max(cpu->pstate.min_pstate, cpu->max_perf_ratio); intel_pstate_set_pstate(cpu, pstate); } @@ -1695,9 +1699,8 @@ static int intel_pstate_prepare_request(struct cpudata *cpu, int pstate) int max_pstate = intel_pstate_get_base_pstate(cpu); int min_pstate; - min_pstate = max(cpu->pstate.min_pstate, - fp_ext_toint(max_pstate * cpu->min_perf)); - max_pstate = max(min_pstate, fp_ext_toint(max_pstate * cpu->max_perf)); + min_pstate = max(cpu->pstate.min_pstate, cpu->min_perf_ratio); + max_pstate = max(min_pstate, cpu->max_perf_ratio); return clamp_t(int, pstate, min_pstate, max_pstate); } @@ -1967,52 +1970,61 @@ static void intel_pstate_update_perf_limits(struct cpufreq_policy *policy, { int max_freq = intel_pstate_get_max_freq(cpu); int32_t max_policy_perf, min_policy_perf; + int max_state, turbo_max; - max_policy_perf = div_ext_fp(policy->max, max_freq); - max_policy_perf = clamp_t(int32_t, max_policy_perf, 0, int_ext_tofp(1)); + /* + * HWP needs some special consideration, because on BDX the + * HWP_REQUEST uses abstract value to represent performance + * rather than pure ratios. + */ + if (hwp_active) { + intel_pstate_get_hwp_max(cpu->cpu, &turbo_max, &max_state); + } else { + max_state = intel_pstate_get_base_pstate(cpu); + turbo_max = cpu->pstate.turbo_pstate; + } + + max_policy_perf = max_state * policy->max / max_freq; if (policy->max == policy->min) { min_policy_perf = max_policy_perf; } else { - min_policy_perf = div_ext_fp(policy->min, max_freq); + min_policy_perf = max_state * policy->min / max_freq; min_policy_perf = clamp_t(int32_t, min_policy_perf, 0, max_policy_perf); } + pr_debug("cpu:%d max_state %d min_policy_perf:%d max_policy_perf:%d\n", + policy->cpu, max_state, + min_policy_perf, max_policy_perf); + /* Normalize user input to [min_perf, max_perf] */ if (per_cpu_limits) { - cpu->min_perf = min_policy_perf; - cpu->max_perf = max_policy_perf; + cpu->min_perf_ratio = min_policy_perf; + cpu->max_perf_ratio = max_policy_perf; } else { int32_t global_min, global_max; /* Global limits are in percent of the maximum turbo P-state. */ - global_max = percent_ext_fp(global.max_perf_pct); - global_min = percent_ext_fp(global.min_perf_pct); - if (max_freq != cpu->pstate.turbo_freq) { - int32_t turbo_factor; - - turbo_factor = div_ext_fp(cpu->pstate.turbo_pstate, - cpu->pstate.max_pstate); - global_min = mul_ext_fp(global_min, turbo_factor); - global_max = mul_ext_fp(global_max, turbo_factor); - } + global_max = DIV_ROUND_UP(turbo_max * global.max_perf_pct, 100); + global_min = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100); global_min = clamp_t(int32_t, global_min, 0, global_max); - cpu->min_perf = max(min_policy_perf, global_min); - cpu->min_perf = min(cpu->min_perf, max_policy_perf); - cpu->max_perf = min(max_policy_perf, global_max); - cpu->max_perf = max(min_policy_perf, cpu->max_perf); + pr_debug("cpu:%d global_min:%d global_max:%d\n", policy->cpu, + global_min, global_max); + + cpu->min_perf_ratio = max(min_policy_perf, global_min); + cpu->min_perf_ratio = min(cpu->min_perf_ratio, max_policy_perf); + cpu->max_perf_ratio = min(max_policy_perf, global_max); + cpu->max_perf_ratio = max(min_policy_perf, cpu->max_perf_ratio); /* Make sure min_perf <= max_perf */ - cpu->min_perf = min(cpu->min_perf, cpu->max_perf); + cpu->min_perf_ratio = min(cpu->min_perf_ratio, + cpu->max_perf_ratio); + } - - cpu->max_perf = round_up(cpu->max_perf, EXT_FRAC_BITS); - cpu->min_perf = round_up(cpu->min_perf, EXT_FRAC_BITS); - - pr_debug("cpu:%d max_perf_pct:%d min_perf_pct:%d\n", policy->cpu, - fp_ext_toint(cpu->max_perf * 100), - fp_ext_toint(cpu->min_perf * 100)); + pr_debug("cpu:%d max_perf_ratio:%d min_perf_ratio:%d\n", policy->cpu, + cpu->max_perf_ratio, + cpu->min_perf_ratio); } static int intel_pstate_set_policy(struct cpufreq_policy *policy) @@ -2115,8 +2127,8 @@ static int __intel_pstate_cpu_init(struct cpufreq_policy *policy) cpu = all_cpu_data[policy->cpu]; - cpu->max_perf = int_ext_tofp(1); - cpu->min_perf = 0; + cpu->max_perf_ratio = 0xFF; + cpu->min_perf_ratio = 0; policy->min = cpu->pstate.min_pstate * cpu->pstate.scaling; policy->max = cpu->pstate.turbo_pstate * cpu->pstate.scaling; From d50a7d8acd780f54c48703ab1069c2e64a24e4a0 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Mon, 12 Jun 2017 17:55:10 +0200 Subject: [PATCH 31/69] ARM: cpuidle: Support asymmetric idle definition Some hardware have clusters with different idle states. The current code does not support this and fails as it expects all the idle states to be identical. Because of this, the Mediatek mtk8173 had to create the same idle state for a big.Little system and now the Hisilicon 960 is facing the same situation. Solve this by simply assuming the multiple driver will be needed for all the platforms using the ARM generic cpuidle driver which makes sense because of the different topologies we can support with a single kernel for ARM32 or ARM64. Every CPU has its own driver, so every single CPU can specify in the DT the idle states. This simple approach allows to support the future dynamIQ system, current SMP and HMP. Tested on: - 96boards: Hikey 620 - 96boards: Hikey 960 - 96boards: dragonboard410c - Mediatek 8173 Tested-by: Leo Yan Signed-off-by: Daniel Lezcano Acked-by: Sudeep Holla Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/Kconfig.arm | 1 + drivers/cpuidle/cpuidle-arm.c | 62 +++++++++++++++++++++-------------- 2 files changed, 39 insertions(+), 24 deletions(-) diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm index 21340e0be73e..f52144808455 100644 --- a/drivers/cpuidle/Kconfig.arm +++ b/drivers/cpuidle/Kconfig.arm @@ -4,6 +4,7 @@ config ARM_CPUIDLE bool "Generic ARM/ARM64 CPU idle Driver" select DT_IDLE_STATES + select CPU_IDLE_MULTIPLE_DRIVERS help Select this to enable generic cpuidle driver for ARM. It provides a generic idle driver whose idle states are configured diff --git a/drivers/cpuidle/cpuidle-arm.c b/drivers/cpuidle/cpuidle-arm.c index f440d385ed34..7080c384ad5d 100644 --- a/drivers/cpuidle/cpuidle-arm.c +++ b/drivers/cpuidle/cpuidle-arm.c @@ -18,6 +18,7 @@ #include #include #include +#include #include @@ -44,7 +45,7 @@ static int arm_enter_idle_state(struct cpuidle_device *dev, return CPU_PM_CPU_IDLE_ENTER(arm_cpuidle_suspend, idx); } -static struct cpuidle_driver arm_idle_driver = { +static struct cpuidle_driver arm_idle_driver __initdata = { .name = "arm_idle", .owner = THIS_MODULE, /* @@ -80,30 +81,42 @@ static const struct of_device_id arm_idle_state_match[] __initconst = { static int __init arm_idle_init(void) { int cpu, ret; - struct cpuidle_driver *drv = &arm_idle_driver; + struct cpuidle_driver *drv; struct cpuidle_device *dev; - /* - * Initialize idle states data, starting at index 1. - * This driver is DT only, if no DT idle states are detected (ret == 0) - * let the driver initialization fail accordingly since there is no - * reason to initialize the idle driver if only wfi is supported. - */ - ret = dt_init_idle_driver(drv, arm_idle_state_match, 1); - if (ret <= 0) - return ret ? : -ENODEV; - - ret = cpuidle_register_driver(drv); - if (ret) { - pr_err("Failed to register cpuidle driver\n"); - return ret; - } - - /* - * Call arch CPU operations in order to initialize - * idle states suspend back-end specific data - */ for_each_possible_cpu(cpu) { + + drv = kmemdup(&arm_idle_driver, sizeof(*drv), GFP_KERNEL); + if (!drv) { + ret = -ENOMEM; + goto out_fail; + } + + drv->cpumask = (struct cpumask *)cpumask_of(cpu); + + /* + * Initialize idle states data, starting at index 1. This + * driver is DT only, if no DT idle states are detected (ret + * == 0) let the driver initialization fail accordingly since + * there is no reason to initialize the idle driver if only + * wfi is supported. + */ + ret = dt_init_idle_driver(drv, arm_idle_state_match, 1); + if (ret <= 0) { + ret = ret ? : -ENODEV; + goto out_fail; + } + + ret = cpuidle_register_driver(drv); + if (ret) { + pr_err("Failed to register cpuidle driver\n"); + goto out_fail; + } + + /* + * Call arch CPU operations in order to initialize + * idle states suspend back-end specific data + */ ret = arm_cpuidle_init(cpu); /* @@ -141,10 +154,11 @@ out_fail: dev = per_cpu(cpuidle_devices, cpu); cpuidle_unregister_device(dev); kfree(dev); + drv = cpuidle_get_driver(); + cpuidle_unregister_driver(drv); + kfree(drv); } - cpuidle_unregister_driver(drv); - return ret; } device_initcall(arm_idle_init); From a99d87306f83d2a97c8c7e854b6583c4037ecf75 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Sat, 20 May 2017 20:11:55 -0400 Subject: [PATCH 32/69] tools/power turbostat: hide SKL counters, when not requested Skylake has some new counters, and they were erroneously exempt from --show and --hide eg. turbostat --quiet --show CPU CPU Totl%C0 Any%C0 GFX%C0 CPUGFX% - 116.73 90.56 85.69 79.00 0 117.78 91.38 86.47 79.71 2 1 3 is now CPU - 0 2 1 3 Signed-off-by: Len Brown --- tools/power/x86/turbostat/turbostat.c | 58 +++++++++++++++++++++------ 1 file changed, 45 insertions(+), 13 deletions(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index b11294730771..2b25727bd3d7 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -57,7 +57,6 @@ unsigned int list_header_only; unsigned int dump_only; unsigned int do_snb_cstates; unsigned int do_knl_cstates; -unsigned int do_skl_residency; unsigned int do_slm_cstates; unsigned int use_c1_residency_msr; unsigned int has_aperf; @@ -384,8 +383,14 @@ struct msr_counter bic[] = { { 0x0, "CPU" }, { 0x0, "Mod%c6" }, { 0x0, "sysfs" }, + { 0x0, "Totl%C0" }, + { 0x0, "Any%C0" }, + { 0x0, "GFX%C0" }, + { 0x0, "CPUGFX%" }, }; + + #define MAX_BIC (sizeof(bic) / sizeof(struct msr_counter)) #define BIC_Package (1ULL << 0) #define BIC_Avg_MHz (1ULL << 1) @@ -426,6 +431,10 @@ struct msr_counter bic[] = { #define BIC_CPU (1ULL << 36) #define BIC_Mod_c6 (1ULL << 37) #define BIC_sysfs (1ULL << 38) +#define BIC_Totl_c0 (1ULL << 39) +#define BIC_Any_c0 (1ULL << 40) +#define BIC_GFX_c0 (1ULL << 41) +#define BIC_CPUGFX (1ULL << 42) unsigned long long bic_enabled = 0xFFFFFFFFFFFFFFFFULL; unsigned long long bic_present = BIC_sysfs; @@ -599,12 +608,14 @@ void print_header(char *delim) if (DO_BIC(BIC_GFXMHz)) outp += sprintf(outp, "%sGFXMHz", (printed++ ? delim : "")); - if (do_skl_residency) { + if (DO_BIC(BIC_Totl_c0)) outp += sprintf(outp, "%sTotl%%C0", (printed++ ? delim : "")); + if (DO_BIC(BIC_Any_c0)) outp += sprintf(outp, "%sAny%%C0", (printed++ ? delim : "")); + if (DO_BIC(BIC_GFX_c0)) outp += sprintf(outp, "%sGFX%%C0", (printed++ ? delim : "")); + if (DO_BIC(BIC_CPUGFX)) outp += sprintf(outp, "%sCPUGFX%%", (printed++ ? delim : "")); - } if (DO_BIC(BIC_Pkgpc2)) outp += sprintf(outp, "%sPkg%%pc2", (printed++ ? delim : "")); @@ -912,12 +923,14 @@ int format_counters(struct thread_data *t, struct core_data *c, outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), p->gfx_mhz); /* Totl%C0, Any%C0 GFX%C0 CPUGFX% */ - if (do_skl_residency) { + if (DO_BIC(BIC_Totl_c0)) outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->pkg_wtd_core_c0/tsc); + if (DO_BIC(BIC_Any_c0)) outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->pkg_any_core_c0/tsc); + if (DO_BIC(BIC_GFX_c0)) outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->pkg_any_gfxe_c0/tsc); + if (DO_BIC(BIC_CPUGFX)) outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->pkg_both_core_gfxe_c0/tsc); - } if (DO_BIC(BIC_Pkgpc2)) outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->pc2/tsc); @@ -1038,12 +1051,16 @@ delta_package(struct pkg_data *new, struct pkg_data *old) int i; struct msr_counter *mp; - if (do_skl_residency) { + + if (DO_BIC(BIC_Totl_c0)) old->pkg_wtd_core_c0 = new->pkg_wtd_core_c0 - old->pkg_wtd_core_c0; + if (DO_BIC(BIC_Any_c0)) old->pkg_any_core_c0 = new->pkg_any_core_c0 - old->pkg_any_core_c0; + if (DO_BIC(BIC_GFX_c0)) old->pkg_any_gfxe_c0 = new->pkg_any_gfxe_c0 - old->pkg_any_gfxe_c0; + if (DO_BIC(BIC_CPUGFX)) old->pkg_both_core_gfxe_c0 = new->pkg_both_core_gfxe_c0 - old->pkg_both_core_gfxe_c0; - } + old->pc2 = new->pc2 - old->pc2; if (DO_BIC(BIC_Pkgpc3)) old->pc3 = new->pc3 - old->pc3; @@ -1292,12 +1309,14 @@ int sum_counters(struct thread_data *t, struct core_data *c, if (!(t->flags & CPU_IS_FIRST_CORE_IN_PACKAGE)) return 0; - if (do_skl_residency) { + if (DO_BIC(BIC_Totl_c0)) average.packages.pkg_wtd_core_c0 += p->pkg_wtd_core_c0; + if (DO_BIC(BIC_Any_c0)) average.packages.pkg_any_core_c0 += p->pkg_any_core_c0; + if (DO_BIC(BIC_GFX_c0)) average.packages.pkg_any_gfxe_c0 += p->pkg_any_gfxe_c0; + if (DO_BIC(BIC_CPUGFX)) average.packages.pkg_both_core_gfxe_c0 += p->pkg_both_core_gfxe_c0; - } average.packages.pc2 += p->pc2; if (DO_BIC(BIC_Pkgpc3)) @@ -1357,12 +1376,14 @@ void compute_average(struct thread_data *t, struct core_data *c, average.cores.c7 /= topo.num_cores; average.cores.mc6_us /= topo.num_cores; - if (do_skl_residency) { + if (DO_BIC(BIC_Totl_c0)) average.packages.pkg_wtd_core_c0 /= topo.num_packages; + if (DO_BIC(BIC_Any_c0)) average.packages.pkg_any_core_c0 /= topo.num_packages; + if (DO_BIC(BIC_GFX_c0)) average.packages.pkg_any_gfxe_c0 /= topo.num_packages; + if (DO_BIC(BIC_CPUGFX)) average.packages.pkg_both_core_gfxe_c0 /= topo.num_packages; - } average.packages.pc2 /= topo.num_packages; if (DO_BIC(BIC_Pkgpc3)) @@ -1603,13 +1624,19 @@ retry: if (!(t->flags & CPU_IS_FIRST_CORE_IN_PACKAGE)) return 0; - if (do_skl_residency) { + if (DO_BIC(BIC_Totl_c0)) { if (get_msr(cpu, MSR_PKG_WEIGHTED_CORE_C0_RES, &p->pkg_wtd_core_c0)) return -10; + } + if (DO_BIC(BIC_Any_c0)) { if (get_msr(cpu, MSR_PKG_ANY_CORE_C0_RES, &p->pkg_any_core_c0)) return -11; + } + if (DO_BIC(BIC_GFX_c0)) { if (get_msr(cpu, MSR_PKG_ANY_GFXE_C0_RES, &p->pkg_any_gfxe_c0)) return -12; + } + if (DO_BIC(BIC_CPUGFX)) { if (get_msr(cpu, MSR_PKG_BOTH_CORE_GFXE_C0_RES, &p->pkg_both_core_gfxe_c0)) return -13; } @@ -4198,7 +4225,12 @@ void process_cpuid() BIC_PRESENT(BIC_Pkgpc10); } do_irtl_hsw = has_hsw_msrs(family, model); - do_skl_residency = has_skl_msrs(family, model); + if (has_skl_msrs(family, model)) { + BIC_PRESENT(BIC_Totl_c0); + BIC_PRESENT(BIC_Any_c0); + BIC_PRESENT(BIC_GFX_c0); + BIC_PRESENT(BIC_CPUGFX); + } do_slm_cstates = is_slm(family, model); do_knl_cstates = is_knl(family, model); From f4fdf2b474606580b95eed95d06c762d4fd3f57b Mon Sep 17 00:00:00 2001 From: Len Brown Date: Sat, 27 May 2017 21:06:55 -0700 Subject: [PATCH 33/69] tools/power turbostat: if --debug, print sampling overhead The --debug option now pre-pends each row with the number of micro-seconds [usec] to collect the finishing snapshot for that row. Signed-off-by: Len Brown --- tools/power/x86/turbostat/turbostat.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 2b25727bd3d7..6f7c64acee23 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -150,6 +150,8 @@ size_t cpu_present_setsize, cpu_affinity_setsize, cpu_subset_size; #define MAX_ADDED_COUNTERS 16 struct thread_data { + struct timeval tv_begin; + struct timeval tv_end; unsigned long long tsc; unsigned long long aperf; unsigned long long mperf; @@ -530,6 +532,8 @@ void print_header(char *delim) struct msr_counter *mp; int printed = 0; + if (debug) + outp += sprintf(outp, "usec %s", delim); if (DO_BIC(BIC_Package)) outp += sprintf(outp, "%sPackage", (printed++ ? delim : "")); if (DO_BIC(BIC_Core)) @@ -782,6 +786,14 @@ int format_counters(struct thread_data *t, struct core_data *c, (cpu_subset && !CPU_ISSET_S(t->cpu_id, cpu_subset_size, cpu_subset))) return 0; + if (debug) { + /* on each row, print how many usec each timestamp took to gather */ + struct timeval tv; + + timersub(&t->tv_end, &t->tv_begin, &tv); + outp += sprintf(outp, "%5ld\t", tv.tv_sec * 1000000 + tv.tv_usec); + } + interval_float = tv_delta.tv_sec + tv_delta.tv_usec/1000000.0; tsc = t->tsc * tsc_tweak; @@ -1503,6 +1515,9 @@ int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p) struct msr_counter *mp; int i; + + gettimeofday(&t->tv_begin, (struct timezone *)NULL); + if (cpu_migrate(cpu)) { fprintf(outf, "Could not migrate to CPU %d\n", cpu); return -1; @@ -1586,7 +1601,7 @@ retry: /* collect core counters only for 1st thread in core */ if (!(t->flags & CPU_IS_FIRST_THREAD_IN_CORE)) - return 0; + goto done; if (DO_BIC(BIC_CPU_c3) && !do_slm_cstates && !do_knl_cstates) { if (get_msr(cpu, MSR_CORE_C3_RESIDENCY, &c->c3)) @@ -1622,7 +1637,7 @@ retry: /* collect package counters only for 1st core in package */ if (!(t->flags & CPU_IS_FIRST_CORE_IN_PACKAGE)) - return 0; + goto done; if (DO_BIC(BIC_Totl_c0)) { if (get_msr(cpu, MSR_PKG_WEIGHTED_CORE_C0_RES, &p->pkg_wtd_core_c0)) @@ -1715,6 +1730,8 @@ retry: if (get_mp(cpu, mp, &p->counter[i])) return -10; } +done: + gettimeofday(&t->tv_end, (struct timezone *)NULL); return 0; } From c91fc8519d87715a3a173475ea3778794c139996 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Sat, 27 May 2017 21:18:12 -0700 Subject: [PATCH 34/69] tools/power turbostat: stop migrating, unless '-m' Turbostat has the capability to set its own affinity to each CPU so that its MSR accesses are on the local CPU. However, using the in-kernel cross-call in the msr driver tends to be less invasive, so do that -- by-default. '-m' remains to get the old behaviour. Signed-off-by: Len Brown --- tools/power/x86/turbostat/turbostat.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 6f7c64acee23..1a3a5b436b80 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -92,6 +92,7 @@ unsigned int do_ring_perf_limit_reasons; unsigned int crystal_hz; unsigned long long tsc_hz; int base_cpu; +int do_migrate; double discover_bclk(unsigned int family, unsigned int model); unsigned int has_hwp; /* IA32_PM_ENABLE, IA32_HWP_CAPABILITIES */ /* IA32_HWP_REQUEST, IA32_HWP_STATUS */ @@ -302,6 +303,9 @@ int for_all_cpus(int (func)(struct thread_data *, struct core_data *, struct pkg int cpu_migrate(int cpu) { + if (!do_migrate) + return 0; + CPU_ZERO_S(cpu_affinity_setsize, cpu_affinity_set); CPU_SET_S(cpu, cpu_affinity_setsize, cpu_affinity_set); if (sched_setaffinity(0, cpu_affinity_setsize, cpu_affinity_set) == -1) @@ -5000,6 +5004,7 @@ void cmdline(int argc, char **argv) {"hide", required_argument, 0, 'H'}, // meh, -h taken by --help {"Joules", no_argument, 0, 'J'}, {"list", no_argument, 0, 'l'}, + {"migrate", no_argument, 0, 'm'}, {"out", required_argument, 0, 'o'}, {"quiet", no_argument, 0, 'q'}, {"show", required_argument, 0, 's'}, @@ -5011,7 +5016,7 @@ void cmdline(int argc, char **argv) progname = argv[0]; - while ((opt = getopt_long_only(argc, argv, "+C:c:Ddhi:JM:m:o:qST:v", + while ((opt = getopt_long_only(argc, argv, "+C:c:Ddhi:Jmo:qST:v", long_options, &option_index)) != -1) { switch (opt) { case 'a': @@ -5054,6 +5059,9 @@ void cmdline(int argc, char **argv) list_header_only++; quiet++; break; + case 'm': + do_migrate = 1; + break; case 'o': outf = fopen_or_die(optarg, "w"); break; From f26b151977447be3b86f92c91e1caedc9b5eb8bf Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 23 Jun 2017 20:45:54 -0700 Subject: [PATCH 35/69] tools/power turbostat: decode MSR_IA32_MISC_ENABLE only on Intel otherwise, turbostat bails on on AMD Opteron boxes: turbostat: cpu26: msr offset 0x1a0 read failed: Input/output error Reported-by: Kamil Kolakowski Signed-off-by: Len Brown --- tools/power/x86/turbostat/turbostat.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 1a3a5b436b80..33992a93148b 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -3943,6 +3943,9 @@ void decode_misc_enable_msr(void) { unsigned long long msr; + if (!genuine_intel) + return; + if (!get_msr(base_cpu, MSR_IA32_MISC_ENABLE, &msr)) fprintf(outf, "cpu%d: MSR_IA32_MISC_ENABLE: 0x%08llx (%sTCC %sEIST %sMWAIT %sPREFETCH %sTURBO)\n", base_cpu, msr, From f7d44a8f3fd7f13770470a306a233acbaad5e96d Mon Sep 17 00:00:00 2001 From: Len Brown Date: Sat, 27 May 2017 21:24:58 -0700 Subject: [PATCH 36/69] tools/power turbostat: update version number Signed-off-by: Len Brown --- tools/power/x86/turbostat/turbostat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 33992a93148b..0dafba2c1e7d 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -4634,7 +4634,7 @@ int get_and_dump_counters(void) } void print_version() { - fprintf(outf, "turbostat version 17.04.12" + fprintf(outf, "turbostat version 17.06.23" " - Len Brown \n"); } From 6ae78b4e7c276e5306897269443f66cb4d86e47f Mon Sep 17 00:00:00 2001 From: Sherry Hurwitz Date: Tue, 20 Jun 2017 02:07:37 -0500 Subject: [PATCH 37/69] cpupower: Fix bug where return value was not used Save return value from amd_pci_get_num_boost_states and remove redundant setting of *support Signed-off-by: Sherry Hurwitz Reviewed-by: Thomas Renninger Signed-off-by: Rafael J. Wysocki --- tools/power/cpupower/utils/helpers/misc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c index 1609243f5c64..6952a6abd1e5 100644 --- a/tools/power/cpupower/utils/helpers/misc.c +++ b/tools/power/cpupower/utils/helpers/misc.c @@ -16,10 +16,9 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active, if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_CBP) { *support = 1; - amd_pci_get_num_boost_states(active, states); - if (ret <= 0) + ret = amd_pci_get_num_boost_states(active, states); + if (ret) return ret; - *support = 1; } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA) *support = *active = 1; return 0; From 902bef73faa99b8c024e0f18c6199872b7cccb52 Mon Sep 17 00:00:00 2001 From: Sherry Hurwitz Date: Tue, 20 Jun 2017 02:08:42 -0500 Subject: [PATCH 38/69] cpupower: Add support for new AMD family 0x17 Add support for new AMD family 0x17 - Add bit field changes to the msr_pstate structure - Add the new formula for the calculation of cof - Changed method to access to CpbDis Signed-off-by: Sherry Hurwitz Acked-by: Thomas Renninger Signed-off-by: Rafael J. Wysocki --- tools/power/cpupower/utils/helpers/amd.c | 31 +++++++++++++++----- tools/power/cpupower/utils/helpers/helpers.h | 2 ++ tools/power/cpupower/utils/helpers/misc.c | 22 ++++++++++++-- 3 files changed, 44 insertions(+), 11 deletions(-) diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c index 6437ef39aeea..5fd5c5b8c7b8 100644 --- a/tools/power/cpupower/utils/helpers/amd.c +++ b/tools/power/cpupower/utils/helpers/amd.c @@ -26,6 +26,15 @@ union msr_pstate { unsigned res3:21; unsigned en:1; } bits; + struct { + unsigned fid:8; + unsigned did:6; + unsigned vid:8; + unsigned iddval:8; + unsigned idddiv:2; + unsigned res1:30; + unsigned en:1; + } fam17h_bits; unsigned long long val; }; @@ -35,6 +44,8 @@ static int get_did(int family, union msr_pstate pstate) if (family == 0x12) t = pstate.val & 0xf; + else if (family == 0x17) + t = pstate.fam17h_bits.did; else t = pstate.bits.did; @@ -44,16 +55,20 @@ static int get_did(int family, union msr_pstate pstate) static int get_cof(int family, union msr_pstate pstate) { int t; - int fid, did; + int fid, did, cof; did = get_did(family, pstate); - - t = 0x10; - fid = pstate.bits.fid; - if (family == 0x11) - t = 0x8; - - return (100 * (fid + t)) >> did; + if (family == 0x17) { + fid = pstate.fam17h_bits.fid; + cof = 200 * fid / did; + } else { + t = 0x10; + fid = pstate.bits.fid; + if (family == 0x11) + t = 0x8; + cof = (100 * (fid + t)) >> did; + } + return cof; } /* Needs: diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h index afb66f80554e..799a18be60aa 100644 --- a/tools/power/cpupower/utils/helpers/helpers.h +++ b/tools/power/cpupower/utils/helpers/helpers.h @@ -70,6 +70,8 @@ enum cpupower_cpu_vendor {X86_VENDOR_UNKNOWN = 0, X86_VENDOR_INTEL, #define CPUPOWER_CAP_IS_SNB 0x00000020 #define CPUPOWER_CAP_INTEL_IDA 0x00000040 +#define CPUPOWER_AMD_CPBDIS 0x02000000 + #define MAX_HW_PSTATES 10 struct cpupower_cpu_info { diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c index 6952a6abd1e5..601d719d4e60 100644 --- a/tools/power/cpupower/utils/helpers/misc.c +++ b/tools/power/cpupower/utils/helpers/misc.c @@ -2,11 +2,14 @@ #include "helpers/helpers.h" +#define MSR_AMD_HWCR 0xc0010015 + int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active, int *states) { struct cpupower_cpu_info cpu_info; int ret; + unsigned long long val; *support = *active = *states = 0; @@ -16,9 +19,22 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active, if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_CBP) { *support = 1; - ret = amd_pci_get_num_boost_states(active, states); - if (ret) - return ret; + + /* AMD Family 0x17 does not utilize PCI D18F4 like prior + * families and has no fixed discrete boost states but + * has Hardware determined variable increments instead. + */ + + if (cpu_info.family == 0x17) { + if (!read_msr(cpu, MSR_AMD_HWCR, &val)) { + if (!(val & CPUPOWER_AMD_CPBDIS)) + *active = 1; + } + } else { + ret = amd_pci_get_num_boost_states(active, states); + if (ret) + return ret; + } } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA) *support = *active = 1; return 0; From f8475cef90082bf0902ddab106112de130d90395 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 23 Jun 2017 22:11:52 -0700 Subject: [PATCH 39/69] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown Reviewed-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki --- arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/aperfmperf.c | 79 ++++++++++++++++++++++++++++++++ drivers/cpufreq/cpufreq.c | 12 ++++- include/linux/cpufreq.h | 2 + 4 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kernel/cpu/aperfmperf.c diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 52000010c62e..cdf82492b770 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -21,6 +21,7 @@ obj-y += common.o obj-y += rdrand.o obj-y += match.o obj-y += bugs.o +obj-$(CONFIG_CPU_FREQ) += aperfmperf.o obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c new file mode 100644 index 000000000000..d869c8671e36 --- /dev/null +++ b/arch/x86/kernel/cpu/aperfmperf.c @@ -0,0 +1,79 @@ +/* + * x86 APERF/MPERF KHz calculation for + * /sys/.../cpufreq/scaling_cur_freq + * + * Copyright (C) 2017 Intel Corp. + * Author: Len Brown + * + * This file is licensed under GPLv2. + */ + +#include +#include +#include +#include + +struct aperfmperf_sample { + unsigned int khz; + unsigned long jiffies; + u64 aperf; + u64 mperf; +}; + +static DEFINE_PER_CPU(struct aperfmperf_sample, samples); + +/* + * aperfmperf_snapshot_khz() + * On the current CPU, snapshot APERF, MPERF, and jiffies + * unless we already did it within 10ms + * calculate kHz, save snapshot + */ +static void aperfmperf_snapshot_khz(void *dummy) +{ + u64 aperf, aperf_delta; + u64 mperf, mperf_delta; + struct aperfmperf_sample *s = this_cpu_ptr(&samples); + + /* Don't bother re-computing within 10 ms */ + if (time_before(jiffies, s->jiffies + HZ/100)) + return; + + rdmsrl(MSR_IA32_APERF, aperf); + rdmsrl(MSR_IA32_MPERF, mperf); + + aperf_delta = aperf - s->aperf; + mperf_delta = mperf - s->mperf; + + /* + * There is no architectural guarantee that MPERF + * increments faster than we can read it. + */ + if (mperf_delta == 0) + return; + + /* + * if (cpu_khz * aperf_delta) fits into ULLONG_MAX, then + * khz = (cpu_khz * aperf_delta) / mperf_delta + */ + if (div64_u64(ULLONG_MAX, cpu_khz) > aperf_delta) + s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta); + else /* khz = aperf_delta / (mperf_delta / cpu_khz) */ + s->khz = div64_u64(aperf_delta, + div64_u64(mperf_delta, cpu_khz)); + s->jiffies = jiffies; + s->aperf = aperf; + s->mperf = mperf; +} + +unsigned int arch_freq_get_on_cpu(int cpu) +{ + if (!cpu_khz) + return 0; + + if (!static_cpu_has(X86_FEATURE_APERFMPERF)) + return 0; + + smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1); + + return per_cpu(samples.khz, cpu); +} diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 26b643d57847..6e7424d12de9 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -632,11 +632,21 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency); show_one(scaling_min_freq, min); show_one(scaling_max_freq, max); +__weak unsigned int arch_freq_get_on_cpu(int cpu) +{ + return 0; +} + static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf) { ssize_t ret; + unsigned int freq; - if (cpufreq_driver && cpufreq_driver->setpolicy && cpufreq_driver->get) + freq = arch_freq_get_on_cpu(policy->cpu); + if (freq) + ret = sprintf(buf, "%u\n", freq); + else if (cpufreq_driver && cpufreq_driver->setpolicy && + cpufreq_driver->get) ret = sprintf(buf, "%u\n", cpufreq_driver->get(policy->cpu)); else ret = sprintf(buf, "%u\n", policy->cur); diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index a5ce0bbeadb5..905117bd5012 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -883,6 +883,8 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy) } #endif +extern unsigned int arch_freq_get_on_cpu(int cpu); + /* the following are really really optional */ extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs; extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs; From 62611cb912f7cb08ef7815780bcc20a09abedd20 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 23 Jun 2017 22:11:53 -0700 Subject: [PATCH 40/69] intel_pstate: delete scheduler hook in HWP mode The cpufreq/scaling_cur_freq sysfs attribute is now provided by shared x86 cpufreq code on modern x86 systems, including all systems supported by the intel_pstate driver. In HWP mode, maintaining that value was the sole purpose of the scheduler hook, intel_pstate_update_util_hwp(), so it can now be removed. Signed-off-by: Len Brown Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index be0290a1a5e2..b00a45595c89 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1736,16 +1736,6 @@ static void intel_pstate_adjust_pstate(struct cpudata *cpu, int target_pstate) fp_toint(cpu->iowait_boost * 100)); } -static void intel_pstate_update_util_hwp(struct update_util_data *data, - u64 time, unsigned int flags) -{ - struct cpudata *cpu = container_of(data, struct cpudata, update_util); - u64 delta_ns = time - cpu->sample.time; - - if ((s64)delta_ns >= INTEL_PSTATE_HWP_SAMPLING_INTERVAL) - intel_pstate_sample(cpu, time); -} - static void intel_pstate_update_util_pid(struct update_util_data *data, u64 time, unsigned int flags) { @@ -1937,6 +1927,9 @@ static void intel_pstate_set_update_util_hook(unsigned int cpu_num) { struct cpudata *cpu = all_cpu_data[cpu_num]; + if (hwp_active) + return; + if (cpu->update_util_set) return; @@ -2570,7 +2563,6 @@ static int __init intel_pstate_init(void) } else { hwp_active++; intel_pstate.attr = hwp_cpufreq_attrs; - pstate_funcs.update_util = intel_pstate_update_util_hwp; goto hwp_cpu_matched; } } else { From 82b4e03e01bc8def546b52707962809f04ae5c7a Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 23 Jun 2017 22:11:54 -0700 Subject: [PATCH 41/69] intel_pstate: skip scheduler hook when in "performance" mode When the governor is set to "performance", intel_pstate does not need the scheduler hook for doing any calculations. Under these conditions, its only purpose is to continue to maintain cpufreq/scaling_cur_freq. The cpufreq/scaling_cur_freq sysfs attribute is now provided by shared x86 cpufreq code on modern x86 systems, including all systems supported by the intel_pstate driver. So in "performance" governor mode, the scheduler hook can be skipped. This applies to both in Software and Hardware P-state control modes. Suggested-by: Srinivas Pandruvada Signed-off-by: Len Brown Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index b00a45595c89..1772d309bea9 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -2044,10 +2044,10 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) */ intel_pstate_clear_update_util_hook(policy->cpu); intel_pstate_max_within_limits(cpu); + } else { + intel_pstate_set_update_util_hook(policy->cpu); } - intel_pstate_set_update_util_hook(policy->cpu); - if (hwp_active) intel_pstate_hwp_set(policy->cpu); From ea0212f40c6bc0594c8eff79266759e3ecd4bacc Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 25 Jun 2017 19:31:13 +0200 Subject: [PATCH 42/69] PM / wakeirq: Convert to SRCU The wakeirq infrastructure uses RCU to protect the list of wakeirqs. That breaks the irq bus locking infrastructure, which is allows sleeping functions to be called so interrupt controllers behind slow busses, e.g. i2c, can be handled. The wakeirq functions hold rcu_read_lock and call into irq functions, which in case of interrupts using the irq bus locking will trigger a might_sleep() splat. Convert the wakeirq infrastructure to Sleepable RCU and unbreak it. Fixes: 4990d4fe327b (PM / Wakeirq: Add automated device wake IRQ handling) Reported-by: Brian Norris Suggested-by: Paul E. McKenney Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney Tested-by: Tony Lindgren Tested-by: Brian Norris Cc: 4.2+ # 4.2+ Signed-off-by: Rafael J. Wysocki --- drivers/base/power/wakeup.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c313b600d356..994bbf8b1476 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -60,6 +60,8 @@ static LIST_HEAD(wakeup_sources); static DECLARE_WAIT_QUEUE_HEAD(wakeup_count_wait_queue); +DEFINE_STATIC_SRCU(wakeup_srcu); + static struct wakeup_source deleted_ws = { .name = "deleted", .lock = __SPIN_LOCK_UNLOCKED(deleted_ws.lock), @@ -198,7 +200,7 @@ void wakeup_source_remove(struct wakeup_source *ws) spin_lock_irqsave(&events_lock, flags); list_del_rcu(&ws->entry); spin_unlock_irqrestore(&events_lock, flags); - synchronize_rcu(); + synchronize_srcu(&wakeup_srcu); } EXPORT_SYMBOL_GPL(wakeup_source_remove); @@ -332,12 +334,12 @@ void device_wakeup_detach_irq(struct device *dev) void device_wakeup_arm_wake_irqs(void) { struct wakeup_source *ws; + int srcuidx; - rcu_read_lock(); + srcuidx = srcu_read_lock(&wakeup_srcu); list_for_each_entry_rcu(ws, &wakeup_sources, entry) dev_pm_arm_wake_irq(ws->wakeirq); - - rcu_read_unlock(); + srcu_read_unlock(&wakeup_srcu, srcuidx); } /** @@ -348,12 +350,12 @@ void device_wakeup_arm_wake_irqs(void) void device_wakeup_disarm_wake_irqs(void) { struct wakeup_source *ws; + int srcuidx; - rcu_read_lock(); + srcuidx = srcu_read_lock(&wakeup_srcu); list_for_each_entry_rcu(ws, &wakeup_sources, entry) dev_pm_disarm_wake_irq(ws->wakeirq); - - rcu_read_unlock(); + srcu_read_unlock(&wakeup_srcu, srcuidx); } /** @@ -804,10 +806,10 @@ EXPORT_SYMBOL_GPL(pm_wakeup_dev_event); void pm_print_active_wakeup_sources(void) { struct wakeup_source *ws; - int active = 0; + int srcuidx, active = 0; struct wakeup_source *last_activity_ws = NULL; - rcu_read_lock(); + srcuidx = srcu_read_lock(&wakeup_srcu); list_for_each_entry_rcu(ws, &wakeup_sources, entry) { if (ws->active) { pr_debug("active wakeup source: %s\n", ws->name); @@ -823,7 +825,7 @@ void pm_print_active_wakeup_sources(void) if (!active && last_activity_ws) pr_debug("last active wakeup source: %s\n", last_activity_ws->name); - rcu_read_unlock(); + srcu_read_unlock(&wakeup_srcu, srcuidx); } EXPORT_SYMBOL_GPL(pm_print_active_wakeup_sources); @@ -950,8 +952,9 @@ void pm_wakep_autosleep_enabled(bool set) { struct wakeup_source *ws; ktime_t now = ktime_get(); + int srcuidx; - rcu_read_lock(); + srcuidx = srcu_read_lock(&wakeup_srcu); list_for_each_entry_rcu(ws, &wakeup_sources, entry) { spin_lock_irq(&ws->lock); if (ws->autosleep_enabled != set) { @@ -965,7 +968,7 @@ void pm_wakep_autosleep_enabled(bool set) } spin_unlock_irq(&ws->lock); } - rcu_read_unlock(); + srcu_read_unlock(&wakeup_srcu, srcuidx); } #endif /* CONFIG_PM_AUTOSLEEP */ @@ -1026,15 +1029,16 @@ static int print_wakeup_source_stats(struct seq_file *m, static int wakeup_sources_stats_show(struct seq_file *m, void *unused) { struct wakeup_source *ws; + int srcuidx; seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t" "expire_count\tactive_since\ttotal_time\tmax_time\t" "last_change\tprevent_suspend_time\n"); - rcu_read_lock(); + srcuidx = srcu_read_lock(&wakeup_srcu); list_for_each_entry_rcu(ws, &wakeup_sources, entry) print_wakeup_source_stats(m, ws); - rcu_read_unlock(); + srcu_read_unlock(&wakeup_srcu, srcuidx); print_wakeup_source_stats(m, &deleted_ws); From 5209654a46ee71137ad9b06da99d4ef2794475af Mon Sep 17 00:00:00 2001 From: Yazen Ghannam Date: Wed, 7 Jun 2017 10:19:46 -0500 Subject: [PATCH 43/69] x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems AMD systems support the Monitor/Mwait instructions and these can be used for ACPI C1 in the same way as on Intel systems. Three things are needed: 1) This patch. 2) BIOS that declares a C1 state in _CST to use FFH, with correct values. 3) CPUID_Fn00000005_EDX is non-zero on the system. The BIOS on AMD systems have historically not defined a C1 state in _CST, so the acpi_idle driver uses HALT for ACPI C1. Currently released systems have CPUID_Fn00000005_EDX as reserved/RAZ. If a BIOS is released for these systems that requests a C1 state with FFH, the FFH implementation in Linux will fail since CPUID_Fn00000005_EDX is 0. The acpi_idle driver will then fallback to using HALT for ACPI C1. Future systems are expected to have non-zero CPUID_Fn00000005_EDX and BIOS support for using FFH for ACPI C1. Allow ffh_cstate_init() to succeed on AMD systems. Tested on Fam15h and Fam17h systems. Signed-off-by: Yazen Ghannam Acked-by: Borislav Petkov Signed-off-by: Rafael J. Wysocki --- arch/x86/kernel/acpi/cstate.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c index 8233a630280f..dde437f5d14f 100644 --- a/arch/x86/kernel/acpi/cstate.c +++ b/arch/x86/kernel/acpi/cstate.c @@ -167,7 +167,8 @@ static int __init ffh_cstate_init(void) { struct cpuinfo_x86 *c = &boot_cpu_data; - if (c->x86_vendor != X86_VENDOR_INTEL) + if (c->x86_vendor != X86_VENDOR_INTEL && + c->x86_vendor != X86_VENDOR_AMD) return -1; cpu_cstate_entry = alloc_percpu(struct cstate_entry); From 49368a47f6dc1b256e3b83813da5c9b0731fe268 Mon Sep 17 00:00:00 2001 From: Balbir Singh Date: Sat, 3 Jun 2017 20:52:32 +1000 Subject: [PATCH 44/69] PM / hibernate: Use CONFIG_HAVE_SET_MEMORY for include condition Kbuild reported a build failure when CONFIG_STRICT_KERNEL_RWX was enabled on powerpc. We don't yet have ARCH_HAS_SET_MEMORY and ppc32 saw a build failure. I've only done a basic compile test with a config that has hibernation enabled. Fixes: 50327ddfbc92 (kernel/power/snapshot.c: use set_memory.h header) Reported-by: Christophe Leroy Signed-off-by: Balbir Singh Acked-by: Pavel Machek Signed-off-by: Rafael J. Wysocki --- kernel/power/snapshot.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index fa46606f3356..71730d672290 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -36,13 +36,13 @@ #include #include #include -#ifdef CONFIG_STRICT_KERNEL_RWX +#ifdef CONFIG_ARCH_HAS_SET_MEMORY #include #endif #include "power.h" -#ifdef CONFIG_STRICT_KERNEL_RWX +#if defined(CONFIG_STRICT_KERNEL_RWX) && defined(CONFIG_ARCH_HAS_SET_MEMORY) static bool hibernate_restore_protection; static bool hibernate_restore_protection_active; @@ -77,7 +77,7 @@ static inline void hibernate_restore_protection_begin(void) {} static inline void hibernate_restore_protection_end(void) {} static inline void hibernate_restore_protect_page(void *page_address) {} static inline void hibernate_restore_unprotect_page(void *page_address) {} -#endif /* CONFIG_STRICT_KERNEL_RWX */ +#endif /* CONFIG_STRICT_KERNEL_RWX && CONFIG_ARCH_HAS_SET_MEMORY */ static int swsusp_page_is_free(struct page *); static void swsusp_set_page_forbidden(struct page *); From eba74c294467d55c697e2199c37dfaf8126fe396 Mon Sep 17 00:00:00 2001 From: BaoJun Luo Date: Tue, 27 Jun 2017 02:10:44 +0200 Subject: [PATCH 45/69] PM / hibernate: Drop redundant parameter of swsusp_alloc() The first parameter of swsusp_alloc is not used, so drop it. Signed-off-by: BaoJun Luo [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki --- kernel/power/snapshot.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 71730d672290..b7708e319941 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1929,8 +1929,7 @@ static inline unsigned int alloc_highmem_pages(struct memory_bitmap *bm, * also be located in the high memory, because of the way in which * copy_data_pages() works. */ -static int swsusp_alloc(struct memory_bitmap *orig_bm, - struct memory_bitmap *copy_bm, +static int swsusp_alloc(struct memory_bitmap *copy_bm, unsigned int nr_pages, unsigned int nr_highmem) { if (nr_highmem > 0) { @@ -1976,7 +1975,7 @@ asmlinkage __visible int swsusp_save(void) return -ENOMEM; } - if (swsusp_alloc(&orig_bm, ©_bm, nr_pages, nr_highmem)) { + if (swsusp_alloc(©_bm, nr_pages, nr_highmem)) { printk(KERN_ERR "PM: Memory allocation failed\n"); return -ENOMEM; } From 73808d0fd26b3d3f0f44cc7c469ad1d3c1b570b8 Mon Sep 17 00:00:00 2001 From: "Prakash, Prashanth" Date: Thu, 11 May 2017 16:39:44 -0600 Subject: [PATCH 46/69] cpufreq / CPPC: Initialize policy->min to lowest nonlinear performance Description of Lowest Perfomance in ACPI 6.1 specification states: "Lowest Performance is the absolute lowest performance level of the platform. Selecting a performance level lower than the lowest nonlinear performance level may actually cause an efficiency penalty, but should reduce the instantaneous power consumption of the processor. In traditional terms, this represents the T-state range of performance levels." Set the default value of policy->min to Lowest Nonlinear Performance to avoid any potential efficiency penalty. Signed-off-by: Prashanth Prakash Acked-by: Viresh Kumar Acked-by: Alexey Klimov Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cppc_cpufreq.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index e82bb3c30b92..10be285c9055 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -144,10 +144,23 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) cppc_dmi_max_khz = cppc_get_dmi_max_khz(); - policy->min = cpu->perf_caps.lowest_perf * cppc_dmi_max_khz / cpu->perf_caps.highest_perf; + /* + * Set min to lowest nonlinear perf to avoid any efficiency penalty (see + * Section 8.4.7.1.1.5 of ACPI 6.1 spec) + */ + policy->min = cpu->perf_caps.lowest_nonlinear_perf * cppc_dmi_max_khz / + cpu->perf_caps.highest_perf; policy->max = cppc_dmi_max_khz; - policy->cpuinfo.min_freq = policy->min; - policy->cpuinfo.max_freq = policy->max; + + /* + * Set cpuinfo.min_freq to Lowest to make the full range of performance + * available if userspace wants to use any perf between lowest & lowest + * nonlinear perf + */ + policy->cpuinfo.min_freq = cpu->perf_caps.lowest_perf * cppc_dmi_max_khz / + cpu->perf_caps.highest_perf; + policy->cpuinfo.max_freq = cppc_dmi_max_khz; + policy->cpuinfo.transition_latency = cppc_get_transition_latency(cpu_num); policy->shared_type = cpu->shared_type; From d8600c8b0cd11d2249e14bf8b2eccbf4fa0db770 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Mon, 12 Jun 2017 17:17:41 +0200 Subject: [PATCH 47/69] PM / Domains: Constify genpd pointer Mark pointer to struct generic_pm_domain const (either passed in argument or used localy in a function), whenever it is not modifed by the function itself. Signed-off-by: Krzysztof Kozlowski Acked-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index bf3945a58cce..0f7b1bd3680e 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -126,7 +126,7 @@ static const struct genpd_lock_ops genpd_spin_ops = { #define genpd_is_always_on(genpd) (genpd->flags & GENPD_FLAG_ALWAYS_ON) static inline bool irq_safe_dev_in_no_sleep_domain(struct device *dev, - struct generic_pm_domain *genpd) + const struct generic_pm_domain *genpd) { bool ret; @@ -181,12 +181,14 @@ static struct generic_pm_domain *dev_to_genpd(struct device *dev) return pd_to_genpd(dev->pm_domain); } -static int genpd_stop_dev(struct generic_pm_domain *genpd, struct device *dev) +static int genpd_stop_dev(const struct generic_pm_domain *genpd, + struct device *dev) { return GENPD_DEV_CALLBACK(genpd, int, stop, dev); } -static int genpd_start_dev(struct generic_pm_domain *genpd, struct device *dev) +static int genpd_start_dev(const struct generic_pm_domain *genpd, + struct device *dev) { return GENPD_DEV_CALLBACK(genpd, int, start, dev); } @@ -738,7 +740,7 @@ static bool pm_genpd_present(const struct generic_pm_domain *genpd) #ifdef CONFIG_PM_SLEEP -static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd, +static bool genpd_dev_active_wakeup(const struct generic_pm_domain *genpd, struct device *dev) { return GENPD_DEV_CALLBACK(genpd, bool, active_wakeup, dev); @@ -840,7 +842,8 @@ static void genpd_sync_power_on(struct generic_pm_domain *genpd, bool use_lock, * signal remote wakeup from the system's working state as needed by runtime PM. * Return 'true' in either of the above cases. */ -static bool resume_needed(struct device *dev, struct generic_pm_domain *genpd) +static bool resume_needed(struct device *dev, + const struct generic_pm_domain *genpd) { bool active_wakeup; @@ -975,7 +978,7 @@ static int pm_genpd_resume_noirq(struct device *dev) */ static int pm_genpd_freeze_noirq(struct device *dev) { - struct generic_pm_domain *genpd; + const struct generic_pm_domain *genpd; int ret = 0; dev_dbg(dev, "%s()\n", __func__); @@ -999,7 +1002,7 @@ static int pm_genpd_freeze_noirq(struct device *dev) */ static int pm_genpd_thaw_noirq(struct device *dev) { - struct generic_pm_domain *genpd; + const struct generic_pm_domain *genpd; int ret = 0; dev_dbg(dev, "%s()\n", __func__); From 952856db9019f914227f5fbe24f7810e20fec0fd Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Mon, 12 Jun 2017 17:19:31 +0200 Subject: [PATCH 48/69] PM: Constify returned PM event name The pm_verb() returns a pointer to string from .rodata so it should be marked as const. Signed-off-by: Krzysztof Kozlowski Signed-off-by: Rafael J. Wysocki --- drivers/base/power/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 9faee1c893e5..924a5ba7aa27 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -62,7 +62,7 @@ static pm_message_t pm_transition; static int async_error; -static char *pm_verb(int event) +static const char *pm_verb(int event) { switch (event) { case PM_EVENT_SUSPEND: From e3771fa98e266aba29211c624ecf01200be497ad Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Mon, 12 Jun 2017 17:19:32 +0200 Subject: [PATCH 49/69] PM: Constify info string used in messages The 'info' string appearing in many places points to a .rodata string so it should be passes as pointer to const. Signed-off-by: Krzysztof Kozlowski Signed-off-by: Rafael J. Wysocki --- drivers/base/power/main.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 924a5ba7aa27..6add28799f6d 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -208,7 +208,8 @@ static ktime_t initcall_debug_start(struct device *dev) } static void initcall_debug_report(struct device *dev, ktime_t calltime, - int error, pm_message_t state, char *info) + int error, pm_message_t state, + const char *info) { ktime_t rettime; s64 nsecs; @@ -403,21 +404,22 @@ static pm_callback_t pm_noirq_op(const struct dev_pm_ops *ops, pm_message_t stat return NULL; } -static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info) +static void pm_dev_dbg(struct device *dev, pm_message_t state, const char *info) { dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event), ((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ? ", may wakeup" : ""); } -static void pm_dev_err(struct device *dev, pm_message_t state, char *info, +static void pm_dev_err(struct device *dev, pm_message_t state, const char *info, int error) { printk(KERN_ERR "PM: Device %s failed to %s%s: error %d\n", dev_name(dev), pm_verb(state.event), info, error); } -static void dpm_show_time(ktime_t starttime, pm_message_t state, char *info) +static void dpm_show_time(ktime_t starttime, pm_message_t state, + const char *info) { ktime_t calltime; u64 usecs64; @@ -435,7 +437,7 @@ static void dpm_show_time(ktime_t starttime, pm_message_t state, char *info) } static int dpm_run_callback(pm_callback_t cb, struct device *dev, - pm_message_t state, char *info) + pm_message_t state, const char *info) { ktime_t calltime; int error; @@ -535,7 +537,7 @@ static void dpm_watchdog_clear(struct dpm_watchdog *wd) static int device_resume_noirq(struct device *dev, pm_message_t state, bool async) { pm_callback_t callback = NULL; - char *info = NULL; + const char *info = NULL; int error = 0; TRACE_DEVICE(dev); @@ -665,7 +667,7 @@ void dpm_resume_noirq(pm_message_t state) static int device_resume_early(struct device *dev, pm_message_t state, bool async) { pm_callback_t callback = NULL; - char *info = NULL; + const char *info = NULL; int error = 0; TRACE_DEVICE(dev); @@ -793,7 +795,7 @@ EXPORT_SYMBOL_GPL(dpm_resume_start); static int device_resume(struct device *dev, pm_message_t state, bool async) { pm_callback_t callback = NULL; - char *info = NULL; + const char *info = NULL; int error = 0; DECLARE_DPM_WATCHDOG_ON_STACK(wd); @@ -955,7 +957,7 @@ void dpm_resume(pm_message_t state) static void device_complete(struct device *dev, pm_message_t state) { void (*callback)(struct device *) = NULL; - char *info = NULL; + const char *info = NULL; if (dev->power.syscore) return; @@ -1080,7 +1082,7 @@ static pm_message_t resume_event(pm_message_t sleep_state) static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool async) { pm_callback_t callback = NULL; - char *info = NULL; + const char *info = NULL; int error = 0; TRACE_DEVICE(dev); @@ -1225,7 +1227,7 @@ int dpm_suspend_noirq(pm_message_t state) static int __device_suspend_late(struct device *dev, pm_message_t state, bool async) { pm_callback_t callback = NULL; - char *info = NULL; + const char *info = NULL; int error = 0; TRACE_DEVICE(dev); @@ -1384,7 +1386,7 @@ EXPORT_SYMBOL_GPL(dpm_suspend_end); */ static int legacy_suspend(struct device *dev, pm_message_t state, int (*cb)(struct device *dev, pm_message_t state), - char *info) + const char *info) { int error; ktime_t calltime; @@ -1426,7 +1428,7 @@ static void dpm_clear_suppliers_direct_complete(struct device *dev) static int __device_suspend(struct device *dev, pm_message_t state, bool async) { pm_callback_t callback = NULL; - char *info = NULL; + const char *info = NULL; int error = 0; DECLARE_DPM_WATCHDOG_ON_STACK(wd); From cbcba35dba56f046bbd3437bd275ad3a7156cbc9 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Mon, 12 Jun 2017 17:19:33 +0200 Subject: [PATCH 50/69] PM / sysfs: Constify attribute groups Local instances of struct attribute_group are not modified so they can be made const to increase code safeness. Signed-off-by: Krzysztof Kozlowski Signed-off-by: Rafael J. Wysocki --- drivers/base/power/sysfs.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c index 33b4b902741a..185a52581cfa 100644 --- a/drivers/base/power/sysfs.c +++ b/drivers/base/power/sysfs.c @@ -607,7 +607,7 @@ static struct attribute *power_attrs[] = { #endif /* CONFIG_PM_ADVANCED_DEBUG */ NULL, }; -static struct attribute_group pm_attr_group = { +static const struct attribute_group pm_attr_group = { .name = power_group_name, .attrs = power_attrs, }; @@ -629,7 +629,7 @@ static struct attribute *wakeup_attrs[] = { #endif NULL, }; -static struct attribute_group pm_wakeup_attr_group = { +static const struct attribute_group pm_wakeup_attr_group = { .name = power_group_name, .attrs = wakeup_attrs, }; @@ -644,7 +644,7 @@ static struct attribute *runtime_attrs[] = { &dev_attr_autosuspend_delay_ms.attr, NULL, }; -static struct attribute_group pm_runtime_attr_group = { +static const struct attribute_group pm_runtime_attr_group = { .name = power_group_name, .attrs = runtime_attrs, }; @@ -653,7 +653,7 @@ static struct attribute *pm_qos_resume_latency_attrs[] = { &dev_attr_pm_qos_resume_latency_us.attr, NULL, }; -static struct attribute_group pm_qos_resume_latency_attr_group = { +static const struct attribute_group pm_qos_resume_latency_attr_group = { .name = power_group_name, .attrs = pm_qos_resume_latency_attrs, }; @@ -662,7 +662,7 @@ static struct attribute *pm_qos_latency_tolerance_attrs[] = { &dev_attr_pm_qos_latency_tolerance_us.attr, NULL, }; -static struct attribute_group pm_qos_latency_tolerance_attr_group = { +static const struct attribute_group pm_qos_latency_tolerance_attr_group = { .name = power_group_name, .attrs = pm_qos_latency_tolerance_attrs, }; @@ -672,7 +672,7 @@ static struct attribute *pm_qos_flags_attrs[] = { &dev_attr_pm_qos_remote_wakeup.attr, NULL, }; -static struct attribute_group pm_qos_flags_attr_group = { +static const struct attribute_group pm_qos_flags_attr_group = { .name = power_group_name, .attrs = pm_qos_flags_attrs, }; From edbdabc62328ec0ac98d83ca384bf9fd5251ade6 Mon Sep 17 00:00:00 2001 From: Adam Lessnau Date: Thu, 1 Jun 2017 11:21:50 +0200 Subject: [PATCH 51/69] powercap/RAPL: prevent overridding bits outside of the mask Fixes wrong bits shift operation in the rapl_write_data_raw function, which might cause overridding bits outside of the mask. For example, writing new TIME_WINDOW1 value can override POWER_LIMIT1. Signed-off-by: Adam Lessnau Signed-off-by: Rafael J. Wysocki --- drivers/powercap/intel_rapl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c index 9ddad0815ba9..d1694f1def72 100644 --- a/drivers/powercap/intel_rapl.c +++ b/drivers/powercap/intel_rapl.c @@ -874,7 +874,9 @@ static int rapl_write_data_raw(struct rapl_domain *rd, cpu = rd->rp->lead_cpu; bits = rapl_unit_xlate(rd, rp->unit, value, 1); - bits |= bits << rp->shift; + bits <<= rp->shift; + bits &= rp->mask; + memset(&ma, 0, sizeof(ma)); ma.msr_no = rd->msrs[rp->id]; From 1a99d0c7962364d5fba5e0cfe5ced586e133f31a Mon Sep 17 00:00:00 2001 From: David Wu Date: Fri, 9 Jun 2017 17:36:14 +0800 Subject: [PATCH 52/69] PM / AVS: rockchip-io: add io selectors and supplies for rk3228 This adds the necessary data for handling io voltage domains on the rk3228. Signed-off-by: David Wu Reviewed-by: Heiko Stuebner Signed-off-by: Rafael J. Wysocki --- .../bindings/power/rockchip-io-domain.txt | 7 +++++++ drivers/power/avs/rockchip-io-domain.c | 14 ++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/Documentation/devicetree/bindings/power/rockchip-io-domain.txt b/Documentation/devicetree/bindings/power/rockchip-io-domain.txt index d3a5a93a65cd..43c21fb04564 100644 --- a/Documentation/devicetree/bindings/power/rockchip-io-domain.txt +++ b/Documentation/devicetree/bindings/power/rockchip-io-domain.txt @@ -32,6 +32,7 @@ SoC is on the same page. Required properties: - compatible: should be one of: - "rockchip,rk3188-io-voltage-domain" for rk3188 + - "rockchip,rk3228-io-voltage-domain" for rk3228 - "rockchip,rk3288-io-voltage-domain" for rk3288 - "rockchip,rk3328-io-voltage-domain" for rk3328 - "rockchip,rk3368-io-voltage-domain" for rk3368 @@ -59,6 +60,12 @@ Possible supplies for rk3188: - vccio1-supply: The supply connected to VCCIO1. Sometimes also labeled VCCIO1 and VCCIO2. +Possible supplies for rk3228: +- vccio1-supply: The supply connected to VCCIO1. +- vccio2-supply: The supply connected to VCCIO2. +- vccio3-supply: The supply connected to VCCIO3. +- vccio4-supply: The supply connected to VCCIO4. + Possible supplies for rk3288: - audio-supply: The supply connected to APIO4_VDD. - bb-supply: The supply connected to APIO5_VDD. diff --git a/drivers/power/avs/rockchip-io-domain.c b/drivers/power/avs/rockchip-io-domain.c index 85812521b6ba..031a34372191 100644 --- a/drivers/power/avs/rockchip-io-domain.c +++ b/drivers/power/avs/rockchip-io-domain.c @@ -253,6 +253,16 @@ static const struct rockchip_iodomain_soc_data soc_data_rk3188 = { }, }; +static const struct rockchip_iodomain_soc_data soc_data_rk3228 = { + .grf_offset = 0x418, + .supply_names = { + "vccio1", + "vccio2", + "vccio3", + "vccio4", + }, +}; + static const struct rockchip_iodomain_soc_data soc_data_rk3288 = { .grf_offset = 0x380, .supply_names = { @@ -344,6 +354,10 @@ static const struct of_device_id rockchip_iodomain_match[] = { .compatible = "rockchip,rk3188-io-voltage-domain", .data = (void *)&soc_data_rk3188 }, + { + .compatible = "rockchip,rk3228-io-voltage-domain", + .data = (void *)&soc_data_rk3228 + }, { .compatible = "rockchip,rk3288-io-voltage-domain", .data = (void *)&soc_data_rk3288 From dbb1d8b70c222544e9b54fdb3e22c15745be0155 Mon Sep 17 00:00:00 2001 From: Arvind Yadav Date: Thu, 22 Jun 2017 16:23:32 +0530 Subject: [PATCH 53/69] PM / QoS: constify *_attribute_group. File size before: text data bss dec hex filename 3890 1152 8 5050 13ba drivers/base/power/sysfs.o File size After adding 'const': text data bss dec hex filename 4250 800 8 5058 13c2 drivers/base/power/sysfs.o Signed-off-by: Arvind Yadav Acked-by: Pavel Machek Signed-off-by: Rafael J. Wysocki --- drivers/base/power/sysfs.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c index 33b4b902741a..185a52581cfa 100644 --- a/drivers/base/power/sysfs.c +++ b/drivers/base/power/sysfs.c @@ -607,7 +607,7 @@ static struct attribute *power_attrs[] = { #endif /* CONFIG_PM_ADVANCED_DEBUG */ NULL, }; -static struct attribute_group pm_attr_group = { +static const struct attribute_group pm_attr_group = { .name = power_group_name, .attrs = power_attrs, }; @@ -629,7 +629,7 @@ static struct attribute *wakeup_attrs[] = { #endif NULL, }; -static struct attribute_group pm_wakeup_attr_group = { +static const struct attribute_group pm_wakeup_attr_group = { .name = power_group_name, .attrs = wakeup_attrs, }; @@ -644,7 +644,7 @@ static struct attribute *runtime_attrs[] = { &dev_attr_autosuspend_delay_ms.attr, NULL, }; -static struct attribute_group pm_runtime_attr_group = { +static const struct attribute_group pm_runtime_attr_group = { .name = power_group_name, .attrs = runtime_attrs, }; @@ -653,7 +653,7 @@ static struct attribute *pm_qos_resume_latency_attrs[] = { &dev_attr_pm_qos_resume_latency_us.attr, NULL, }; -static struct attribute_group pm_qos_resume_latency_attr_group = { +static const struct attribute_group pm_qos_resume_latency_attr_group = { .name = power_group_name, .attrs = pm_qos_resume_latency_attrs, }; @@ -662,7 +662,7 @@ static struct attribute *pm_qos_latency_tolerance_attrs[] = { &dev_attr_pm_qos_latency_tolerance_us.attr, NULL, }; -static struct attribute_group pm_qos_latency_tolerance_attr_group = { +static const struct attribute_group pm_qos_latency_tolerance_attr_group = { .name = power_group_name, .attrs = pm_qos_latency_tolerance_attrs, }; @@ -672,7 +672,7 @@ static struct attribute *pm_qos_flags_attrs[] = { &dev_attr_pm_qos_remote_wakeup.attr, NULL, }; -static struct attribute_group pm_qos_flags_attr_group = { +static const struct attribute_group pm_qos_flags_attr_group = { .name = power_group_name, .attrs = pm_qos_flags_attrs, }; From a1a66393e39a97433bcc1737133ba7478993d247 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sat, 24 Jun 2017 01:53:14 +0200 Subject: [PATCH 54/69] ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags The run_wake flag in struct acpi_device_wakeup_flags stores the information on whether or not the device can generate wakeup signals at run time, but in ACPI that really is equivalent to being able to generate wakeup signals at all. In fact, run_wake will always be set after successful executeion of acpi_setup_gpe_for_wake(), but if that fails, the device will not be able to use a wakeup GPE at all, so it won't be able to wake up the systems from sleep states too. Hence, run_wake actually means that the device is capable of triggering wakeup and so it is equivalent to the valid flag. For this reason, drop run_wake from struct acpi_device_wakeup_flags and make sure that the valid flag is only set if acpi_setup_gpe_for_wake() has been successful. Signed-off-by: Rafael J. Wysocki Reviewed-by: Mika Westerberg Acked-by: Bjorn Helgaas --- drivers/acpi/pci_root.c | 2 +- drivers/acpi/proc.c | 4 ++-- drivers/acpi/scan.c | 23 ++++++++--------------- drivers/pci/pci-acpi.c | 3 +-- include/acpi/acpi_bus.h | 1 - 5 files changed, 12 insertions(+), 21 deletions(-) diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index 919be0aa2578..783acbe25520 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -608,7 +608,7 @@ static int acpi_pci_root_add(struct acpi_device *device, pcie_no_aspm(); pci_acpi_add_bus_pm_notifier(device); - if (device->wakeup.flags.run_wake) + if (device->wakeup.flags.valid) device_set_run_wake(root->bus->bridge, true); if (hotadd) { diff --git a/drivers/acpi/proc.c b/drivers/acpi/proc.c index a34669cc823b..85ac848ac6ab 100644 --- a/drivers/acpi/proc.c +++ b/drivers/acpi/proc.c @@ -42,7 +42,7 @@ acpi_system_wakeup_device_seq_show(struct seq_file *seq, void *offset) if (!dev->physical_node_count) { seq_printf(seq, "%c%-8s\n", - dev->wakeup.flags.run_wake ? '*' : ' ', + dev->wakeup.flags.valid ? '*' : ' ', device_may_wakeup(&dev->dev) ? "enabled" : "disabled"); } else { @@ -58,7 +58,7 @@ acpi_system_wakeup_device_seq_show(struct seq_file *seq, void *offset) seq_printf(seq, "\t\t"); seq_printf(seq, "%c%-8s %s:%s\n", - dev->wakeup.flags.run_wake ? '*' : ' ', + dev->wakeup.flags.valid ? '*' : ' ', (device_may_wakeup(&dev->dev) || device_may_wakeup(ldev)) ? "enabled" : "disabled", diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 3a10d7573477..49849667cd3f 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -835,7 +835,7 @@ static int acpi_bus_extract_wakeup_device_power_package(acpi_handle handle, return err; } -static void acpi_wakeup_gpe_init(struct acpi_device *device) +static bool acpi_wakeup_gpe_init(struct acpi_device *device) { static const struct acpi_device_id button_device_ids[] = { {"PNP0C0C", 0}, @@ -845,13 +845,11 @@ static void acpi_wakeup_gpe_init(struct acpi_device *device) }; struct acpi_device_wakeup *wakeup = &device->wakeup; acpi_status status; - acpi_event_status event_status; wakeup->flags.notifier_present = 0; /* Power button, Lid switch always enable wakeup */ if (!acpi_match_device_ids(device, button_device_ids)) { - wakeup->flags.run_wake = 1; if (!acpi_match_device_ids(device, &button_device_ids[1])) { /* Do not use Lid/sleep button for S5 wakeup */ if (wakeup->sleep_state == ACPI_STATE_S5) @@ -859,17 +857,12 @@ static void acpi_wakeup_gpe_init(struct acpi_device *device) } acpi_mark_gpe_for_wake(wakeup->gpe_device, wakeup->gpe_number); device_set_wakeup_capable(&device->dev, true); - return; + return true; } - acpi_setup_gpe_for_wake(device->handle, wakeup->gpe_device, - wakeup->gpe_number); - status = acpi_get_gpe_status(wakeup->gpe_device, wakeup->gpe_number, - &event_status); - if (ACPI_FAILURE(status)) - return; - - wakeup->flags.run_wake = !!(event_status & ACPI_EVENT_FLAG_HAS_HANDLER); + status = acpi_setup_gpe_for_wake(device->handle, wakeup->gpe_device, + wakeup->gpe_number); + return ACPI_SUCCESS(status); } static void acpi_bus_get_wakeup_device_flags(struct acpi_device *device) @@ -887,10 +880,10 @@ static void acpi_bus_get_wakeup_device_flags(struct acpi_device *device) return; } - device->wakeup.flags.valid = 1; + device->wakeup.flags.valid = acpi_wakeup_gpe_init(device); device->wakeup.prepare_count = 0; - acpi_wakeup_gpe_init(device); - /* Call _PSW/_DSW object to disable its ability to wake the sleeping + /* + * Call _PSW/_DSW object to disable its ability to wake the sleeping * system for the ACPI device with the _PRW object. * The _PSW object is depreciated in ACPI 3.0 and is replaced by _DSW. * So it is necessary to call _DSW object first. Only when it is not diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index 61296f20c37a..f084046e1feb 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -777,9 +777,8 @@ static void pci_acpi_setup(struct device *dev) return; device_set_wakeup_capable(dev, true); + device_set_run_wake(dev, true); acpi_pci_sleep_wake(pci_dev, false); - if (adev->wakeup.flags.run_wake) - device_set_run_wake(dev, true); } static void pci_acpi_cleanup(struct device *dev) diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 30f3dc845aa4..72e32bd1e333 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -314,7 +314,6 @@ struct acpi_device_perf { /* Wakeup Management */ struct acpi_device_wakeup_flags { u8 valid:1; /* Can successfully enable wakeup? */ - u8 run_wake:1; /* Run-Wake GPE devices */ u8 notifier_present:1; /* Wake-up notify handler has been installed */ u8 enabled:1; /* Enabled for wakeup */ }; From 4d183d04195318c8ee8bce048f3f9a89c0e2056d Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sat, 24 Jun 2017 01:54:39 +0200 Subject: [PATCH 55/69] ACPI / PM: Consolidate device wakeup settings code Currently, there are two separate ways of handling device wakeup settings in the ACPI core, depending on whether this is runtime wakeup or system wakeup (from sleep states). However, after the previous commit eliminating the run_wake ACPI device wakeup flag, there is no difference between the two any more at the ACPI level, so they can be combined. For this reason, introduce acpi_pm_set_device_wakeup() to replace both acpi_pm_device_run_wake() and acpi_pm_device_sleep_wake() and make it check the ACPI device object's wakeup.valid flag to determine whether or not the device can be set up to generate wakeup signals. Also notice that zpodd_enable/disable_run_wake() only call device_set_run_wake() because acpi_pm_device_run_wake() called device_run_wake(), which is not done by acpi_pm_set_device_wakeup(), so drop the now redundant device_set_run_wake() calls from there. Signed-off-by: Rafael J. Wysocki Reviewed-by: Mika Westerberg Acked-by: Bjorn Helgaas --- drivers/acpi/device_pm.c | 39 ++++++++------------------------------ drivers/ata/libata-zpodd.c | 9 +++------ drivers/pci/pci-acpi.c | 12 ++++++------ drivers/pnp/pnpacpi/core.c | 6 +++--- include/acpi/acpi_bus.h | 13 ++----------- 5 files changed, 22 insertions(+), 57 deletions(-) diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index ca0210213773..d2e985a4bac2 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -717,55 +717,32 @@ static int acpi_device_wakeup(struct acpi_device *adev, u32 target_state, } /** - * acpi_pm_device_run_wake - Enable/disable remote wakeup for given device. - * @dev: Device to enable/disable the platform to wake up. + * acpi_pm_set_device_wakeup - Enable/disable remote wakeup for given device. + * @dev: Device to enable/disable to generate wakeup events. * @enable: Whether to enable or disable the wakeup functionality. */ -int acpi_pm_device_run_wake(struct device *phys_dev, bool enable) -{ - struct acpi_device *adev; - - if (!device_run_wake(phys_dev)) - return -EINVAL; - - adev = ACPI_COMPANION(phys_dev); - if (!adev) { - dev_dbg(phys_dev, "ACPI companion missing in %s!\n", __func__); - return -ENODEV; - } - - return acpi_device_wakeup(adev, ACPI_STATE_S0, enable); -} -EXPORT_SYMBOL(acpi_pm_device_run_wake); - -#ifdef CONFIG_PM_SLEEP -/** - * acpi_pm_device_sleep_wake - Enable or disable device to wake up the system. - * @dev: Device to enable/desible to wake up the system from sleep states. - * @enable: Whether to enable or disable @dev to wake up the system. - */ -int acpi_pm_device_sleep_wake(struct device *dev, bool enable) +int acpi_pm_set_device_wakeup(struct device *dev, bool enable) { struct acpi_device *adev; int error; - if (!device_can_wakeup(dev)) - return -EINVAL; - adev = ACPI_COMPANION(dev); if (!adev) { dev_dbg(dev, "ACPI companion missing in %s!\n", __func__); return -ENODEV; } + if (!acpi_device_can_wakeup(adev)) + return -EINVAL; + error = acpi_device_wakeup(adev, acpi_target_system_state(), enable); if (!error) - dev_dbg(dev, "System wakeup %s by ACPI\n", + dev_dbg(dev, "Wakeup %s by ACPI\n", enable ? "enabled" : "disabled"); return error; } -#endif /* CONFIG_PM_SLEEP */ +EXPORT_SYMBOL(acpi_pm_set_device_wakeup); /** * acpi_dev_pm_low_power - Put ACPI device into a low-power state. diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c index f3a65a3140d3..8a01d09ac4db 100644 --- a/drivers/ata/libata-zpodd.c +++ b/drivers/ata/libata-zpodd.c @@ -174,8 +174,7 @@ void zpodd_enable_run_wake(struct ata_device *dev) sdev_disable_disk_events(dev->sdev); zpodd->powered_off = true; - device_set_run_wake(&dev->tdev, true); - acpi_pm_device_run_wake(&dev->tdev, true); + acpi_pm_set_device_wakeup(&dev->tdev, true); } /* Disable runtime wake capability if it is enabled */ @@ -183,10 +182,8 @@ void zpodd_disable_run_wake(struct ata_device *dev) { struct zpodd *zpodd = dev->zpodd; - if (zpodd->powered_off) { - acpi_pm_device_run_wake(&dev->tdev, false); - device_set_run_wake(&dev->tdev, false); - } + if (zpodd->powered_off) + acpi_pm_set_device_wakeup(&dev->tdev, false); } /* diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index f084046e1feb..eb014b8ab01a 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -578,20 +578,20 @@ static bool acpi_pci_can_wakeup(struct pci_dev *dev) static void acpi_pci_propagate_wakeup_enable(struct pci_bus *bus, bool enable) { while (bus->parent) { - if (!acpi_pm_device_sleep_wake(&bus->self->dev, enable)) + if (!acpi_pm_set_device_wakeup(&bus->self->dev, enable)) return; bus = bus->parent; } /* We have reached the root bus. */ if (bus->bridge) - acpi_pm_device_sleep_wake(bus->bridge, enable); + acpi_pm_set_device_wakeup(bus->bridge, enable); } static int acpi_pci_sleep_wake(struct pci_dev *dev, bool enable) { if (acpi_pci_can_wakeup(dev)) - return acpi_pm_device_sleep_wake(&dev->dev, enable); + return acpi_pm_set_device_wakeup(&dev->dev, enable); acpi_pci_propagate_wakeup_enable(dev->bus, enable); return 0; @@ -604,14 +604,14 @@ static void acpi_pci_propagate_run_wake(struct pci_bus *bus, bool enable) if (bridge->pme_interrupt) return; - if (!acpi_pm_device_run_wake(&bridge->dev, enable)) + if (!acpi_pm_set_device_wakeup(&bridge->dev, enable)) return; bus = bus->parent; } /* We have reached the root bus. */ if (bus->bridge) - acpi_pm_device_run_wake(bus->bridge, enable); + acpi_pm_set_device_wakeup(bus->bridge, enable); } static int acpi_pci_run_wake(struct pci_dev *dev, bool enable) @@ -625,7 +625,7 @@ static int acpi_pci_run_wake(struct pci_dev *dev, bool enable) if (dev->pme_interrupt && !dev->runtime_d3cold) return 0; - if (!acpi_pm_device_run_wake(&dev->dev, enable)) + if (!acpi_pm_set_device_wakeup(&dev->dev, enable)) return 0; acpi_pci_propagate_run_wake(dev->bus, enable); diff --git a/drivers/pnp/pnpacpi/core.c b/drivers/pnp/pnpacpi/core.c index 9113876487ed..3a4c1aa0201e 100644 --- a/drivers/pnp/pnpacpi/core.c +++ b/drivers/pnp/pnpacpi/core.c @@ -149,8 +149,8 @@ static int pnpacpi_suspend(struct pnp_dev *dev, pm_message_t state) } if (device_can_wakeup(&dev->dev)) { - error = acpi_pm_device_sleep_wake(&dev->dev, - device_may_wakeup(&dev->dev)); + error = acpi_pm_set_device_wakeup(&dev->dev, + device_may_wakeup(&dev->dev)); if (error) return error; } @@ -185,7 +185,7 @@ static int pnpacpi_resume(struct pnp_dev *dev) } if (device_may_wakeup(&dev->dev)) - acpi_pm_device_sleep_wake(&dev->dev, false); + acpi_pm_set_device_wakeup(&dev->dev, false); if (acpi_device_power_manageable(acpi_dev)) error = acpi_device_set_power(acpi_dev, ACPI_STATE_D0); diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 72e32bd1e333..6bf0f843f7d7 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -603,7 +603,7 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, void (*func)(struct acpi_device_wakeup_context *context)); acpi_status acpi_remove_pm_notifier(struct acpi_device *adev); int acpi_pm_device_sleep_state(struct device *, int *, int); -int acpi_pm_device_run_wake(struct device *, bool); +int acpi_pm_set_device_wakeup(struct device *dev, bool enable); #else static inline void acpi_pm_wakeup_event(struct device *dev) { @@ -626,16 +626,7 @@ static inline int acpi_pm_device_sleep_state(struct device *d, int *p, int m) return (m >= ACPI_STATE_D0 && m <= ACPI_STATE_D3_COLD) ? m : ACPI_STATE_D0; } -static inline int acpi_pm_device_run_wake(struct device *dev, bool enable) -{ - return -ENODEV; -} -#endif - -#ifdef CONFIG_PM_SLEEP -int acpi_pm_device_sleep_wake(struct device *, bool); -#else -static inline int acpi_pm_device_sleep_wake(struct device *dev, bool enable) +static inline int acpi_pm_set_device_wakeup(struct device *dev, bool enable) { return -ENODEV; } From 8370c2dc4c7b91be7e1231130f0ae08b5aebecf4 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sat, 24 Jun 2017 01:56:13 +0200 Subject: [PATCH 56/69] PCI / PM: Drop pme_interrupt flag from struct pci_dev The pme_interrupt flag in struct pci_dev is set when PMEs generated by the device are going to be signaled via root port PME interrupts. Ironically enough, that information is only used by the code setting up device wakeup through ACPI which returns as soon as it sees the pme_interrupt flag set while setting up "remote runtime wakeup". That is questionable, however, because in theory there may be PCIe devices using out-of-band PME signaling under root ports handled by the native PME code or devices requiring wakeup power setup to be carried out by AML. For such devices, ACPI wakeup should be invoked regardless of whether or not native PME signaling is used in general. For this reason, drop the pme_interrupt flag and rework the code using it which then allows the ACPI-based device wakeup handling in PCI to be consolidated to use one code path for both "runtime remote wakeup" and system wakeup (from sleep states). Signed-off-by: Rafael J. Wysocki Reviewed-by: Mika Westerberg Acked-by: Bjorn Helgaas --- drivers/acpi/device_pm.c | 10 ++++-- drivers/pci/pci-acpi.c | 68 +++++++++------------------------------- drivers/pci/pcie/pme.c | 14 ++++----- include/acpi/acpi_bus.h | 5 +++ include/linux/pci.h | 1 - 5 files changed, 34 insertions(+), 64 deletions(-) diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index d2e985a4bac2..28938b5a334e 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -496,6 +496,13 @@ bool acpi_bus_can_wakeup(acpi_handle handle) } EXPORT_SYMBOL(acpi_bus_can_wakeup); +bool acpi_pm_device_can_wakeup(struct device *dev) +{ + struct acpi_device *adev = ACPI_COMPANION(dev); + + return adev ? acpi_device_can_wakeup(adev) : false; +} + /** * acpi_dev_pm_get_state - Get preferred power state of ACPI device. * @dev: Device whose preferred target power state to return. @@ -737,8 +744,7 @@ int acpi_pm_set_device_wakeup(struct device *dev, bool enable) error = acpi_device_wakeup(adev, acpi_target_system_state(), enable); if (!error) - dev_dbg(dev, "Wakeup %s by ACPI\n", - enable ? "enabled" : "disabled"); + dev_dbg(dev, "Wakeup %s by ACPI\n", enable ? "enabled" : "disabled"); return error; } diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index eb014b8ab01a..de255bc9736b 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -569,67 +569,29 @@ static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev) return state_conv[state]; } -static bool acpi_pci_can_wakeup(struct pci_dev *dev) -{ - struct acpi_device *adev = ACPI_COMPANION(&dev->dev); - return adev ? acpi_device_can_wakeup(adev) : false; -} - -static void acpi_pci_propagate_wakeup_enable(struct pci_bus *bus, bool enable) +static int acpi_pci_propagate_wakeup(struct pci_bus *bus, bool enable) { while (bus->parent) { - if (!acpi_pm_set_device_wakeup(&bus->self->dev, enable)) - return; + if (acpi_pm_device_can_wakeup(&bus->self->dev)) + return acpi_pm_set_device_wakeup(&bus->self->dev, enable); + bus = bus->parent; } /* We have reached the root bus. */ - if (bus->bridge) - acpi_pm_set_device_wakeup(bus->bridge, enable); + if (bus->bridge) { + if (acpi_pm_device_can_wakeup(bus->bridge)) + return acpi_pm_set_device_wakeup(bus->bridge, enable); + } + return 0; } -static int acpi_pci_sleep_wake(struct pci_dev *dev, bool enable) +static int acpi_pci_wakeup(struct pci_dev *dev, bool enable) { - if (acpi_pci_can_wakeup(dev)) + if (acpi_pm_device_can_wakeup(&dev->dev)) return acpi_pm_set_device_wakeup(&dev->dev, enable); - acpi_pci_propagate_wakeup_enable(dev->bus, enable); - return 0; -} - -static void acpi_pci_propagate_run_wake(struct pci_bus *bus, bool enable) -{ - while (bus->parent) { - struct pci_dev *bridge = bus->self; - - if (bridge->pme_interrupt) - return; - if (!acpi_pm_set_device_wakeup(&bridge->dev, enable)) - return; - bus = bus->parent; - } - - /* We have reached the root bus. */ - if (bus->bridge) - acpi_pm_set_device_wakeup(bus->bridge, enable); -} - -static int acpi_pci_run_wake(struct pci_dev *dev, bool enable) -{ - /* - * Per PCI Express Base Specification Revision 2.0 section - * 5.3.3.2 Link Wakeup, platform support is needed for D3cold - * waking up to power on the main link even if there is PME - * support for D3cold - */ - if (dev->pme_interrupt && !dev->runtime_d3cold) - return 0; - - if (!acpi_pm_set_device_wakeup(&dev->dev, enable)) - return 0; - - acpi_pci_propagate_run_wake(dev->bus, enable); - return 0; + return acpi_pci_propagate_wakeup(dev->bus, enable); } static bool acpi_pci_need_resume(struct pci_dev *dev) @@ -653,8 +615,8 @@ static const struct pci_platform_pm_ops acpi_pci_platform_pm = { .set_state = acpi_pci_set_power_state, .get_state = acpi_pci_get_power_state, .choose_state = acpi_pci_choose_state, - .sleep_wake = acpi_pci_sleep_wake, - .run_wake = acpi_pci_run_wake, + .sleep_wake = acpi_pci_wakeup, + .run_wake = acpi_pci_wakeup, .need_resume = acpi_pci_need_resume, }; @@ -778,7 +740,7 @@ static void pci_acpi_setup(struct device *dev) device_set_wakeup_capable(dev, true); device_set_run_wake(dev, true); - acpi_pci_sleep_wake(pci_dev, false); + acpi_pci_wakeup(pci_dev, false); } static void pci_acpi_cleanup(struct device *dev) diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c index 2dd1c68e6de8..1db1615838ac 100644 --- a/drivers/pci/pcie/pme.c +++ b/drivers/pci/pcie/pme.c @@ -294,31 +294,29 @@ static irqreturn_t pcie_pme_irq(int irq, void *context) } /** - * pcie_pme_set_native - Set the PME interrupt flag for given device. + * pcie_pme_can_wakeup - Set the wakeup capability flag. * @dev: PCI device to handle. * @ign: Ignored. */ -static int pcie_pme_set_native(struct pci_dev *dev, void *ign) +static int pcie_pme_can_wakeup(struct pci_dev *dev, void *ign) { device_set_run_wake(&dev->dev, true); - dev->pme_interrupt = true; return 0; } /** - * pcie_pme_mark_devices - Set the PME interrupt flag for devices below a port. + * pcie_pme_mark_devices - Set the wakeup flag for devices below a port. * @port: PCIe root port or event collector to handle. * * For each device below given root port, including the port itself (or for each * root complex integrated endpoint if @port is a root complex event collector) - * set the flag indicating that it can signal run-time wake-up events via PCIe - * PME interrupts. + * set the flag indicating that it can signal run-time wake-up events. */ static void pcie_pme_mark_devices(struct pci_dev *port) { - pcie_pme_set_native(port, NULL); + pcie_pme_can_wakeup(port, NULL); if (port->subordinate) - pci_walk_bus(port->subordinate, pcie_pme_set_native, NULL); + pci_walk_bus(port->subordinate, pcie_pme_can_wakeup, NULL); } /** diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 6bf0f843f7d7..be6b6bb4ef9c 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -602,6 +602,7 @@ void acpi_pm_wakeup_event(struct device *dev); acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, void (*func)(struct acpi_device_wakeup_context *context)); acpi_status acpi_remove_pm_notifier(struct acpi_device *adev); +bool acpi_pm_device_can_wakeup(struct device *dev); int acpi_pm_device_sleep_state(struct device *, int *, int); int acpi_pm_set_device_wakeup(struct device *dev, bool enable); #else @@ -618,6 +619,10 @@ static inline acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) { return AE_SUPPORT; } +static inline bool acpi_pm_device_can_wakeup(struct device *dev) +{ + return false; +} static inline int acpi_pm_device_sleep_state(struct device *d, int *p, int m) { if (p) diff --git a/include/linux/pci.h b/include/linux/pci.h index 8039f9f0ca05..d3d5bca82b43 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -307,7 +307,6 @@ struct pci_dev { u8 pm_cap; /* PM capability offset */ unsigned int pme_support:5; /* Bitmask of states from which PME# can be generated */ - unsigned int pme_interrupt:1; unsigned int pme_poll:1; /* Poll device's PME status bit */ unsigned int d1_support:1; /* Low power state D1 is supported */ unsigned int d2_support:1; /* Low power state D2 is supported */ From 0847684cfc5f0e9f009919bfdcb041d60e19b856 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sat, 24 Jun 2017 01:57:35 +0200 Subject: [PATCH 57/69] PCI / PM: Simplify device wakeup settings code After previous changes it is not necessary to distinguish between device wakeup for run time and device wakeup from system sleep states any more, so rework the PCI device wakeup settings code accordingly. Signed-off-by: Rafael J. Wysocki Reviewed-by: Mika Westerberg Acked-by: Bjorn Helgaas --- drivers/pci/pci-acpi.c | 3 +-- drivers/pci/pci-driver.c | 2 +- drivers/pci/pci-mid.c | 10 ++-------- drivers/pci/pci.c | 36 ++++++++++-------------------------- drivers/pci/pci.h | 9 ++------- include/linux/pci.h | 9 +-------- 6 files changed, 17 insertions(+), 52 deletions(-) diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index de255bc9736b..138a3c55d80e 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -615,8 +615,7 @@ static const struct pci_platform_pm_ops acpi_pci_platform_pm = { .set_state = acpi_pci_set_power_state, .get_state = acpi_pci_get_power_state, .choose_state = acpi_pci_choose_state, - .sleep_wake = acpi_pci_wakeup, - .run_wake = acpi_pci_wakeup, + .set_wakeup = acpi_pci_wakeup, .need_resume = acpi_pci_need_resume, }; diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 192e7b681b96..ffe7d54d9328 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -1216,7 +1216,7 @@ static int pci_pm_runtime_resume(struct device *dev) pci_restore_standard_config(pci_dev); pci_fixup_device(pci_fixup_resume_early, pci_dev); - __pci_enable_wake(pci_dev, PCI_D0, true, false); + pci_enable_wake(pci_dev, PCI_D0, false); pci_fixup_device(pci_fixup_resume, pci_dev); rc = pm->runtime_resume(dev); diff --git a/drivers/pci/pci-mid.c b/drivers/pci/pci-mid.c index 1c4af7227bca..a4ac940c7696 100644 --- a/drivers/pci/pci-mid.c +++ b/drivers/pci/pci-mid.c @@ -39,12 +39,7 @@ static pci_power_t mid_pci_choose_state(struct pci_dev *pdev) return PCI_D3hot; } -static int mid_pci_sleep_wake(struct pci_dev *dev, bool enable) -{ - return 0; -} - -static int mid_pci_run_wake(struct pci_dev *dev, bool enable) +static int mid_pci_wakeup(struct pci_dev *dev, bool enable) { return 0; } @@ -59,8 +54,7 @@ static const struct pci_platform_pm_ops mid_pci_platform_pm = { .set_state = mid_pci_set_power_state, .get_state = mid_pci_get_power_state, .choose_state = mid_pci_choose_state, - .sleep_wake = mid_pci_sleep_wake, - .run_wake = mid_pci_run_wake, + .set_wakeup = mid_pci_wakeup, .need_resume = mid_pci_need_resume, }; diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 5641035e58fa..d378262d30e3 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -574,8 +574,7 @@ static const struct pci_platform_pm_ops *pci_platform_pm; int pci_set_platform_pm(const struct pci_platform_pm_ops *ops) { if (!ops->is_manageable || !ops->set_state || !ops->get_state || - !ops->choose_state || !ops->sleep_wake || !ops->run_wake || - !ops->need_resume) + !ops->choose_state || !ops->set_wakeup || !ops->need_resume) return -EINVAL; pci_platform_pm = ops; return 0; @@ -603,16 +602,10 @@ static inline pci_power_t platform_pci_choose_state(struct pci_dev *dev) pci_platform_pm->choose_state(dev) : PCI_POWER_ERROR; } -static inline int platform_pci_sleep_wake(struct pci_dev *dev, bool enable) +static inline int platform_pci_set_wakeup(struct pci_dev *dev, bool enable) { return pci_platform_pm ? - pci_platform_pm->sleep_wake(dev, enable) : -ENODEV; -} - -static inline int platform_pci_run_wake(struct pci_dev *dev, bool enable) -{ - return pci_platform_pm ? - pci_platform_pm->run_wake(dev, enable) : -ENODEV; + pci_platform_pm->set_wakeup(dev, enable) : -ENODEV; } static inline bool platform_pci_need_resume(struct pci_dev *dev) @@ -1889,10 +1882,9 @@ void pci_pme_active(struct pci_dev *dev, bool enable) EXPORT_SYMBOL(pci_pme_active); /** - * __pci_enable_wake - enable PCI device as wakeup event source + * pci_enable_wake - enable PCI device as wakeup event source * @dev: PCI device affected * @state: PCI state from which device will issue wakeup events - * @runtime: True if the events are to be generated at run time * @enable: True to enable event generation; false to disable * * This enables the device as a wakeup event source, or disables it. @@ -1908,14 +1900,10 @@ EXPORT_SYMBOL(pci_pme_active); * Error code depending on the platform is returned if both the platform and * the native mechanism fail to enable the generation of wake-up events */ -int __pci_enable_wake(struct pci_dev *dev, pci_power_t state, - bool runtime, bool enable) +int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable) { int ret = 0; - if (enable && !runtime && !device_may_wakeup(&dev->dev)) - return -EINVAL; - /* * Don't do the same thing twice in a row for one device, but restore * PME Enable in case it has been updated by config space restoration. @@ -1938,24 +1926,20 @@ int __pci_enable_wake(struct pci_dev *dev, pci_power_t state, pci_pme_active(dev, true); else ret = 1; - error = runtime ? platform_pci_run_wake(dev, true) : - platform_pci_sleep_wake(dev, true); + error = platform_pci_set_wakeup(dev, true); if (ret) ret = error; if (!ret) dev->wakeup_prepared = true; } else { - if (runtime) - platform_pci_run_wake(dev, false); - else - platform_pci_sleep_wake(dev, false); + platform_pci_set_wakeup(dev, false); pci_pme_active(dev, false); dev->wakeup_prepared = false; } return ret; } -EXPORT_SYMBOL(__pci_enable_wake); +EXPORT_SYMBOL(pci_enable_wake); /** * pci_wake_from_d3 - enable/disable device to wake up from D3_hot or D3_cold @@ -2097,12 +2081,12 @@ int pci_finish_runtime_suspend(struct pci_dev *dev) dev->runtime_d3cold = target_state == PCI_D3cold; - __pci_enable_wake(dev, target_state, true, pci_dev_run_wake(dev)); + pci_enable_wake(dev, target_state, pci_dev_run_wake(dev)); error = pci_set_power_state(dev, target_state); if (error) { - __pci_enable_wake(dev, target_state, true, false); + pci_enable_wake(dev, target_state, false); dev->runtime_d3cold = false; } diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index f8113e5b9812..240b2c0fed4b 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -47,11 +47,7 @@ int pci_probe_reset_function(struct pci_dev *dev); * platform; to be used during system-wide transitions from a * sleeping state to the working state and vice versa * - * @sleep_wake: enables/disables the system wake up capability of given device - * - * @run_wake: enables/disables the platform to generate run-time wake-up events - * for given device (the device's wake-up capability has to be - * enabled by @sleep_wake for this feature to work) + * @set_wakeup: enables/disables wakeup capability for the device * * @need_resume: returns 'true' if the given device (which is currently * suspended) needs to be resumed to be configured for system @@ -65,8 +61,7 @@ struct pci_platform_pm_ops { int (*set_state)(struct pci_dev *dev, pci_power_t state); pci_power_t (*get_state)(struct pci_dev *dev); pci_power_t (*choose_state)(struct pci_dev *dev); - int (*sleep_wake)(struct pci_dev *dev, bool enable); - int (*run_wake)(struct pci_dev *dev, bool enable); + int (*set_wakeup)(struct pci_dev *dev, bool enable); bool (*need_resume)(struct pci_dev *dev); }; diff --git a/include/linux/pci.h b/include/linux/pci.h index d3d5bca82b43..ff200c84240b 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1097,8 +1097,7 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state); pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state); bool pci_pme_capable(struct pci_dev *dev, pci_power_t state); void pci_pme_active(struct pci_dev *dev, bool enable); -int __pci_enable_wake(struct pci_dev *dev, pci_power_t state, - bool runtime, bool enable); +int pci_enable_wake(struct pci_dev *dev, pci_power_t state, bool enable); int pci_wake_from_d3(struct pci_dev *dev, bool enable); int pci_prepare_to_sleep(struct pci_dev *dev); int pci_back_from_sleep(struct pci_dev *dev); @@ -1108,12 +1107,6 @@ void pci_pme_wakeup_bus(struct pci_bus *bus); void pci_d3cold_enable(struct pci_dev *dev); void pci_d3cold_disable(struct pci_dev *dev); -static inline int pci_enable_wake(struct pci_dev *dev, pci_power_t state, - bool enable) -{ - return __pci_enable_wake(dev, state, false, enable); -} - /* PCI Virtual Channel */ int pci_save_vc_state(struct pci_dev *dev); void pci_restore_vc_state(struct pci_dev *dev); From de3ef1eb1cd0cc3a75f7a3661e10ed827f370ab8 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sat, 24 Jun 2017 01:58:53 +0200 Subject: [PATCH 58/69] PM / core: Drop run_wake flag from struct dev_pm_info The run_wake flag in struct dev_pm_info is used to indicate whether or not the device is capable of generating remote wakeup signals at run time (or in the system working state), but the distinction between runtime remote wakeup and system wakeup signaling has always been rather artificial. The only practical reason for it to exist at the core level was that ACPI and PCI treated those two cases differently, but that's not the case any more after recent changes. For this reason, get rid of the run_wake flag and, when applicable, use device_set_wakeup_capable() and device_can_wakeup() instead of device_set_run_wake() and device_run_wake(), respectively. Signed-off-by: Rafael J. Wysocki Reviewed-by: Mika Westerberg Acked-by: Bjorn Helgaas --- Documentation/power/runtime_pm.txt | 7 ++----- drivers/acpi/pci_root.c | 5 ++--- drivers/pci/pci-acpi.c | 5 +---- drivers/pci/pci.c | 6 +++--- drivers/pci/pcie/pme.c | 2 +- drivers/usb/dwc3/dwc3-pci.c | 3 +-- drivers/usb/host/uhci-pci.c | 2 +- include/linux/pm.h | 1 - include/linux/pm_runtime.h | 12 ------------ 9 files changed, 11 insertions(+), 32 deletions(-) diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index ee69d7532172..0fde3dcf077a 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -105,9 +105,9 @@ knows what to do to handle the device). In particular, if the driver requires remote wakeup capability (i.e. hardware mechanism allowing the device to request a change of its power state, such as -PCI PME) for proper functioning and device_run_wake() returns 'false' for the +PCI PME) for proper functioning and device_can_wakeup() returns 'false' for the device, then ->runtime_suspend() should return -EBUSY. On the other hand, if -device_run_wake() returns 'true' for the device and the device is put into a +device_can_wakeup() returns 'true' for the device and the device is put into a low-power state during the execution of the suspend callback, it is expected that remote wakeup will be enabled for the device. Generally, remote wakeup should be enabled for all input devices put into low-power states at run time. @@ -253,9 +253,6 @@ defined in include/linux/pm.h: being executed for that device and it is not practical to wait for the suspend to complete; means "start a resume as soon as you've suspended" - unsigned int run_wake; - - set if the device is capable of generating runtime wake-up events - enum rpm_status runtime_status; - the runtime PM status of the device; this field's initial value is RPM_SUSPENDED, which means that each device is initially regarded by the diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index 783acbe25520..0d34e622a8b5 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -608,8 +608,7 @@ static int acpi_pci_root_add(struct acpi_device *device, pcie_no_aspm(); pci_acpi_add_bus_pm_notifier(device); - if (device->wakeup.flags.valid) - device_set_run_wake(root->bus->bridge, true); + device_set_wakeup_capable(root->bus->bridge, device->wakeup.flags.valid); if (hotadd) { pcibios_resource_survey_bus(root->bus); @@ -649,7 +648,7 @@ static void acpi_pci_root_remove(struct acpi_device *device) pci_stop_root_bus(root->bus); pci_ioapic_remove(root); - device_set_run_wake(root->bus->bridge, false); + device_set_wakeup_capable(root->bus->bridge, false); pci_acpi_remove_bus_pm_notifier(device); pci_remove_root_bus(root->bus); diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index 138a3c55d80e..e70c1c7ba1bf 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -738,7 +738,6 @@ static void pci_acpi_setup(struct device *dev) return; device_set_wakeup_capable(dev, true); - device_set_run_wake(dev, true); acpi_pci_wakeup(pci_dev, false); } @@ -750,10 +749,8 @@ static void pci_acpi_cleanup(struct device *dev) return; pci_acpi_remove_pm_notifier(adev); - if (adev->wakeup.flags.valid) { + if (adev->wakeup.flags.valid) device_set_wakeup_capable(dev, false); - device_set_run_wake(dev, false); - } } static bool pci_acpi_bus_match(struct device *dev) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d378262d30e3..0b5302a9fdae 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2105,7 +2105,7 @@ bool pci_dev_run_wake(struct pci_dev *dev) { struct pci_bus *bus = dev->bus; - if (device_run_wake(&dev->dev)) + if (device_can_wakeup(&dev->dev)) return true; if (!dev->pme_support) @@ -2118,7 +2118,7 @@ bool pci_dev_run_wake(struct pci_dev *dev) while (bus->parent) { struct pci_dev *bridge = bus->self; - if (device_run_wake(&bridge->dev)) + if (device_can_wakeup(&bridge->dev)) return true; bus = bus->parent; @@ -2126,7 +2126,7 @@ bool pci_dev_run_wake(struct pci_dev *dev) /* We have reached the root bus. */ if (bus->bridge) - return device_run_wake(bus->bridge); + return device_can_wakeup(bus->bridge); return false; } diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c index 1db1615838ac..80e58d25006d 100644 --- a/drivers/pci/pcie/pme.c +++ b/drivers/pci/pcie/pme.c @@ -300,7 +300,7 @@ static irqreturn_t pcie_pme_irq(int irq, void *context) */ static int pcie_pme_can_wakeup(struct pci_dev *dev, void *ign) { - device_set_run_wake(&dev->dev, true); + device_set_wakeup_capable(&dev->dev, true); return 0; } diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c index fe851544d7fb..7e995df7a797 100644 --- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -230,7 +230,6 @@ static int dwc3_pci_probe(struct pci_dev *pci, } device_init_wakeup(dev, true); - device_set_run_wake(dev, true); pci_set_drvdata(pci, dwc); pm_runtime_put(dev); @@ -310,7 +309,7 @@ static int dwc3_pci_runtime_suspend(struct device *dev) { struct dwc3_pci *dwc = dev_get_drvdata(dev); - if (device_run_wake(dev)) + if (device_can_wakeup(dev)) return dwc3_pci_dsm(dwc, PCI_INTEL_BXT_STATE_D3); return -EBUSY; diff --git a/drivers/usb/host/uhci-pci.c b/drivers/usb/host/uhci-pci.c index 02260cfdedb1..49effdc0d857 100644 --- a/drivers/usb/host/uhci-pci.c +++ b/drivers/usb/host/uhci-pci.c @@ -131,7 +131,7 @@ static int uhci_pci_init(struct usb_hcd *hcd) /* Intel controllers use non-PME wakeup signalling */ if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_INTEL) - device_set_run_wake(uhci_dev(uhci), 1); + device_set_wakeup_capable(uhci_dev(uhci), true); /* Set up pointers to PCI-specific functions */ uhci->reset_hc = uhci_pci_reset_hc; diff --git a/include/linux/pm.h b/include/linux/pm.h index a0894bc52bb4..b8b4df09fd8f 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -584,7 +584,6 @@ struct dev_pm_info { unsigned int idle_notification:1; unsigned int request_pending:1; unsigned int deferred_resume:1; - unsigned int run_wake:1; unsigned int runtime_auto:1; bool ignore_children:1; unsigned int no_callbacks:1; diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index ca4823e675e2..2efb08a60e63 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -76,16 +76,6 @@ static inline void pm_runtime_put_noidle(struct device *dev) atomic_add_unless(&dev->power.usage_count, -1, 0); } -static inline bool device_run_wake(struct device *dev) -{ - return dev->power.run_wake; -} - -static inline void device_set_run_wake(struct device *dev, bool enable) -{ - dev->power.run_wake = enable; -} - static inline bool pm_runtime_suspended(struct device *dev) { return dev->power.runtime_status == RPM_SUSPENDED @@ -163,8 +153,6 @@ static inline void pm_runtime_forbid(struct device *dev) {} static inline void pm_suspend_ignore_children(struct device *dev, bool enable) {} static inline void pm_runtime_get_noresume(struct device *dev) {} static inline void pm_runtime_put_noidle(struct device *dev) {} -static inline bool device_run_wake(struct device *dev) { return false; } -static inline void device_set_run_wake(struct device *dev, bool enable) {} static inline bool pm_runtime_suspended(struct device *dev) { return false; } static inline bool pm_runtime_active(struct device *dev) { return true; } static inline bool pm_runtime_status_suspended(struct device *dev) { return false; } From 10da65423fdbee185da5bb65f829a9d9312c1198 Mon Sep 17 00:00:00 2001 From: Mikko Perttunen Date: Thu, 22 Jun 2017 10:18:33 +0300 Subject: [PATCH 59/69] PM / Domains: Call driver's noirq callbacks Currently genpd installs its own noirq callbacks, but never calls down to the driver's corresponding callbacks. Add these calls. Signed-off-by: Mikko Perttunen Acked-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 68 ++++++++++++++++++++++++++++++++----- 1 file changed, 59 insertions(+), 9 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 0f7b1bd3680e..bbbb1d72395b 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -902,19 +902,19 @@ static int pm_genpd_prepare(struct device *dev) } /** - * pm_genpd_suspend_noirq - Completion of suspend of device in an I/O PM domain. + * genpd_finish_suspend - Completion of suspend or hibernation of device in an + * I/O pm domain. * @dev: Device to suspend. + * @poweroff: Specifies if this is a poweroff_noirq or suspend_noirq callback. * * Stop the device and remove power from the domain if all devices in it have * been stopped. */ -static int pm_genpd_suspend_noirq(struct device *dev) +static int genpd_finish_suspend(struct device *dev, bool poweroff) { struct generic_pm_domain *genpd; int ret; - dev_dbg(dev, "%s()\n", __func__); - genpd = dev_to_genpd(dev); if (IS_ERR(genpd)) return -EINVAL; @@ -922,6 +922,13 @@ static int pm_genpd_suspend_noirq(struct device *dev) if (dev->power.wakeup_path && genpd_dev_active_wakeup(genpd, dev)) return 0; + if (poweroff) + ret = pm_generic_poweroff_noirq(dev); + else + ret = pm_generic_suspend_noirq(dev); + if (ret) + return ret; + if (genpd->dev_ops.stop && genpd->dev_ops.start) { ret = pm_runtime_force_suspend(dev); if (ret) @@ -936,6 +943,20 @@ static int pm_genpd_suspend_noirq(struct device *dev) return 0; } +/** + * pm_genpd_suspend_noirq - Completion of suspend of device in an I/O PM domain. + * @dev: Device to suspend. + * + * Stop the device and remove power from the domain if all devices in it have + * been stopped. + */ +static int pm_genpd_suspend_noirq(struct device *dev) +{ + dev_dbg(dev, "%s()\n", __func__); + + return genpd_finish_suspend(dev, false); +} + /** * pm_genpd_resume_noirq - Start of resume of device in an I/O PM domain. * @dev: Device to resume. @@ -964,6 +985,10 @@ static int pm_genpd_resume_noirq(struct device *dev) if (genpd->dev_ops.stop && genpd->dev_ops.start) ret = pm_runtime_force_resume(dev); + ret = pm_generic_resume_noirq(dev); + if (ret) + return ret; + return ret; } @@ -987,6 +1012,10 @@ static int pm_genpd_freeze_noirq(struct device *dev) if (IS_ERR(genpd)) return -EINVAL; + ret = pm_generic_freeze_noirq(dev); + if (ret) + return ret; + if (genpd->dev_ops.stop && genpd->dev_ops.start) ret = pm_runtime_force_suspend(dev); @@ -1011,10 +1040,28 @@ static int pm_genpd_thaw_noirq(struct device *dev) if (IS_ERR(genpd)) return -EINVAL; - if (genpd->dev_ops.stop && genpd->dev_ops.start) + if (genpd->dev_ops.stop && genpd->dev_ops.start) { ret = pm_runtime_force_resume(dev); + if (ret) + return ret; + } - return ret; + return pm_generic_thaw_noirq(dev); +} + +/** + * pm_genpd_poweroff_noirq - Completion of hibernation of device in an + * I/O PM domain. + * @dev: Device to poweroff. + * + * Stop the device and remove power from the domain if all devices in it have + * been stopped. + */ +static int pm_genpd_poweroff_noirq(struct device *dev) +{ + dev_dbg(dev, "%s()\n", __func__); + + return genpd_finish_suspend(dev, true); } /** @@ -1051,10 +1098,13 @@ static int pm_genpd_restore_noirq(struct device *dev) genpd_sync_power_on(genpd, true, 0); genpd_unlock(genpd); - if (genpd->dev_ops.stop && genpd->dev_ops.start) + if (genpd->dev_ops.stop && genpd->dev_ops.start) { ret = pm_runtime_force_resume(dev); + if (ret) + return ret; + } - return ret; + return pm_generic_restore_noirq(dev); } /** @@ -1496,7 +1546,7 @@ int pm_genpd_init(struct generic_pm_domain *genpd, genpd->domain.ops.resume_noirq = pm_genpd_resume_noirq; genpd->domain.ops.freeze_noirq = pm_genpd_freeze_noirq; genpd->domain.ops.thaw_noirq = pm_genpd_thaw_noirq; - genpd->domain.ops.poweroff_noirq = pm_genpd_suspend_noirq; + genpd->domain.ops.poweroff_noirq = pm_genpd_poweroff_noirq; genpd->domain.ops.restore_noirq = pm_genpd_restore_noirq; genpd->domain.ops.complete = pm_genpd_complete; From 8b55e55ee44356d68f4a7ee4b11f9cbb1f5958cc Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 28 Jun 2017 16:56:17 +0200 Subject: [PATCH 60/69] PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device genpd_syscore_switch() had two problems: 1. It silently assumed that device, it is being called for, belongs to generic power domain and used container_of() on its power domain pointer. Such assumption might not be true always. 2. It iterated over list of generic power domains without holding gpd_list_lock mutex thus list could have been modified at the same time. Usage of genpd_lookup_dev() solves both problems as it is safe a call for non-generic power domains and uses mutex when iterating. Reported-by: Ulf Hansson Signed-off-by: Krzysztof Kozlowski Acked-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index bbbb1d72395b..b8d7907ee101 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -1148,8 +1148,8 @@ static void genpd_syscore_switch(struct device *dev, bool suspend) { struct generic_pm_domain *genpd; - genpd = dev_to_genpd(dev); - if (!pm_genpd_present(genpd)) + genpd = genpd_lookup_dev(dev); + if (!genpd) return; if (suspend) { From c6e83cac3eda5f7dd32ee1453df2f7abb5c6cd46 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 28 Jun 2017 16:56:18 +0200 Subject: [PATCH 61/69] PM / Domains: Fix unsafe iteration over modified list of device links pm_genpd_remove_subdomain() iterates over domain's master_links list and removes matching element thus it has to use safe version of list iteration. Fixes: f721889ff65a ("PM / Domains: Support for generic I/O PM domains (v8)") Cc: 3.1+ # 3.1+ Signed-off-by: Krzysztof Kozlowski Acked-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index b8d7907ee101..048dc74e3d72 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -1446,7 +1446,7 @@ EXPORT_SYMBOL_GPL(pm_genpd_add_subdomain); int pm_genpd_remove_subdomain(struct generic_pm_domain *genpd, struct generic_pm_domain *subdomain) { - struct gpd_link *link; + struct gpd_link *l, *link; int ret = -EINVAL; if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(subdomain)) @@ -1462,7 +1462,7 @@ int pm_genpd_remove_subdomain(struct generic_pm_domain *genpd, goto out; } - list_for_each_entry(link, &genpd->master_links, master_node) { + list_for_each_entry_safe(link, l, &genpd->master_links, master_node) { if (link->slave != subdomain) continue; From b556b15dc04e9b9b98790f04c21acf5e24f994b2 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 28 Jun 2017 16:56:19 +0200 Subject: [PATCH 62/69] PM / Domains: Fix unsafe iteration over modified list of domain providers of_genpd_del_provider() iterates over list of domain provides and removes matching element thus it has to use safe version of list iteration. Fixes: aa42240ab254 (PM / Domains: Add generic OF-based PM domain look-up) Cc: 3.19+ # 3.19+ Signed-off-by: Krzysztof Kozlowski Acked-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 048dc74e3d72..0b836fdc99ad 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -1835,12 +1835,12 @@ EXPORT_SYMBOL_GPL(of_genpd_add_provider_onecell); */ void of_genpd_del_provider(struct device_node *np) { - struct of_genpd_provider *cp; + struct of_genpd_provider *cp, *tmp; struct generic_pm_domain *gpd; mutex_lock(&gpd_list_lock); mutex_lock(&of_genpd_mutex); - list_for_each_entry(cp, &of_genpd_providers, link) { + list_for_each_entry_safe(cp, tmp, &of_genpd_providers, link) { if (cp->node == np) { /* * For each PM domain associated with the From a7e2d1bce4c1db471f1cbc0c4666a3112bbf0994 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 28 Jun 2017 16:56:20 +0200 Subject: [PATCH 63/69] PM / Domains: Fix unsafe iteration over modified list of domains of_genpd_remove_last() iterates over list of domains and removes matching element thus it has to use safe version of list iteration. Fixes: 17926551c98a (PM / Domains: Add support for removing nested PM domains by provider) Cc: 4.9+ # 4.9+ Signed-off-by: Krzysztof Kozlowski Acked-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 0b836fdc99ad..e342408cfb8d 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -1980,14 +1980,14 @@ EXPORT_SYMBOL_GPL(of_genpd_add_subdomain); */ struct generic_pm_domain *of_genpd_remove_last(struct device_node *np) { - struct generic_pm_domain *gpd, *genpd = ERR_PTR(-ENOENT); + struct generic_pm_domain *gpd, *tmp, *genpd = ERR_PTR(-ENOENT); int ret; if (IS_ERR_OR_NULL(np)) return ERR_PTR(-EINVAL); mutex_lock(&gpd_list_lock); - list_for_each_entry(gpd, &gpd_list, gpd_list_node) { + list_for_each_entry_safe(gpd, tmp, &gpd_list, gpd_list_node) { if (gpd->provider == &np->fwnode) { ret = genpd_remove(gpd); genpd = ret ? ERR_PTR(ret) : gpd; From 268cd2ed3d4413a963d765eb6c0b87f06932b971 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 28 Jun 2017 16:56:21 +0200 Subject: [PATCH 64/69] PM / Domains: Fix missing default_power_down_ok comment Commit fc5cbf0c94b6 (PM / Domains: Support for multiple states) split out some code out of default_power_down_ok() function so the documentation has to be moved to appropriate place. Signed-off-by: Krzysztof Kozlowski Acked-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain_governor.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/base/power/domain_governor.c b/drivers/base/power/domain_governor.c index 2e0fce711135..281f949c5ffe 100644 --- a/drivers/base/power/domain_governor.c +++ b/drivers/base/power/domain_governor.c @@ -92,12 +92,6 @@ static bool default_suspend_ok(struct device *dev) return td->cached_suspend_ok; } -/** - * default_power_down_ok - Default generic PM domain power off governor routine. - * @pd: PM domain to check. - * - * This routine must be executed under the PM domain's lock. - */ static bool __default_power_down_ok(struct dev_pm_domain *pd, unsigned int state) { @@ -187,6 +181,12 @@ static bool __default_power_down_ok(struct dev_pm_domain *pd, return true; } +/** + * default_power_down_ok - Default generic PM domain power off governor routine. + * @pd: PM domain to check. + * + * This routine must be executed under the PM domain's lock. + */ static bool default_power_down_ok(struct dev_pm_domain *pd) { struct generic_pm_domain *genpd = pd_to_genpd(pd); From 654d08a42a5660514df5a5a6ed4dc1b0c978a0a3 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Fri, 9 Jun 2017 12:29:20 -0700 Subject: [PATCH 65/69] intel_idle: Use more common logging style Remove #define PREFIX and add #define pr_fmt to use more common logging. Miscellanea: o Add missing newline to format o Convert a single printk without KERN_ to pr_info Signed-off-by: Joe Perches Acked-by: Jacob Pan Signed-off-by: Rafael J. Wysocki --- drivers/idle/intel_idle.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 216d7ec88c0c..c2ae819a871c 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -51,6 +51,8 @@ /* un-comment DEBUG to enable pr_debug() statements */ #define DEBUG +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -65,7 +67,6 @@ #include #define INTEL_IDLE_VERSION "0.4.1" -#define PREFIX "intel_idle: " static struct cpuidle_driver intel_idle_driver = { .name = "intel_idle", @@ -1111,7 +1112,7 @@ static int __init intel_idle_probe(void) const struct x86_cpu_id *id; if (max_cstate == 0) { - pr_debug(PREFIX "disabled\n"); + pr_debug("disabled\n"); return -EPERM; } @@ -1119,8 +1120,8 @@ static int __init intel_idle_probe(void) if (!id) { if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && boot_cpu_data.x86 == 6) - pr_debug(PREFIX "does not run on family %d model %d\n", - boot_cpu_data.x86, boot_cpu_data.x86_model); + pr_debug("does not run on family %d model %d\n", + boot_cpu_data.x86, boot_cpu_data.x86_model); return -ENODEV; } @@ -1134,13 +1135,13 @@ static int __init intel_idle_probe(void) !mwait_substates) return -ENODEV; - pr_debug(PREFIX "MWAIT substates: 0x%x\n", mwait_substates); + pr_debug("MWAIT substates: 0x%x\n", mwait_substates); icpu = (const struct idle_cpu *)id->driver_data; cpuidle_state_table = icpu->state_table; - pr_debug(PREFIX "v" INTEL_IDLE_VERSION - " model 0x%X\n", boot_cpu_data.x86_model); + pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n", + boot_cpu_data.x86_model); return 0; } @@ -1340,8 +1341,7 @@ static void __init intel_idle_cpuidle_driver_init(void) break; if (cstate + 1 > max_cstate) { - printk(PREFIX "max_cstate %d reached\n", - max_cstate); + pr_info("max_cstate %d reached\n", max_cstate); break; } @@ -1358,8 +1358,8 @@ static void __init intel_idle_cpuidle_driver_init(void) /* if state marked as disabled, skip it */ if (cpuidle_state_table[cstate].disabled != 0) { - pr_debug(PREFIX "state %s is disabled", - cpuidle_state_table[cstate].name); + pr_debug("state %s is disabled\n", + cpuidle_state_table[cstate].name); continue; } @@ -1395,7 +1395,7 @@ static int intel_idle_cpu_init(unsigned int cpu) dev->cpu = cpu; if (cpuidle_register_device(dev)) { - pr_debug(PREFIX "cpuidle_register_device %d failed!\n", cpu); + pr_debug("cpuidle_register_device %d failed!\n", cpu); return -EIO; } @@ -1447,8 +1447,8 @@ static int __init intel_idle_init(void) retval = cpuidle_register_driver(&intel_idle_driver); if (retval) { struct cpuidle_driver *drv = cpuidle_get_driver(); - printk(KERN_DEBUG PREFIX "intel_idle yielding to %s", - drv ? drv->name : "none"); + printk(KERN_DEBUG pr_fmt("intel_idle yielding to %s\n"), + drv ? drv->name : "none"); goto init_driver_fail; } @@ -1460,8 +1460,8 @@ static int __init intel_idle_init(void) if (retval < 0) goto hp_setup_fail; - pr_debug(PREFIX "lapic_timer_reliable_states 0x%x\n", - lapic_timer_reliable_states); + pr_debug("lapic_timer_reliable_states 0x%x\n", + lapic_timer_reliable_states); return 0; From 3ed09c94580de9d5b18cc35d1f97e9f24cd9233b Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Mon, 26 Jun 2017 15:38:15 +1000 Subject: [PATCH 66/69] cpuidle: menu: allow state 0 to be disabled The menu driver does not allow state0 to be disabled completely. If it is disabled but other enabled states don't meet latency requirements, it is still used. Fix this by starting with the first enabled idle state. Fall back to state 0 if no idle states are enabled (arguably this should be -EINVAL if it is attempted, but this is the minimal fix). Acked-by: Gautham R. Shenoy Signed-off-by: Nicholas Piggin Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/governors/menu.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index b2330fd69e34..61b64c2b2cb8 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -286,6 +286,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) struct device *device = get_cpu_device(dev->cpu); int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); int i; + int first_idx; + int idx; unsigned int interactivity_req; unsigned int expected_interval; unsigned long nr_iowaiters, cpu_load; @@ -335,11 +337,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) if (data->next_timer_us > polling_threshold && latency_req > s->exit_latency && !s->disabled && !dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable) - data->last_state_idx = CPUIDLE_DRIVER_STATE_START; + first_idx = CPUIDLE_DRIVER_STATE_START; else - data->last_state_idx = CPUIDLE_DRIVER_STATE_START - 1; + first_idx = CPUIDLE_DRIVER_STATE_START - 1; } else { - data->last_state_idx = CPUIDLE_DRIVER_STATE_START; + first_idx = 0; } /* @@ -359,20 +361,28 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) * Find the idle state with the lowest power while satisfying * our constraints. */ - for (i = data->last_state_idx + 1; i < drv->state_count; i++) { + idx = -1; + for (i = first_idx; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; struct cpuidle_state_usage *su = &dev->states_usage[i]; if (s->disabled || su->disable) continue; + if (idx == -1) + idx = i; /* first enabled state */ if (s->target_residency > data->predicted_us) break; if (s->exit_latency > latency_req) break; - data->last_state_idx = i; + idx = i; } + if (idx == -1) + idx = 0; /* No states enabled. Must use 0. */ + + data->last_state_idx = idx; + return data->last_state_idx; } From 59494fe2c89e46e85d83de9bc45dd1d528955c49 Mon Sep 17 00:00:00 2001 From: Arvind Yadav Date: Thu, 29 Jun 2017 16:58:40 +0530 Subject: [PATCH 67/69] PM: hibernate: constify attribute_group structures. attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by work with const attribute_group. So mark the non-const structs as const. File size before: text data bss dec hex filename 6332 488 308 7128 1bd8 kernel/power/hibernate.o File size After adding 'const': text data bss dec hex filename 6396 424 308 7128 1bd8 kernel/power/hibernate.o Signed-off-by: Arvind Yadav Acked-by: Pavel Machek Signed-off-by: Rafael J. Wysocki --- kernel/power/hibernate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index a8b978c35a6a..e1914c7b85b1 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -1108,7 +1108,7 @@ static struct attribute * g[] = { }; -static struct attribute_group attr_group = { +static const struct attribute_group attr_group = { .attrs = g, }; From fab24dcc395637557a7988a867e7b3a5823917a9 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 29 Jun 2017 01:47:56 +0200 Subject: [PATCH 68/69] cpufreq: intel_pstate: Clean up after performance governor changes After commit 82b4e03e01bc (intel_pstate: skip scheduler hook when in "performance" mode) get_target_pstate_use_performance() and get_target_pstate_use_cpu_load() are never called if scaling_governor is "performance", so drop the CPUFREQ_POLICY_PERFORMANCE checks from them as they will never trigger anyway. Moreover, the documentation needs to be updated to reflect the change made by the above commit, so do that too. Signed-off-by: Rafael J. Wysocki Acked-by: Srinivas Pandruvada --- Documentation/admin-guide/pm/intel_pstate.rst | 6 ++---- drivers/cpufreq/intel_pstate.c | 6 ------ 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index 33d703989ea8..1d6249825efc 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -157,10 +157,8 @@ Without HWP, this P-state selection algorithm is always the same regardless of the processor model and platform configuration. It selects the maximum P-state it is allowed to use, subject to limits set via -``sysfs``, every time the P-state selection computations are carried out by the -driver's utilization update callback for the given CPU (that does not happen -more often than every 10 ms), but the hardware configuration will not be changed -if the new P-state is the same as the current one. +``sysfs``, every time the driver configuration for the given CPU is updated +(e.g. via ``sysfs``). This is the default P-state selection algorithm if the :c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 1772d309bea9..756483251832 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1620,9 +1620,6 @@ static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu) int32_t busy_frac, boost; int target, avg_pstate; - if (cpu->policy == CPUFREQ_POLICY_PERFORMANCE) - return cpu->pstate.turbo_pstate; - busy_frac = div_fp(sample->mperf, sample->tsc); boost = cpu->iowait_boost; @@ -1659,9 +1656,6 @@ static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu) int32_t perf_scaled, max_pstate, current_pstate, sample_ratio; u64 duration_ns; - if (cpu->policy == CPUFREQ_POLICY_PERFORMANCE) - return cpu->pstate.turbo_pstate; - /* * perf_scaled is the ratio of the average P-state during the last * sampling period to the P-state requested last time (in percent). From 8183003e48cbb9d6d276b6330eb5edecf95830f3 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 29 Jun 2017 01:49:44 +0200 Subject: [PATCH 69/69] cpufreq: Update scaling_cur_freq documentation Commit f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" modified the way the scaling_cur_freq cpufreq policy attribute in sysfs is handled on contemporary Intel-based x86 systems, so update the documentation to reflect that change. Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/cpufreq.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst index 09aa2e949787..463cf7e73db8 100644 --- a/Documentation/admin-guide/pm/cpufreq.rst +++ b/Documentation/admin-guide/pm/cpufreq.rst @@ -269,16 +269,16 @@ are the following: ``scaling_cur_freq`` Current frequency of all of the CPUs belonging to this policy (in kHz). - For the majority of scaling drivers, this is the frequency of the last - P-state requested by the driver from the hardware using the scaling + In the majority of cases, this is the frequency of the last P-state + requested by the scaling driver from the hardware using the scaling interface provided by it, which may or may not reflect the frequency the CPU is actually running at (due to hardware design and other limitations). - Some scaling drivers (e.g. |intel_pstate|) attempt to provide - information more precisely reflecting the current CPU frequency through - this attribute, but that still may not be the exact current CPU - frequency as seen by the hardware at the moment. + Some architectures (e.g. ``x86``) may attempt to provide information + more precisely reflecting the current CPU frequency through this + attribute, but that still may not be the exact current CPU frequency as + seen by the hardware at the moment. ``scaling_driver`` The scaling driver currently in use.