Commit Graph

1222 Commits

Author SHA1 Message Date
Jason Low c8d75aa47d posix_cpu_timer: Reduce unnecessary sighand lock contention
It was found while running a database workload on large systems that
significant time was spent trying to acquire the sighand lock.

The issue was that whenever an itimer expired, many threads ended up
simultaneously trying to send the signal. Most of the time, nothing
happened after acquiring the sighand lock because another thread
had just already sent the signal and updated the "next expire" time.
The fastpath_timer_check() didn't help much since the "next expire"
time was updated after the threads exit fastpath_timer_check().

This patch addresses this by having the thread_group_cputimer structure
maintain a boolean to signify when a thread in the group is already
checking for process wide timers, and adds extra logic in the fastpath
to check the boolean.

Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: George Spelvin <linux@horizon.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: hideaki.kimura@hpe.com
Cc: terry.rudd@hpe.com
Cc: scott.norton@hpe.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1444849677-29330-5-git-send-email-jason.low2@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-15 11:23:41 +02:00
Jason Low d5c373eb56 posix_cpu_timer: Convert cputimer->running to bool
In the next patch in this series, a new field 'checking_timer' will
be added to 'struct thread_group_cputimer'. Both this and the
existing 'running' integer field are just used as boolean values. To
save space in the structure, we can make both of these fields booleans.

This is a preparatory patch to convert the existing running integer
field to a boolean.

Suggested-by: George Spelvin <linux@horizon.com>
Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed: George Spelvin <linux@horizon.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: hideaki.kimura@hpe.com
Cc: terry.rudd@hpe.com
Cc: scott.norton@hpe.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1444849677-29330-4-git-send-email-jason.low2@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-15 11:23:41 +02:00
Jason Low 934715a191 posix_cpu_timer: Check thread timers only when there are active thread timers
The fastpath_timer_check() contains logic to check for if any timers
are set by checking if !task_cputime_zero(). Similarly, we can do this
before calling check_thread_timers(). In the case where there
are only process-wide timers, this will skip all of the computations for
per-thread timers when there are no per-thread timers.

As suggested by George, we can put the task_cputime_zero() check in
check_thread_timers(), since that is more of an optization to the
function. Similarly, we move the existing check of cputimer->running
to check_process_timers().

Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: George Spelvin <linux@horizon.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: hideaki.kimura@hpe.com
Cc: terry.rudd@hpe.com
Cc: scott.norton@hpe.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1444849677-29330-3-git-send-email-jason.low2@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-15 11:23:41 +02:00
Jason Low 7c177d994e posix_cpu_timer: Optimize fastpath_timer_check()
In fastpath_timer_check(), the task_cputime() function is always
called to compute the utime and stime values. However, this is not
necessary if there are no per-thread timers to check for. This patch
modifies the code such that we compute the task_cputime values only
when there are per-thread timers set.

Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: George Spelvin <linux@horizon.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: hideaki.kimura@hpe.com
Cc: terry.rudd@hpe.com
Cc: scott.norton@hpe.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1444849677-29330-2-git-send-email-jason.low2@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-15 11:23:41 +02:00
Ingo Molnar b9f27c0f4f Linux 4.3-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWGqXxAAoJEHm+PkMAQRiGP80H+wX+0PTYpCNaH09LpvlRVZvM
 vG3RnqIYQ8cyOrJJEoVqLznm4vRmw3GrbHxhzDEVrX1beXNVRJdqyZUOttQuDuyS
 A/NWGaZCcu45VcL08NVMCqYv9D7HwDxe5WOhXigo2QX4nlbmJsBoU24ibV35nGYT
 2xdhEwHIH3X+qlzp8Mya6mheYHO+eZ6C+jSy7dYjQoXto0Acz6SoGC6/lhsV4biw
 ENRTJY7y+wzG6ND2PrjF6QV0SwCDAU/f7KcYe01+wm74/uCLYgQuUOPRUeu6ydfN
 Li6CNwN8NzcimLTF4zmQWBte8SkQDVM9LeC8Eyoz2aUYzq7hf6fTfihmQSZnQtQ=
 =vDtg
 -----END PGP SIGNATURE-----

Merge tag 'v4.3-rc5' into timers/core, to pick up fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-12 09:51:18 +02:00
Rasmus Villemoes 9fc4468d54 timers: Use __fls in apply_slack()
In apply_slack(), find_last_bit() is applied to a bitmask consisting
of precisely BITS_PER_LONG bits. Since mask is non-zero, we might as
well eliminate the function call and use __fls() directly. On x86_64,
this shaves 23 bytes of the only caller, mod_timer().

This also gets rid of Coverity CID 1192106, but that is a false
positive: Coverity is not aware that mask != 0 implies that
find_last_bit will not return BITS_PER_LONG.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1443771931-6284-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-11 22:13:46 +02:00
Guillaume Gomez cfed432d7f clocksource: Remove return statement from void functions
Signed-off-by: Guillaume Gomez <guillaume1.gomez@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/CAAOQCfSDgmqSWDBsetau%2ByF8x0%2BDagCF_pfFw0p5xH_BKkKEog@mail.gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-11 22:13:46 +02:00
Linus Torvalds 37cc7ab1d2 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "An abs64() fix in the watchdog driver, and two clocksource driver
  NO_IRQ assumption fixes"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Fix abs() usage w/ 64bit values
  clocksource/drivers/keystone: Fix bad NO_IRQ usage
  clocksource/drivers/rockchip: Fix bad NO_IRQ usage
2015-10-03 10:51:41 -04:00
John Stultz 67dfae0cd7 clocksource: Fix abs() usage w/ 64bit values
This patch fixes one cases where abs() was being used with 64-bit
nanosecond values, where the result may be capped at 32-bits.

This potentially could cause watchdog false negatives on 32-bit
systems, so this patch addresses the issue by using abs64().

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-02 22:53:01 +02:00
Dmitry Vyukov 3ed769bdb2 timers: Fix data race in timer_stats_account_timer()
timer_stats_account_timer() reads timer->start_site, then checks it
for NULL and then re-reads it again, while
timer_stats_timer_clear_start_info() can concurrently reset
timer->start_site to NULL. This should not lead to crashes, but can
double number of entries in timer stats as start_site is used during
comparison, the doubled entries will have unuseful NULL start_site.

Read timer->start_site only once in timer_stats_account_timer().

The data race was found with KernelThreadSanitizer (KTSAN).

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: andreyknvl@google.com
Cc: glider@google.com
Cc: kcc@google.com
Cc: ktsan@googlegroups.com
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1442584463-69553-1-git-send-email-dvyukov@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 15:43:18 +02:00
Zhen Lei 571af55a31 time: Fix spelling in comments
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Xinwei Hu <huxinwei@huawei.com>
Cc: Xunlei Pang <pang.xunlei@linaro.org>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1440484973-13892-1-git-send-email-thunder.leizhen@huawei.com
[ Fixed yet another typo in one of the sentences fixed. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-22 12:54:23 +02:00
Linus Torvalds 1345df21ac Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "A fix for an abs()/abs64() bug that caused too slow NTP convergence on
  32-bit kernels, plus a removal of an obsolete clockevents driver
  facility after all users got converted during the merge window"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents: Remove unused set_mode() callback
  time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()
2015-09-17 10:55:25 -07:00
Viresh Kumar eef7635a22 clockevents: Remove unused set_mode() callback
All users are migrated to the per-state callbacks, get rid of the
unused interface and the core support code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linaro-kernel@lists.linaro.org
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/fd60de14cf6d125489c031207567bb255ad946f6.1441943991.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 11:00:55 +02:00
John Stultz 2619d7e9c9 time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()
The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.

Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:

  dc491596f6 ("timekeeping: Rework frequency adjustments to work better w/ nohz")

used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.

Per include/linux/kernel.h:

  "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".

Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).

This patch fixes the issue by using abs64() instead.

Reported-by: Nuno Gonçalves <nunojpg@gmail.com>
Tested-by: Nuno Goncalves <nunojpg@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 10:30:47 +02:00
Frederic Weisbecker 7c8bb6cb95 nohz: Assert existing housekeepers when nohz full enabled
The code ensures that when nohz full is running, at least the
boot CPU serves as a housekeeper and it can't be later offlined.

Let's assert this assumption to make sure that we have CPUs to
handle unbound jobs like workqueues and timers while nohz full
CPUs run undisturbed.

Also improve the comments on housekeeper offlining prevention.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Vatika Harlalka <vatikaharlalka@gmail.com>
Link: http://lkml.kernel.org/r/1441119060-2230-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-02 10:33:22 +02:00
Linus Torvalds 5e359bf221 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Rather large, but nothing exiting:

   - new range check for settimeofday() to prevent that boot time
     becomes negative.
   - fix for file time rounding
   - a few simplifications of the hrtimer code
   - fix for the proc/timerlist code so the output of clock realtime
     timers is accurate
   - more y2038 work
   - tree wide conversion of clockevent drivers to the new callbacks"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (88 commits)
  hrtimer: Handle failure of tick_init_highres() gracefully
  hrtimer: Unconfuse switch_hrtimer_base() a bit
  hrtimer: Simplify get_target_base() by returning current base
  hrtimer: Drop return code of hrtimer_switch_to_hres()
  time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
  time: Introduce current_kernel_time64()
  time: Introduce struct itimerspec64
  time: Add the common weak version of update_persistent_clock()
  time: Always make sure wall_to_monotonic isn't positive
  time: Fix nanosecond file time rounding in timespec_trunc()
  timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers
  cris/time: Migrate to new 'set-state' interface
  kernel: broadcast-hrtimer: Migrate to new 'set-state' interface
  xtensa/time: Migrate to new 'set-state' interface
  unicore/time: Migrate to new 'set-state' interface
  um/time: Migrate to new 'set-state' interface
  sparc/time: Migrate to new 'set-state' interface
  sh/localtimer: Migrate to new 'set-state' interface
  score/time: Migrate to new 'set-state' interface
  s390/time: Migrate to new 'set-state' interface
  ...
2015-09-01 14:04:50 -07:00
Linus Torvalds 65a99597f0 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NOHZ updates from Ingo Molnar:
 "The main changes, mostly written by Frederic Weisbecker, include:

   - Fix some jiffies based cputime assumptions.  (No real harm because
     the concerned code isn't used by full dynticks.)

   - Simplify jiffies <-> usecs conversions.  Remove dead code.

   - Remove early hacks on nohz full code that avoided messing up idle
     nohz internals.  Now nohz integrates well full and idle and such
     hack have become needless.

   - Restart nohz full tick from irq exit.  (A simplification and a
     preparation for future optimization on scheduler kick to nohz
     full)

   - Code cleanups.

   - Tile driver isolation enhancement on top of nohz.  (Chris Metcalf)"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: Remove useless argument on tick_nohz_task_switch()
  nohz: Move tick_nohz_restart_sched_tick() above its users
  nohz: Restart nohz full tick from irq exit
  nohz: Remove idle task special case
  nohz: Prevent tilegx network driver interrupts
  alpha: Fix jiffies based cputime assumption
  apm32: Fix cputime == jiffies assumption
  jiffies: Remove HZ > USEC_PER_SEC special case
2015-08-31 21:04:24 -07:00
Linus Torvalds 7073bc6612 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle are:

   - the combination of tree geometry-initialization simplifications and
     OS-jitter-reduction changes to expedited grace periods.  These two
     are stacked due to the large number of conflicts that would
     otherwise result.

   - privatize smp_mb__after_unlock_lock().

     This commit moves the definition of smp_mb__after_unlock_lock() to
     kernel/rcu/tree.h, in recognition of the fact that RCU is the only
     thing using this, that nothing else is likely to use it, and that
     it is likely to go away completely.

   - documentation updates.

   - torture-test updates.

   - misc fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  rcu,locking: Privatize smp_mb__after_unlock_lock()
  rcu: Silence lockdep false positive for expedited grace periods
  rcu: Don't disable CPU hotplug during OOM notifiers
  scripts: Make checkpatch.pl warn on expedited RCU grace periods
  rcu: Update MAINTAINERS entry
  rcu: Clarify CONFIG_RCU_EQS_DEBUG help text
  rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
  rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
  rcu: Make rcu_is_watching() really notrace
  cpu: Wait for RCU grace periods concurrently
  rcu: Create a synchronize_rcu_mult()
  rcu: Fix obsolete priority-boosting comment
  rcu: Use WRITE_ONCE in RCU_INIT_POINTER
  rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT
  rcu: Add RCU-sched flavors of get-state and cond-sync
  rcu: Add fastpath bypassing funnel locking
  rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
  rcu: Pull out wait_event*() condition into helper function
  documentation: Describe new expedited stall warnings
  rcu: Add stall warnings to synchronize_sched_expedited()
  ...
2015-08-31 18:12:07 -07:00
Guenter Roeck 85e1cd6e76 hrtimer: Handle failure of tick_init_highres() gracefully
Commit 75e3b37d05 ("hrtimer: Drop return code of hrtimer_switch_to_hres()")
drops the return code of hrtimer_switch_to_hres(). While doing so, it also
drops the return statement itself on failure. This may cause a system hang.
Seen when running arm:multi_v7_defconfig in qemu with devicetree file
vexpress-v2p-ca9.

Fixes: 75e3b37d05 ("hrtimer: Drop return code of hrtimer_switch_to_hres()")
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/1440231047-16256-1-git-send-email-linux@roeck-us.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-22 10:57:50 +02:00
Thomas Gleixner 05ddaa4d6d Merge branch 'fortglx/4.3/time' of https://git.linaro.org/people/john.stultz/linux into timers/core
- A handful or y2038 related items
- A walltime to monotonic limit
- Small fixes for timespec_trunc() and timer_list output
2015-08-20 21:13:22 +02:00
Frederic Weisbecker b48362d8aa hrtimer: Unconfuse switch_hrtimer_base() a bit
The variable called "this_base" is confusing because its name suggests
it's of "struct hrtimer_clock_base" type, along with "base" and "new_base"
which doesn't help understanding this complicated function.

Make its name clearer and fix the misleading comment while at it.

[ tglx: Fixed the comment for real ]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1439907509-9553-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-18 18:36:59 +02:00
Frederic Weisbecker 662b3e1946 hrtimer: Simplify get_target_base() by returning current base
Instead of fetching again the current cpu base, just take it from the
parameter.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1439907509-9553-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-18 18:36:59 +02:00
Eric Dumazet d0023a1448 timer: Write timer->flags atomically
lock_timer_base() cannot prevent the following :

CPU1 ( in __mod_timer()
timer->flags |= TIMER_MIGRATING;
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
// The next line clears TIMER_MIGRATING
timer->flags &= ~TIMER_BASEMASK;
                                  CPU2 (in lock_timer_base())
                                  see timer base is cpu0 base
                                  spin_lock_irqsave(&base->lock, *flags);
                                  if (timer->flags == tf)
                                       return base; // oops, wrong base
timer->flags |= base->cpu // too late

We must write timer->flags in one go, otherwise we can fool other cpus.

Fixes: bc7a34b8b9 ("timer: Reduce timer migration overhead if disabled")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jon Christopherson <jon@jons.org>
Cc: David Miller <davem@davemloft.net>
Cc: xen-devel@lists.xen.org
Cc: david.vrabel@citrix.com
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Link: http://lkml.kernel.org/r/1439831928.32680.11.camel@edumazet-glaptop2.roam.corp.google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
2015-08-18 15:31:16 +02:00
Luiz Capitulino 75e3b37d05 hrtimer: Drop return code of hrtimer_switch_to_hres()
It's not checked by the caller.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Link: http://lkml.kernel.org/r/20150811164043.538241ef@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17 23:19:03 +02:00
Baolin Wang 9ca3085060 time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
The conversion between struct timespec and jiffies is not year 2038
safe on 32bit systems. Introduce timespec64_to_jiffies() and
jiffies_to_timespec64() functions which use struct timespec64 to
make it ready for 2038 issue.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-08-17 11:25:41 -07:00
Baolin Wang 8758a240e2 time: Introduce current_kernel_time64()
The current_kernel_time() is not year 2038 safe on 32bit systems
since it returns a timespec value. Introduce current_kernel_time64()
which returns a timespec64 value.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-08-17 11:25:35 -07:00
Xunlei Pang 7494e9eede time: Add the common weak version of update_persistent_clock()
The weak update_persistent_clock64() calls update_persistent_clock(),
if the architecture defines an update_persistent_clock64() to replace
and remove its update_persistent_clock() version, when building the
kernel the linker will throw an undefined symbol error, that is, any
arch that switches to update_persistent_clock64() will have this issue.

To solve the issue, we add the common weak update_persistent_clock().

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-08-17 11:25:16 -07:00
Wang YanQing e1d7ba8735 time: Always make sure wall_to_monotonic isn't positive
Two issues were found on an IMX6 development board without an
enabled RTC device(resulting in the boot time and monotonic
time being initialized to 0).

Issue 1:exportfs -a generate:
       "exportfs: /opt/nfs/arm does not support NFS export"
Issue 2:cat /proc/stat:
       "btime 4294967236"

The same issues can be reproduced on x86 after running the
following code:
	int main(void)
	{
	    struct timeval val;
	    int ret;

	    val.tv_sec = 0;
	    val.tv_usec = 0;
	    ret = settimeofday(&val, NULL);
	    return 0;
	}

Two issues are different symptoms of same problem:
The reason is a positive wall_to_monotonic pushes boot time back
to the time before Epoch, and getboottime will return negative
value.

In symptom 1:
          negative boot time cause get_expiry() to overflow time_t
          when input expire time is 2147483647, then cache_flush()
          always clears entries just added in ip_map_parse.
In symptom 2:
          show_stat() uses "unsigned long" to print negative btime
          value returned by getboottime.

This patch fix the problem by prohibiting time from being set to a value which
would cause a negative boot time. As a result one can't set the CLOCK_REALTIME
time prior to (1970 + system uptime).

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wang YanQing <udknight@gmail.com>
[jstultz: reworded commit message]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-08-17 11:24:54 -07:00
Karsten Blees de4a95faf1 time: Fix nanosecond file time rounding in timespec_trunc()
timespec_trunc() avoids rounding if granularity <= nanoseconds-per-jiffie
(or TICK_NSEC). This optimization assumes that:

 1. current_kernel_time().tv_nsec is already rounded to TICK_NSEC (i.e.
    with HZ=1000 you'd get 1000000, 2000000, 3000000... but never 1000001).
    This is no longer true (probably since hrtimers introduced in 2.6.16).

 2. TICK_NSEC is evenly divisible by all possible granularities. This may
    be true for HZ=100, 250, 1000, but obviously not for HZ=300 /
    TICK_NSEC=3333333 (introduced in 2.6.20).

Thus, sub-second portions of in-core file times are not rounded to on-disk
granularity. I.e. file times may change when the inode is re-read from disk
or when the file system is remounted.

This affects all file systems with file time granularities > 1 ns and < 1s,
e.g. CEPH (1000 ns), UDF (1000 ns), CIFS (100 ns), NTFS (100 ns) and FUSE
(configurable from user mode via struct fuse_init_out.time_gran).

Steps to reproduce with e.g. UDF:

  $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
  $ mkdir udf && mount udfdisk udf
  $ touch udf/test && stat -c %y udf/test
  2015-06-09 10:22:56.130006767 +0200
  $ umount udf && mount udfdisk udf
  $ stat -c %y udf/test
  2015-06-09 10:22:56.130006000 +0200

Remounting truncates the mtime to 1 µs.

Fix the rounding in timespec_trunc() and update the documentation.

timespec_trunc() is exclusively used to calculate inode's [acm]time (mostly
via current_fs_time()), and always with super_block.s_time_gran as second
argument. So this can safely be changed without side effects.

Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
as super_block.s_time_gran isn't prepared to handle different ctime /
mtime / atime resolutions nor resolutions > 1 second.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-08-17 11:23:46 -07:00
John Stultz 38bf985b05 timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers
I noticed for non-monotonic timers in timer_list, some of the
output looked a little confusing.

For example:
 #1: <0000000000000000>, posix_timer_fn, S:01, hrtimer_start_range_ns, leap-a-day/2360
 # expires at 1434412800000000000-1434412800000000000 nsecs [in 1434410725062375469 to 1434410725062375469 nsecs]

You'll note the relative time till the expiration "[in xxx to
yyy nsecs]" is incorrect. This is because its printing the delta
between CLOCK_MONOTONIC time to the CLOCK_REALTIME expiration.

This patch fixes this issue by adding the clock offset to the
"now" time which we use to calculate the delta.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-08-17 11:23:31 -07:00
Ingo Molnar 9b9412dc70 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

  - The combination of tree geometry-initialization simplifications
    and OS-jitter-reduction changes to expedited grace periods.
    These two are stacked due to the large number of conflicts
    that would otherwise result.

    [ With one addition, a temporary commit to silence a lockdep false
      positive. Additional changes to the expedited grace-period
      primitives (queued for 4.4) remove the cause of this false
      positive, and therefore include a revert of this temporary commit. ]

  - Documentation updates.

  - Torture-test updates.

  - Miscellaneous fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12 12:12:12 +02:00
Viresh Kumar ecbebcb868 kernel: broadcast-hrtimer: Migrate to new 'set-state' interface
Migrate broadcast-hrtimer driver to the new 'set-state' interface
provided by clockevents core, the earlier 'set-mode' interface is marked
obsolete now.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2015-08-10 11:41:08 +02:00
Luiz Capitulino d74892c5b2 clockevents: Drop redundant cpumask check in tick_check_new_device()
The same check is performed by tick_check_percpu().

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Link: http://lkml.kernel.org/r/20150729151417.069d1bb0@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-01 12:00:13 +02:00
Frederic Weisbecker de734f89b6 nohz: Remove useless argument on tick_nohz_task_switch()
Leftover from early code.

Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:45:01 +02:00
Frederic Weisbecker 59d2c7ca49 nohz: Move tick_nohz_restart_sched_tick() above its users
Fix the function declaration/definition dance.

Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:45:01 +02:00
Frederic Weisbecker 73738a95d0 nohz: Restart nohz full tick from irq exit
Restart the tick when necessary from the irq exit path. It makes nohz
full more flexible, simplify the related IPIs and doesn't bring
significant overhead on irq exit.

In a longer term view, it will allow us to piggyback the nohz kick
on the scheduler IPI in the future instead of sending a dedicated IPI
that often doubles the scheduler IPI on task wakeup. This will require
more changes though including careful review of resched_curr() callers
to include nohz full needs.

Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:45:00 +02:00
Frederic Weisbecker 5944935943 nohz: Remove idle task special case
On nohz full early days, idle dynticks and full dynticks weren't well
integrated and we couldn't risk full dynticks calls on idle without
risking messing up tick idle statistics. This is why we prevented such
thing to happen.

Nowadays full dynticks and idle dynticks are better integrated and
interact without known issue.

So lets remove that.

Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:44:59 +02:00
Frederic Weisbecker e075867681 jiffies: Remove HZ > USEC_PER_SEC special case
HZ never goes much further 1000 and a bit. And if we ever reach one tick
per microsecond, we might be having a problem.

Lets stop maintaining this special case, just leave a paranoid check.

Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc; John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:44:57 +02:00
Thomas Gleixner 0f44705175 tick: Move the export of tick_broadcast_oneshot_control to the proper place
tick_broadcast_oneshot_control got moved from tick-broadcast to
tick-common, but the export stayed in the old place. Fix it up.

Fixes: f32dd11705 'tick/broadcast: Make idle check independent from mode and config'
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-14 12:01:04 +02:00
Thomas Gleixner c4d029f2d4 tick/broadcast: Prevent NULL pointer dereference
Dan reported that the recent changes to the broadcast code introduced
a potential NULL dereference.

Add the proper check.

Fixes: e045431190 "tick/broadcast: Sanity check the shutdown of the local clock_event"
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11 14:26:34 +02:00
Thomas Gleixner c428833481 tick/broadcast: Handle spurious interrupts gracefully
Andriy reported that on a virtual machine the warning about negative
expiry time in the clock events programming code triggered:

hpet: hpet0 irq 40 for MSI
hpet: hpet1 irq 41 for MSI
Switching to clocksource hpet
WARNING: at kernel/time/clockevents.c:239

[<ffffffff810ce6eb>] clockevents_program_event+0xdb/0xf0
[<ffffffff810cf211>] tick_handle_periodic_broadcast+0x41/0x50
[<ffffffff81016525>] timer_interrupt+0x15/0x20

When the second hpet is installed as a per cpu timer the broadcast
event is not longer required and stopped, which sets the next_evt of
the broadcast device to KTIME_MAX.

If after that a spurious interrupt happens on the broadcast device,
then the current code blindly handles it and tries to reprogram the
broadcast device afterwards, which adds the period to
next_evt. KTIME_MAX + period results in a negative expiry value
causing the WARN_ON in the clockevents code to trigger.

Add a proper check for the state of the broadcast device into the
interrupt handler and return if the interrupt is spurious.

[ Folded in pointer fix from Sudeep ]

Reported-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20150705205221.802094647@linutronix.de
2015-07-07 18:46:48 +02:00
Thomas Gleixner d5113e13a5 tick/broadcast: Check for hrtimer broadcast active early
If the current cpu is the one which has the hrtimer based broadcast
queued then we better return busy immediately instead of going through
loops and hoops to figure that out.

[ Split out from a larger combo patch ]

Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:48 +02:00
Thomas Gleixner 0cc5281aa5 tick/broadcast: Return busy when IPI is pending
Tell the idle code not to go deep if the broadcast IPI is about to
arrive.

[ Split out from a larger combo patch ]

Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:48 +02:00
Thomas Gleixner d33257264b tick/broadcast: Return busy if periodic mode and hrtimer broadcast
If the system is in periodic mode and the broadcast device is hrtimer
based, return busy as we have no proper handling for this.

[ Split out from a larger combo patch ]

Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:48 +02:00
Thomas Gleixner e3ac79e087 tick/broadcast: Move the check for periodic mode inside state handling
We need to check more than the periodic mode for proper operation in
all runtime combinations. To avoid code duplication move the check
into the enter state handling.

No functional change.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:47 +02:00
Thomas Gleixner b78f3f3c89 tick/broadcast: Prevent deep idle if no broadcast device available
Add a check for a installed broadcast device to the oneshot control
function and return busy if not.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:47 +02:00
Thomas Gleixner f32dd11705 tick/broadcast: Make idle check independent from mode and config
Currently the broadcast busy check, which prevents the idle code from
going into deep idle, works only in one shot mode.

If NOHZ and HIGHRES are off (config or command line) there is no
sanity check at all, so under certain conditions cpus are allowed to
go into deep idle, where the local timer stops, and are not woken up
again because there is no broadcast timer installed or a hrtimer based
broadcast device is not evaluated.

Move tick_broadcast_oneshot_control() into the common code and provide
proper subfunctions for the various config combinations.

The common check in tick_broadcast_oneshot_control() is for the C3STOP
misfeature flag of the local clock event device. If its not set, idle
can proceed. If set, further checks are necessary.

Provide checks for the trivial cases:

 - If broadcast is disabled in the config, then return busy

 - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
   busy if the broadcast device is hrtimer based.

 - If oneshot mode is enabled in the config call the original
   tick_broadcast_oneshot_control() function. That function needs
   extra checks which will be implemented in seperate patches.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:47 +02:00
Thomas Gleixner e045431190 tick/broadcast: Sanity check the shutdown of the local clock_event
The broadcast code shuts down the local clock event unconditionally
even if no broadcast device is installed or if the broadcast device is
hrtimer based.

Add proper sanity checks.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:47 +02:00
Thomas Gleixner 8eb231261f tick/broadcast: Prevent hrtimer recursion
The hrtimer based broadcast vehicle can cause a hrtimer recursion
which went unnoticed until we changed the hrtimer expiry code to keep
track of the currently running timer.

local_timer_interrupt()
  local_handler()
    hrtimer_interrupt()
      expire_hrtimers()
        broadcast_hrtimer()
	  send_ipis()
	  local_handler()
	    hrtimer_interrupt()
	     ....

Solution is simple: Prevent the local handler call from the broadcast
code when the broadcast 'device' is hrtimer based.

[ Split out from a larger combo patch ]

Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:47 +02:00
Viresh Kumar 7c4a976cd5 clockevents: Allow set-state callbacks to be optional
Its mandatory for the drivers to provide set_state_{oneshot|periodic}()
(only if related modes are supported) and set_state_shutdown() callbacks
today, if they are implementing the new set-state interface.

But this leads to unnecessary noop callbacks for drivers which don't
want to implement them. Over that, it will lead to a full function call
for nothing really useful.

Lets make all set-state callbacks optional.

Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1436256875-15562-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-07 10:44:45 +02:00