This is mostly for cleanup value:
- We don't need several gotos to handle the case where the first
waiter is a writer. Two simple tests will do (and generate very
similar code).
- In the remainder of the function, we know the first waiter is a reader,
so we don't have to double check that. We can use do..while loops
to iterate over the readers to wake (generates slightly better code).
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are not planning to add some new waiter flags, so we can convert the
waiter type into an enumeration.
Background: David Howells suggested I do this back when I tried adding
a new waiter type for unfair readers. However, I believe the cleanup
applies regardless of that use case.
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We (Linux Kernel Performance project) found a regression
introduced by commit:
5a505085f0 mm/rmap: Convert the struct anon_vma::mutex to an rwsem
which converted all anon_vma::mutex locks rwsem write locks.
The semantics are the same, but the behavioral difference is
quite huge in some cases. After investigating it we found the
root cause: mutexes support lock stealing while rwsems don't.
Here is the link for the detailed regression report:
https://lkml.org/lkml/2013/1/29/84
Ingo suggested adding write lock stealing to rwsems:
"I think we should allow lock-steal between rwsem writers - that
will not hurt fairness as most rwsem fairness concerns relate to
reader vs. writer fairness"
And here is the rwsem-spinlock version.
With this patch, we got a double performance increase in one
test box with following aim7 workfile:
FILESIZE: 1M
POOLSIZE: 10M
10 fork_test
/usr/bin/time output w/o patch /usr/bin/time_output with patch
-- Percent of CPU this job got: 369% Percent of CPU this job got: 537%
Voluntary context switches: 640595016 Voluntary context switches: 157915561
We got a 45% increase in CPU usage and saved about 3/4 voluntary context switches.
Reported-by: LKP project <lkp@linux.intel.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Alex Shi <alex.shi@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/1359716356-23865-1-git-send-email-yuanhan.liu@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
There is no reason to allow the lock protecting rwsems (the
ownerless variant) to be preemptible on -rt. Convert it to raw.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
rwsems can be used with IRQs disabled, particularily in early boot
before IRQs are enabled. Currently the spin_unlock_irq() usage in the
slow-patch will unconditionally enable interrupts and cause problems
since interrupts are not yet initialized or enabled.
This patch uses save/restore versions of IRQ spinlocks in the slowpath
to ensure interrupts are not unintentionally disabled.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rwsem_is_locked() tests ->activity without locks, so we should always keep
->activity consistent. However, the code in __rwsem_do_wake() breaks this
rule, it updates ->activity after _all_ readers waken up, this may give
some reader a wrong ->activity value, thus cause rwsem_is_locked() behaves
wrong.
Quote from Andrew:
"
- we have one or more processes sleeping in down_read(), waiting for access.
- we wake one or more processes up without altering ->activity
- they start to run and they do rwsem_is_locked(). This incorrectly
returns "false", because the waker process is still crunching away in
__rwsem_do_wake().
- the waker now alters ->activity, but it was too late.
"
So we need get a spinlock to protect this. And rwsem_is_locked() should
not block, thus we use spin_trylock_irqsave().
[akpm@linux-foundation.org: simplify code]
Reported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Cc: Ben Woodard <bwoodard@llnl.gov>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These functions need not to be exported, since no drivers should use them.
__init_rwsem() is an exception, because init_rwsem(), which is a macro,
is used.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This annotation makes it possible to assign a subclass on lock init. This
annotation is meant to reduce the _nested() annotations by assigning a
default subclass.
One could do without this annotation and rely on lockdep_set_class()
exclusively, but that would require a manual stack of struct lock_class_key
objects.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Use the lock validator framework to prove rwsem locking correctness.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace a number of memory barriers with smp_ variants. This means we won't
take the unnecessary hit on UP machines.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!