Commit Graph

20666 Commits

Author SHA1 Message Date
Linus Torvalds 1dc51b8288 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Assorted VFS fixes and related cleanups (IMO the most interesting in
  that part are f_path-related things and Eric's descriptor-related
  stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
  fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups".  The
  file_table locking rule change by Eric Dumazet is a rather big and
  fundamental update even if the patch isn't huge.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
  9p: cope with bogus responses from server in p9_client_{read,write}
  p9_client_write(): avoid double p9_free_req()
  9p: forgetting to cancel request on interrupted zero-copy RPC
  dax: bdev_direct_access() may sleep
  block: Add support for DAX reads/writes to block devices
  dax: Use copy_from_iter_nocache
  dax: Add block size note to documentation
  fs/file.c: __fget() and dup2() atomicity rules
  fs/file.c: don't acquire files->file_lock in fd_install()
  fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
  vfs: avoid creation of inode number 0 in get_next_ino
  namei: make set_root_rcu() return void
  make simple_positive() public
  ufs: use dir_pages instead of ufs_dir_pages()
  pagemap.h: move dir_pages() over there
  remove the pointless include of lglock.h
  fs: cleanup slight list_entry abuse
  xfs: Correctly lock inode when removing suid and file capabilities
  fs: Call security_ops->inode_killpriv on truncate
  fs: Provide function telling whether file_remove_privs() will do anything
  ...
2015-07-04 19:36:06 -07:00
Linus Torvalds 1b3618b60a Except for the preempt notifiers fix, these are all small bugfixes
that could have been waited for -rc2.  Sending them now since I
 was taking care of Peter's patch anyway.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVmB5pAAoJEL/70l94x66DpLAH/A0p2HICsG5Qw3gnI3NxAmK4
 YUvtMx0d67mFXPg0kuYRMO7C2Is6XHKtnmsX8oqkg3JTRFfn7XYqlwvrrK3Be08U
 tGvhigneJTGDXwU74jyik+D6VLmyJP3CxEvXM3d9AFyy7Ro9Grxx0Ja8c9cmKGQE
 esCwNAEJOcqaQMtNIix3WtXifOVFr40NZlbAawsMyxVw8LZK/K5maXyUTRDI57Qn
 B1wbTN1KD847/0rLrit+8VlsGEZBorUgCFhueeYGy/7EdiY0bNkzhLWb4erlWnRq
 ZlKzsLdfXmEg2CEepaHCm5jlLfIurgbLfoV1tzQ5jAuj/SHmUxq+k3lZZYTYA3w=
 =vDKM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Except for the preempt notifiers fix, these are all small bugfixes
  that could have been waited for -rc2.  Sending them now since I was
  taking care of Peter's patch anyway"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: add hyper-v crash msrs values
  KVM: x86: remove data variable from kvm_get_msr_common
  KVM: s390: virtio-ccw: don't overwrite config space values
  KVM: x86: keep track of LVT0 changes under APICv
  KVM: x86: properly restore LVT0
  KVM: x86: make vapics_in_nmi_mode atomic
  sched, preempt_notifier: separate notifier registration from static_key inc/dec
2015-07-04 11:29:59 -07:00
Linus Torvalds 22a093b2fb Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Debug info and other statistics fixes and related enhancements"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/numa: Fix numa balancing stats in /proc/pid/sched
  sched/numa: Show numa_group ID in /proc/sched_debug task listings
  sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h
  sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y
  sched/stat: Simplify the sched_info accounting dependency
2015-07-04 08:56:53 -07:00
Srikar Dronamraju 397f2378f1 sched/numa: Fix numa balancing stats in /proc/pid/sched
Commit 44dba3d5d6 ("sched: Refactor task_struct to use
numa_faults instead of numa_* pointers") modified the way
tsk->numa_faults stats are accounted.

However that commit never touched show_numa_stats() that is displayed
in /proc/pid/sched and thus the numbers displayed in /proc/pid/sched
don't match the actual numbers.

Fix it by making sure that /proc/pid/sched reflects the task
fault numbers. Also add group fault stats too.

Also couple of more modifications are added here:

1. Format changes:

  - Previously we would list two entries per node, one for private
    and one for shared. Also the home node info was listed in each entry.

  - Now preferred node, total_faults and current node are
    displayed separately.

  - Now there is one entry per node, that lists private,shared task and
    group faults.

2. Unit changes:

  - p->numa_pages_migrated was getting reset after every read of
    /proc/pid/sched. It's more useful to have absolute numbers since
    differential migrations between two accesses can be more easily
    calculated.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-4-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04 10:04:33 +02:00
Srikar Dronamraju e3d24d0a60 sched/numa: Show numa_group ID in /proc/sched_debug task listings
Having the numa group ID in /proc/sched_debug helps to see how
the numa groups have spread across the system.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-3-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04 10:04:32 +02:00
Srikar Dronamraju 6b55c9654f sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h
Currently print_cfs_rq() is declared in include/linux/sched.h.
However it's not used outside kernel/sched. Hence move the
declaration to kernel/sched/sched.h

Also some functions are only available for CONFIG_SCHED_DEBUG=y.
Hence move the declarations to within the #ifdef.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-2-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04 10:04:31 +02:00
Naveen N. Rao f6db834799 sched/stat: Simplify the sched_info accounting dependency
Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task
sched_info, which results in ugly #if clauses.

Simplify the code by introducing a synthethic CONFIG_SCHED_INFO
switch, selected by both.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: ricklind@us.ibm.com
Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04 10:04:30 +02:00
Linus Torvalds 0cbee99269 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace updates from Eric Biederman:
 "Long ago and far away when user namespaces where young it was realized
  that allowing fresh mounts of proc and sysfs with only user namespace
  permissions could violate the basic rule that only root gets to decide
  if proc or sysfs should be mounted at all.

  Some hacks were put in place to reduce the worst of the damage could
  be done, and the common sense rule was adopted that fresh mounts of
  proc and sysfs should allow no more than bind mounts of proc and
  sysfs.  Unfortunately that rule has not been fully enforced.

  There are two kinds of gaps in that enforcement.  Only filesystems
  mounted on empty directories of proc and sysfs should be ignored but
  the test for empty directories was insufficient.  So in my tree
  directories on proc, sysctl and sysfs that will always be empty are
  created specially.  Every other technique is imperfect as an ordinary
  directory can have entries added even after a readdir returns and
  shows that the directory is empty.  Special creation of directories
  for mount points makes the code in the kernel a smidge clearer about
  it's purpose.  I asked container developers from the various container
  projects to help test this and no holes were found in the set of mount
  points on proc and sysfs that are created specially.

  This set of changes also starts enforcing the mount flags of fresh
  mounts of proc and sysfs are consistent with the existing mount of
  proc and sysfs.  I expected this to be the boring part of the work but
  unfortunately unprivileged userspace winds up mounting fresh copies of
  proc and sysfs with noexec and nosuid clear when root set those flags
  on the previous mount of proc and sysfs.  So for now only the atime,
  read-only and nodev attributes which userspace happens to keep
  consistent are enforced.  Dealing with the noexec and nosuid
  attributes remains for another time.

  This set of changes also addresses an issue with how open file
  descriptors from /proc/<pid>/ns/* are displayed.  Recently readlink of
  /proc/<pid>/fd has been triggering a WARN_ON that has not been
  meaningful since it was added (as all of the code in the kernel was
  converted) and is not now actively wrong.

  There is also a short list of issues that have not been fixed yet that
  I will mention briefly.

  It is possible to rename a directory from below to above a bind mount.
  At which point any directory pointers below the renamed directory can
  be walked up to the root directory of the filesystem.  With user
  namespaces enabled a bind mount of the bind mount can be created
  allowing the user to pick a directory whose children they can rename
  to outside of the bind mount.  This is challenging to fix and doubly
  so because all obvious solutions must touch code that is in the
  performance part of pathname resolution.

  As mentioned above there is also a question of how to ensure that
  developers by accident or with purpose do not introduce exectuable
  files on sysfs and proc and in doing so introduce security regressions
  in the current userspace that will not be immediately obvious and as
  such are likely to require breaking userspace in painful ways once
  they are recognized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  vfs: Remove incorrect debugging WARN in prepend_path
  mnt: Update fs_fully_visible to test for permanently empty directories
  sysfs: Create mountpoints with sysfs_create_mount_point
  sysfs: Add support for permanently empty directories to serve as mount points.
  kernfs: Add support for always empty directories.
  proc: Allow creating permanently empty directories that serve as mount points
  sysctl: Allow creating permanently empty directories that serve as mountpoints.
  fs: Add helper functions for permanently empty directories.
  vfs: Ignore unlocked mounts in fs_fully_visible
  mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
  mnt: Refactor the logic for mounting sysfs and proc in a user namespace
2015-07-03 15:20:57 -07:00
Peter Zijlstra 2ecd9d29ab sched, preempt_notifier: separate notifier registration from static_key inc/dec
Commit 1cde2930e1 ("sched/preempt: Add static_key() to preempt_notifiers")
had two problems.  First, the preempt-notifier API needs to sleep with the
addition of the static_key, we do however need to hold off preemption
while modifying the preempt notifier list, otherwise a preemption could
observe an inconsistent list state.  KVM correctly registers and
unregisters preempt notifiers with preemption disabled, so the sleep
caused dmesg splats.

Second, KVM registers and unregisters preemption notifiers very often
(in vcpu_load/vcpu_put).  With a single uniprocessor guest the static key
would move between 0 and 1 continuously, hitting the slow path on every
userspace exit.

To fix this, wrap the static_key inc/dec in a new API, and call it from
KVM.

Fixes: 1cde2930e1 ("sched/preempt: Add static_key() to preempt_notifiers")
Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Reported-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-03 18:55:00 +02:00
Linus Torvalds 7df9ab845c make certificate list change message more useful
It's a bug in our Makefile rules, make it show what the changing
certificate list was, and make it a warning so that people actually see
it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-02 16:42:13 -07:00
Linus Torvalds 2d01eedf1d Merge branch 'akpm' (patches from Andrew)
Merge third patchbomb from Andrew Morton:

 - the rest of MM

 - scripts/gdb updates

 - ipc/ updates

 - lib/ updates

 - MAINTAINERS updates

 - various other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (67 commits)
  genalloc: rename of_get_named_gen_pool() to of_gen_pool_get()
  genalloc: rename dev_get_gen_pool() to gen_pool_get()
  x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit
  MAINTAINERS: add zpool
  MAINTAINERS: BCACHE: Kent Overstreet has changed email address
  MAINTAINERS: move Jens Osterkamp to CREDITS
  MAINTAINERS: remove unused nbd.h pattern
  MAINTAINERS: update brcm gpio filename pattern
  MAINTAINERS: update brcm dts pattern
  MAINTAINERS: update sound soc intel patterns
  MAINTAINERS: remove website for paride
  MAINTAINERS: update Emulex ocrdma email addresses
  bcache: use kvfree() in various places
  libcxgbi: use kvfree() in cxgbi_free_big_mem()
  target: use kvfree() in session alloc and free
  IB/ehca: use kvfree() in ipz_queue_{cd}tor()
  drm/nouveau/gem: use kvfree() in u_free()
  drm: use kvfree() in drm_free_large()
  cxgb4: use kvfree() in t4_free_mem()
  cxgb3: use kvfree() in cxgb_free_mem()
  ...
2015-07-01 17:47:51 -07:00
Linus Torvalds 6ac15baacb Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "This contains:

   - a build regression fix introduced by the timeconst move

   - a hotplug regression fix introduced by the timer wheel diet

   - a cpu hotplug bug fix for the exynos clocksource driver"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Remove development rules from Kbuild/Makefile
  timer: Fix hotplug regression
  clocksource: exynos_mct: Avoid blocking calls in the cpu hotplug notifier
2015-07-01 15:44:18 -07:00
Linus Torvalds 5c3950970b Power management and ACPI fixes for v4.2-rc1
- Fix a recently added memory leak in an error path in the ACPI
    resources management code (Dan Carpenter).
 
  - Fix a build warning triggered by an ACPI video header function
    that should be static inline (Borislav Petkov).
 
  - Change names of helper function converting struct fwnode_handle
    pointers to either struct device_node or struct acpi_device
    pointers so they don't conflict with local variable names
    (Alexander Sverdlin).
 
  - Make the hibernate core re-enable nonboot CPUs on failures to
    disable them as expected (Vitaly Kuznetsov).
 
  - Increase the default timeout of the device suspend watchdog to
    prevent it from triggering too early on some systems (Takashi Iwai).
 
  - Prevent the cpuidle powernv driver from registering idle
    states with CPUIDLE_FLAG_TIMER_STOP set if CONFIG_TICK_ONESHOT
    is unset which leads to boot hangs (Preeti U Murthy).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJVksL4AAoJEILEb/54YlRxY9cP/jnXE12Jv2aYQAram5Fd7nY7
 LSiuKAzVCqJbdBa3sRILKjMwgxciYABXypw6Zapa8TimAV374GZh6W4VXgIAifDf
 gdicSSxB4A5cViUEte3uebzdaM2QMcOcQ6A+UOe849q3emfbu91f0LXTWwahR0og
 Hjs7QCcvZS/swQIIY0JaIivC5mRwxrx141oPVn4GNf0tnOzH7eOUktYCZwh4U4Qo
 FwUn66XI+ttlrRxs/IV5QSQ9S5qDBHOKdv6MgmGwMzUXINTL4w2Zawg+Pw0m3Puf
 2VnT9jsz4anD1jZMJGLpeKNAjFJZ6ODiv06HjYhscaUZuMSECUiE9f6auUkiSU1F
 r463ujaIcCW5MWUGjRBq5GwP15IuIL/NxjWAVyyMxeYcjOKrpQeEJ2liWnXPv/Bh
 JVOoktFvFu1iOojlWfwnaC83bjZiJIJ6BFEywIm7l0wD1VAmk8IFepMqIrTp4U9R
 ptKD8Go9GripftHryqSQA5wgp64hIQKxSHi+fEM6RfcLskXVKxEYSButva3wnFHx
 2lQSxro42jsaP8O9T1wmb+br0h3efPeo9qyewmjld+vISeUhmsCILta55VtXI0wu
 ektWoAuaA97d+u+mTee4U2kONXo+yUuXe0YqlCb3x/7m8oMg6hoblWZPnWFSeqeH
 T/vI5cRS+a3NqAjctwmX
 =vkde
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "These are fixes that didn't make it to the previous PM+ACPI pull
  request or are fixing issues introduced by it.

  Specifics:

   - Fix a recently added memory leak in an error path in the ACPI
     resources management code (Dan Carpenter)

   - Fix a build warning triggered by an ACPI video header function that
     should be static inline (Borislav Petkov)

   - Change names of helper function converting struct fwnode_handle
     pointers to either struct device_node or struct acpi_device
     pointers so they don't conflict with local variable names
     (Alexander Sverdlin)

   - Make the hibernate core re-enable nonboot CPUs on failures to
     disable them as expected (Vitaly Kuznetsov)

   - Increase the default timeout of the device suspend watchdog to
     prevent it from triggering too early on some systems (Takashi Iwai)

   - Prevent the cpuidle powernv driver from registering idle states
     with CPUIDLE_FLAG_TIMER_STOP set if CONFIG_TICK_ONESHOT is unset
     which leads to boot hangs (Preeti U Murthy)"

* tag 'pm+acpi-4.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  tick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in periodic mode
  PM / sleep: Increase default DPM watchdog timeout to 60
  PM / hibernate: re-enable nonboot cpus on disable_nonboot_cpus() failure
  ACPI / OF: Rename of_node() and acpi_node() to to_of_node() and to_acpi_node()
  ACPI / video: Inline acpi_video_set_dmi_backlight_type
  ACPI / resources: free memory on error in add_region_before()
2015-07-01 14:17:44 -07:00
Linus Torvalds 7adf12b87f xen: features and cleanups for 4.2-rc0
- Add "make xenconfig" to assist in generating configs for Xen guests.
 - Preparatory cleanups necessary for supporting 64 KiB pages in ARM
   guests.
 - Automatically use hvc0 as the default console in ARM guests.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJVkpoqAAoJEFxbo/MsZsTRu3IH/2AMPx2i65hoSqfHtGf3sz/z
 XNfcidVmOElFVXGaW83m0tBWMemT5LpOGRfiq5sIo8xt/8xD2vozEkl/3kkf3RrX
 EmZDw3E8vmstBdBTjWdovVhNenRc0m0pB5daS7wUdo9cETq1ag1L3BHTB3fEBApO
 74V6qAfnhnq+snqWhRD3XAk3LKI0nWuWaV+5HsmxDtnunGhuRLGVs7mwxZGg56sM
 mILA0eApGPdwyVVpuDe0SwV52V8E/iuVOWTcomGEN2+cRWffG5+QpHxQA8bOtF6O
 KfqldiNXOY/idM+5+oSm9hespmdWbyzsFqmTYz0LvQvxE8eEZtHHB3gIcHkE8QU=
 =danz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:
 "Xen features and cleanups for 4.2-rc0:

   - add "make xenconfig" to assist in generating configs for Xen guests

   - preparatory cleanups necessary for supporting 64 KiB pages in ARM
     guests

   - automatically use hvc0 as the default console in ARM guests"

* tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  block/xen-blkback: s/nr_pages/nr_segs/
  block/xen-blkfront: Remove invalid comment
  block/xen-blkfront: Remove unused macro MAXIMUM_OUTSTANDING_BLOCK_REQS
  arm/xen: Drop duplicate define mfn_to_virt
  xen/grant-table: Remove unused macro SPP
  xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
  xen: Include xen/page.h rather than asm/xen/page.h
  kconfig: add xenconfig defconfig helper
  kconfig: clarify kvmconfig is for kvm
  xen/pcifront: Remove usage of struct timeval
  xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
  hvc_xen: avoid uninitialized variable warning
  xenbus: avoid uninitialized variable warning
  xen/arm: allow console=hvc0 to be omitted for guests
  arm,arm64/xen: move Xen initialization earlier
  arm/xen: Correctly check if the event channel interrupt is present
2015-07-01 11:53:46 -07:00
Linus Torvalds 02201e3f1b Minor merge needed, due to function move.
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
 speed module address lookup.  He found some abusers of the module lock
 doing that too.
 
 A little bit of parameter work here too; including Dan Streetman's breaking
 up the big param mutex so writing a parameter can load another module (yeah,
 really).  Unfortunately that broke the usual suspects, !CONFIG_MODULES and
 !CONFIG_SYSFS, so those fixes were appended too.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
 58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
 b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
 rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
 wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
 GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
 PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
 qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
 HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
 OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
 dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
 tLdh/a9GiCitqS0bT7GE
 =tWPQ
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Main excitement here is Peter Zijlstra's lockless rbtree optimization
  to speed module address lookup.  He found some abusers of the module
  lock doing that too.

  A little bit of parameter work here too; including Dan Streetman's
  breaking up the big param mutex so writing a parameter can load
  another module (yeah, really).  Unfortunately that broke the usual
  suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
  appended too"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
  modules: only use mod->param_lock if CONFIG_MODULES
  param: fix module param locks when !CONFIG_SYSFS.
  rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
  module: add per-module param_lock
  module: make perm const
  params: suppress unused variable error, warn once just in case code changes.
  modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
  kernel/module.c: avoid ifdefs for sig_enforce declaration
  kernel/workqueue.c: remove ifdefs over wq_power_efficient
  kernel/params.c: export param_ops_bool_enable_only
  kernel/params.c: generalize bool_enable_only
  kernel/module.c: use generic module param operaters for sig_enforce
  kernel/params: constify struct kernel_param_ops uses
  sysfs: tightened sysfs permission checks
  module: Rework module_addr_{min,max}
  module: Use __module_address() for module_address_lookup()
  module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
  module: Optimize __module_address() using a latched RB-tree
  rbtree: Implement generic latch_tree
  seqlock: Introduce raw_read_seqcount_latch()
  ...
2015-07-01 10:49:25 -07:00
Eric W. Biederman f9bb48825a sysfs: Create mountpoints with sysfs_create_mount_point
This allows for better documentation in the code and
it allows for a simpler and fully correct version of
fs_fully_visible to be written.

The mount points converted and their filesystems are:
/sys/hypervisor/s390/       s390_hypfs
/sys/kernel/config/         configfs
/sys/kernel/debug/          debugfs
/sys/firmware/efi/efivars/  efivarfs
/sys/fs/fuse/connections/   fusectl
/sys/fs/pstore/             pstore
/sys/kernel/tracing/        tracefs
/sys/fs/cgroup/             cgroup
/sys/kernel/security/       securityfs
/sys/fs/selinux/            selinuxfs
/sys/fs/smackfs/            smackfs

Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-01 10:36:47 -05:00
Eric W. Biederman f9bd6733d3 sysctl: Allow creating permanently empty directories that serve as mountpoints.
Add a magic sysctl table sysctl_mount_point that when used to
create a directory forces that directory to be permanently empty.

Update the code to use make_empty_dir_inode when accessing permanently
empty directories.

Update the code to not allow adding to permanently empty directories.

Update /proc/sys/fs/binfmt_misc to be a permanently empty directory.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-01 10:36:39 -05:00
Thomas Gleixner 65f26062cd time: Remove development rules from Kbuild/Makefile
time.o gets rebuilt unconditionally due to a leftover Makefile rule
which was placed there for development purposes.

Remove it along with the commented out always rule in the toplevel
Kbuild file.

Fixes: 0a227985d4 'time: Move timeconst.h into include/generated'
Reported-by; Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Mc Guire <der.herr@hofr.at>
2015-07-01 09:57:35 +02:00
Pekka Enberg 200f1ce365 kernel/relay.c: use kvfree() in relay_free_page_array()
Use kvfree() instead of open-coding it.

Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:59 -07:00
Antonio Ospite b389645f04 printk: improve the description of /dev/kmsg line format
The comment about /dev/kmsg does not mention the additional values which
may actually be exported, fix that.

Also move up the part of the comment instructing the users to ignore these
additional values, this way the reading is more fluent and logically
compact.

Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:59 -07:00
Lorenzo Stoakes 3e44c471a2 gcov: add support for GCC 5.1
Fix kernel gcov support for GCC 5.1.  Similar to commit a992bf836f
("gcov: add support for GCC 4.9"), this patch takes into account the
existence of a new gcov counter (see gcc's gcc/gcov-counter.def.)

Firstly, it increments GCOV_COUNTERS (to 10), which makes the data
structure struct gcov_info compatible with GCC 5.1.

Secondly, a corresponding counter function __gcov_merge_icall_topn (Top N
value tracking for indirect calls) is included in base.c with the other
gcov counters unused for kernel profiling.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Yuan Pengfei <coolypf@qq.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:57 -07:00
HATAYAMA Daisuke 5375b708f2 kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path
Commit f06e5153f4 ("kernel/panic.c: add "crash_kexec_post_notifiers"
option for kdump after panic_notifers") introduced
"crash_kexec_post_notifiers" kernel boot option, which toggles wheather
panic() calls crash_kexec() before panic_notifiers and dump kmsg or after.

The problem is that the commit overlooks panic_on_oops kernel boot option.
 If it is enabled, crash_kexec() is called directly without going through
panic() in oops path.

To fix this issue, this patch adds a check to "crash_kexec_post_notifiers"
in the condition of kexec_should_crash().

Also, put a comment in kexec_should_crash() to explain not obvious things
on this patch.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:57 -07:00
HATAYAMA Daisuke f45d85ff1f kernel/panic: call the 2nd crash_kexec() only if crash_kexec_post_notifiers is enabled
For compatibility with the behaviour before the commit f06e5153f4
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after
panic_notifers"), the 2nd crash_kexec() should be called only if
crash_kexec_post_notifiers is enabled.

Note that crash_kexec() returns immediately if kdump crash kernel is not
loaded, so in this case, this patch makes no functionality change, but the
point is to make it explicit, from the caller panic() side, that the 2nd
crash_kexec() does nothing.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:57 -07:00
Stephen Rothwell 20bdc2cfdb modules: only use mod->param_lock if CONFIG_MODULES
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-06-28 14:50:12 +09:30
Rusty Russell cf2fde7b39 param: fix module param locks when !CONFIG_SYSFS.
As Dan Streetman points out, the entire point of locking for is to
stop sysfs accesses, so they're elided entirely in the !SYSFS case.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-06-28 14:46:14 +09:30
Linus Torvalds 4a10a91756 Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "Four small audit patches for v4.2, all bug fixes.  Only 10 lines of
  change this time so very unremarkable, the patch subject lines pretty
  much tell the whole story"

* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
  audit: Fix check of return value of strnlen_user()
  audit: obsolete audit_context check is removed in audit_filter_rules()
  audit: fix for typo in comment to function audit_log_link_denied()
  lsm: rename duplicate labels in LSM_AUDIT_DATA_TASK audit message type
2015-06-27 13:53:16 -07:00
Linus Torvalds e22619a29f Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "The main change in this kernel is Casey's generalized LSM stacking
  work, which removes the hard-coding of Capabilities and Yama stacking,
  allowing multiple arbitrary "small" LSMs to be stacked with a default
  monolithic module (e.g.  SELinux, Smack, AppArmor).

  See
        https://lwn.net/Articles/636056/

  This will allow smaller, simpler LSMs to be incorporated into the
  mainline kernel and arbitrarily stacked by users.  Also, this is a
  useful cleanup of the LSM code in its own right"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
  tpm, tpm_crb: fix le64_to_cpu conversions in crb_acpi_add()
  vTPM: set virtual device before passing to ibmvtpm_reset_crq
  tpm_ibmvtpm: remove unneccessary message level.
  ima: update builtin policies
  ima: extend "mask" policy matching support
  ima: add support for new "euid" policy condition
  ima: fix ima_show_template_data_ascii()
  Smack: freeing an error pointer in smk_write_revoke_subj()
  selinux: fix setting of security labels on NFS
  selinux: Remove unused permission definitions
  selinux: enable genfscon labeling for sysfs and pstore files
  selinux: enable per-file labeling for debugfs files.
  selinux: update netlink socket classes
  signals: don't abuse __flush_signals() in selinux_bprm_committed_creds()
  selinux: Print 'sclass' as string when unrecognized netlink message occurs
  Smack: allow multiple labels in onlycap
  Smack: fix seq operations in smackfs
  ima: pass iint to ima_add_violation()
  ima: wrap event related data to the new ima_event_data structure
  integrity: add validity checks for 'path' parameter
  ...
2015-06-27 13:26:03 -07:00
Linus Torvalds e0dd880a54 Merge branch 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Most of the changes are around implementing and fixing fallouts from
  sysfs and internal interface to limit the CPUs available to all
  unbound workqueues to help isolating CPUs.  It needs more work as
  ordered workqueues can roam unrestricted but still is a significant
  improvement"

* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix typos in comments
  workqueue: move flush_scheduled_work() to workqueue.h
  workqueue: remove the lock from wq_sysfs_prep_attrs()
  workqueue: remove the declaration of copy_workqueue_attrs()
  workqueue: ensure attrs changes are properly synchronized
  workqueue: separate out and refactor the locking of applying attrs
  workqueue: simplify wq_update_unbound_numa()
  workqueue: wq_pool_mutex protects the attrs-installation
  workqueue: fix a typo
  workqueue: function name in the comment differs from the real function name
  workqueue: fix trivial typo in Documentation/workqueue.txt
  workqueue: Allow modifying low level unbound workqueue cpumask
  workqueue: Create low-level unbound workqueues cpumask
  workqueue: split apply_workqueue_attrs() into 3 stages
2015-06-26 20:12:21 -07:00
Linus Torvalds bbe179f88d Merge branch 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - threadgroup_lock got reorganized so that its users can pick the
   actual locking mechanism to use.  Its only user - cgroups - is
   updated to use a percpu_rwsem instead of per-process rwsem.

   This makes things a bit lighter on hot paths and allows cgroups to
   perform and fail multi-task (a process) migrations atomically.
   Multi-task migrations are used in several places including the
   unified hierarchy.

 - Delegation rule and documentation added to unified hierarchy.  This
   will likely be the last interface update from the cgroup core side
   for unified hierarchy before lifting the devel mask.

 - Some groundwork for the pids controller which is scheduled to be
   merged in the coming devel cycle.

* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: add delegation section to unified hierarchy documentation
  cgroup: require write perm on common ancestor when moving processes on the default hierarchy
  cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
  kernfs: make kernfs_get_inode() public
  MAINTAINERS: add a cgroup core co-maintainer
  cgroup: fix uninitialised iterator in for_each_subsys_which
  cgroup: replace explicit ss_mask checking with for_each_subsys_which
  cgroup: use bitmask to filter for_each_subsys
  cgroup: add seq_file forward declaration for struct cftype
  cgroup: simplify threadgroup locking
  sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
  sched, cgroup: reorganize threadgroup locking
  cgroup: switch to unsigned long for bitmasks
  cgroup: reorganize include/linux/cgroup.h
  cgroup: separate out include/linux/cgroup-defs.h
  cgroup: fix some comment typos
2015-06-26 19:50:04 -07:00
Linus Torvalds 8d7804a2f0 Driver core patches for 4.2-rc1
Here is the driver core / firmware changes for 4.2-rc1.
 
 A number of small changes all over the place in the driver core, and in
 the firmware subsystem.  Nothing really major, full details in the
 shortlog.  Some of it is a bit of churn, given that the platform driver
 probing changes was found to not work well, so they were reverted.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlWNoCQACgkQMUfUDdst+ym4JACdFrrXoMt2pb8nl5gMidGyM9/D
 jg8AnRgdW8ArDA/xOarULd/X43eA3J3C
 =Al2B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the driver core / firmware changes for 4.2-rc1.

  A number of small changes all over the place in the driver core, and
  in the firmware subsystem.  Nothing really major, full details in the
  shortlog.  Some of it is a bit of churn, given that the platform
  driver probing changes was found to not work well, so they were
  reverted.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
  Revert "base/platform: Only insert MEM and IO resources"
  Revert "base/platform: Continue on insert_resource() error"
  Revert "of/platform: Use platform_device interface"
  Revert "base/platform: Remove code duplication"
  firmware: add missing kfree for work on async call
  fs: sysfs: don't pass count == 0 to bin file readers
  base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
  base/platform: Remove code duplication
  of/platform: Use platform_device interface
  base/platform: Continue on insert_resource() error
  base/platform: Only insert MEM and IO resources
  firmware: use const for remaining firmware names
  firmware: fix possible use after free on name on asynchronous request
  firmware: check for file truncation on direct firmware loading
  firmware: fix __getname() missing failure check
  drivers: of/base: move of_init to driver_init
  drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
  sysfs: disambiguate between "error code" and "failure" in comments
  driver-core: fix build for !CONFIG_MODULES
  driver-core: make __device_attach() static
  ...
2015-06-26 15:07:37 -07:00
Linus Torvalds e382608254 This patch series contains several clean ups and even a new trace clock
"monitonic raw". Also some enhancements to make the ring buffer even
 faster. But the biggest and most noticeable change is the renaming of
 the ftrace* files, structures and variables that have to deal with
 trace events.
 
 Over the years I've had several developers tell me about their confusion
 with what ftrace is compared to events. Technically, "ftrace" is the
 infrastructure to do the function hooks, which include tracing and also
 helps with live kernel patching. But the trace events are a separate
 entity altogether, and the files that affect the trace events should
 not be named "ftrace". These include:
 
   include/trace/ftrace.h	->	include/trace/trace_events.h
   include/linux/ftrace_event.h	->	include/linux/trace_events.h
 
 Also, functions that are specific for trace events have also been renamed:
 
   ftrace_print_*()		->	trace_print_*()
   (un)register_ftrace_event()	->	(un)register_trace_event()
   ftrace_event_name()		->	trace_event_name()
   ftrace_trigger_soft_disabled()->	trace_trigger_soft_disabled()
   ftrace_define_fields_##call() ->	trace_define_fields_##call()
   ftrace_get_offsets_##call()	->	trace_get_offsets_##call()
 
 Structures have been renamed:
 
   ftrace_event_file		->	trace_event_file
   ftrace_event_{call,class}	->	trace_event_{call,class}
   ftrace_event_buffer		->	trace_event_buffer
   ftrace_subsystem_dir		->	trace_subsystem_dir
   ftrace_event_raw_##call	->	trace_event_raw_##call
   ftrace_event_data_offset_##call->	trace_event_data_offset_##call
   ftrace_event_type_funcs_##call ->	trace_event_type_funcs_##call
 
 And a few various variables and flags have also been updated.
 
 This has been sitting in linux-next for some time, and I have not heard
 a single complaint about this rename breaking anything. Mostly because
 these functions, variables and structures are mostly internal to the
 tracing system and are seldom (if ever) used by anything external to that.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJViYhVAAoJEEjnJuOKh9ldcJ0IAI+mytwoMAN/CWDE8pXrTrgs
 aHlcr1zorSzZ0Lq6lKsWP+V0VGVhP8KWO16vl35HaM5ZB9U+cDzWiGobI8JTHi/3
 eeTAPTjQdgrr/L+ZO1ApzS1jYPhN3Xi5L7xublcYMJjKfzU+bcYXg/x8gRt0QbG3
 S9QN/kBt0JIIjT7McN64m5JVk2OiU36LxXxwHgCqJvVCPHUrriAdIX7Z5KRpEv13
 zxgCN4d7Jiec/FsMW8dkO0vRlVAvudZWLL7oDmdsvNhnLy8nE79UOeHos2c1qifQ
 LV4DeQ+2Hlu7w9wxixHuoOgNXDUEiQPJXzPc/CuCahiTL9N/urQSGQDoOVMltR4=
 =hkdz
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This patch series contains several clean ups and even a new trace
  clock "monitonic raw".  Also some enhancements to make the ring buffer
  even faster.  But the biggest and most noticeable change is the
  renaming of the ftrace* files, structures and variables that have to
  deal with trace events.

  Over the years I've had several developers tell me about their
  confusion with what ftrace is compared to events.  Technically,
  "ftrace" is the infrastructure to do the function hooks, which include
  tracing and also helps with live kernel patching.  But the trace
  events are a separate entity altogether, and the files that affect the
  trace events should not be named "ftrace".  These include:

    include/trace/ftrace.h         ->    include/trace/trace_events.h
    include/linux/ftrace_event.h   ->    include/linux/trace_events.h

  Also, functions that are specific for trace events have also been renamed:

    ftrace_print_*()               ->    trace_print_*()
    (un)register_ftrace_event()    ->    (un)register_trace_event()
    ftrace_event_name()            ->    trace_event_name()
    ftrace_trigger_soft_disabled() ->    trace_trigger_soft_disabled()
    ftrace_define_fields_##call()  ->    trace_define_fields_##call()
    ftrace_get_offsets_##call()    ->    trace_get_offsets_##call()

  Structures have been renamed:

    ftrace_event_file              ->    trace_event_file
    ftrace_event_{call,class}      ->    trace_event_{call,class}
    ftrace_event_buffer            ->    trace_event_buffer
    ftrace_subsystem_dir           ->    trace_subsystem_dir
    ftrace_event_raw_##call        ->    trace_event_raw_##call
    ftrace_event_data_offset_##call->    trace_event_data_offset_##call
    ftrace_event_type_funcs_##call ->    trace_event_type_funcs_##call

  And a few various variables and flags have also been updated.

  This has been sitting in linux-next for some time, and I have not
  heard a single complaint about this rename breaking anything.  Mostly
  because these functions, variables and structures are mostly internal
  to the tracing system and are seldom (if ever) used by anything
  external to that"

* tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  ring_buffer: Allow to exit the ring buffer benchmark immediately
  ring-buffer-benchmark: Fix the wrong type
  ring-buffer-benchmark: Fix the wrong param in module_param
  ring-buffer: Add enum names for the context levels
  ring-buffer: Remove useless unused tracing_off_permanent()
  ring-buffer: Give NMIs a chance to lock the reader_lock
  ring-buffer: Add trace_recursive checks to ring_buffer_write()
  ring-buffer: Allways do the trace_recursive checks
  ring-buffer: Move recursive check to per_cpu descriptor
  ring-buffer: Add unlikelys to make fast path the default
  tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
  tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
  tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
  tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
  tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
  tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
  tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
  tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
  tracing: Rename ftrace_event_name() to trace_event_name()
  tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
  ...
2015-06-26 14:02:43 -07:00
Thomas Gleixner 24bfcb1009 timer: Fix hotplug regression
The recent timer wheel rework removed the get/put_cpu_var() pair in
the hotplug migration code, which results in:

BUG: using smp_processor_id() in preemptible [00000000] code: hib.sh/2845
...
[<ffffffff810d4fa3>] timer_cpu_notify+0x53/0x12

That hunk is a leftover from an earlier iteration and went unnoticed
so far.

Restore the previous code which was obviously correct.

Fixes: 0eeda71bc3 'timer: Replace timer base by a cpu index'
Reported-and_tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-26 22:58:06 +02:00
Linus Torvalds fcbc1777ce After fixing the previous filter issue reported by Vince Weaver,
I could not come up with a situation where the operand counter (cnt)
 could go below zero, so I added a WARN_ON_ONCE(cnt < 0). Vince was
 able to trigger that warn on with his fuzzer test, but didn't have
 a filter input that caused it.
 
 Later, Sasha Levin was able to trigger that same warning, and was
 able to give me the filter string that triggered it. It was simply
 a single operation ">".
 
 I wrapped the filtering code in a userspace program such that I could
 single step through the logic. With a single operator the operand
 counter can legitimately go below zero, and should be reported to the
 user as an error, but should not produce a kernel warning. The
 WARN_ON_ONCE(cnt < 0) should be just a "if (cnt < 0) break;" and the
 code following it will produce the error message for the user.
 
 While debugging this, I found that there was another bug that let
 the pointer to the filter string go beyond the filter string.
 This too was fixed.
 
 Finally, there was a typo in a stub function that only gets compiled
 if trace events is disabled but tracing is enabled (I'm not even sure
 that's possible).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVjWh2AAoJEEjnJuOKh9ldOn0IANHPW82++0O87U1pEe3hHnKv
 gSTKiNPVNC3GBt9DHnawA0EuyPfPa+Wj5X2xgrstWA+KRADZErZzdWpzbh/iHosJ
 0kaUFqFcaKBheOSqhHfz3WQshD16pb1lQYbV7vbdzMjpcIpYT3VcuKQq3zQVb5Pr
 njPmgZXK+I9ITYQ8E+DysnTg0+Mo+l/2P/tqnBoIkAVmuZitfJS5okTtVw1GNzyR
 7VRMGBE3G0GxB++57T/xILXjFc9sSGQH5lZgLHQhEh36YgWuDvc0R2FfxDKROmeq
 b/xw68uCO1Hv8oEng6r/UceVtUoaXhf+JamSJqxztBTsjsqR/iXCV78Jac1vnPY=
 =cmr8
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This isn't my 4.2 pull request (yet).  I found a few more bugs that I
  would have sent to fix 4.1, but since 4.1 is already out, I'm sending
  this before sending my 4.2 request (which is ready to go).

  After fixing the previous filter issue reported by Vince Weaver, I
  could not come up with a situation where the operand counter (cnt)
  could go below zero, so I added a WARN_ON_ONCE(cnt < 0).  Vince was
  able to trigger that warn on with his fuzzer test, but didn't have a
  filter input that caused it.

  Later, Sasha Levin was able to trigger that same warning, and was able
  to give me the filter string that triggered it.  It was simply a
  single operation ">".

  I wrapped the filtering code in a userspace program such that I could
  single step through the logic.  With a single operator the operand
  counter can legitimately go below zero, and should be reported to the
  user as an error, but should not produce a kernel warning.  The
  WARN_ON_ONCE(cnt < 0) should be just a "if (cnt < 0) break;" and the
  code following it will produce the error message for the user.

  While debugging this, I found that there was another bug that let the
  pointer to the filter string go beyond the filter string.  This too
  was fixed.

  Finally, there was a typo in a stub function that only gets compiled
  if trace events is disabled but tracing is enabled (I'm not even sure
  that's possible)"

* tag 'trace-fixes-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix typo from "static inlin" to "static inline"
  tracing/filter: Do not allow infix to exceed end of string
  tracing/filter: Do not WARN on operand count going below zero
2015-06-26 13:56:55 -07:00
Linus Torvalds e8a0b37d28 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "Bigger items included in this update are:

   - A series of updates from Arnd for ARM randconfig build failures
   - Updates from Dmitry for StrongARM SA-1100 to move IRQ handling to
     drivers/irqchip/
   - Move ARMs SP804 timer to drivers/clocksource/
   - Perf updates from Mark Rutland in preparation to move the ARM perf
     code into drivers/ so it can be shared with ARM64.
   - MCPM updates from Nicolas
   - Add support for taking platform serial number from DT
   - Re-implement Keystone2 physical address space switch to conform to
     architecture requirements
   - Clean up ARMv7 LPAE code, which goes in hand with the Keystone2
     changes.
   - L2C cleanups to avoid unlocking caches if we're prevented by the
     secure support to unlock.
   - Avoid cleaning a potentially dirty cache containing stale data on
     CPU initialisation
   - Add ARM-only entry point for secondary startup (for machines that
     can only call into a Thumb kernel in ARM mode).  Same thing is also
     done for the resume entry point.
   - Provide arch_irqs_disabled via asm-generic
   - Enlarge ARMv7M vector table
   - Always use BFD linker for VDSO, as gold doesn't accept some of the
     options we need.
   - Fix an incorrect BSYM (for Thumb symbols) usage, and convert all
     BSYM compiler macros to a "badr" (for branch address).
   - Shut up compiler warnings provoked by our cmpxchg() implementation.
   - Ensure bad xchg sizes fail to link"

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (75 commits)
  ARM: Fix build if CLKDEV_LOOKUP is not configured
  ARM: fix new BSYM() usage introduced via for-arm-soc branch
  ARM: 8383/1: nommu: avoid deprecated source register on mov
  ARM: 8391/1: l2c: add options to overwrite prefetching behavior
  ARM: 8390/1: irqflags: Get arch_irqs_disabled from asm-generic
  ARM: 8387/1: arm/mm/dma-mapping.c: Add arm_coherent_dma_mmap
  ARM: 8388/1: tcm: Don't crash when TCM banks are protected by TrustZone
  ARM: 8384/1: VDSO: force use of BFD linker
  ARM: 8385/1: VDSO: group link options
  ARM: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
  ARM: remove __bad_xchg definition
  ARM: 8369/1: ARMv7M: define size of vector table for Vybrid
  ARM: 8382/1: clocksource: make ARM_TIMER_SP804 depend on GENERIC_SCHED_CLOCK
  ARM: 8366/1: move Dual-Timer SP804 driver to drivers/clocksource
  ARM: 8365/1: introduce sp804_timer_disable and remove arm_timer.h inclusion
  ARM: 8364/1: fix BE32 module loading
  ARM: 8360/1: add secondary_startup_arm prototype in header file
  ARM: 8359/1: correct secondary_startup_arm mode
  ARM: proc-v7: sanitise and document registers around errata
  ARM: proc-v7: clean up MIDR access
  ...
2015-06-26 12:20:00 -07:00
Linus Torvalds 47a469421d Merge branch 'akpm' (patches from Andrew)
Merge second patchbomb from Andrew Morton:

 - most of the rest of MM

 - lots of misc things

 - procfs updates

 - printk feature work

 - updates to get_maintainer, MAINTAINERS, checkpatch

 - lib/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits)
  exit,stats: /* obey this comment */
  coredump: add __printf attribute to cn_*printf functions
  coredump: use from_kuid/kgid when formatting corename
  fs/reiserfs: remove unneeded cast
  NILFS2: support NFSv2 export
  fs/befs/btree.c: remove unneeded initializations
  fs/minix: remove unneeded cast
  init/do_mounts.c: add create_dev() failure log
  kasan: remove duplicate definition of the macro KASAN_FREE_PAGE
  fs/efs: femove unneeded cast
  checkpatch: emit "NOTE: <types>" message only once after multiple files
  checkpatch: emit an error when there's a diff in a changelog
  checkpatch: validate MODULE_LICENSE content
  checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY
  checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
  checkpatch: fix processing of MEMSET issues
  checkpatch: suggest using ether_addr_equal*()
  checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files
  checkpatch: remove local from codespell path
  checkpatch: add --showfile to allow input via pipe to show filenames
  ...
2015-06-26 09:52:05 -07:00
Rafael J. Wysocki 132c242d95 Merge branches 'acpi-video', 'device-properties', 'pm-sleep' and 'pm-cpuidle'
* acpi-video:
  ACPI / video: Inline acpi_video_set_dmi_backlight_type

* device-properties:
  ACPI / OF: Rename of_node() and acpi_node() to to_of_node() and to_acpi_node()

* pm-sleep:
  PM / sleep: Increase default DPM watchdog timeout to 60
  PM / hibernate: re-enable nonboot cpus on disable_nonboot_cpus() failure

* pm-cpuidle:
  tick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in periodic mode
2015-06-26 03:30:37 +02:00
Rik van Riel 51229b4953 exit,stats: /* obey this comment */
There is a helpful comment in do_exit() that states we sync the mm's RSS
info before statistics gathering.

The function that does the statistics gathering is called right above that
comment.

Change the code to obey the comment.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:43 -07:00
Rasmus Villemoes ff14417c0a kernel/trace/blktrace.c: use strreplace() in do_blk_trace_setup()
Part of the disassembly of do_blk_trace_setup:

    231b:       e8 00 00 00 00          callq  2320 <do_blk_trace_setup+0x50>
                        231c: R_X86_64_PC32     strlen+0xfffffffffffffffc
    2320:       eb 0a                   jmp    232c <do_blk_trace_setup+0x5c>
    2322:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    2328:       48 83 c3 01             add    $0x1,%rbx
    232c:       48 39 d8                cmp    %rbx,%rax
    232f:       76 47                   jbe    2378 <do_blk_trace_setup+0xa8>
    2331:       41 80 3c 1c 2f          cmpb   $0x2f,(%r12,%rbx,1)
    2336:       75 f0                   jne    2328 <do_blk_trace_setup+0x58>
    2338:       41 c6 04 1c 5f          movb   $0x5f,(%r12,%rbx,1)
    233d:       4c 89 e7                mov    %r12,%rdi
    2340:       e8 00 00 00 00          callq  2345 <do_blk_trace_setup+0x75>
                        2341: R_X86_64_PC32     strlen+0xfffffffffffffffc
    2345:       eb e1                   jmp    2328 <do_blk_trace_setup+0x58>

Yep, that's right: gcc isn't smart enough to realize that replacing '/' by
'_' cannot change the strlen(), so we call it again and again (at least
when a '/' is found).  Even if gcc were that smart, this construction
would still loop over the string twice, once for the initial strlen() call
and then the open-coded loop.

Let's simply use strreplace() instead.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Liked-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:40 -07:00
Rasmus Villemoes 1bb564718f kernel/trace/trace_events_filter.c: use strreplace()
There's no point in starting over every time we see a ','...

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:40 -07:00
Vasily Averin 3ea4331c60 check_syslog_permissions() cleanup
Patch fixes drawbacks in heck_syslog_permissions() noticed by AKPM:
"from_file handling makes me cry.

That's not a boolean - it's an enumerated value with two values
currently defined.

But the code in check_syslog_permissions() treats it as a boolean and
also hardwires the knowledge that SYSLOG_FROM_PROC == 1 (or == `true`).

And the name is wrong: it should be called from_proc to match
SYSLOG_FROM_PROC."

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:39 -07:00
Vasily Averin d194e5d666 security_syslog() should be called once only
The final version of commit 637241a900 ("kmsg: honor dmesg_restrict
sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
processed incorrectly:

- open of /dev/kmsg checks syslog access permissions by using
  check_syslog_permissions() where security_syslog() is not called if
  dmesg_restrict is set.

- syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
  can be executed twice (inside check_syslog_permissions() and then
  directly in do_syslog())

With this patch security_syslog() is called once only in all
syslog-related operations regardless of dmesg_restrict value.

Fixes: 637241a900 ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:39 -07:00
Tejun Heo 6fe29354be printk: implement support for extended console drivers
printk log_buf keeps various metadata for each message including its
sequence number and timestamp.  The metadata is currently available only
through /dev/kmsg and stripped out before passed onto console drivers.  We
want this metadata to be available to console drivers too so that console
consumers can get full information including the metadata and dictionary,
which among other things can be used to detect whether messages got lost
in transit.

This patch implements support for extended console drivers.  Consoles can
indicate that they want extended messages by setting the new CON_EXTENDED
flag and they'll be fed messages formatted the same way as /dev/kmsg.

 "<level>,<sequnum>,<timestamp>,<contflag>;<message text>\n"

If extended consoles exist, in-kernel fragment assembly is disabled.  This
ensures that all messages emitted to consoles have full metadata including
sequence number.  The contflag carries enough information to reassemble
the fragments from the reader side trivially.  Note that this only affects
/dev/kmsg.  Regular console and /proc/kmsg outputs are not affected by
this change.

* Extended message formatting for console drivers is enabled iff there
  are registered extended consoles.

* Comment describing /dev/kmsg message format updated to add missing
  contflag field and help distinguishing variable from verbatim terms.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
Cc: Kay Sievers <kay@vrfy.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:38 -07:00
Tejun Heo 0a295e67ec printk: factor out message formatting from devkmsg_read()
The extended message formatting used for /dev/kmsg will be used implement
extended consoles.  Factor out msg_print_ext_header() and
msg_print_ext_body() from devkmsg_read().

This is pure restructuring.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
Cc: Kay Sievers <kay@vrfy.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:38 -07:00
Tejun Heo d43ff430f4 printk: guard the amount written per line by devkmsg_read()
This patchset updates netconsole so that it can emit messages with the
same header as used in /dev/kmsg which gives neconsole receiver full log
information which enables things like structured logging and detection
of lost messages.

This patch (of 7):

devkmsg_read() uses 8k buffer and assumes that the formatted output
message won't overrun which seems safe given LOG_LINE_MAX, the current use
of dict and the escaping method being used; however, we're planning to use
devkmsg formatting wider and accounting for the buffer size properly isn't
that complicated.

This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates devkmsg_read()
so that it limits output accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
Cc: Kay Sievers <kay@vrfy.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:38 -07:00
Josh Triplett 3033f14ab7 clone: support passing tls argument via C rather than pt_regs magic
clone has some of the quirkiest syscall handling in the kernel, with a
pile of special cases, historical curiosities, and architecture-specific
calling conventions.  In particular, clone with CLONE_SETTLS accepts a
parameter "tls" that the C entry point completely ignores and some
assembly entry points overwrite; instead, the low-level arch-specific
code pulls the tls parameter out of the arch-specific register captured
as part of pt_regs on entry to the kernel.  That's a massive hack, and
it makes the arch-specific code only work when called via the specific
existing syscall entry points; because of this hack, any new clone-like
system call would have to accept an identical tls argument in exactly
the same arch-specific position, rather than providing a unified system
call entry point across architectures.

The first patch allows architectures to handle the tls argument via
normal C parameter passing, if they opt in by selecting
HAVE_COPY_THREAD_TLS.  The second patch makes 32-bit and 64-bit x86 opt
into this.

These two patches came out of the clone4 series, which isn't ready for
this merge window, but these first two cleanup patches were entirely
uncontroversial and have acks.  I'd like to go ahead and submit these
two so that other architectures can begin building on top of this and
opting into HAVE_COPY_THREAD_TLS.  However, I'm also happy to wait and
send these through the next merge window (along with v3 of clone4) if
anyone would prefer that.

This patch (of 2):

clone with CLONE_SETTLS accepts an argument to set the thread-local
storage area for the new thread.  sys_clone declares an int argument
tls_val in the appropriate point in the argument list (based on the
various CLONE_BACKWARDS variants), but doesn't actually use or pass along
that argument.  Instead, sys_clone calls do_fork, which calls
copy_process, which calls the arch-specific copy_thread, and copy_thread
pulls the corresponding syscall argument out of the pt_regs captured at
kernel entry (knowing what argument of clone that architecture passes tls
in).

Apart from being awful and inscrutable, that also only works because only
one code path into copy_thread can pass the CLONE_SETTLS flag, and that
code path comes from sys_clone with its architecture-specific
argument-passing order.  This prevents introducing a new version of the
clone system call without propagating the same architecture-specific
position of the tls argument.

However, there's no reason to pull the argument out of pt_regs when
sys_clone could just pass it down via C function call arguments.

Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into,
and a new copy_thread_tls that accepts the tls parameter as an additional
unsigned long (syscall-argument-sized) argument.  Change sys_clone's tls
argument to an unsigned long (which does not change the ABI), and pass
that down to copy_thread_tls.

Architectures that don't opt into copy_thread_tls will continue to ignore
the C argument to sys_clone in favor of the pt_regs captured at kernel
entry, and thus will be unable to introduce new versions of the clone
syscall.

Patch co-authored by Josh Triplett and Thiago Macieira.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:38 -07:00
Alexey Dobriyan 4a00e9df29 prctl: more prctl(PR_SET_MM_*) checks
Individual prctl(PR_SET_MM_*) calls do some checking to maintain a
consistent view of mm->arg_start et al fields, but not enough.  In
particular PR_SET_MM_ARG_START/PR_SET_MM_ARG_END/ R_SET_MM_ENV_START/
PR_SET_MM_ENV_END only check that the address lies in an existing VMA,
but don't check that the start address is lower than the end address _at
all_.

Consolidate all consistency checks, so there will be no difference in
the future between PR_SET_MM_MAP and individual PR_SET_MM_* calls.

The program below makes both ARGV and ENVP areas be reversed.  It makes
/proc/$PID/cmdline show garbage (it doesn't oops by luck).

#include <sys/mman.h>
#include <sys/prctl.h>
#include <unistd.h>

enum {PAGE_SIZE=4096};

int main(void)
{
	void *p;

	p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

#define PR_SET_MM               35
#define PR_SET_MM_ARG_START     8
#define PR_SET_MM_ARG_END       9
#define PR_SET_MM_ENV_START     10
#define PR_SET_MM_ENV_END       11
	prctl(PR_SET_MM, PR_SET_MM_ARG_START, (unsigned long)p + PAGE_SIZE - 1, 0, 0);
	prctl(PR_SET_MM, PR_SET_MM_ARG_END,   (unsigned long)p, 0, 0);
	prctl(PR_SET_MM, PR_SET_MM_ENV_START, (unsigned long)p + PAGE_SIZE - 1, 0, 0);
	prctl(PR_SET_MM, PR_SET_MM_ENV_END,   (unsigned long)p, 0, 0);

	pause();
	return 0;
}

[akpm@linux-foundation.org: tidy code, tweak comment]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:37 -07:00
Steven Rostedt (Red Hat) cc9e4bde03 tracing: Fix typo from "static inlin" to "static inline"
The trace.h header when called without CONFIG_EVENT_TRACING enabled
(seldom done), will not compile because of a typo in the protocol
of trace_event_enum_update().

Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-25 18:21:34 -04:00
Steven Rostedt (Red Hat) 6b88f44e16 tracing/filter: Do not allow infix to exceed end of string
While debugging a WARN_ON() for filtering, I found that it is possible
for the filter string to be referenced after its end. With the filter:

 # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has

	ps->infix.cnt--;
	return ps->infix.string[ps->infix.tail++];

The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.

Luckily, only root can write to the filter.

Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-25 18:18:17 -04:00
Steven Rostedt (Red Hat) b4875bbe7e tracing/filter: Do not WARN on operand count going below zero
When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).

 # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.

Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-25 18:02:29 -04:00
Linus Torvalds bfffa1cc9d Merge branch 'for-4.2/core' of git://git.kernel.dk/linux-block
Pull core block IO update from Jens Axboe:
 "Nothing really major in here, mostly a collection of smaller
  optimizations and cleanups, mixed with various fixes.  In more detail,
  this contains:

   - Addition of policy specific data to blkcg for block cgroups.  From
     Arianna Avanzini.

   - Various cleanups around command types from Christoph.

   - Cleanup of the suspend block I/O path from Christoph.

   - Plugging updates from Shaohua and Jeff Moyer, for blk-mq.

   - Eliminating atomic inc/dec of both remaining IO count and reference
     count in a bio.  From me.

   - Fixes for SG gap and chunk size support for data-less (discards)
     IO, so we can merge these better.  From me.

   - Small restructuring of blk-mq shared tag support, freeing drivers
     from iterating hardware queues.  From Keith Busch.

   - A few cfq-iosched tweaks, from Tahsin Erdogan and me.  Makes the
     IOPS mode the default for non-rotational storage"

* 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits)
  cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL
  cfq-iosched: fix sysfs oops when attempting to read unconfigured weights
  cfq-iosched: move group scheduling functions under ifdef
  cfq-iosched: fix the setting of IOPS mode on SSDs
  blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file
  block, cgroup: implement policy-specific per-blkcg data
  block: Make CFQ default to IOPS mode on SSDs
  block: add blk_set_queue_dying() to blkdev.h
  blk-mq: Shared tag enhancements
  block: don't honor chunk sizes for data-less IO
  block: only honor SG gap prevention for merges that contain data
  block: fix returnvar.cocci warnings
  block, dm: don't copy bios for request clones
  block: remove management of bi_remaining when restoring original bi_end_io
  block: replace trylock with mutex_lock in blkdev_reread_part()
  block: export blkdev_reread_part() and __blkdev_reread_part()
  suspend: simplify block I/O handling
  block: collapse bio bit space
  block: remove unused BIO_RW_BLOCK and BIO_EOF flags
  block: remove BIO_EOPNOTSUPP
  ...
2015-06-25 14:29:53 -07:00