Commit Graph

296603 Commits

Author SHA1 Message Date
Russell King 2b7da084d4 ARM: riscpc: use DEFINE_RES_xxx()
Use DEFINE_RES_xxx() to define device resources.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-24 09:37:37 +00:00
Russell King 530c2eaa6a ARM: riscpc: remove expansion card irq mask register
This register is only present on older platforms, and not on RiscPC,
so lets remove this unused support.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-24 09:37:37 +00:00
Russell King b4ac08492d ARM: riscpc: convert ecard to use irq_alloc_descs()
Use irq_alloc_descs() to allocate IRQs for expansion cards.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-24 09:37:36 +00:00
Russell King c402c11072 ARM: riscpc: use irq chip data in ecard.c
Use irq chip data to store the expansion card data pointer, rather
than converting from the interrupt number to a slot number.  This
allows the interrupt chip methods to avoid knowing about interrupt
numbering.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-24 09:37:36 +00:00
Russell King 6e747b4b83 ARM: riscpc: move ecard.c to arch/arm/mach-rpc
RiscPC is the only platform using the Acorn expansion card support, so
move it into its mach-* directory.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-24 09:37:36 +00:00
Russell King 927b6c4da9 ARM: riscpc: remove IRQ_TIMER
Use IRQ_TIMER0 instead, which is the same thing.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-24 09:37:36 +00:00
Russell King 18a66d5ae9 ARM: riscpc: use definition for serial port interrupt
Rather than using a plain integer, use the definition already provided.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-24 09:37:36 +00:00
Russell King 8c6d9d0a01 ARM: riscpc: pass IRQ resources into keyboard driver
Rather than including asm/irq.h into the keyboard driver, pass the
IRQ numbers via the platform device instead.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-24 09:37:36 +00:00
Russell King a1be5d6496 ARM: riscpc: move time-acorn.c to mach-rpc
Nothing but RiscPC makes use of the Acorn timekeeping code, so move
it into mach-rpc.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-24 09:37:30 +00:00
Linus Torvalds f1d38e423a Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl
Pull sysctl updates from Eric Biederman:

 - Rewrite of sysctl for speed and clarity.

   Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
   are no longer bottlenecks in the process of adding and removing
   network devices.

   sysctl is now focused on being a filesystem instead of system call
   and the code can all be found in fs/proc/proc_sysctl.c.  Hopefully
   this means the code is now approachable.

   Much thanks is owed to Lucian Grinjincu for keeping at this until
   something was found that was usable.

 - The recent proc_sys_poll oops found by the fuzzer during hibernation
   is fixed.

* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
  sysctl: protect poll() in entries that may go away
  sysctl: Don't call sysctl_follow_link unless we are a link.
  sysctl: Comments to make the code clearer.
  sysctl: Correct error return from get_subdir
  sysctl: An easier to read version of find_subdir
  sysctl: fix memset parameters in setup_sysctl_set()
  sysctl: remove an unused variable
  sysctl: Add register_sysctl for normal sysctl users
  sysctl: Index sysctl directories with rbtrees.
  sysctl: Make the header lists per directory.
  sysctl: Move sysctl_check_dups into insert_header
  sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
  sysctl: Replace root_list with links between sysctl_table_sets.
  sysctl: Add sysctl_print_dir and use it in get_subdir
  sysctl: Stop requiring explicit management of sysctl directories
  sysctl: Add a root pointer to ctl_table_set
  sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
  sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
  sysctl: Normalize the root_table data structure.
  sysctl: Factor out insert_header and erase_header
  ...
2012-03-23 18:08:58 -07:00
Linus Torvalds dae430c6f6 A bunch of fixes/updates for the AMD side of EDAC including
* MCE decoding updates
 * tree-wide EDAC sweep making pci_device_ids __devinitconst
 * Scrub rate API correction
 * two amd64_edac corrections for K8 boxes and sysfs csrow nodes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPZ0GkAAoJEBLB8Bhh3lVKjr4P/2q+7iZY8skYJqjrGel1vpKG
 aMwn1DTv9Mhy2omE+UDSuMVWrZ8CI/G+5aW5j6uulj6uko+3Q1aQtkN3zQhoa1RG
 7Q2X4t0skRnr7xOJKIYour+ufX4MLoVdEZqu4jvIFoniAQEXbDfbgT7KnA/hRwFp
 i04GaFRsPb97cvi7OfMpczTVan3lDUhT81kvdlrVZGhu3dVIkZw//hmsrt98SF1G
 qeBSvY/ji45D/mro6QMDZcN5+AWgLuI4ivM3Bk1tFgxs6KVaHcCOfPC8szQAuupl
 fQl/N4sHyj7BuTHlCWEog+V67ifpZcb5peya9Vs5nArhj/GzFmVhE0JBgFG7aCXQ
 hnAEhmgUI5NdHhLmrmGWSPOEQbG+V53/mBEk45sdH9zzl7yBH20gILdYFejmwEuX
 HfTTugIby4ncsw9Gkmi2rx33wNuzdOsXNRILk56bSf9X3e1m7LDtZV6EMeQ9I3D9
 ONjrrmNrkz5AmI765IsGs6vJT6ZkpZk6DNbykEOTPtoOhwiCSsbapowh9NpdhL1k
 EwUuZfyWoulVeYJw+m6a6Aw9bp2stj694p0tfdXh4C9g09Qlvx2RANBAgEJQE/oD
 5rWE/p/HiDFuMd7IrRuLaC1vV7YnWNX/EHLvtV7Ei2qslaK9XxFIb9p7jqyUIpoj
 nwUDJYIB4Dbk+NSV34N3
 =Q+wu
 -----END PGP SIGNATURE-----

Merge tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull AMD64 EDAC fixes from Borislav Petkov:
 "A bunch of fixes/updates for the AMD side of EDAC including

   * MCE decoding updates
   * tree-wide EDAC sweep making pci_device_ids __devinitconst
   * Scrub rate API correction
   * two amd64_edac corrections for K8 boxes and sysfs csrow nodes"

* tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  MCE, AMD: Constify error tables
  MCE, AMD: Correct bank 5 error signatures
  MCE, AMD: Rework NB MCE signatures
  MCE, AMD: Correct VB data error description
  MCE, AMD: Correct ucode patch buffer description
  MCE, AMD: Correct some MC0 error types
  EDAC: Make pci_device_id tables __devinitconst.
  EDAC: Correct scrub rate API
  amd64_edac: Fix K8 revD and later chip select sizes
  amd64_edac: Fix missing csrows sysfs nodes
2012-03-23 17:59:47 -07:00
Linus Torvalds cf821923ba Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
Pull cpufreq updates for 3.4 from Dave Jones: new drivers and some fixes.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  provide disable_cpufreq() function to disable the API.
  EXYNOS5250: Add support cpufreq for EXYNOS5250
  EXYNOS4X12: Add support cpufreq for EXYNOS4X12
  [CPUFREQ] CPUfreq ondemand: update sampling rate without waiting for next sampling
  [CPUFREQ] Add S3C2416/S3C2450 cpufreq driver
  [CPUFREQ] Fix exposure of ARM_EXYNOS4210_CPUFREQ
  [CPUFREQ] EXYNOS4210: update the name of EXYNOS clock register
  [CPUFREQ] EXYNOS: Initialize locking_frequency with initial frequency
  [CPUFREQ] s3c64xx: Fix mis-cherry pick of VDDINT

Fix up trivial conflicts in Kconfig and Makefile due to just changes
next to each other (OMAP2PLUS changes vs some new EXYNOS cpufreq
drivers).
2012-03-23 17:56:39 -07:00
Linus Torvalds 4416b0eaa3 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
Pull cpufreq fixes from Dave Jones:
 "I meant to get some of these in for 3.3 final, but left things too
  late, so I've got two trees this time."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  cpufreq: OMAP: specify range for voltage scaling
  cpufreq: OMAP: scale voltage along with frequency
  cpufreq: OMAP driver depends CPUfreq tables
2012-03-23 17:51:50 -07:00
Linus Torvalds 24613ff927 Merge branch 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm
Pull #3 ARM updates from Russell King:
 "This adds gpio support to soc_common, allowing an amount of code to be
  deleted from each PCMCIA socket driver for the PXA/SA11x0 SoCs."

* 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm:
  PCMCIA: sa1111: rename sa1111 socket drivers to have sa1111_ prefix.
  PCMCIA: make lubbock socket driver part of sa1111_cs
  PCMCIA: add Kconfig control for building sa11xx_base.c
  PCMCIA: sa1111: jornada720: no need to disable IRQs around sa1111_set_io
  PCMCIA: sa1111: pass along sa1111_pcmcia_configure_socket() failure code
  PCMCIA: soc_common: remove explicit wrprot initialization in socket drivers
  PCMCIA: soc_common: remove soc_pcmcia_*_irqs functions
  PCMCIA: sa11x0: h3600: convert to use new irq/gpio management
  PCMCIA: sa11x0: simpad: convert to use new irq/gpio management
  PCMCIA: sa11x0: shannon: convert to use new irq/gpio management
  PCMCIA: sa11x0: nanoengine: convert reset handling to use GPIO subsystem
  PCMCIA: sa11x0: nanoengine: convert to use new irq/gpio management
  PCMCIA: sa11x0: cerf: convert reset handling to use GPIO subsystem
  PCMCIA: sa11x0: cerf: convert to use new irq/gpio management
  PCMCIA: sa11x0: assabet: convert to use new irq/gpio management
  PCMCIA: sa1111: use new per-socket irq/gpio infrastructure
  PCMCIA: pxa: convert PXA socket drivers to use new irq/gpio management
  PCMCIA: soc_common: add GPIO support for card status signals
  PCMCIA: soc_common: move common initialization into soc_common
2012-03-23 17:37:40 -07:00
Linus Torvalds 0d19eac120 Merge branch 'amba' of git://git.linaro.org/people/rmk/linux-arm
Pull #2 ARM updates from Russell King:
 "Further ARM AMBA primecell updates which aren't included directly in
  the previous commit.  I wanted to keep these separate as they're
  touching stuff outside arch/arm/."

* 'amba' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7362/1: AMBA: Add module_amba_driver() helper macro for amba_driver
  ARM: 7335/1: mach-u300: do away with MMC config files
  ARM: 7280/1: mmc: mmci: Cache MMCICLOCK and MMCIPOWER register
  ARM: 7309/1: realview: fix unconnected interrupts on EB11MP
  ARM: 7230/1: mmc: mmci: Fix PIO read for small SDIO packets
  ARM: 7227/1: mmc: mmci: Prepare for SDIO before setting up DMA job
  ARM: 7223/1: mmc: mmci: Fixup use of runtime PM and use autosuspend
  ARM: 7221/1: mmc: mmci: Change from using legacy suspend
  ARM: 7219/1: mmc: mmci: Change vdd_handler to a generic ios_handler
  ARM: 7218/1: mmc: mmci: Provide option to configure bus signal direction
  ARM: 7217/1: mmc: mmci: Put power register deviations in variant data
  ARM: 7216/1: mmc: mmci: Do not release spinlock in request_end
  ARM: 7215/1: mmc: mmci: Increase max_segs from 16 to 128
2012-03-23 17:36:29 -07:00
Linus Torvalds 56c10bf82c Merge branch 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm
Pull #1 ARM updates from Russell King:
 "This one covers stuff which Arnd is waiting for me to push, as this is
  shared between both our trees and probably other trees elsewhere.

  Essentially, this contains:
   - AMBA primecell device initializer updates - mostly shrinking the
     size of the device declarations in platform code to something more
     reasonable.
   - Getting rid of the NO_IRQ crap from AMBA primecell stuff.
   - Nicolas' idle cleanups.  This in combination with the restart
     cleanups from the last merge window results in a great many
     mach/system.h files being deleted."

Yay: ~80 files, ~2000 lines deleted.

* 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm: (60 commits)
  ARM: remove disable_fiq and arch_ret_to_user macros
  ARM: make entry-macro.S depend on !MULTI_IRQ_HANDLER
  ARM: rpc: make default fiq handler run-time installed
  ARM: make arch_ret_to_user macro optional
  ARM: amba: samsung: use common amba device initializers
  ARM: amba: spear: use common amba device initializers
  ARM: amba: nomadik: use common amba device initializers
  ARM: amba: u300: use common amba device initializers
  ARM: amba: lpc32xx: use common amba device initializers
  ARM: amba: netx: use common amba device initializers
  ARM: amba: bcmring: use common amba device initializers
  ARM: amba: ep93xx: use common amba device initializers
  ARM: amba: omap2: use common amba device initializers
  ARM: amba: integrator: use common amba device initializers
  ARM: amba: realview: get rid of private platform amba_device initializer
  ARM: amba: versatile: get rid of private platform amba_device initializer
  ARM: amba: vexpress: get rid of private platform amba_device initializer
  ARM: amba: provide common initializers for static amba devices
  ARM: amba: make use of -1 IRQs warn
  ARM: amba: u300: get rid of NO_IRQ initializers
  ...
2012-03-23 17:30:49 -07:00
Linus Torvalds bab2d8c602 OpenRISC changes for 3.4
This series for the OpenRISC architecture consists of mostly trivial fixups.
 The most interesting bits of the series are:
 
 * A fix to the timer code whereby the shortest trigger period is set to
   100 cycles; previously, it was possible to set this to 1 cycle, but by
   the time the register was written, that time had already passed and the
   timer interrupt would not go off until the cycle counter had gone a full
   cycle.
 
 * Allowing a device tree binary to be passed in to the kernel from u-boot.
   The OpenRISC architecture has been recently merged into upstream u-boot,
   so this change gets OpenRISC Linux into sync with that project.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iEYEABECAAYFAk9sO1EACgkQ70gcjN2673MC3wCgrCMVzL0htkO7o6oYAP2+oMeh
 FKIAnRZ+8KFQ1/WM/jnPHLJxVClyJpBp
 =Uk9/
 -----END PGP SIGNATURE-----

Merge tag 'for-3.4' of git://openrisc.net/jonas/linux

Pull OpenRISC changes for 3.4 from Jonas Bonn:
 "This series for the OpenRISC architecture consists of mostly trivial
  fixups.  The most interesting bits of the series are:

  * A fix to the timer code whereby the shortest trigger period is set
    to 100 cycles; previously, it was possible to set this to 1 cycle,
    but by the time the register was written, that time had already
    passed and the timer interrupt would not go off until the cycle
    counter had gone a full cycle.

  * Allowing a device tree binary to be passed in to the kernel from
    u-boot.  The OpenRISC architecture has been recently merged into
    upstream u-boot, so this change gets OpenRISC Linux into sync with
    that project."

* tag 'for-3.4' of git://openrisc.net/jonas/linux:
  OpenRISC: Remove memory_start/end prototypes
  openrisc: remove semicolon from KSTK_ defs
  openrisc: sanitize use of orig_gpr11
  openrisc: fix virt_addr_valid
  OpenRISC: Export dump_stack()
  OpenRISC: Select GENERIC_ATOMIC64
  openrisc: Set shortest clock event to 100 ticks
  openrisc: included linux/thread_info.h twice
  OpenRISC: Use set_current_blocked() and block_sigmask()
  OpenRISC: Don't mask signals if we fail to setup signal stack
  OpenRISC: No need to reset handler if SA_ONESHOT
  OpenRISC: Don't reimplement force_sigsegv()
  openrisc: enable passing of flattened device tree pointer
  arch/openrisc/mm/init.c: trivial: use BUG_ON
2012-03-23 17:24:25 -07:00
Linus Torvalds 0e65ae099c Miscellaneous Itanium patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJPa7EVAAoJEKurIx+X31iBADgP/1eHhyb4A2ULOk19MhrEAb3/
 myoz76qVZvFC7i/5kmtkpAZqCYNnNfiBzr0vpd6MNq9mgFx5jna/IiaV8cZ08kcj
 JPknyrAoK8IgGgVQmi+pSHoaiNRAjnvkLwBsadLSFGmcjpi1BYlzmQrFYvMh5GAN
 J6j7+2Sv2YNec2QthZZ56L29QqQnbzg0DfS6ZTcOLSvitCnAsUb54Hoi/saK1c9r
 loUdWeVen8FPBSTfyqFtJLWJj1Yo3KxVoUgME+f7TeBQNRfihzDhMLuK+IQ5Bsp5
 /Z4+Pi4MxiTccYETWCLFWyS9cheK7tfR/SOpEd+XQYx7BQ2gLp/WMNd3q1iUTsg+
 AXr3/XOtmKWKAlu6Wy/oFUOjokPVm/yl9MSB6Io1bmlYfaCmb8I+Eyr+fJzPKLyI
 wVd+SQl02Z2PlC9dnr73Wu+bCrx0u9xQncucNEpZIpBZLGeuU0sPP0Y7nCGAOL5C
 NL7yQmb45pvT7kvzBfrrFSqEfWccycIL+CJw9kb+k1aspTxT0YBZPbpNTKj83aXW
 H4KAF9Av/ZHeacriDpxrywwMtTsTS/xLhxEdHB/XEzjkGLuB0R7ZlzvTSuLBbm/1
 fu8/bty4ehOHRqACIuNNMGbdIZ5ruThylSC4Yx2a3JRIJISZS/Bo/0mU5zLRRaH+
 XuEw3+mfhLq9oU8SdAug
 =78St
 -----END PGP SIGNATURE-----

Merge tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull miscellaneous Itanium patches from Tony Luck.

The conflicts in arch/ia64/hp/sim/simserial.c were due to patches to
simserial that had alredy been included (with lots of further cleanups)
in the serial tree.

* tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  Documentation/kernel-parameters: remove inttest parameter
  [IA64] Fix ISA IRQ trigger model and polarity setting
  [IA64] Fix a couple of warnings for EXPORT_SYMBOL
  [IA64] Check return from device_register() in cx_device_register()
  [IA64] Fix warning from machine_kexec.c
  [IA64] simserial, bail out when request_irq fails
  [IA64] hpsim, initialize chip for assigned irqs
  [IA64] simserial, include some headers
  [IA64] hpsim, fix SAL handling in fw-emu
  [IA64] genirq fixup for SGI/SN
  [IA64] disable interrupts when exiting from ia64_mca_cmc_int_handler()
2012-03-23 17:19:37 -07:00
Russell King bba1594d34 Merge branch 'mmci' into amba 2012-03-24 00:10:36 +00:00
Linus Torvalds 2fb9e96cad Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull additional x86 fixes from Peter Anvin:
 - address a long-standing bug related to when a kernel-spawned process
   gets a signal on an i386 kernel compiled without CONFIG_VM86.

 - fix the newly introduced build warning in arch/x86/boot.

 - fix a typo in the i386 system call table which affects building some
   libcs.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-32: Fix endless loop when processing signals for kernel tasks
  x86, boot: Correct CFLAGS for hostprogs
  x86-32: Fix typo for mq_getsetattr in syscall table
2012-03-23 17:10:09 -07:00
Linus Torvalds 8e3ade251b Merge branch 'akpm' (Andrew's patch-bomb)
Merge second batch of patches from Andrew Morton:
 - various misc things
 - core kernel changes to prctl, exit, exec, init, etc.
 - kernel/watchdog.c updates
 - get_maintainer
 - MAINTAINERS
 - the backlight driver queue
 - core bitops code cleanups
 - the led driver queue
 - some core prio_tree work
 - checkpatch udpates
 - largeish crc32 update
 - a new poll() feature for the v4l guys
 - the rtc driver queue
 - fatfs
 - ptrace
 - signals
 - kmod/usermodehelper updates
 - coredump
 - procfs updates

* emailed from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
  seq_file: add seq_set_overflow(), seq_overflow()
  proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().
  procfs: speed up /proc/pid/stat, statm
  procfs: add num_to_str() to speed up /proc/stat
  proc: speed up /proc/stat handling
  fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static
  coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP
  coredump: remove VM_ALWAYSDUMP flag
  kmod: make __request_module() killable
  kmod: introduce call_modprobe() helper
  usermodehelper: ____call_usermodehelper() doesn't need do_exit()
  usermodehelper: kill umh_wait, renumber UMH_* constants
  usermodehelper: implement UMH_KILLABLE
  usermodehelper: introduce umh_complete(sub_info)
  usermodehelper: use UMH_WAIT_PROC consistently
  signal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/
  signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()
  signal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths
  signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE
  Hexagon: use set_current_blocked() and block_sigmask()
  ...
2012-03-23 16:59:10 -07:00
KAMEZAWA Hiroyuki e075f59152 seq_file: add seq_set_overflow(), seq_overflow()
It is undocumented but a seq_file's overflow state is indicated by
m->count == m->size.  Add seq_set_overflow() and seq_overflow() to
set/check overflow status explicitly.

Based on an idea from Eric Dumazet.

[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Pravin B Shelar 1b26c9b334 proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().
The namespace cleanup path leaks a dentry which holds a reference count
on a network namespace.  Keeping that network namespace from being freed
when the last user goes away.  Leaving things like vlan devices in the
leaked network namespace.

If you use ip netns add for much real work this problem becomes apparent
pretty quickly.  It light testing the problem hides because frequently
you simply don't notice the leak.

Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.

This issue exists back to 3.0.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
KAMEZAWA Hiroyuki bda7bad62b procfs: speed up /proc/pid/stat, statm
Process accounting applications as top, ps visit some files under
/proc/<pid>.  With seq_put_decimal_ull(), we can optimize /proc/<pid>/stat
and /proc/<pid>/statm files.

This patch adds
  - seq_put_decimal_ll() for signed values.
  - allow delimiter == 0.
  - convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.

Test result on a system with 2000+ procs.

Before patch:
  [kamezawa@bluextal test]$ top -b -n 1 | wc -l
  2223
  [kamezawa@bluextal test]$ time top -b -n 1 > /dev/null

  real    0m0.675s
  user    0m0.044s
  sys     0m0.121s

  [kamezawa@bluextal test]$ time ps -elf > /dev/null

  real    0m0.236s
  user    0m0.056s
  sys     0m0.176s

After patch:
  kamezawa@bluextal ~]$ time top -b -n 1 > /dev/null

  real    0m0.657s
  user    0m0.052s
  sys     0m0.100s

  [kamezawa@bluextal ~]$ time ps -elf > /dev/null

  real    0m0.198s
  user    0m0.050s
  sys     0m0.145s

Considering top, ps tend to scan /proc periodically, this will reduce cpu
consumption by top/ps to some extent.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
KAMEZAWA Hiroyuki 1ac101a5d6 procfs: add num_to_str() to speed up /proc/stat
== stat_check.py
num = 0
with open("/proc/stat") as f:
        while num < 1000 :
                data = f.read()
                f.seek(0, 0)
                num = num + 1
==

perf shows

    20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
    13.41%  stat_check.py  [kernel.kallsyms]    [k] number
    12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
    10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
     4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
     4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf

This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().

On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.150s
user    0m0.026s
sys     0m0.121s

== After patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.055s
user    0m0.022s
sys     0m0.030s

[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrea Righi <andrea@betterlinux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Turner <pjt@google.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Eric Dumazet 59a32e2ce5 proc: speed up /proc/stat handling
On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
and is slow :

  # strace -T -o /tmp/STRACE cat /proc/stat | wc -c
  5826
  # grep "cpu " /tmp/STRACE
  read(0, "cpu  1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504>

Thats partly because show_stat() must be called twice since initial
buffer size is too small (4096 bytes for less than 32 possible cpus)

Fix this by :

 1) Taking into account nr_irqs in the initial buffer sizing.

 2) Using ksize() to allow better filling of initial buffer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Djalal Harouni b908243c54 fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static
get_sparsemem_vmemmap_info() is only used inside fs/proc/kcore.c

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Jason Baron accb61fe7b coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP
Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
for 'VM_NODUMP' flag.  The idea is is to add a new madvise() flag:
MADV_DONTDUMP, which can be set by applications to specifically request
memory regions which should not dump core.

The specific application I have in mind is qemu: we can add a flag there
that wouldn't dump all of guest memory when qemu dumps core.  This flag
might also be useful for security sensitive apps that want to absolutely
make sure that parts of memory are not dumped.  To clear the flag use:
MADV_DODUMP.

[akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
[akpm@linux-foundation.org: fix up the architectures which broke]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Jason Baron 909af768e8 coredump: remove VM_ALWAYSDUMP flag
The motivation for this patchset was that I was looking at a way for a
qemu-kvm process, to exclude the guest memory from its core dump, which
can be quite large.  There are already a number of filter flags in
/proc/<pid>/coredump_filter, however, these allow one to specify 'types'
of kernel memory, not specific address ranges (which is needed in this
case).

Since there are no more vma flags available, the first patch eliminates
the need for the 'VM_ALWAYSDUMP' flag.  The flag is used internally by
the kernel to mark vdso and vsyscall pages.  However, it is simple
enough to check if a vma covers a vdso or vsyscall page without the need
for this flag.

The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
'MADV_DONTDUMP', and unset via 'MADV_DODUMP'.  The core dump filters
continue to work the same as before unless 'MADV_DONTDUMP' is set on the
region.

The qemu code which implements this features is at:

  http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch

In my testing the qemu core dump shrunk from 383MB -> 13MB with this
patch.

I also believe that the 'MADV_DONTDUMP' flag might be useful for
security sensitive apps, which might want to select which areas are
dumped.

This patch:

The VM_ALWAYSDUMP flag is currently used by the coredump code to
indicate that a vma is part of a vsyscall or vdso section.  However, we
can determine if a vma is in one these sections by checking it against
the gate_vma and checking for a non-NULL return value from
arch_vma_name().  Thus, freeing a valuable vma bit.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Oleg Nesterov 1cc684ab75 kmod: make __request_module() killable
As Tetsuo Handa pointed out, request_module() can stress the system
while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE.

The task T uses "almost all" memory, then it does something which
triggers request_module().  Say, it can simply call sys_socket().  This
in turn needs more memory and leads to OOM.  oom-killer correctly
chooses T and kills it, but this can't help because it sleeps in
TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the
TIF_MEMDIE task T.

Make __request_module() killable.  The only necessary change is that
call_modprobe() should kmalloc argv and module_name, they can't live in
the stack if we use UMH_KILLABLE.  This memory is freed via
call_usermodehelper_freeinfo()->cleanup.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Oleg Nesterov 3e63a93b98 kmod: introduce call_modprobe() helper
No functional changes.  Move the call_usermodehelper code from
__request_module() into the new simple helper, call_modprobe().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Oleg Nesterov 5b9bd473e3 usermodehelper: ____call_usermodehelper() doesn't need do_exit()
Minor cleanup.  ____call_usermodehelper() can simply return, no need to
call do_exit() explicitely.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Oleg Nesterov 9d944ef32e usermodehelper: kill umh_wait, renumber UMH_* constants
No functional changes.  It is not sane to use UMH_KILLABLE with enum
umh_wait, but obviously we do not want another argument in
call_usermodehelper_* helpers.  Kill this enum, use the plain int.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Oleg Nesterov d0bd587a80 usermodehelper: implement UMH_KILLABLE
Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC.
The caller must ensure that subprocess_info->path/etc can not go away
until call_usermodehelper_freeinfo().

call_usermodehelper_exec(UMH_KILLABLE) does
wait_for_completion_killable.  If it fails, it uses
xchg(&sub_info->complete, NULL) to serialize with umh_complete() which
does the same xhcg() to access sub_info->complete.

If call_usermodehelper_exec wins, it can safely return.  umh_complete()
should get NULL and call call_usermodehelper_freeinfo().

Otherwise we know that umh_complete() was already called, in this case
call_usermodehelper_exec() falls back to wait_for_completion() which
should succeed "very soon".

Note: UMH_NO_WAIT == -1 but it obviously should not be used with
UMH_KILLABLE.  We delay the neccessary cleanup to simplify the back
porting.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Oleg Nesterov b344992250 usermodehelper: introduce umh_complete(sub_info)
Preparation.  Add the new trivial helper, umh_complete().  Currently it
simply does complete(sub_info->complete).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Oleg Nesterov 70834d3070 usermodehelper: use UMH_WAIT_PROC consistently
A few call_usermodehelper() callers use the hardcoded constant instead of
the proper UMH_WAIT_PROC, fix them.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Oleg Nesterov a02d6fd643 signal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/
Change zap_pid_ns_processes() to use SEND_SIG_FORCED, it looks more
clear compared to SEND_SIG_NOINFO which relies on from_ancestor_ns logic
send_signal().

It is also more efficient if we need to kill a lot of tasks because it
doesn't alloc sigqueue.

While at it, add the __fatal_signal_pending(task) check as a minor
optimization.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Oleg Nesterov d2d393099d signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()
Change oom_kill_task() to use do_send_sig_info(SEND_SIG_FORCED) instead
of force_sig(SIGKILL).  With the recent changes we do not need force_ to
kill the CLONE_NEWPID tasks.

And this is more correct.  force_sig() can race with the exiting thread
even if oom_kill_task() checks p->mm != NULL, while
do_send_sig_info(group => true) kille the whole process.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Oleg Nesterov def8cf7256 signal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths
Cosmetic, rename the from_ancestor_ns argument in prepare_signal()
paths.  After the previous change it doesn't match the reality.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Oleg Nesterov 629d362b99 signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE
force_sig_info() and friends have the special semantics for synchronous
signals, this interface should not be used if the target is not current.
And it needs the fixes, in particular the clearing of SIGNAL_UNKILLABLE
is not exactly right.

However there are callers which have to use force_ exactly because it
clears SIGNAL_UNKILLABLE and thus it can kill the CLONE_NEWPID tasks,
although this is almost always is wrong by various reasons.

With this patch SEND_SIG_FORCED ignores SIGNAL_UNKILLABLE, like we do if
the signal comes from the ancestor namespace.

This makes the naming in prepare_signal() paths insane, fixed by the
next cleanup.

Note: this only affects SIGKILL/SIGSTOP, but this is enough for
force_sig() abusers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Matt Fleming 43aca3246c Hexagon: use set_current_blocked() and block_sigmask()
As described in e6fa16ab9c ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block is
pending in the shared queue.

Also, use the new helper function introduced in commit 5e6292c0f2
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate
code across architectures.  In the past some architectures got this code
wrong, so using this helper function should stop that from happening
again.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Denys Vlasenko ee00560c7d ptrace: remove PTRACE_SEIZE_DEVEL bit
PTRACE_SEIZE code is tested and ready for production use, remove the
code which requires special bit in data argument to make PTRACE_SEIZE
work.

Strace team prepares for a new release of strace, and we would like to
ship the code which uses PTRACE_SEIZE, preferably after this change goes
into released kernel.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Denys Vlasenko 5cdf389aee ptrace: renumber PTRACE_EVENT_STOP so that future new options and events can match
PTRACE_EVENT_foo and PTRACE_O_TRACEfoo used to match.

New PTRACE_EVENT_STOP is the first event which has no corresponding
PTRACE_O_TRACE option.  If we will ever want to add another such option,
its PTRACE_EVENT's value will collide with PTRACE_EVENT_STOP's value.

This patch changes PTRACE_EVENT_STOP value to prevent this.

While at it, added a comment - the one atop PTRACE_EVENT block, saying
"Wait extended result codes for the above trace options", is not true
for PTRACE_EVENT_STOP.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
Denys Vlasenko aa9147c98f ptrace: make PTRACE_SEIZE set ptrace options specified in 'data' parameter
This can be used to close a few corner cases in strace where we get
unwanted racy behavior after attach, but before we have a chance to set
options (the notorious post-execve SIGTRAP comes to mind), and removes
the need to track "did we set opts for this task" state in strace
internals.

While we are at it:

Make it possible to extend SEIZE in the future with more functionality
by passing non-zero 'addr' parameter.  To that end, error out if 'addr'
is non-zero.  PTRACE_ATTACH did not (and still does not) have such
check, and users (strace) do pass garbage there...  let's avoid
repeating this mistake with SEIZE.

Set all task->ptrace bits in one operation - before this change, we were
adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops.  This
was probably ok (not a bug), but let's be on a safer side.

Changes since v2: use (unsigned long) casts instead of (long) ones, move
PTRACE_SEIZE_DEVEL-related code to separate lines of code.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Pedro Alves <palves@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:40 -07:00
Denys Vlasenko 86b6c1f301 ptrace: simplify PTRACE_foo constants and PTRACE_SETOPTIONS code
Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
PT_option bits contiguous and therefore makes code in
ptrace_setoptions() much simpler.

Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
instead of using explicit numeric constants, to ensure we don't mess up
relationship between bit positions and event ids.

PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is easier to use.

PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:40 -07:00
Denys Vlasenko 8c5cf9e5c5 ptrace: don't modify flags on PTRACE_SETOPTIONS failure
On ptrace(PTRACE_SETOPTIONS, pid, 0, <opts>), we used to set those
option bits which are known, and then fail with -EINVAL if there are
some unknown bits in <opts>.

This is inconsistent with typical error handling, which does not change
any state if input is invalid.

This patch changes PTRACE_SETOPTIONS behavior so that in this case, we
return -EINVAL and don't change any bits in task->ptrace.

It's very unlikely that there is userspace code in the wild which will
be affected by this change: it should have the form

    ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_BOGUSOPT)

where PTRACE_O_BOGUSOPT is a constant unknown to the kernel.  But kernel
headers, naturally, don't contain any PTRACE_O_BOGUSOPTs, thus the only
way userspace can use one if it defines one itself.  I can't see why
anyone would do such a thing deliberately.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:40 -07:00
Oleg Nesterov b1845ff53f ptrace: don't send SIGTRAP on exec if SEIZED
ptrace_event(PTRACE_EVENT_EXEC) sends SIGTRAP if PT_TRACE_EXEC is not
set.  This is because this SIGTRAP predates PTRACE_O_TRACEEXEC option,
we do not need/want this with PT_SEIZED which can set the options during
attach.

Suggested-by: Pedro Alves <palves@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Chris Evans <scarybeasts@gmail.com>
Cc: Indan Zupancic <indan@nul.nu>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:40 -07:00
Oleg Nesterov 15cab95213 ptrace: the killed tracee should not enter the syscall
Another old/known problem.  If the tracee is killed after it reports
syscall_entry, it starts the syscall and debugger can't control this.
This confuses the users and this creates the security problems for
ptrace jailers.

Change tracehook_report_syscall_entry() to return non-zero if killed,
this instructs syscall_trace_enter() to abort the syscall.

Reported-by: Chris Evans <scarybeasts@gmail.com>
Tested-by: Indan Zupancic <indan@nul.nu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:40 -07:00
Namjae Jeon d533df07c2 fat: fix bug in enforcing Long File Name length
Since '*outlen' is initialized to zero, it is currently possible to
create a filename of length (FAT_LFN_LEN + 1) when utf8 is not enabled.
To enforce the FAT_LFN_LEN limit, we must perform one less iteration.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Ravishankar N <cyberax82@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:40 -07:00
Namjae Jeon 41f0c02eac fat: clean up xlate_to_uni()
xlate_to_uni() is called by vfat_build_slots() with sbi->nls_io as the
final argument.  nls_io can never be null at this point because the
check is already being done in fat_fill_super() wherein the mount fails
if it is null.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Ravishankar N <cyberax82@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:40 -07:00