Commit Graph

2759 Commits

Author SHA1 Message Date
Hugh Dickins 5f30fc94ca lib/radix-tree.c: swapoff tmpfs radix_tree: remember to rcu_read_unlock
Running fsx on tmpfs with concurrent memhog-swapoff-swapon, lots of

  BUG: sleeping function called from invalid context at kernel/fork.c:606
  in_atomic(): 0, irqs_disabled(): 0, pid: 1394, name: swapoff
  1 lock held by swapoff/1394:
   #0:  (rcu_read_lock){.+.+.+}, at: [<ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6

followed by

  ================================================
  [ BUG: lock held when returning to user space! ]
  3.14.0-rc1 #3 Not tainted
  ------------------------------------------------
  swapoff/1394 is leaving the kernel with locks still held!
  1 lock held by swapoff/1394:
   #0:  (rcu_read_lock){.+.+.+}, at: [<ffffffff812520a1>] radix_tree_locate_item+0x1f/0x2b6

after which the system recovered nicely.

Whoops, I long ago forgot the rcu_read_unlock() on one unlikely branch.

Fixes e504f3fdd6 ("tmpfs radix_tree: locate_item to speed up swapoff")

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04 07:55:47 -08:00
Dan Williams 3b7a6418c7 dma debug: account for cachelines and read-only mappings in overlap tracking
While debug_dma_assert_idle() checks if a given *page* is actively
undergoing dma the valid granularity of a dma mapping is a *cacheline*.
Sander's testing shows that the warning message "DMA-API: exceeded 7
overlapping mappings of pfn..." is falsely triggering.  The test is
simply mapping multiple cachelines in a given page.

Ultimately we want overlap tracking to be valid as it is a real api
violation, so we need to track active mappings by cachelines.  Update
the active dma tracking to use the page-frame-relative cacheline of the
mapping as the key, and update debug_dma_assert_idle() to check for all
possible mapped cachelines for a given page.

However, the need to track active mappings is only relevant when the
dma-mapping is writable by the device.  In fact it is fairly standard
for read-only mappings to have hundreds or thousands of overlapping
mappings at once.  Limiting the overlap tracking to writable
(!DMA_TO_DEVICE) eliminates this class of false-positive overlap
reports.

Note, the radix gang lookup is sub-optimal.  It would be best if it
stopped fetching entries once the search passed a page boundary.
Nevertheless, this implementation does not perturb the original net_dma
failing case.  That is to say the extra overhead does not show up in
terms of making the failing case pass due to a timing change.

References:
  http://marc.info/?l=linux-netdev&m=139232263419315&w=2
  http://marc.info/?l=linux-netdev&m=139217088107122&w=2

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04 07:55:47 -08:00
Bjorn Helgaas d19cb803a2 vsprintf: Add support for IORESOURCE_UNSET in %pR
Sometimes we have a struct resource where we know the type (MEM/IO/etc.)
and the size, but we haven't assigned address space for it.  The
IORESOURCE_UNSET flag is a way to indicate this situation.  For these
"unset" resources, the start address is meaningless, so print only the
size, e.g.,

  - pci 0000:0c:00.0: reg 184: [mem 0x00000000-0x00001fff 64bit]
  + pci 0000:0c:00.0: reg 184: [mem size 0x2000 64bit]

For %pr (printing with raw flags), we still print the address range,
because %pr is mostly used for debugging anyway.

Thanks to Fengguang Wu <fengguang.wu@intel.com> for suggesting
resource_size().

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-02-26 14:42:09 -07:00
Paul E. McKenney 0af3fe1efa locktorture: Add a lock-torture kernel module
This commit adds the locking counterpart to rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Make n_lock_torture_errors and torture_spinlock static
  as suggested by Fengguang Wu. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:04:29 -08:00
Paul E. McKenney 51b1130eb5 rcutorture: Abstract rcu_torture_random()
Because rcu_torture_random() will be used by the locking equivalent to
rcutorture, pull it out into its own module.  This new module cannot
be separately configured, instead, use the Kconfig "select" statement
from the Kconfig options of tests depending on it.

Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-02-23 09:00:58 -08:00
Greg Kroah-Hartman 91219a3b20 Merge 3.14-rc3 into driver-core-next
We want those fixes here for testing and development.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-18 08:57:10 -08:00
Andreas Gruenbacher 05f7a7d6a7 idr: Add new function idr_is_empty()
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:27:47 +01:00
Linus Torvalds 5e57dc8110 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block IO fixes from Jens Axboe:
 "Second round of updates and fixes for 3.14-rc2.  Most of this stuff
  has been queued up for a while.  The notable exception is the blk-mq
  changes, which are naturally a bit more in flux still.

  The pull request contains:

   - Two bug fixes for the new immutable vecs, causing crashes with raid
     or swap.  From Kent.

   - Various blk-mq tweaks and fixes from Christoph.  A fix for
     integrity bio's from Nic.

   - A few bcache fixes from Kent and Darrick Wong.

   - xen-blk{front,back} fixes from David Vrabel, Matt Rushton, Nicolas
     Swenson, and Roger Pau Monne.

   - Fix for a vec miscount with integrity vectors from Martin.

   - Minor annotations or fixes from Masanari Iida and Rashika Kheria.

   - Tweak to null_blk to do more normal FIFO processing of requests
     from Shlomo Pongratz.

   - Elevator switching bypass fix from Tejun.

   - Softlockup in blkdev_issue_discard() fix when !CONFIG_PREEMPT from
     me"

* 'for-linus' of git://git.kernel.dk/linux-block: (31 commits)
  block: add cond_resched() to potentially long running ioctl discard loop
  xen-blkback: init persistent_purge_work work_struct
  blk-mq: pair blk_mq_start_request / blk_mq_requeue_request
  blk-mq: dont assume rq->errors is set when returning an error from ->queue_rq
  block: Fix cloning of discard/write same bios
  block: Fix type mismatch in ssize_t_blk_mq_tag_sysfs_show
  blk-mq: rework flush sequencing logic
  null_blk: use blk_complete_request and blk_mq_complete_request
  virtio_blk: use blk_mq_complete_request
  blk-mq: rework I/O completions
  fs: Add prototype declaration to appropriate header file include/linux/bio.h
  fs: Mark function as static in fs/bio-integrity.c
  block/null_blk: Fix completion processing from LIFO to FIFO
  block: Explicitly handle discard/write same segments
  block: Fix nr_vecs for inline integrity vectors
  blk-mq: Add bio_integrity setup to blk_mq_make_request
  blk-mq: initialize sg_reserved_size
  blk-mq: handle dma_drain_size
  blk-mq: divert __blk_put_request for MQ ops
  blk-mq: support at_head inserations for blk_execute_rq
  ...
2014-02-14 10:45:18 -08:00
Andi Kleen a7330c997d asmlinkage Make __stack_chk_failed and memcmp visible
In LTO symbols implicitely referenced by the compiler need
to be visible. Earlier these symbols were visible implicitely
from being exported, but we disabled implicit visibility fo
 EXPORTs when modules are disabled to improve code size. So
now these symbols have to be marked visible explicitely.

Do this for __stack_chk_fail (with stack protector)
and memcmp.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391845930-28580-10-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13 18:13:43 -08:00
Tejun Heo a8fa94e0f2 Merge branch 'master' into driver-core-next-test-merge-rc2
da9846ae15 ("kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP
flag") in driver-core-linus conflicts with kernfs_drain() updates in
driver-core-next.  The former just adds the missing KERNFS_LOCKDEP
checks which are already handled by kernfs_lockdep() checks in
driver-core-next.  The conflict can be resolved by taking code from
driver-core-next.

Conflicts:
	fs/kernfs/dir.c
2014-02-10 19:34:30 -05:00
Linus Torvalds c1ff84317f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Quite a varied little collection of fixes.  Most of them are
  relatively small or isolated; the biggest one is Mel Gorman's fixes
  for TLB range flushing.

  A couple of AMD-related fixes (including not crashing when given an
  invalid microcode image) and fix a crash when compiled with gcov"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Unify valid container checks
  x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
  x86/efi: Allow mapping BGRT on x86-32
  x86: Fix the initialization of physnode_map
  x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
  x86/intel/mid: Fix X86_INTEL_MID dependencies
  arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
  mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
  x86: mm: change tlb_flushall_shift for IvyBridge
  x86/mm: Eliminate redundant page table walk during TLB range flushing
  x86/mm: Clean up inconsistencies when flushing TLB ranges
  mm, x86: Account for TLB flushes only when debugging
  x86/AMD/NB: Fix amd_set_subcaches() parameter type
  x86/quirks: Add workaround for AMD F16h Erratum792
  x86, doc, kconfig: Fix dud URL for Microcode data
2014-02-08 11:54:43 -08:00
Tejun Heo fa4cd451cc sysfs, kobject: add sysfs wrapper for kernfs_enable_ns()
Currently, kobject is invoking kernfs_enable_ns() directly.  This is
fine now as sysfs and kernfs are enabled and disabled together.  If
sysfs is disabled, kernfs_enable_ns() is switched to dummy
implementation too and everything is fine; however, kernfs will soon
have its own config option CONFIG_KERNFS and !SYSFS && KERNFS will be
possible, which can make kobject call into non-dummy
kernfs_enable_ns() with NULL kernfs_node pointers leading to an oops.

Introduce sysfs_enable_ns() which is a wrapper around
kernfs_enable_ns() so that it can be made a noop depending only on
CONFIG_SYSFS regardless of the planned CONFIG_KERNFS.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07 16:08:57 -08:00
H. Peter Anvin a3b072cd18 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJS9P2pAAoJEC84WcCNIz1VABsP/j0fwiWdaoEn/9p9EUQcBcMn
 CILuiKI8GoBR+0nH7vm/jSgUKfUxzM6T0+qjoZPVkO/u/qup+JCT5JxoywUyEbfb
 +wPNIhBgKSwJdWXPd4ZvObA9jknrP6r5sJTsixpESVpVWmcC8egPgkyBFILjokKS
 BCJqqruHZcXfWaDQTocYErl3T217J7RfnCG1o7yP96g4jteX8EdvIkPjEf2mM83I
 GXacHLy8IhGhdyPKSXcRqZ0ivhZ2WX1iFQYIY19FhmzBs364WulyXve+oV+l/4h4
 Kx7ks4Ob5AlUIizchGNwVz7058F9o/v7m0CezTgi4Q/RrZi34samxbjk/95/Xc60
 JSvWkekm3/jOODubad5zj+ZnJmG83/ZlCUuTaqsE47ftbaedSUHBN9QSpm8iHEex
 n3d4J3AdP/3amcP33kZ5MRALDYIFKb4ZxtDkADqDcXhS56COivGAdZe5hnyCpb/9
 RPUDXTOlxfJQK/y2Atcdb400JjJ/Yr9Kew81LRIt0UMZU3dKSh05UZ+a7Ym0yCkt
 3k0NNkgsFCZbYTO/Z3aPDcprwU5Lq9UrwjB17U2ev/qK+qRYDzCzSR0XGPrLMRv7
 C5Bnov6uCn/0ZG/NlAx8UXK9wdWDsLhp1QkBz+daX3sGwRAS+OiKBv6+l8dqsdOc
 1L8PMkTX2rgtELiv4PJ/
 =hLCC
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-07 11:27:30 -08:00
Peter Oberparleiter 6583327c4d x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
Commit d61931d89b, "x86: Add optimized popcnt variants" introduced
compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
options -fprofile-arcs and -O2, this flag causes gcc to generate
broken constructor code. As a result, a 64 bit x86 kernel compiled
with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
file" and runs into sproadic BUGs during boot.

The gcc people indicate that these kinds of problems are endemic when
using ad hoc calling conventions.  It is therefore best to treat any
file compiled with ad hoc calling conventions as an isolated
environment and avoid things like profiling or coverage analysis,
since those subsystems assume a "normal" calling conventions.

This patch avoids the bug by excluding lib/hweight.o from coverage
profiling.

Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
2014-02-06 07:15:20 -08:00
Linus Torvalds 12b13835a0 kbuild: don't enable DEBUG_INFO when building for COMPILE_TEST
It really isn't very interesting to have DEBUG_INFO when doing compile
coverage stuff (you wouldn't want to run the result anyway, that's kind
of the whole point of COMPILE_TEST), and it currently makes the build
take longer and use much more disk space for "all{yes,mod}config".

There's somewhat active discussion about this still, and we might end up
with some new config option for things like this (Andi points out that
the silly X86_DECODER_SELFTEST option also slows down the normal
coverage tests hugely), but I'm starting the ball rolling with this
simple one-liner.

DEBUG_INFO isn't that noticeable if you have tons of memory and a good
IO subsystem, but it hurts you a lot if you don't - for very little
upside for the common use.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-04 12:20:01 -08:00
Helge Deller 8a10bc9d27 parisc/sti_console: prefer Linux fonts over built-in ROM fonts
The built-in ROM fonts lack many necessary ASCII characters, which is
why it makes sens to prefer the Linux fonts instead if they are
available.  This makes consoles on STI graphics cards which are not
supported by the stifb driver (e.g. Visualize FXe) looks much nicer.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v3.13
2014-02-02 20:56:47 +01:00
Linus Torvalds 4e13c5d021 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

  - add support for SCSI Referrals (Hannes)
  - add support for T10 DIF into target core (nab + mkp)
  - add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
  - add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab)
  - prep changes to iser-target for >= v3.15 T10 DIF support (Sagi)
  - add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
  - allow percpu_ida_alloc() to receive task state bitmask (Kent)
  - fix >= v3.12 iscsi-target session reset hung task regression (nab)
  - fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab)
  - fix a long-standing network portal creation race (Andy)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  target: Fix percpu_ref_put race in transport_lun_remove_cmd
  target/iscsi: Fix network portal creation race
  target: Report bad sector in sense data for DIF errors
  iscsi-target: Convert gfp_t parameter to task state bitmask
  iscsi-target: Fix connection reset hang with percpu_ida_alloc
  percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
  iscsi-target: Pre-allocate more tags to avoid ack starvation
  qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
  qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO.
  qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
  IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
  IB/isert: Move fastreg descriptor creation to a function
  IB/isert: Avoid frwr notation, user fastreg
  IB/isert: seperate connection protection domains and dma MRs
  tcm_loop: Enable DIF/DIX modes in SCSI host LLD
  target/rd: Add DIF protection into rd_execute_rw
  target/rd: Add support for protection SGL setup + release
  target/rd: Refactor rd_build_device_space + rd_release_device_space
  target/file: Add DIF protection support to fd_execute_rw
  target/file: Add DIF protection init/format support
  ...
2014-01-31 15:31:23 -08:00
Linus Torvalds e7651b819e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is a pretty big pull, and most of these changes have been
  floating in btrfs-next for a long time.  Filipe's properties work is a
  cool building block for inheriting attributes like compression down on
  a per inode basis.

  Jeff Mahoney kicked in code to export filesystem info into sysfs.

  Otherwise, lots of performance improvements, cleanups and bug fixes.

  Looks like there are still a few other small pending incrementals, but
  I wanted to get the bulk of this in first"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
  Btrfs: fix spin_unlock in check_ref_cleanup
  Btrfs: setup inode location during btrfs_init_inode_locked
  Btrfs: don't use ram_bytes for uncompressed inline items
  Btrfs: fix btrfs_search_slot_for_read backwards iteration
  Btrfs: do not export ulist functions
  Btrfs: rework ulist with list+rb_tree
  Btrfs: fix memory leaks on walking backrefs failure
  Btrfs: fix send file hole detection leading to data corruption
  Btrfs: add a reschedule point in btrfs_find_all_roots()
  Btrfs: make send's file extent item search more efficient
  Btrfs: fix to catch all errors when resolving indirect ref
  Btrfs: fix protection between walking backrefs and root deletion
  btrfs: fix warning while merging two adjacent extents
  Btrfs: fix infinite path build loops in incremental send
  btrfs: undo sysfs when open_ctree() fails
  Btrfs: fix snprintf usage by send's gen_unique_name
  btrfs: fix defrag 32-bit integer overflow
  btrfs: sysfs: list the NO_HOLES feature
  btrfs: sysfs: don't show reserved incompat feature
  btrfs: call permission checks earlier in ioctls and return EPERM
  ...
2014-01-30 20:08:20 -08:00
Shaohua Li d835502f3d percpu_ida: fix a live lock
steal_tags only happens when free tags is more than half of the total
tags.  This is too strict and can cause live lock. I found that if one
cpu has free tags, but other cpu can't steal (thread is bound to
specific cpus), threads which want to allocate tags are always
sleeping. I found this when I run next patch, but this could happen
without it I think.

I did performance test too with null_blk. Two cases (each cpu has enough
percpu tags, or total tags are limited), no performance changes were
observed.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2014-01-30 12:57:25 -07:00
Dan Williams 59f2e7df57 dma-debug: fix overlap detection
Commit 0abdd7a81b ("dma-debug: introduce debug_dma_assert_idle()") was
reworked to expand the overlap counter to the full range expressable by
3 tag bits, but it has a thinko in treating the overlap counter as a
pure reference count for the entry.

Instead of deleting when the reference-count drops to zero, we need to
delete when the overlap-count drops below zero.  Also, when detecting
overflow we can just test the overlap-count > MAX rather than applying
special meaning to 0.

Regression report available here:
http://marc.info/?l=linux-netdev&m=139073373932386&w=2

This patch, now tested on the original net_dma case, sees the expected
handful of reports before the eventual data corruption occurs.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-29 16:22:40 -08:00
Lad, Prabhakar 0368dfd01a lib/genalloc.c: add check gen_pool_dma_alloc() if dma pointer is not NULL
In the gen_pool_dma_alloc() the dma pointer can be NULL and while
assigning gen_pool_virt_to_phys(pool, vaddr) to dma caused the following
crash on da850 evm:

   Unable to handle kernel NULL pointer dereference at virtual address 00000000
   Internal error: Oops: 805 [#1] PREEMPT ARM
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper Tainted: G        W    3.13.0-rc1-00001-g0609e45-dirty #5
   task: c4830000 ti: c4832000 task.ti: c4832000
   PC is at gen_pool_dma_alloc+0x30/0x3c
   LR is at gen_pool_virt_to_phys+0x74/0x80
   Process swapper, call trace:
     gen_pool_dma_alloc+0x30/0x3c
     davinci_pm_probe+0x40/0xa8
     platform_drv_probe+0x1c/0x4c
     driver_probe_device+0x98/0x22c
     __driver_attach+0x8c/0x90
     bus_for_each_dev+0x6c/0x8c
     bus_add_driver+0x124/0x1d4
     driver_register+0x78/0xf8
     platform_driver_probe+0x20/0xa4
     davinci_init_late+0xc/0x14
     init_machine_late+0x1c/0x28
     do_one_initcall+0x34/0x15c
     kernel_init_freeable+0xe4/0x1ac
     kernel_init+0x8/0xec

This patch fixes the above.

[akpm@linux-foundation.org: update kerneldoc]
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Nicolin Chen <b42378@freescale.com>
Cc: Joe Perches <joe@perches.com>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: <stable@vger.kernel.org>	[3.13.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-29 16:22:39 -08:00
Jeff Mahoney 29dfe2dc0e kobject: export kobj_sysfs_ops
struct kobj_attribute implements the baseline attribute functionality
that can be used all over the place. We should export the ops associated
with it.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:19:24 -08:00
Andrey Ryabinin 4592599af3 dynamic_debug: replace obselete simple_strtoul() with kstrtouint()
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:39 -08:00
Andrey Ryabinin 3ace678fd1 dynamic_debug: fix ddebug_parse_query()
This fixes following scenario:

  $ echo 'file dynamic_debug.c line 1-123 +p' > /sys/kernel/debug/dynamic_debug/control
  -bash: echo: write error: Invalid argument
  $ dmesg | grep dynamic_debug
  dynamic_debug:ddebug_parse_query: last-line:123 < 1st-line:1
  dynamic_debug:ddebug_parse_query: query parse failed

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:39 -08:00
Andrey Ryabinin d9e133e6f0 dynamic_debug: remove wrong error message
parse_lineno() returns either negative error code or zero.  We don't
need to print something here because if parse_lineno fails it will print
error message.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:39 -08:00
Yinghai Lu ad6492b80f memblock, nobootmem: add memblock_virt_alloc_low()
The new memblock_virt APIs are used to replaced old bootmem API.

We need to allocate page below 4G for swiotlb.

That should fix regression on Andrew's system that is using swiotlb.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:38 -08:00
Linus Torvalds ba6b5084e6 Bug-fixes:
- Don't DoS with 'swiotlb is full' message.
  - Documentation update.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS5nImAAoJEFjIrFwIi8fJ7vMH/Rh0rSL66ffVXy6XYOYIOlrZ
 6ZyBDFEdVhsH25tpVn+HlrIY8Fo6tb6e8Jnyh9qzVO487pWcawv+OyEzKaw+bbst
 aDO6rygwsca4DpdJWg6Q3kD+Kusi444eg/7h3jCMHIzW/g2fRmu9HVpXP6GSPqGB
 390113RYpF/KdEjuNL5DZqK/1ciE9IOwVnZcuR/aa2R7TswWWQ9yjWpY7GcNCYMT
 m7Gv/kf34a6UC/TPLLV6mtuIpZQRNtlPlhUW461/oGCEFgvakMR42AzFSgpaicdz
 45JnG0Gxr4rvOcNjCZPsJe6ehfyJFNgkF/p8LGqPcw2LGZvVHHToodCaPmZwGIs=
 =q2Dl
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb

Pull swiotlb bug-fixes from Konrad Rzeszutek Wilk:
 - Don't DoS with 'swiotlb is full' message.
 - Documentation update.

* tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: Don't DoS us with 'swiotlb buffer is full' (v2)
  swiotlb: update format
2014-01-27 08:17:09 -08:00
Linus Torvalds 028e219eff IEEE 1394 (FireWire) subsystem changes:
- make remote debugging over 1394 a runtime option instead of a
     buildtime option
 
   - extend remote debug access past the 4 GB barrier on respectively
     capable hardware
 
   - documentation update
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJS5mQyAAoJEHnzb7JUXXnQW/UQAI8+oZAmq67dTMNnASO3XJjc
 B4rO/xU1ISvf4uCO7gZb1KHUDem1XnkG1YLGtJbrlWqEwqsF6RizUSNtStaiVKo3
 WXu1OK74KvtyGgqZhpbn1OttYLac+Tj22XooPQ7q1fQ/ihzeODKEWnKpTv0RFaQf
 EMgKxorfDe7JTr8ZoSk5JJ/Vmg47RaPeymX+wQoLZRQtrSiKt3+wbH51XHOIacjj
 DJgmek9zjmd0S8D7uRyF7/35HdPazMh8uclI8HjSM9r7YfGhUetuPMRNIETbuLAj
 gj59kgJujCnMuuBQ2DBZWXjcZEUf6+0ttcMBVME6I2vKZ+M+Z5VsdbWZrfjvKzXN
 QF5wJpFVfVPoPOP2YUq3OGqK9mJ/rOPzhm2mpkvIe8qIiKne4EcHnLIlZBnGQxOL
 axI1FLlwjmqxL0WAnnfUWp3jqDfUJm53X+bRgMYjsyhZTYfurQuVgPcJ0odlEu7o
 47Al3VEV97hCJlGyNpp+T4G1Fpd8WpPH5FyDEwQm5N4P0Xl3r6llD+DTBdil2OYW
 nqe/tXKI1Xu71jXKq/qtBvIRwFNFgIsX5pYhLvXBnKSUYIT1j346Ya1WeaDBQ865
 EWMUVB+13UopLJgFA4qHVX/X35OXG3NJBuAWb7e1AMcXvfKniUrPPtycmN5oFlv1
 wksJpVNH8xBc8q+MANSy
 =82Pd
 -----END PGP SIGNATURE-----

Merge tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire updates from Stefan Richter:
 "IEEE 1394 (FireWire) subsystem changes:

   - make remote debugging over 1394 a runtime option instead of a
     buildtime option
   - extend remote debug access past the 4 GB barrier on respectively
     capable hardware
   - documentation update"

* tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: Enable remote DMA above 4 GB
  firewire: ohci: Turn remote DMA support into a module parameter
  Documentation/: update FireWire debugging documentation
2014-01-27 08:14:08 -08:00
Linus Torvalds 4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Nicholas Bellinger 555b270e25 iscsi-target: Fix connection reset hang with percpu_ida_alloc
This patch addresses a bug where connection reset would hang
indefinately once percpu_ida_alloc() was starved for tags, due
to the fact that it always assumed uninterruptible sleep mode.

So now make percpu_ida_alloc() check for signal_pending_state() for
making interruptible sleep optional, and convert iscsit_allocate_cmd()
to set TASK_INTERRUPTIBLE for GFP_KERNEL, or TASK_RUNNING for
GFP_ATOMIC.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-25 06:58:52 +00:00
Jan Beulich 2a1d689c9b lib/decompress_unlz4.c: always set an error return code on failures
"ret", being set to -1 early on, gets cleared by the first invocation of
lz4_decompress()/lz4_decompress_unknownoutputsize(), and hence subsequent
failures wouldn't be noticed by the caller without setting it back to -1
right after those calls.

Reported-by: Matthew Daley <mattjd@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:04 -08:00
Cody P Schafer 964fe94d71 rbtree/test: test rbtree_postorder_for_each_entry_safe()
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:03 -08:00
Cody P Schafer dbf128cbf9 rbtree/test: move rb_node to the middle of the test struct
Avoid making the rb_node the first entry to catch some bugs around NULL
checking the rb_node.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:03 -08:00
Kees Cook 3e2a4c183a test: check copy_to/from_user boundary validation
To help avoid an architecture failing to correctly check kernel/user
boundaries when handling copy_to_user, copy_from_user, put_user, or
get_user, perform some simple tests and fail to load if any of them
behave unexpectedly.

Specifically, this is to make sure there is a way to notice if things
like what was fixed in commit 8404663f81 ("ARM: 7527/1: uaccess:
explicitly check __user pointer when !CPU_USE_DOMAINS") ever regresses
again, for any architecture.

Additionally, adds new "user" selftest target, which loads this module.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:57 -08:00
Kees Cook 93e9ef83f4 test: add minimal module for verification testing
This is a pair of test modules I'd like to see in the tree.  Instead of
putting these in lkdtm, where I've been adding various tests that trigger
crashes, these don't make sense there since they need to be either
distinctly separate, or their pass/fail state don't need to crash the
machine.

These live in lib/ for now, along with a few other in-kernel test modules,
and use the slightly more common "test_" naming convention, instead of
"test-".  We should likely standardize on the former:

$ find . -name 'test_*.c' | grep -v /tools/ | wc -l
4
$ find . -name 'test-*.c' | grep -v /tools/ | wc -l
2

The first is entirely a no-op module, designed to allow simple testing of
the module loading and verification interface.  It's useful to have a
module that has no other uses or dependencies so it can be reliably used
for just testing module loading and verification.

The second is a module that exercises the user memory access functions, in
an effort to make sure that we can quickly catch any regressions in
boundary checking (e.g.  like what was recently fixed on ARM).

This patch (of 2):

When doing module loading verification tests (for example, with module
signing, or LSM hooks), it is very handy to have a module that can be
built on all systems under test, isn't auto-loaded at boot, and has no
device or similar dependencies.  This creates the "test_module.ko" module
for that purpose, which only reports its load and unload to printk.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:57 -08:00
Felipe Contreras ff6f9bbb58 lib/cmdline.c: declare exported symbols immediately
WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(memparse);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(get_option);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(get_options);

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Cc: Levente Kurusa <levex@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:57 -08:00
Felipe Contreras 9fd4305448 lib/cmdline.c: fix style issues
WARNING: space prohibited between function name and open parenthesis '('
+int get_option (char **str, int *pint)

WARNING: space prohibited between function name and open parenthesis '('
+	*pint = simple_strtol (cur, str, 0);

ERROR: trailing whitespace
+ $

WARNING: please, no spaces at the start of a line
+ $

WARNING: space prohibited between function name and open parenthesis '('
+		res = get_option ((char **)&str, ints + i);

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:57 -08:00
Felipe Contreras ae2924a2bd lib/kstrtox.c: remove redundant cleanup
We can't reach the cleanup code unless the flag KSTRTOX_OVERFLOW is not
set, so there's not no point in clearing a bit that we know is not set.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Acked-by: Levente Kurusa <levex@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:57 -08:00
Joe Perches aaf07621b8 vsprintf: add %pad extension for dma_addr_t use
dma_addr_t's can be either u32 or u64 depending on a CONFIG option.

There are a few hundred dma_addr_t's printed via either cast to unsigned
long long, unsigned long or no cast at all.

Add %pad to be able to emit them without the cast.

Update Documentation/printk-formats.txt too.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Shevchenko, Andriy" <andriy.shevchenko@intel.com>
Cc: Rob Landley <rob@landley.net>
Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:56 -08:00
Du, Changbin 578b1e0701 dynamic_debug: add wildcard support to filter files/functions/modules
Add wildcard '*'(matches zero or more characters) and '?' (matches one
character) support when qurying debug flags.

Now we can open debug messages using keywords. eg:
1. open debug logs in all usb drivers
    echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
2.  open debug logs for usb xhci code
    echo "file *xhci* +p" > <debugfs>/dynamic_debug/control

Signed-off-by: Du, Changbin <changbin.du@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:55 -08:00
Andrew Morton a3d2cca43c lib/parser.c: put EXPORT_SYMBOLs in the conventional place
Cc: Du, Changbin <changbin.du@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:55 -08:00
Du, Changbin aace05097a lib/parser.c: add match_wildcard() function
match_wildcard function is a simple implementation of wildcard
matching algorithm. It only supports two usual wildcardes:
    '*' - matches zero or more characters
    '?' - matches one character
This algorithm is safe since it is non-recursive.

Signed-off-by: Du, Changbin <changbin.du@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:55 -08:00
Kent Overstreet 6f6b5d1ec5 percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
This patch changes percpu_ida_alloc() + callers to accept task state
bitmask for prepare_to_wait() for code like target/iscsi that needs
it for interruptible sleep, that is provided in a subsequent patch.

It now expects TASK_UNINTERRUPTIBLE when the caller is able to sleep
waiting for a new tag, or TASK_RUNNING when the caller cannot sleep,
and is forced to return a negative value when no tags are available.

v2 changes:
  - Include blk-mq + tcm_fc + vhost/scsi + target/iscsi changes
  - Drop signal_pending_state() call
v3 changes:
  - Only call prepare_to_wait() + finish_wait() when != TASK_RUNNING
    (PeterZ)

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-23 20:17:18 +00:00
Stephen Hemminger 30b02c4b2a assoc_array: remove global variable
The associative array code creates unnecessary and potentially
problematic global variable 'status'.  Remove it since never used.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 09:25:10 -08:00
Hannes Frederic Sowa 809fa972fd reciprocal_divide: update/correction of the algorithm
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c480
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1<<32)*((1<<l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')>>32
  q = (t+((n-t)>>sh_1))>>sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
Linus Torvalds df32e43a54 Merge branch 'akpm' (incoming from Andrew)
Merge first patch-bomb from Andrew Morton:

 - a couple of misc things

 - inotify/fsnotify work from Jan

 - ocfs2 updates (partial)

 - about half of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  mm/migrate: remove unused function, fail_migrate_page()
  mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages
  mm/migrate: correct failure handling if !hugepage_migration_support()
  mm/migrate: add comment about permanent failure path
  mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure
  mm: compaction: reset scanner positions immediately when they meet
  mm: compaction: do not mark unmovable pageblocks as skipped in async compaction
  mm: compaction: detect when scanners meet in isolate_freepages
  mm: compaction: reset cached scanner pfn's before reading them
  mm: compaction: encapsulate defer reset logic
  mm: compaction: trace compaction begin and end
  memcg, oom: lock mem_cgroup_print_oom_info
  sched: add tracepoints related to NUMA task migration
  mm: numa: do not automatically migrate KSM pages
  mm: numa: trace tasks that fail migration due to rate limiting
  mm: numa: limit scope of lock for NUMA migrate rate limiting
  mm: numa: make NUMA-migrate related functions static
  lib/show_mem.c: show num_poisoned_pages when oom
  mm/hwpoison: add '#' to hwpoison_inject
  mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter
  ...
2014-01-21 19:05:45 -08:00
Linus Torvalds 5cb7398caf Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu changes from Tejun Heo:
 "Two trivial changes - addition of WARN_ONCE() in lib/percpu-refcount.c
  and use of VMALLOC_TOTAL instead of END - START in percpu.c"

* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: use VMALLOC_TOTAL instead of VMALLOC_END - VMALLOC_START
  percpu-refcount: Add a WARN() for ref going negative
2014-01-21 17:48:41 -08:00
Xishi Qiu 25487d73a9 lib/show_mem.c: show num_poisoned_pages when oom
Show num_poisoned_pages when oom, it is a little helpful to find the
reason.  Also it will be emitted anytime show_mem() is called.

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:48 -08:00
Santosh Shilimkar c15295001a lib/cpumask.c: use memblock apis for early memory allocations
Switch to memblock interfaces for early memory allocator instead of
bootmem allocator.  No functional change in beahvior than what it is in
current code from bootmem users points of view.

Archs already converted to NO_BOOTMEM now directly use memblock
interfaces instead of bootmem wrappers build on top of memblock.  And
the archs which still uses bootmem, these new apis just fallback to
exiting bootmem APIs.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Paul Walmsley <paul@pwsan.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:47 -08:00
Santosh Shilimkar 457ff1de2d lib/swiotlb.c: use memblock apis for early memory allocations
Switch to memblock interfaces for early memory allocator instead of
bootmem allocator.  No functional change in beahvior than what it is in
current code from bootmem users points of view.

Archs already converted to NO_BOOTMEM now directly use memblock
interfaces instead of bootmem wrappers build on top of memblock.  And
the archs which still uses bootmem, these new apis just fallback to
exiting bootmem APIs.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Paul Walmsley <paul@pwsan.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:47 -08:00
Mel Gorman aec6a8889a mm, show_mem: remove SHOW_MEM_FILTER_PAGE_COUNT
Commit 4b59e6c473 ("mm, show_mem: suppress page counts in
non-blockable contexts") introduced SHOW_MEM_FILTER_PAGE_COUNT to
suppress PFN walks on large memory machines.  Commit c78e93630d ("mm:
do not walk all of system memory during show_mem") avoided a PFN walk in
the generic show_mem helper which removes the requirement for
SHOW_MEM_FILTER_PAGE_COUNT in that case.

This patch removes PFN walkers from the arch-specific implementations
that report on a per-node or per-zone granularity.  ARM and unicore32
still do a PFN walk as they report memory usage on each bank which is a
much finer granularity where the debugging information may still be of
use.  As the remaining arches doing PFN walks have relatively small
amounts of memory, this patch simply removes SHOW_MEM_FILTER_PAGE_COUNT.

[akpm@linux-foundation.org: fix parisc]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: James Bottomley <jejb@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:44 -08:00
Dan Williams 0abdd7a81b dma-debug: introduce debug_dma_assert_idle()
Record actively mapped pages and provide an api for asserting a given
page is dma inactive before execution proceeds.  Placing
debug_dma_assert_idle() in cow_user_page() flagged the violation of the
dma-api in the NET_DMA implementation (see commit 7787380336 "net_dma:
mark broken").

The implementation includes the capability to count, in a limited way,
repeat mappings of the same page that occur without an intervening
unmap.  This 'overlap' counter is limited to the few bits of tag space
in a radix tree.  This mechanism is added to mitigate false negative
cases where, for example, a page is dma mapped twice and
debug_dma_assert_idle() is called after the page is un-mapped once.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Kent Overstreet 687b0ad275 percpu-refcount: Add a WARN() for ref going negative
AIO had a missing get, which led to an ioctx leak - after percpu_ref_kill() the
ref was 0 so percpu_ref_put() never saw it hit 0.

This wasn't noticed at the time because it all happened completely silently,
this adds a WARN() which would've caught the aio bug.

tj: Used WARN_ONCE() instead of WARN().

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-01-21 04:40:56 -05:00
Linus Torvalds ec513b16c4 USB patches for 3.14-rc1
Here's the big USB pull request for 3.14-rc1
 
 Lots of little things all over the place, and the usual USB gadget
 updates, and XHCI fixes (some for an issue reported by a lot of people.)
 USB PHY updates as well as chipidea updates and fixes.
 
 All of these have been in the linux-next tree with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlLdjwIACgkQMUfUDdst+ylBIQCgkRoR8lJc0L5lZ3fugIJL4IzZ
 j6AAn0nBLP6VI0cvUEi3TcrPTzv4MEuL
 =Yb+I
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB updates from Greg KH:
 "Here's the big USB pull request for 3.14-rc1

  Lots of little things all over the place, and the usual USB gadget
  updates, and XHCI fixes (some for an issue reported by a lot of
  people).  USB PHY updates as well as chipidea updates and fixes.

  All of these have been in the linux-next tree with no reported issues"

* tag 'usb-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (318 commits)
  usb: chipidea: udc: using MultO at TD as real mult value for ISO-TX
  usb: chipidea: need to mask INT_STATUS when write otgsc
  usb: chipidea: put hw_phymode_configure before ci_usb_phy_init
  usb: chipidea: Fix Internal error: : 808 [#1] ARM related to STS flag
  usb: chipidea: imx: set CI_HDRC_IMX28_WRITE_FIX for imx28
  usb: chipidea: add freescale imx28 special write register method
  usb: ehci: add freescale imx28 special write register method
  usb: core: check for valid id_table when using the RefId feature
  usb: cdc-wdm: resp_count can be 0 even if WDM_READ is set
  usb: core: bail out if user gives an unknown RefId when using new_id
  usb: core: allow a reference device for new_id
  usb: core: add sanity checks when using bInterfaceClass with new_id
  USB: image: correct spelling mistake in comment
  USB: c67x00: correct spelling mistakes in comments
  usb: delete non-required instances of include <linux/init.h>
  usb:hub set hub->change_bits when over-current happens
  Revert "usb: chipidea: imx: set CI_HDRC_IMX28_WRITE_FIX for imx28"
  xhci: Set scatter-gather limit to avoid failed block writes.
  xhci: Avoid infinite loop when sg urb requires too many trbs
  usb: gadget: remove unused variable in gr_queue_int()
  ...
2014-01-20 16:13:02 -08:00
Linus Torvalds d3bad75a6d Driver core / sysfs patches for 3.14-rc1
Here's the big driver core and sysfs patch set for 3.14-rc1.
 
 There's a lot of work here moving sysfs logic out into a "kernfs" to
 allow other subsystems to also have a virtual filesystem with the same
 attributes of sysfs (handle device disconnect, dynamic creation /
 removal  as needed / unneeded, etc.  This is primarily being done for
 the cgroups filesystem, but the goal is to also move debugfs to it when
 it is ready, solving all of the known issues in that filesystem as well.
 The code isn't completed yet, but all should be stable now (there is a
 big section that was reverted due to problems found when testing.)
 
 There's also some other smaller fixes, and a driver core addition that
 allows for a "collection" of objects, that the DRM people will be using
 soon (it's in this tree to make merges after -rc1 easier.)
 
 All of this has been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlLdh0cACgkQMUfUDdst+ylv4QCfeDKDgLo4LsaBIIrFSxLoH/c7
 UUsAoMPRwA0h8wy+BQcJAg4H4J4maKj3
 =0pc0
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core / sysfs patches from Greg KH:
 "Here's the big driver core and sysfs patch set for 3.14-rc1.

  There's a lot of work here moving sysfs logic out into a "kernfs" to
  allow other subsystems to also have a virtual filesystem with the same
  attributes of sysfs (handle device disconnect, dynamic creation /
  removal as needed / unneeded, etc)

  This is primarily being done for the cgroups filesystem, but the goal
  is to also move debugfs to it when it is ready, solving all of the
  known issues in that filesystem as well.  The code isn't completed
  yet, but all should be stable now (there is a big section that was
  reverted due to problems found when testing)

  There's also some other smaller fixes, and a driver core addition that
  allows for a "collection" of objects, that the DRM people will be
  using soon (it's in this tree to make merges after -rc1 easier)

  All of this has been in linux-next with no reported issues"

* tag 'driver-core-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (113 commits)
  kernfs: associate a new kernfs_node with its parent on creation
  kernfs: add struct dentry declaration in kernfs.h
  kernfs: fix get_active failure handling in kernfs_seq_*()
  Revert "kernfs: fix get_active failure handling in kernfs_seq_*()"
  Revert "kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq"
  Revert "kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()"
  Revert "kernfs: remove KERNFS_REMOVED"
  Revert "kernfs: restructure removal path to fix possible premature return"
  Revert "kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()"
  Revert "kernfs: remove kernfs_addrm_cxt"
  Revert "kernfs: make kernfs_get_active() block if the node is deactivated but not removed"
  Revert "kernfs: implement kernfs_{de|re}activate[_self]()"
  Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers"
  Revert "pci: use device_remove_file_self() instead of device_schedule_callback()"
  Revert "scsi: use device_remove_file_self() instead of device_schedule_callback()"
  Revert "s390: use device_remove_file_self() instead of device_schedule_callback()"
  Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
  Revert "kernfs: remove unnecessary NULL check in __kernfs_remove()"
  kernfs: remove unnecessary NULL check in __kernfs_remove()
  drivers/base: provide an infrastructure for componentised subsystems
  ...
2014-01-20 15:49:44 -08:00
Linus Torvalds 897aea303f Merge branch 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core debug changes from Ingo Molnar:
 "Currently there are two methods to set the panic_timeout: via
  'panic=X' boot commandline option, or via /proc/sys/kernel/panic.

  This tree adds a third panic_timeout configuration method:
  configuration via Kconfig, via CONFIG_PANIC_TIMEOUT=X - useful to
  distros that generally want their kernel defaults to come with the
  .config.

  CONFIG_PANIC_TIMEOUT defaults to 0, which was the previous default
  value of panic_timeout.

  Doing that unearthed a few arch trickeries regarding arch-special
  panic_timeout values and related complications - hopefully all
  resolved to the satisfaction of everyone"

* 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  powerpc: Clean up panic_timeout usage
  MIPS: Remove panic_timeout settings
  panic: Make panic_timeout configurable
2014-01-20 10:22:12 -08:00
Weilong Chen 82ef3d5d5f net: fix "queues" uevent between network namespaces
When I create a new namespace with 'ip netns add net0', or add/remove
new links in a namespace with 'ip link add/delete type veth', rx/tx
queues events can be got in all namespaces. That is because rx/tx queue
ktypes do not have namespace support, and their kobj parents are setted to
NULL. This patch is to fix it.

Reported-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 20:02:02 -08:00
David S. Miller 4180442058 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 00:55:41 -08:00
Michael Dalton 03144b5869 lib: Ensure EWMA does not store wrong intermediate values
To ensure ewma_read() without a lock returns a valid but possibly
out of date average, modify ewma_add() by using ACCESS_ONCE to prevent
intermediate wrong values from being written to avg->internal.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Hugh Dickins d1969a84dd percpu_counter: unbreak __percpu_counter_add()
Commit 74e72f894d ("lib/percpu_counter.c: fix __percpu_counter_add()")
looked very plausible, but its arithmetic was badly wrong: obvious once
you see the fix, but maddening to get there from the weird tmpfs ENOSPCs

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Fan Du <fan.du@windriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-17 11:32:24 +11:00
Ming Lei 74e72f894d lib/percpu_counter.c: fix __percpu_counter_add()
__percpu_counter_add() may be called in softirq/hardirq handler (such
as, blk_mq_queue_exit() is typically called in hardirq/softirq handler),
so we need to call this_cpu_add()(irq safe helper) to update percpu
counter, otherwise counts may be lost.

This fixes the problem that 'rmmod null_blk' hangs in blk_cleanup_queue()
because of miscounting of request_queue->mq_usage_counter.

This patch is the v1 of previous one of "lib/percpu_counter.c:
disable local irq when updating percpu couter", and takes Andrew's
approach which may be more efficient for ARCHs(x86, s390) that
have optimized this_cpu_add().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-15 14:19:42 +07:00
Lubomir Rintel 8bc588e0e5 firewire: ohci: Turn remote DMA support into a module parameter
This makes it possible to debug kernel over FireWire without the need to
recompile it.

[Stefan R: changed description from "...0" to "...N"]

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2014-01-12 18:54:38 +01:00
Bart Van Assche 9705710e40 kobject: Fix source code comment spelling
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-08 15:36:18 -08:00
Konrad Rzeszutek Wilk 0cb637bff8 swiotlb: Don't DoS us with 'swiotlb buffer is full' (v2)
There is no need for that so lets use ratelimiting.
Also add some extra information to be helpful.

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v2: s/ld/zs on the printk]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-07 10:15:21 -05:00
Greg Kroah-Hartman eb4c69033f Revert "kobject: introduce kobj_completion"
This reverts commit eee0316497.

Jeff writes:
	I have no objections to reverting it. There were concerns from
	Al Viro that it'd be tough to get right by callers and I had
	assumed it got dropped after that. I had planned on using it in
	my btrfs sysfs exports patchset but came up with a better way.

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-04 19:56:40 -08:00
Greg Kroah-Hartman 5bd2010fbe Merge 3.13-rc5 into staging-next
We want these fixes here to handle some merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-24 09:43:21 -08:00
Francesco Fusco 237217546d lib: hash: follow-up fixups for arch hash
This patch adds the include file to pull in __read_mostly on some
architectures e.g. ppc and also fixes up signatures in generic
asm.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 00:14:53 -05:00
David S. Miller 143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
Francesco Fusco 71ae8aac3e lib: introduce arch optimized hash library
We introduce a new hashing library that is meant to be used in
the contexts where speed is more important than uniformity of the
hashed values. The hash library leverages architecture specific
implementation to achieve high performance and fall backs to
jhash() for the generic case.

On Intel-based x86 architectures, the library can exploit the crc32l
instruction, part of the Intel SSE4.2 instruction set, if the
instruction is supported by the processor. This implementation
is twice as fast as the jhash() implementation on an i7 processor.

Additional architectures, such as Arm64 provide instructions for
accelerating the computation of CRC, so they could be added as well
in follow-up work.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:27:17 -05:00
Greg Kroah-Hartman d59abb9325 Merge branch 3.13-rc4 into usb-next 2013-12-16 08:46:03 -08:00
Tejun Heo 324a56e16e kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly
kernfs has just been separated out from sysfs and we're already in
full conflict mode.  Nothing can make the situation any worse.  Let's
take the chance to name things properly.

This patch performs the following renames.

* s/sysfs_elem_dir/kernfs_elem_dir/
* s/sysfs_elem_symlink/kernfs_elem_symlink/
* s/sysfs_elem_attr/kernfs_elem_file/
* s/sysfs_dirent/kernfs_node/
* s/sd/kn/ in kernfs proper
* s/parent_sd/parent/
* s/target_sd/target/
* s/dir_sd/parent/
* s/to_sysfs_dirent()/rb_to_kn()/
* misc renames of local vars when they conflict with the above

Because md, mic and gpio dig into sysfs details, this patch ends up
modifying them.  All are sysfs_dirent renames and trivial.  While we
can avoid these by introducing a dummy wrapping struct sysfs_dirent
around kernfs_node, given the limited usage outside kernfs and sysfs
proper, I don't think such workaround is called for.

This patch is strictly rename only and doesn't introduce any
functional difference.

- mic / gpio renames were missing.  Spotted by kbuild test robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-11 15:28:36 -08:00
Maurizio Lombardi 020d30f17f kobject: fix memory leak in kobject_set_name_vargs
If the call to kvasprintf fails then the old name of the object will be leaked,
this patch fixes the bug by restoring the old name before returning ENOMEM.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08 18:19:15 -08:00
Ming Lei 0d6077f8b4 lib/scatterlist: export sg_miter_skip()
sg_copy_buffer() can't meet demand for some drrivers(such usb
mass storage), so we have to use the sg_miter_* APIs to access
sg buffer, then need export sg_miter_skip() for these drivers.

The API is needed for converting to sg_miter_* APIs in USB storage
driver for accessing sg buffer.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08 17:56:37 -08:00
Bjorn Helgaas 35a5fe695b kobject: remove kset from sysfs immediately in kset_unregister()
There's no "unlink from sysfs" interface for ksets, so I think callers of
kset_unregister() expect the kset to be removed from sysfs immediately,
without waiting for the last reference to be released.

This patch makes the sysfs removal happen immediately, so the caller may
create a new kset with the same name as soon as kset_unregister() returns.
Without this, every caller has to call "kobject_del(&kset->kobj)" first
unless it knows it will never create a new kset with the same name.

This sometimes shows up on module unload and reload, where the reload fails
because it tries to create a kobject with the same name as one from the
original load that still exists.  CONFIG_DEBUG_KOBJECT_RELEASE=y makes this
problem easier to hit.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-07 21:20:11 -08:00
Bjorn Helgaas 89c86a64cd kobject: delay kobject release for random time
When CONFIG_DEBUG_KOBJECT_RELEASE=y, delay kobject release functions for a
random time between 1 and 8 seconds, which effectively changes the order in
which they're called.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-07 21:14:13 -08:00
David Howells 23fd78d764 KEYS: Fix multiple key add into associative array
If sufficient keys (or keyrings) are added into a keyring such that a node in
the associative array's tree overflows (each node has a capacity N, currently
16) and such that all N+1 keys have the same index key segment for that level
of the tree (the level'th nibble of the index key), then assoc_array_insert()
calls ops->diff_objects() to indicate at which bit position the two index keys
vary.

However, __key_link_begin() passes a NULL object to assoc_array_insert() with
the intention of supplying the correct pointer later before we commit the
change.  This means that keyring_diff_objects() is given a NULL pointer as one
of its arguments which it does not expect.  This results in an oops like the
attached.

With the previous patch to fix the keyring hash function, this can be forced
much more easily by creating a keyring and only adding keyrings to it.  Add any
other sort of key and a different insertion path is taken - all 16+1 objects
must want to cluster in the same node slot.

This can be tested by:

	r=`keyctl newring sandbox @s`
	for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done

This should work fine, but oopses when the 17th keyring is added.

Since ops->diff_objects() is always called with the first pointer pointing to
the object to be inserted (ie. the NULL pointer), we can fix the problem by
changing the to-be-inserted object pointer to point to the index key passed
into assoc_array_insert() instead.

Whilst we're at it, we also switch the arguments so that they are the same as
for ->compare_object().

BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
IP: [<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
...
RIP: 0010:[<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
...
Call Trace:
 [<ffffffff81191f9d>] keyring_diff_objects+0x21/0xd2
 [<ffffffff811f09ef>] assoc_array_insert+0x3b6/0x908
 [<ffffffff811929a7>] __key_link_begin+0x78/0xe5
 [<ffffffff81191a2e>] key_create_or_update+0x17d/0x36a
 [<ffffffff81192e0a>] SyS_add_key+0x123/0x183
 [<ffffffff81400ddb>] tracesys+0xdd/0xe2

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Stephen Gallagher <sgallagh@redhat.com>
2013-12-02 11:24:18 +00:00
Tejun Heo 93b2b8e4aa sysfs, kernfs: introduce kernfs_create_dir[_ns]()
Introduce kernfs interface to manipulate a directory which takes and
returns sysfs_dirents.

create_dir() is renamed to kernfs_create_dir_ns() and its argumantes
and return value are updated.  create_dir() usages are replaced with
kernfs_create_dir_ns() and sysfs_create_subdir() usages are replaced
with kernfs_create_dir().  Dup warnings are handled explicitly by
sysfs users of the kernfs interface.

sysfs_enable_ns() is renamed to kernfs_enable_ns().

This patch doesn't introduce any behavior changes.

v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.

v3: kernfs_enable_ns() added.

v4: Refreshed on top of "sysfs: drop kobj_ns_type handling, take #2"
    so that this patch removes sysfs_enable_ns().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 17:20:13 -08:00
Greg Kroah-Hartman 45b4539309 Merge 3.13-rc2 into driver-core-next
We want those fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29 16:48:22 -08:00
Will Deacon 14058d20c1 lockref: include mutex.h rather than reinvent arch_mutex_cpu_relax
arch_mutex_cpu_relax is already conditionally defined in mutex.h, so
simply include that header rather than replicate the code here.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-27 20:37:33 -08:00
Tejun Heo c84a3b2779 sysfs: drop kobj_ns_type handling, take #2
The way namespace tags are implemented in sysfs is more complicated
than necessary.  As each tag is a pointer value and required to be
non-NULL under a namespace enabled parent, there's no need to record
separately what type each tag is.  If multiple namespace types are
needed, which currently aren't, we can simply compare the tag to a set
of allowed tags in the superblock assuming that the tags, being
pointers, won't have the same value across multiple types.

This patch rips out kobj_ns_type handling from sysfs.  sysfs now has
an enable switch to turn on namespace under a node.  If enabled, all
children are required to have non-NULL namespace tags and filtered
against the super_block's tag.

kobject namespace determination is now performed in
lib/kobject.c::create_dir() making sysfs_read_ns_type() unnecessary.
The sanity checks are also moved.  create_dir() is restructured to
ease such addition.  This removes most kobject namespace knowledge
from sysfs proper which will enable proper separation and layering of
sysfs.

This is the second try.  The first one was cb26a31157 ("sysfs: drop
kobj_ns_type handling") which tried to automatically enable namespace
if there are children with non-NULL namespace tags; however, it was
broken for symlinks as they should inherit the target's tag iff
namespace is enabled in the parent.  This led to namespace filtering
enabled incorrectly for wireless net class devices through phy80211
symlinks and thus network configuration failure.  a1212d278c
("Revert "sysfs: drop kobj_ns_type handling"") reverted the commit.

This shouldn't introduce any behavior changes, for real.

v2: Dummy implementation of sysfs_enable_ns() for !CONFIG_SYSFS was
    missing and caused build failure.  Reported by kbuild test robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27 13:01:03 -08:00
Jason Baron 5800dc3cff panic: Make panic_timeout configurable
The panic_timeout value can be set via the command line option
'panic=x', or via /proc/sys/kernel/panic, however that is not
sufficient when the panic occurs before we are able to set up
these values. Thus, add a CONFIG_PANIC_TIMEOUT so that we can
set the desired value from the .config.

The default panic_timeout value continues to be 0 - wait
forever. Also adds set_arch_panic_timeout(new_timeout,
arch_default_timeout), which is intended to be used by arches in
arch_setup(). The idea being that the new_timeout is only set if
the user hasn't changed from the arch_default_timeout.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: benh@kernel.crashing.org
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: mpe@ellerman.id.au
Cc: felipe.contreras@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1a1674daec27c534df409697025ac568ebcee91e.1385418410.git.jbaron@akamai.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-26 12:12:26 +01:00
Linus Torvalds b0e3636f65 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Things have been quiet this round with mostly bugfixes, percpu
  conversions, and other minor iscsi-target conformance testing changes.

  The highlights include:

   - Add demo_mode_discovery attribute for iscsi-target (Thomas)
   - Convert tcm_fc(FCoE) to use percpu-ida pre-allocation
   - Add send completion interrupt coalescing for ib_isert
   - Convert target-core to use percpu-refcounting for se_lun
   - Fix mutex_trylock usage bug in iscsit_increment_maxcmdsn
   - tcm_loop updates (Hannes)
   - target-core ALUA cleanups + prep for v3.14 SCSI Referrals support (Hannes)

  v3.14 is currently shaping to be a busy development cycle in target
  land, with initial support for T10 Referrals and T10 DIF currently on
  the roadmap"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
  iscsi-target: chap auth shouldn't match username with trailing garbage
  iscsi-target: fix extract_param to handle buffer length corner case
  iscsi-target: Expose default_erl as TPG attribute
  target_core_configfs: split up ALUA supported states
  target_core_alua: Make supported states configurable
  target_core_alua: Store supported ALUA states
  target_core_alua: Rename ALUA_ACCESS_STATE_OPTIMIZED
  target_core_alua: spellcheck
  target core: rename (ex,im)plict -> (ex,im)plicit
  percpu-refcount: Add percpu-refcount.o to obj-y
  iscsi-target: Do not reject non-immediate CmdSNs exceeding MaxCmdSN
  iscsi-target: Convert iscsi_session statistics to atomic_long_t
  target: Convert se_device statistics to atomic_long_t
  target: Fix delayed Task Aborted Status (TAS) handling bug
  iscsi-target: Reject unsupported multi PDU text command sequence
  ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
  iscsi-target: Fix mutex_trylock usage in iscsit_increment_maxcmdsn
  target: Core does not need blkdev.h
  target: Pass through I/O topology for block backstores
  iser-target: Avoid using FRMR for single dma entry requests
  ...
2013-11-22 10:52:03 -08:00
Linus Torvalds 78dc53c422 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "In this patchset, we finally get an SELinux update, with Paul Moore
  taking over as maintainer of that code.

  Also a significant update for the Keys subsystem, as well as
  maintenance updates to Smack, IMA, TPM, and Apparmor"

and since I wanted to know more about the updates to key handling,
here's the explanation from David Howells on that:

 "Okay.  There are a number of separate bits.  I'll go over the big bits
  and the odd important other bit, most of the smaller bits are just
  fixes and cleanups.  If you want the small bits accounting for, I can
  do that too.

   (1) Keyring capacity expansion.

        KEYS: Consolidate the concept of an 'index key' for key access
        KEYS: Introduce a search context structure
        KEYS: Search for auth-key by name rather than target key ID
        Add a generic associative array implementation.
        KEYS: Expand the capacity of a keyring

     Several of the patches are providing an expansion of the capacity of a
     keyring.  Currently, the maximum size of a keyring payload is one page.
     Subtract a small header and then divide up into pointers, that only gives
     you ~500 pointers on an x86_64 box.  However, since the NFS idmapper uses
     a keyring to store ID mapping data, that has proven to be insufficient to
     the cause.

     Whatever data structure I use to handle the keyring payload, it can only
     store pointers to keys, not the keys themselves because several keyrings
     may point to a single key.  This precludes inserting, say, and rb_node
     struct into the key struct for this purpose.

     I could make an rbtree of records such that each record has an rb_node
     and a key pointer, but that would use four words of space per key stored
     in the keyring.  It would, however, be able to use much existing code.

     I selected instead a non-rebalancing radix-tree type approach as that
     could have a better space-used/key-pointer ratio.  I could have used the
     radix tree implementation that we already have and insert keys into it by
     their serial numbers, but that means any sort of search must iterate over
     the whole radix tree.  Further, its nodes are a bit on the capacious side
     for what I want - especially given that key serial numbers are randomly
     allocated, thus leaving a lot of empty space in the tree.

     So what I have is an associative array that internally is a radix-tree
     with 16 pointers per node where the index key is constructed from the key
     type pointer and the key description.  This means that an exact lookup by
     type+description is very fast as this tells us how to navigate directly to
     the target key.

     I made the data structure general in lib/assoc_array.c as far as it is
     concerned, its index key is just a sequence of bits that leads to a
     pointer.  It's possible that someone else will be able to make use of it
     also.  FS-Cache might, for example.

   (2) Mark keys as 'trusted' and keyrings as 'trusted only'.

        KEYS: verify a certificate is signed by a 'trusted' key
        KEYS: Make the system 'trusted' keyring viewable by userspace
        KEYS: Add a 'trusted' flag and a 'trusted only' flag
        KEYS: Separate the kernel signature checking keyring from module signing

     These patches allow keys carrying asymmetric public keys to be marked as
     being 'trusted' and allow keyrings to be marked as only permitting the
     addition or linkage of trusted keys.

     Keys loaded from hardware during kernel boot or compiled into the kernel
     during build are marked as being trusted automatically.  New keys can be
     loaded at runtime with add_key().  They are checked against the system
     keyring contents and if their signatures can be validated with keys that
     are already marked trusted, then they are marked trusted also and can
     thus be added into the master keyring.

     Patches from Mimi Zohar make this usable with the IMA keyrings also.

   (3) Remove the date checks on the key used to validate a module signature.

        X.509: Remove certificate date checks

     It's not reasonable to reject a signature just because the key that it was
     generated with is no longer valid datewise - especially if the kernel
     hasn't yet managed to set the system clock when the first module is
     loaded - so just remove those checks.

   (4) Make it simpler to deal with additional X.509 being loaded into the kernel.

        KEYS: Load *.x509 files into kernel keyring
        KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate

     The builder of the kernel now just places files with the extension ".x509"
     into the kernel source or build trees and they're concatenated by the
     kernel build and stuffed into the appropriate section.

   (5) Add support for userspace kerberos to use keyrings.

        KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
        KEYS: Implement a big key type that can save to tmpfs

     Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
     We looked at storing it in keyrings instead as that confers certain
     advantages such as tickets being automatically deleted after a certain
     amount of time and the ability for the kernel to get at these tokens more
     easily.

     To make this work, two things were needed:

     (a) A way for the tickets to persist beyond the lifetime of all a user's
         sessions so that cron-driven processes can still use them.

         The problem is that a user's session keyrings are deleted when the
         session that spawned them logs out and the user's user keyring is
         deleted when the UID is deleted (typically when the last log out
         happens), so neither of these places is suitable.

         I've added a system keyring into which a 'persistent' keyring is
         created for each UID on request.  Each time a user requests their
         persistent keyring, the expiry time on it is set anew.  If the user
         doesn't ask for it for, say, three days, the keyring is automatically
         expired and garbage collected using the existing gc.  All the kerberos
         tokens it held are then also gc'd.

     (b) A key type that can hold really big tickets (up to 1MB in size).

         The problem is that Active Directory can return huge tickets with lots
         of auxiliary data attached.  We don't, however, want to eat up huge
         tracts of unswappable kernel space for this, so if the ticket is
         greater than a certain size, we create a swappable shmem file and dump
         the contents in there and just live with the fact we then have an
         inode and a dentry overhead.  If the ticket is smaller than that, we
         slap it in a kmalloc()'d buffer"

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
  KEYS: Fix keyring content gc scanner
  KEYS: Fix error handling in big_key instantiation
  KEYS: Fix UID check in keyctl_get_persistent()
  KEYS: The RSA public key algorithm needs to select MPILIB
  ima: define '_ima' as a builtin 'trusted' keyring
  ima: extend the measurement list to include the file signature
  kernel/system_certificate.S: use real contents instead of macro GLOBAL()
  KEYS: fix error return code in big_key_instantiate()
  KEYS: Fix keyring quota misaccounting on key replacement and unlink
  KEYS: Fix a race between negating a key and reading the error set
  KEYS: Make BIG_KEYS boolean
  apparmor: remove the "task" arg from may_change_ptraced_domain()
  apparmor: remove parent task info from audit logging
  apparmor: remove tsk field from the apparmor_audit_struct
  apparmor: fix capability to not use the current task, during reporting
  Smack: Ptrace access check mode
  ima: provide hash algo info in the xattr
  ima: enable support for larger default filedata hash algorithms
  ima: define kernel parameter 'ima_template=' to change configured default
  ima: add Kconfig default measurement list template
  ...
2013-11-21 19:46:00 -08:00
Randy Dunlap fcd40d69af percpu-refcount: Add percpu-refcount.o to obj-y
Drop percpu_ida.o from lib-y since it is also listed in obj-y
and it doesn't need to be listed in both places.

Move percpu-refcount.o from lib-y to obj-y to fix build errors
in target_core_mod:

ERROR: "percpu_ref_cancel_init" [drivers/target/target_core_mod.ko] undefined!
ERROR: "percpu_ref_kill_and_confirm" [drivers/target/target_core_mod.ko] undefined!
ERROR: "percpu_ref_init" [drivers/target/target_core_mod.ko] undefined!

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-19 21:39:21 -08:00
Linus Torvalds 1ee2dcc224 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...
2013-11-19 15:50:47 -08:00
Linus Torvalds eda670c626 Features:
- SWIOTLB has tracing added when doing bounce buffer.
  - Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to
    safely program real devices for DMA operations when running as
    a guest on Xen on ARM, without IOMMU support.*1
  - xen_raw_printk works with PVHVM guests if needed.
 Bug-fixes:
  - Make memory ballooning work under HVM with large MMIO region.
  - Inform hypervisor of MCFG regions found in ACPI DSDT.
  - Remove deprecated IRQF_DISABLED.
  - Remove deprecated __cpuinit.
 
 [*1]:
 "On arm and arm64 all Xen guests, including dom0, run with second stage
 translation enabled. As a consequence when dom0 programs a device for a
 DMA operation is going to use (pseudo) physical addresses instead
 machine addresses. This work introduces two trees to track physical to
 machine and machine to physical mappings of foreign pages. Local pages
 are assumed mapped 1:1 (physical address == machine address).  It
 enables the SWIOTLB-Xen driver on ARM and ARM64, so that Linux can
 translate physical addresses to machine addresses for dma operations
 when necessary. " (Stefano).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJSgS86AAoJEFjIrFwIi8fJpY4H/R2gke1A1p9UvTwbkaDhgPs/
 u/mkI6aH+ktgvu5QZNprki660uydtc4Ck7y8leeLGYw+ed1Ys559SJhRc/x8jBYZ
 Hh2chnplld0LAjSpdIDTTePArE1xBo4Gz+fT0zc5cVh0leJwOXn92Kx8N5AWD/T3
 gwH4Ok4K1dzZBIls7imM2AM/L1xcApcx3Dl/QpNcoePQtR4yLuPWMUbb3LM8pbUY
 0B6ZVN4GOhtJ84z8HRKnh4uMnBYmhmky6laTlHVa6L+j1fv7aAPCdNbePjIt/Pvj
 HVYB1O/ht73yHw0zGfK6lhoGG8zlu+Q7sgiut9UsGZZfh34+BRKzNTypqJ3ezQo=
 =xc43
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen updates from Konrad Rzeszutek Wilk:
 "This has tons of fixes and two major features which are concentrated
  around the Xen SWIOTLB library.

  The short <blurb> is that the tracing facility (just one function) has
  been added to SWIOTLB to make it easier to track I/O progress.
  Additionally under Xen and ARM (32 & 64) the Xen-SWIOTLB driver
  "is used to translate physical to machine and machine to physical
  addresses of foreign[guest] pages for DMA operations" (Stefano) when
  booting under hardware without proper IOMMU.

  There are also bug-fixes, cleanups, compile warning fixes, etc.

  The commit times for some of the commits is a bit fresh - that is b/c
  we wanted to make sure we have the Ack's from the ARM folks - which
  with the string of back-to-back conferences took a bit of time.  Rest
  assured - the code has been stewing in #linux-next for some time.

  Features:
   - SWIOTLB has tracing added when doing bounce buffer.
   - Xen ARM/ARM64 can use Xen-SWIOTLB.  This work allows Linux to
     safely program real devices for DMA operations when running as a
     guest on Xen on ARM, without IOMMU support. [*1]
   - xen_raw_printk works with PVHVM guests if needed.

  Bug-fixes:
   - Make memory ballooning work under HVM with large MMIO region.
   - Inform hypervisor of MCFG regions found in ACPI DSDT.
   - Remove deprecated IRQF_DISABLED.
   - Remove deprecated __cpuinit.

  [*1]:
  "On arm and arm64 all Xen guests, including dom0, run with second
   stage translation enabled.  As a consequence when dom0 programs a
   device for a DMA operation is going to use (pseudo) physical
   addresses instead machine addresses.  This work introduces two trees
   to track physical to machine and machine to physical mappings of
   foreign pages.  Local pages are assumed mapped 1:1 (physical address
   == machine address).  It enables the SWIOTLB-Xen driver on ARM and
   ARM64, so that Linux can translate physical addresses to machine
   addresses for dma operations when necessary.  " (Stefano)"

* tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (32 commits)
  xen/arm: pfn_to_mfn and mfn_to_pfn return the argument if nothing is in the p2m
  arm,arm64/include/asm/io.h: define struct bio_vec
  swiotlb-xen: missing include dma-direction.h
  pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCI
  arm: make SWIOTLB available
  xen: delete new instances of added __cpuinit
  xen/balloon: Set balloon's initial state to number of existing RAM pages
  xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
  xen: remove deprecated IRQF_DISABLED
  x86/xen: remove deprecated IRQF_DISABLED
  swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs
  swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary
  grant-table: call set_phys_to_machine after mapping grant refs
  arm,arm64: do not always merge biovec if we are running on Xen
  swiotlb: print a warning when the swiotlb is full
  swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device
  xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device
  tracing/events: Fix swiotlb tracepoint creation
  swiotlb-xen: use xen_alloc/free_coherent_pages
  xen: introduce xen_alloc/free_coherent_pages
  ...
2013-11-15 13:34:37 +09:00
Lars-Peter Clausen a019e48cfb kfifo: kfifo_copy_{to,from}_user: fix copied bytes calculation
'copied' and 'len' are in bytes, while 'ret' is in elements, so we need to
multiply 'ret' with the size of one element to get the correct result.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:23 +09:00
Andrew Morton 0791a6057c llists-move-llist_reverse_order-from-raid5-to-llistc-fix
fix comment typo, per Jan

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:22 +09:00
Christoph Hellwig b89241e8cd llists: move llist_reverse_order from raid5 to llist.c
Make this useful helper available for other users.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:22 +09:00
Kees Cook 9196436ab2 vsprintf: ignore %n again
This ignores %n in printf again, as was originally documented.
Implementing %n poses a greater security risk than utility, so it should
stay ignored.  To help anyone attempting to use %n, a warning will be
emitted if it is encountered.

Based on an earlier patch by Joe Perches.

Because %n was designed to write to pointers on the stack, it has been
frequently used as an attack vector when bugs are found that leak
user-controlled strings into functions that ultimately process format
strings.  While this class of bug can still be turned into an
information leak, removing %n eliminates the common method of elevating
such a bug into an arbitrary kernel memory writing primitive,
significantly reducing the danger of this class of bug.

For seq_file users that need to know the length of a written string for
padding, please see seq_setwidth() and seq_pad() instead.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:20 +09:00
Peter Zijlstra 57f4257eae lockref: use BLOATED_SPINLOCKS to avoid explicit config dependencies
Avoid the fragile Kconfig construct guestimating spinlock_t sizes; use a
friendly compile-time test to determine this.

[kirill.shutemov@linux.intel.com: drop CONFIG_CMPXCHG_LOCKREF]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:20 +09:00
Daniel Borkmann 0125737acc random32: use msecs_to_jiffies for reseed timer
Use msecs_to_jiffies, for these calculations as different HZ
considerations are taken into account for conversion of the timer
shot, and also it makes the code more readable.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:06:02 -05:00
Daniel Borkmann 66b251422b random32: add __init prefix to prandom_start_seed_timer
We only call that in functions annotated with __init, so add __init
prefix in prandom_start_seed_timer() as well, so that the kernel can
make use of this hint and we can possibly free up resources after it's
usage. And since it's an internal function rename it to
__prandom_start_seed_timer().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:06:02 -05:00
Linus Torvalds 5e30025a31 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 "The biggest changes:

   - add lockdep support for seqcount/seqlocks structures, this
     unearthed both bugs and required extra annotation.

   - move the various kernel locking primitives to the new
     kernel/locking/ directory"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  block: Use u64_stats_init() to initialize seqcounts
  locking/lockdep: Mark __lockdep_count_forward_deps() as static
  lockdep/proc: Fix lock-time avg computation
  locking/doc: Update references to kernel/mutex.c
  ipv6: Fix possible ipv6 seqlock deadlock
  cpuset: Fix potential deadlock w/ set_mems_allowed
  seqcount: Add lockdep functionality to seqcount/seqlock structures
  net: Explicitly initialize u64_stats_sync structures for lockdep
  locking: Move the percpu-rwsem code to kernel/locking/
  locking: Move the lglocks code to kernel/locking/
  locking: Move the rwsem code to kernel/locking/
  locking: Move the rtmutex code to kernel/locking/
  locking: Move the semaphore core to kernel/locking/
  locking: Move the spinlock code to kernel/locking/
  locking: Move the lockdep code to kernel/locking/
  locking: Move the mutex code to kernel/locking/
  hung_task debugging: Add tracepoint to report the hang
  x86/locking/kconfig: Update paravirt spinlock Kconfig description
  lockstat: Report avg wait and hold times
  lockdep, x86/alternatives: Drop ancient lockdep fixup message
  ...
2013-11-14 16:30:30 +09:00
Linus Torvalds 0910c0bdf7 Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-block
Pull block IO core updates from Jens Axboe:
 "This is the pull request for the core changes in the block layer for
  3.13.  It contains:

   - The new blk-mq request interface.

     This is a new and more scalable queueing model that marries the
     best part of the request based interface we currently have (which
     is fully featured, but scales poorly) and the bio based "interface"
     which the new drivers for high IOPS devices end up using because
     it's much faster than the request based one.

     The bio interface has no block layer support, since it taps into
     the stack much earlier.  This means that drivers end up having to
     implement a lot of functionality on their own, like tagging,
     timeout handling, requeue, etc.  The blk-mq interface provides all
     these.  Some drivers even provide a switch to select bio or rq and
     has code to handle both, since things like merging only works in
     the rq model and hence is faster for some workloads.  This is a
     huge mess.  Conversion of these drivers nets us a substantial code
     reduction.  Initial results on converting SCSI to this model even
     shows an 8x improvement on single queue devices.  So while the
     model was intended to work on the newer multiqueue devices, it has
     substantial improvements for "classic" hardware as well.  This code
     has gone through extensive testing and development, it's now ready
     to go.  A pull request is coming to convert virtio-blk to this
     model will be will be coming as well, with more drivers scheduled
     for 3.14 conversion.

   - Two blktrace fixes from Jan and Chen Gang.

   - A plug merge fix from Alireza Haghdoost.

   - Conversion of __get_cpu_var() from Christoph Lameter.

   - Fix for sector_div() with 64-bit divider from Geert Uytterhoeven.

   - A fix for a race between request completion and the timeout
     handling from Jeff Moyer.  This is what caused the merge conflict
     with blk-mq/core, in case you are looking at that.

   - A dm stacking fix from Mike Snitzer.

   - A code consolidation fix and duplicated code removal from Kent
     Overstreet.

   - A handful of block bug fixes from Mikulas Patocka, fixing a loop
     crash and memory corruption on blk cg.

   - Elevator switch bug fix from Tomoki Sekiyama.

  A heads-up that I had to rebase this branch.  Initially the immutable
  bio_vecs had been queued up for inclusion, but a week later, it became
  clear that it wasn't fully cooked yet.  So the decision was made to
  pull this out and postpone it until 3.14.  It was a straight forward
  rebase, just pruning out the immutable series and the later fixes of
  problems with it.  The rest of the patches applied directly and no
  further changes were made"

* 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits)
  block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  block: Do not call sector_div() with a 64-bit divisor
  kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()
  block: Consolidate duplicated bio_trim() implementations
  block: Use rw_copy_check_uvector()
  block: Enable sysfs nomerge control for I/O requests in the plug list
  block: properly stack underlying max_segment_size to DM device
  elevator: acquire q->sysfs_lock in elevator_change()
  elevator: Fix a race in elevator switching and md device initialization
  block: Replace __get_cpu_var uses
  bdi: test bdi_init failure
  block: fix a probe argument to blk_register_region
  loop: fix crash if blk_alloc_queue fails
  blk-core: Fix memory corruption if blkcg_init_queue fails
  block: fix race between request completion and timeout handling
  blktrace: Send BLK_TN_PROCESS events to all running traces
  blk-mq: don't disallow request merges for req->special being set
  blk-mq: mq plug list breakage
  blk-mq: fix for flush deadlock
  ...
2013-11-14 12:08:14 +09:00
Linus Torvalds 42a2d923cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) The addition of nftables.  No longer will we need protocol aware
    firewall filtering modules, it can all live in userspace.

    At the core of nftables is a, for lack of a better term, virtual
    machine that executes byte codes to inspect packet or metadata
    (arriving interface index, etc.) and make verdict decisions.

    Besides support for loading packet contents and comparing them, the
    interpreter supports lookups in various datastructures as
    fundamental operations.  For example sets are supports, and
    therefore one could create a set of whitelist IP address entries
    which have ACCEPT verdicts attached to them, and use the appropriate
    byte codes to do such lookups.

    Since the interpreted code is composed in userspace, userspace can
    do things like optimize things before giving it to the kernel.

    Another major improvement is the capability of atomically updating
    portions of the ruleset.  In the existing netfilter implementation,
    one has to update the entire rule set in order to make a change and
    this is very expensive.

    Userspace tools exist to create nftables rules using existing
    netfilter rule sets, but both kernel implementations will need to
    co-exist for quite some time as we transition from the old to the
    new stuff.

    Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
    worked so hard on this.

 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
    to our pseudo-random number generator, mostly used for things like
    UDP port randomization and netfitler, amongst other things.

    In particular the taus88 generater is updated to taus113, and test
    cases are added.

 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
    and Yang Yingliang.

 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
    Sujir.

 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
    Neal Cardwell, and Yuchung Cheng.

 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
    control message data, much like other socket option attributes.
    From Francesco Fusco.

 7) Allow applications to specify a cap on the rate computed
    automatically by the kernel for pacing flows, via a new
    SO_MAX_PACING_RATE socket option.  From Eric Dumazet.

 8) Make the initial autotuned send buffer sizing in TCP more closely
    reflect actual needs, from Eric Dumazet.

 9) Currently early socket demux only happens for TCP sockets, but we
    can do it for connected UDP sockets too.  Implementation from Shawn
    Bohrer.

10) Refactor inet socket demux with the goal of improving hash demux
    performance for listening sockets.  With the main goals being able
    to use RCU lookups on even request sockets, and eliminating the
    listening lock contention.  From Eric Dumazet.

11) The bonding layer has many demuxes in it's fast path, and an RCU
    conversion was started back in 3.11, several changes here extend the
    RCU usage to even more locations.  From Ding Tianhong and Wang
    Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
    Falico.

12) Allow stackability of segmentation offloads to, in particular, allow
    segmentation offloading over tunnels.  From Eric Dumazet.

13) Significantly improve the handling of secret keys we input into the
    various hash functions in the inet hashtables, TCP fast open, as
    well as syncookies.  From Hannes Frederic Sowa.  The key fundamental
    operation is "net_get_random_once()" which uses static keys.

    Hannes even extended this to ipv4/ipv6 fragmentation handling and
    our generic flow dissector.

14) The generic driver layer takes care now to set the driver data to
    NULL on device removal, so it's no longer necessary for drivers to
    explicitly set it to NULL any more.  Many drivers have been cleaned
    up in this way, from Jingoo Han.

15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.

16) Improve CRC32 interfaces and generic SKB checksum iterators so that
    SCTP's checksumming can more cleanly be handled.  Also from Daniel
    Borkmann.

17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
    using the interface MTU value.  This helps avoid PMTU attacks,
    particularly on DNS servers.  From Hannes Frederic Sowa.

18) Use generic XPS for transmit queue steering rather than internal
    (re-)implementation in virtio-net.  From Jason Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  random32: add test cases for taus113 implementation
  random32: upgrade taus88 generator to taus113 from errata paper
  random32: move rnd_state to linux/random.h
  random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
  random32: add periodic reseeding
  random32: fix off-by-one in seeding requirement
  PHY: Add RTL8201CP phy_driver to realtek
  xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
  macmace: add missing platform_set_drvdata() in mace_probe()
  ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
  ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
  vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
  ixgbe: add warning when max_vfs is out of range.
  igb: Update link modes display in ethtool
  netfilter: push reasm skb through instead of original frag skbs
  ip6_output: fragment outgoing reassembled skb properly
  MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
  net_sched: tbf: support of 64bit rates
  ixgbe: deleting dfwd stations out of order can cause null ptr deref
  ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
  ...
2013-11-13 17:40:34 +09:00
Nicolin Chen 684f0d3d14 lib/genalloc: add a helper function for DMA buffer allocation
When using pool space for DMA buffer, there might be duplicated calling of
gen_pool_alloc() and gen_pool_virt_to_phys() in each implementation.

Thus it's better to add a simple helper function, a compatible one to the
common dma_alloc_coherent(), to save some code.

Signed-off-by: Nicolin Chen <b42378@freescale.com>
Cc: "Hans J. Koch" <hjk@hansjkoch.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Kevin Hilman <khilman@deeprootsystems.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:22 +09:00
Duan Jiong ff6092a8a4 lib/digsig.c: use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:22 +09:00
Olof Johansson c0d92a57a8 lib/vsprintf.c: document formats for dentry and struct file
Looks like these were added to Documentation/printk-formats.txt but
not the in-file table.

Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:22 +09:00
Xie XiuQi d3773ba13c lib/debugobjects.c: remove unnecessary work pending test
Remove unnecessary work pending test before calling schedule_work().  It
has been tested in queue_work_on() already.  No functional changed.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:22 +09:00
Ryan Mallon 312b4e2269 vsprintf: check real user/group id for %pK
Some setuid binaries will allow reading of files which have read
permission by the real user id.  This is problematic with files which
use %pK because the file access permission is checked at open() time,
but the kptr_restrict setting is checked at read() time.  If a setuid
binary opens a %pK file as an unprivileged user, and then elevates
permissions before reading the file, then kernel pointer values may be
leaked.

This happens for example with the setuid pppd application on Ubuntu 12.04:

  $ head -1 /proc/kallsyms
  00000000 T startup_32

  $ pppd file /proc/kallsyms
  pppd: In file /proc/kallsyms: unrecognized option 'c1000000'

This will only leak the pointer value from the first line, but other
setuid binaries may leak more information.

Fix this by adding a check that in addition to the current process having
CAP_SYSLOG, that effective user and group ids are equal to the real ids.
If a setuid binary reads the contents of a file which uses %pK then the
pointer values will be printed as NULL if the real user is unprivileged.

Update the sysctl documentation to reflect the changes, and also correct
the documentation to state the kptr_restrict=0 is the default.

This is a only temporary solution to the issue.  The correct solution is
to do the permission check at open() time on files, and to replace %pK
with a function which checks the open() time permission.  %pK uses in
printk should be removed since no sane permission check can be done, and
instead protected by using dmesg_restrict.

Signed-off-by: Ryan Mallon <rmallon@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Joe Perches <joe@perches.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:14 +09:00
Greg Thelen 623fd8072c percpu: add test module for various percpu operations
Tests various percpu operations.

Enable with CONFIG_PERCPU_TEST=m.

Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:11 +09:00
Mel Gorman c78e93630d mm: do not walk all of system memory during show_mem
It has been reported on very large machines that show_mem is taking almost
5 minutes to display information.  This is a serious problem if there is
an OOM storm.  The bulk of the cost is in show_mem doing a very expensive
PFN walk to give us the following information

  Total RAM:       Also available as totalram_pages
  Highmem pages:   Also available as totalhigh_pages
  Reserved pages:  Can be inferred from the zone structure
  Shared pages:    PFN walk required
  Unshared pages:  PFN walk required
  Quick pages:     Per-cpu walk required

Only the shared/unshared pages requires a full PFN walk but that
information is useless.  It is also inaccurate as page pins of unshared
pages would be accounted for as shared.  Even if the information was
accurate, I'm struggling to think how the shared/unshared information
could be useful for debugging OOM conditions.  Maybe it was useful before
rmap existed when reclaiming shared pages was costly but it is less
relevant today.

The PFN walk could be optimised a bit but why bother as the information is
useless.  This patch deletes the PFN walker and infers the total RAM,
highmem and reserved pages count from struct zone.  It omits the
shared/unshared page usage on the grounds that it is useless.  It also
corrects the reporting of HighMem as HighMem/MovableOnly as ZONE_MOVABLE
has similar problems to HighMem with respect to lowmem/highmem exhaustion.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:09 +09:00
Linus Torvalds 39cf275a1a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle are:

   - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
     van Riel, Peter Zijlstra et al.  Yay!

   - optimize preemption counter handling: merge the NEED_RESCHED flag
     into the preempt_count variable, by Peter Zijlstra.

   - wait.h fixes and code reorganization from Peter Zijlstra

   - cfs_bandwidth fixes from Ben Segall

   - SMP load-balancer cleanups from Peter Zijstra

   - idle balancer improvements from Jason Low

   - other fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
  stop_machine: Fix race between stop_two_cpus() and stop_cpus()
  sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
  sched: Fix asymmetric scheduling for POWER7
  sched: Move completion code from core.c to completion.c
  sched: Move wait code from core.c to wait.c
  sched: Move wait.c into kernel/sched/
  sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
  sched: Avoid throttle_cfs_rq() racing with period_timer stopping
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  sched: Remove extra put_online_cpus() inside sched_setaffinity()
  sched/rt: Fix task_tick_rt() comment
  sched/wait: Fix build breakage
  sched/wait: Introduce prepare_to_wait_event()
  sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
  sched: Remove get_online_cpus() usage
  sched: Fix race in migrate_swap_stop()
  ...
2013-11-12 10:20:12 +09:00
Daniel Borkmann a6a9c0f1bf random32: add test cases for taus113 implementation
We generated a battery of 100 test cases from GSL taus113 implemention
and compare the results from a particular seed and a particular
iteration with our implementation in the kernel. We have verified on
32 and 64 bit machines that our taus113 kernel implementation gives
same results as GSL taus113 implementation:

  [    0.147370] prandom: seed boundary self test passed
  [    0.148078] prandom: 100 self tests passed

This is a Kconfig option that is disabled on default, just like the
crc32 init selftests in order to not unnecessary slow down boot process.
We also refactored out prandom_seed_very_weak() as it's now used in
multiple places in order to reduce redundant code.

GSL code we used for generating test cases:

  int i, j;
  srand(time(NULL));
  for (i = 0; i < 100; ++i) {
    int iteration = 500 + (rand() % 500);
    gsl_rng_default_seed = rand() + 1;
    gsl_rng *r = gsl_rng_alloc(gsl_rng_taus113);
    printf("\t{ %lu, ", gsl_rng_default_seed);
    for (j = 0; j < iteration - 1; ++j)
      gsl_rng_get(r);
    printf("%u, %lu },\n", iteration, gsl_rng_get(r));
    gsl_rng_free(r);
  }

Joint work with Hannes Frederic Sowa.

Cc: Florian Weimer <fweimer@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 14:32:15 -05:00
Daniel Borkmann a98814cef8 random32: upgrade taus88 generator to taus113 from errata paper
Since we use prandom*() functions quite often in networking code
i.e. in UDP port selection, netfilter code, etc, upgrade the PRNG
from Pierre L'Ecuyer's original paper "Maximally Equidistributed
Combined Tausworthe Generators", Mathematics of Computation, 65,
213 (1996), 203--213 to the version published in his errata paper [1].

The Tausworthe generator is a maximally-equidistributed generator,
that is fast and has good statistical properties [1].

The version presented there upgrades the 3 state LFSR to a 4 state
LFSR with increased periodicity from about 2^88 to 2^113. The
algorithm is presented in [1] by the very same author who also
designed the original algorithm in [2].

Also, by increasing the state, we make it a bit harder for attackers
to "guess" the PRNGs internal state. See also discussion in [3].

Now, as we use this sort of weak initialization discussed in [3]
only between core_initcall() until late_initcall() time [*] for
prandom32*() users, namely in prandom_init(), it is less relevant
from late_initcall() onwards as we overwrite seeds through
prandom_reseed() anyways with a seed source of higher entropy, that
is, get_random_bytes(). In other words, a exhaustive keysearch of
96 bit would be needed. Now, with the help of this patch, this
state-search increases further to 128 bit. Initialization needs
to make sure that s1 > 1, s2 > 7, s3 > 15, s4 > 127.

taus88 and taus113 algorithm is also part of GSL. I added a test
case in the next patch to verify internal behaviour of this patch
with GSL and ran tests with the dieharder 3.31.1 RNG test suite:

$ dieharder -g 052 -a -m 10 -s 1 -S 4137730333 #taus88
$ dieharder -g 054 -a -m 10 -s 1 -S 4137730333 #taus113

With this seed configuration, in order to compare both, we get
the following differences:

algorithm                 taus88           taus113
rands/second [**]         1.61e+08         1.37e+08
sts_serial(4, 1st run)    WEAK             PASSED
sts_serial(9, 2nd run)    WEAK             PASSED
rgb_lagged_sum(31)        WEAK             PASSED

We took out diehard_sums test as according to the authors it is
considered broken and unusable [4]. Despite that and the slight
decrease in performance (which is acceptable), taus113 here passes
all 113 tests (only rgb_minimum_distance_5 in WEAK, the rest PASSED).
In general, taus/taus113 is considered "very good" by the authors
of dieharder [5].

The papers [1][2] states a single warm-up step is sufficient by
running quicktaus once on each state to ensure proper initialization
of ~s_{0}:

Our selection of (s) according to Table 1 of [1] row 1 holds the
condition L - k <= r - s, that is,

  (32 32 32 32) - (31 29 28 25) <= (25 27 15 22) - (18 2 7 13)

with r = k - q and q = (6 2 13 3) as also stated by the paper.
So according to [2] we are safe with one round of quicktaus for
initialization. However we decided to include the warm-up phase
of the PRNG as done in GSL in every case as a safety net. We also
use the warm up phase to make the output of the RNG easier to
verify by the GSL output.

In prandom_init(), we also mix random_get_entropy() into it, just
like drivers/char/random.c does it, jiffies ^ random_get_entropy().
random-get_entropy() is get_cycles(). xor is entropy preserving so
it is fine if it is not implemented by some architectures.

Note, this PRNG is *not* used for cryptography in the kernel, but
rather as a fast PRNG for various randomizations i.e. in the
networking code, or elsewhere for debugging purposes, for example.

[*]: In order to generate some "sort of pseduo-randomness", since
get_random_bytes() is not yet available for us, we use jiffies and
initialize states s1 - s3 with a simple linear congruential generator
(LCG), that is x <- x * 69069; and derive s2, s3, from the 32bit
initialization from s1. So the above quote from [3] accounts only
for the time from core to late initcall, not afterwards.
[**] Single threaded run on MacBook Air w/ Intel Core i5-3317U

 [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps
 [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
 [3] http://thread.gmane.org/gmane.comp.encryption.general/12103/
 [4] http://code.google.com/p/dieharder/source/browse/trunk/libdieharder/diehard_sums.c?spec=svn490&r=490#20
 [5] http://www.phy.duke.edu/~rgb/General/dieharder.php

Joint work with Hannes Frederic Sowa.

Cc: Florian Weimer <fweimer@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 14:32:15 -05:00
Hannes Frederic Sowa 4af712e8df random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
The Tausworthe PRNG is initialized at late_initcall time. At that time the
entropy pool serving get_random_bytes is not filled sufficiently. This
patch adds an additional reseeding step as soon as the nonblocking pool
gets marked as initialized.

On some machines it might be possible that late_initcall gets called after
the pool has been initialized. In this situation we won't reseed again.

(A call to prandom_seed_late blocks later invocations of early reseed
attempts.)

Joint work with Daniel Borkmann.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 14:32:14 -05:00
Hannes Frederic Sowa 6d31920246 random32: add periodic reseeding
The current Tausworthe PRNG is never reseeded with truly random data after
the first attempt in late_initcall. As this PRNG is used for some critical
random data as e.g. UDP port randomization we should try better and reseed
the PRNG once in a while with truly random data from get_random_bytes().

When we reseed with prandom_seed we now make also sure to throw the first
output away. This suffices the reseeding procedure.

The delay calculation is based on a proposal from Eric Dumazet.

Joint work with Daniel Borkmann.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 14:32:14 -05:00
Daniel Borkmann 51c37a70aa random32: fix off-by-one in seeding requirement
For properly initialising the Tausworthe generator [1], we have
a strict seeding requirement, that is, s1 > 1, s2 > 7, s3 > 15.

Commit 697f8d0348 ("random32: seeding improvement") introduced
a __seed() function that imposes boundary checks proposed by the
errata paper [2] to properly ensure above conditions.

However, we're off by one, as the function is implemented as:
"return (x < m) ? x + m : x;", and called with __seed(X, 1),
__seed(X, 7), __seed(X, 15). Thus, an unwanted seed of 1, 7, 15
would be possible, whereas the lower boundary should actually
be of at least 2, 8, 16, just as GSL does. Fix this, as otherwise
an initialization with an unwanted seed could have the effect
that Tausworthe's PRNG properties cannot not be ensured.

Note that this PRNG is *not* used for cryptography in the kernel.

 [1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
 [2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps

Joint work with Hannes Frederic Sowa.

Fixes: 697f8d0348 ("random32: seeding improvement")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 14:32:14 -05:00
Linus Torvalds 8b5baa460b The main feature of interest this time is quota updates. There are
some clean ups and some patches to use the new generic lru list
 code. There is still plenty of scope for some further changes in
 due course - faster lookups of quota structures is very much
 on the todo list. Also, a start has been made towards the more tricky
 issue of using the generic lru code with glocks, but that will
 have to be completed in a subsequent merge window.
 
 The other, more minor feature, is that there have been a number of
 performance patches which relate to block allocation. In particular
 they will improve performance when the disk is nearly full.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSeQMSAAoJEMrg3m4a/8jS4/EP/AtkfsT+GATPmK2R3Yoy0Hrb
 4KucaloOtlUmSsVwTpzYGYZaqJo2D5BndJWw9jekPJOS4aB5CbE1ZYCMIKyuhhr4
 Y70kjfGlwK5hRSItPJ5gHnWkiTZzR65wBLj/+EBAFm2gF3UsJ4DJvLNvd8DP9SJC
 3IfYfqV6cPa7aPDhmEbdq5h0X5iSI+Ee/X2Z3a6fe7rErR1cD4iAEFEyPHa0aHgt
 TkrS32DodOn/J26PvUFq5MUb+El+Ul6EpeB3CC8UN0+pvucAKCMVy8+sPROTbViz
 mMRyWxHHHPDEEulFPWJlXW/tOAhHMHTPGbnWu4bH+iDudzOHif7E0tWklPR9bJAY
 4/1Fa4ILIxV02kdGBHxO74Vv/ir4gyLzzXCPbXOpxu5jMw3VdN9dp0/Uck+rmsya
 rC5Q/8vm4AUO7YYHUBEEaT7Nqp8HRRWGwv11Wdyqf3RQyC5jYHNEXYJkdMsODEae
 p+Ju/O6MfLw68IrG38RaGT4/tCBPonggsCVxwqNxqyDnjtNEpO/o3VjJMJ/3j2b5
 CCRx+9JYENT8EsdpIFWasfABy66xbKPTE9RiMUbk1e73julXLfzIMI3/Ol/Bj7rQ
 YLs5XYrKcz3QfYgMvNS3nMbI3w3nJrCnzdV7STps8nyaOa1oQndGxe9b7tDb+Fb8
 /acuYuQclbvsAkzvH4jc
 =qwXe
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw

Pull gfs2 updates from Steven Whitehouse:
 "The main feature of interest this time is quota updates.  There are
  some clean ups and some patches to use the new generic lru list code.

  There is still plenty of scope for some further changes in due course -
  faster lookups of quota structures is very much on the todo list.
  Also, a start has been made towards the more tricky issue of using the
  generic lru code with glocks, but that will have to be completed in a
  subsequent merge window.

  The other, more minor feature, is that there have been a number of
  performance patches which relate to block allocation.  In particular
  they will improve performance when the disk is nearly full"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: Use generic list_lru for quota
  GFS2: Rename quota qd_lru_lock qd_lock
  GFS2: Use reflink for quota data cache
  GFS2: Use lockref for glocks
  GFS2: Protect quota sync generation
  GFS2: Inline qd_trylock into gfs2_quota_unlock
  GFS2: Make two similar quota code fragments into a function
  GFS2: Remove obsolete quota tunable
  GFS2: Move gfs2_icbit_munge into quota.c
  GFS2: Speed up starting point selection for block allocation
  GFS2: Add allocation parameters structure
  GFS2: Clean up reservation removal
  GFS2: fix dentry leaks
  GFS2: new function gfs2_rbm_incr
  GFS2: Introduce rbm field bii
  GFS2: Do not reset flags on active reservations
  GFS2: introduce bi_blocks for optimization
  GFS2: optimize rbm_from_block wrt bi_start
  GFS2: d_splice_alias() can't return error
2013-11-11 07:11:00 +09:00
Konrad Rzeszutek Wilk e1d8f62ad4 Merge remote-tracking branch 'stefano/swiotlb-xen-9.1' into stable/for-linus-3.13
* stefano/swiotlb-xen-9.1:
  swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs
  swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary
  grant-table: call set_phys_to_machine after mapping grant refs
  arm,arm64: do not always merge biovec if we are running on Xen
  swiotlb: print a warning when the swiotlb is full
  swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device
  xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device
  swiotlb-xen: use xen_alloc/free_coherent_pages
  xen: introduce xen_alloc/free_coherent_pages
  arm64/xen: get_dma_ops: return xen_dma_ops if we are running as xen_initial_domain
  arm/xen: get_dma_ops: return xen_dma_ops if we are running as xen_initial_domain
  swiotlb-xen: introduce xen_swiotlb_set_dma_mask
  xen/arm,arm64: enable SWIOTLB_XEN
  xen: make xen_create_contiguous_region return the dma address
  xen/x86: allow __set_phys_to_machine for autotranslate guests
  arm/xen,arm64/xen: introduce p2m
  arm64: define DMA_ERROR_CODE
  arm: make SWIOTLB available

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Conflicts:
	arch/arm/include/asm/dma-mapping.h
	drivers/xen/swiotlb-xen.c

[Conflicts arose b/c "arm: make SWIOTLB available" v8 was in Stefano's
branch, while I had v9 + Ack from Russel. I also fixed up white-space
issues]
2013-11-08 16:10:48 -05:00
Konrad Rzeszutek Wilk bad97817de Linux 3.12-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSWyGgAAoJEHm+PkMAQRiGA8MH/35upHXImoRCsI5uC1qvHJtI
 QvQAhDFxoEXbFUKeaYgTfcM8q9FgnqnfjhLf8eYa4Q7tDZeqLXOE8bkI807mSZMl
 yECr3jcwlV+zyhV2MP/HdwTjzy25bwxLM3Zy43S7QROrYoMHZYznil/QPfyMATCJ
 XLPuXZC1FtuUen89n4BoDIuL8QaVrIR/zLqFklAQcdTcGpLHSOwFtH8gb2WaRLhv
 +4IikFRFgTNZiMR5tP0GPc6UH6TVTvRb4QKSqqa7J8OmfAIvOzAUdhqWSPOIwWwt
 Z/+JFxFDczAcNmpv4gE6jkgc2vR8CVeHsvh0j61RDSFObBWspwk337CSyUZxYSA=
 =w4VQ
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc5' into stable/for-linus-3.13

Linux 3.12-rc5

Because the Stefano branch (for SWIOTLB ARM changes) is based on that.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

* tag 'v3.12-rc5': (550 commits)
  Linux 3.12-rc5
  watchdog: sunxi: Fix section mismatch
  watchdog: kempld_wdt: Fix bit mask definition
  watchdog: ts72xx_wdt: locking bug in ioctl
  ARM: exynos: dts: Update 5250 arch timer node with clock frequency
  parisc: let probe_kernel_read() capture access to page zero
  parisc: optimize variable initialization in do_page_fault
  parisc: fix interruption handler to respect pagefault_disable()
  parisc: mark parisc_terminate() noreturn and cold.
  parisc: remove unused syscall_ipi() function.
  parisc: kill SMP single function call interrupt
  parisc: Export flush_cache_page() (needed by lustre)
  vfs: allow O_PATH file descriptors for fstatfs()
  ext4: fix memory leak in xattr
  ARC: Ignore ptrace SETREGSET request for synthetic register "stop_pc"
  ALSA: hda - Sony VAIO Pro 13 (haswell) now has a working headset jack
  ALSA: hda - Add a headset mic model for ALC269 and friends
  ALSA: hda - Fix microphone for Sony VAIO Pro 13 (Haswell model)
  compiler/gcc4: Add quirk for 'asm goto' miscompilation bug
  Revert "i915: Update VGA arbiter support for newer devices"
  ...
2013-11-08 15:28:05 -05:00
Jens Axboe e37459b8e2 Merge branch 'blk-mq/core' into for-3.13/core
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-timeout.c
2013-11-08 09:08:12 -07:00
Nicholas Bellinger c9e8d128fe percpu-refcount: Add EXPORT_SYMBOL to use percpu_ref from modules
This patch adds EXPORT_SYMBOL() for percpu_ref_init(),
percpu_ref_cancel_init() and percpu_ref_kill_and_confirm() so
that percpu refcounting can be used by external modules.

Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-07 12:14:52 -08:00
Linus Torvalds a1212d278c Revert "sysfs: drop kobj_ns_type handling"
This reverts commit cb26a31157.

It mysteriously causes NetworkManager to not find the wireless device
for me.  As far as I can tell, Tejun *meant* for this commit to not make
any semantic changes, but there clearly are some.  So revert it, taking
into account some of the calling convention changes that happened in
this area in subsequent commits.

Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-07 20:47:28 +09:00
Linus Torvalds 56edff7529 TTY/Serial driver updates for 3.13-rc1
Here's the big tty/serial driver update for 3.13-rc1.
 
 There's some more minor n_tty work here, but nothing like previous
 kernel releases.  Also some new driver ids, driver updates for new
 hardware, and other small things.
 
 All of this has been in linux-next for a while with no issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJ6xnEACgkQMUfUDdst+ylj1QCfaIzUNuFfzTmyB6K6iZrRNhCc
 WPgAnRLkkEpI/3rRo7jKwGQsuIuyybUu
 =r149
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here's the big tty/serial driver update for 3.13-rc1.

  There's some more minor n_tty work here, but nothing like previous
  kernel releases.  Also some new driver ids, driver updates for new
  hardware, and other small things.

  All of this has been in linux-next for a while with no issues"

* tag 'tty-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (84 commits)
  serial: omap: fix missing comma
  serial: sh-sci: Enable the driver on all ARM platforms
  serial: mfd: Staticize local symbols
  serial: omap: fix a few checkpatch warnings
  serial: omap: improve RS-485 performance
  mrst_max3110: fix unbalanced IRQ issue during resume
  serial: omap: Add support for optional wake-up
  serial: sirf: remove duplicate defines
  tty: xuartps: Fix build error when COMMON_CLK is not set
  tty: xuartps: Fix build error due to missing forward declaration
  tty: xuartps: Fix "may be used uninitialized" build warning
  serial: 8250_pci: add Pericom PCIe Serial board Support (12d8:7952/4/8) - Chip PI7C9X7952/4/8
  tty: xuartps: Update copyright information
  tty: xuartps: Implement suspend/resume callbacks
  tty: xuartps: Dynamically adjust to input frequency changes
  tty: xuartps: Updating set_baud_rate()
  tty: xuartps: Force enable the UART in xuartps_console_write
  tty: xuartps: support 64 byte FIFO size
  tty: xuartps: Add polled mode support for xuartps
  tty: xuartps: Implement BREAK detection, add SYSRQ support
  ...
2013-11-07 12:17:06 +09:00
Linus Torvalds 0324e74534 Driver Core / sysfs patches for 3.13-rc1
Here's the big driver core / sysfs update for 3.13-rc1.
 
 There's lots of dev_groups updates for different subsystems, as they all
 get slowly migrated over to the safe versions of the attribute groups
 (removing userspace races with the creation of the sysfs files.)  Also
 in here are some kobject updates, devres expansions, and the first round
 of Tejun's sysfs reworking to enable it to be used by other subsystems
 as a backend for an in-kernel filesystem.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJ6xAMACgkQMUfUDdst+yk1kQCfcHXhfnrvFZ5J/mDP509IzhNS
 ddEAoLEWoivtBppNsgrWqXpD1vi4UMsE
 =JmVW
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core / sysfs patches from Greg KH:
 "Here's the big driver core / sysfs update for 3.13-rc1.

  There's lots of dev_groups updates for different subsystems, as they
  all get slowly migrated over to the safe versions of the attribute
  groups (removing userspace races with the creation of the sysfs
  files.) Also in here are some kobject updates, devres expansions, and
  the first round of Tejun's sysfs reworking to enable it to be used by
  other subsystems as a backend for an in-kernel filesystem.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (83 commits)
  sysfs: rename sysfs_assoc_lock and explain what it's about
  sysfs: use generic_file_llseek() for sysfs_file_operations
  sysfs: return correct error code on unimplemented mmap()
  mdio_bus: convert bus code to use dev_groups
  device: Make dev_WARN/dev_WARN_ONCE print device as well as driver name
  sysfs: separate out dup filename warning into a separate function
  sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c
  sysfs: remove unused sysfs_get_dentry() prototype
  sysfs: honor bin_attr.attr.ignore_lockdep
  sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr
  devres: restore zeroing behavior of devres_alloc()
  sysfs: fix sysfs_write_file for bin file
  input: gameport: convert bus code to use dev_groups
  input: serio: remove bus usage of dev_attrs
  input: serio: use DEVICE_ATTR_RO()
  i2o: convert bus code to use dev_groups
  memstick: convert bus code to use dev_groups
  tifm: convert bus code to use dev_groups
  virtio: convert bus code to use dev_groups
  ipack: convert bus code to use dev_groups
  ...
2013-11-07 11:42:15 +09:00
Peter Zijlstra 32cf7c3c94 locking: Move the percpu-rwsem code to kernel/locking/
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-52bjmtty46we26hbfd9sc9iy@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 09:24:22 +01:00
Peter Zijlstra ed428bfc3c locking: Move the rwsem code to kernel/locking/
Notably: changed lib/rwsem* targets from lib- to obj-, no idea about
the ramifications of that.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-g0kynfh5feriwc6p3h6kpbw6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 09:24:18 +01:00
Peter Zijlstra 60fc28746a locking: Move the spinlock code to kernel/locking/
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-b81ol0z3mon45m51o131yc9j@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 07:55:21 +01:00
Ingo Molnar c90423d1de Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ file move
Conflicts:
	kernel/Makefile

There are conflicts in kernel/Makefile due to file moving in the
scheduler tree - resolve them.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 07:50:37 +01:00
Daniel Borkmann 165148396d lib: crc32: reduce number of cases for crc32{, c}_combine
We can safely reduce the number of test cases by a tenth.
There is no particular need to run as many as we're running
now for crc32{,c}_combine, that gives us still ~8000 tests
we're doing if people run kernels with crc selftests enabled
which is perfectly fine.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 15:27:08 -05:00
Daniel Borkmann cc0ac19995 lib: crc32: conditionally resched when running testcases
Fengguang reports that when crc32 selftests are running on startup, on
some e.g. 32bit systems, we can get a CPU stall like "INFO: rcu_sched
self-detected stall on CPU { 0} (t=2101 jiffies g=4294967081 c=4294967080
q=41)". As this is not intended, add a cond_resched() at the end of a
test case to fix it. Introduced by efba721f63 ("lib: crc32: add test cases
for crc32{, c}_combine routines").

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 15:27:08 -05:00
David S. Miller 394efd19d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/netconsole.c
	net/bridge/br_private.h

Three mostly trivial conflicts.

The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.

In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".

Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 13:48:30 -05:00
Daniel Borkmann efba721f63 lib: crc32: add test cases for crc32{, c}_combine routines
We already have 100 test cases for crcs itself, so split the test
buffer with a-prio known checksums, and test crc of two blocks
against crc of the whole block for the same results.

Output/result with CONFIG_CRC32_SELFTEST=y:

  [    2.687095] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
  [    2.687097] crc32: self tests passed, processed 225944 bytes in 278177 nsec
  [    2.687383] crc32c: CRC_LE_BITS = 64
  [    2.687385] crc32c: self tests passed, processed 225944 bytes in 141708 nsec
  [    7.336771] crc32_combine: 113072 self tests passed
  [   12.050479] crc32c_combine: 113072 self tests passed
  [   17.633089] alg: No test for crc32 (crc32-pclmul)

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:56 -05:00
Daniel Borkmann 6e95fcaa42 lib: crc32: add functionality to combine two crc32{, c}s in GF(2)
This patch adds a combinator to merge two or more crc32{,c}s
into a new one. This is useful for checksum computations of
fragmented skbs that use crc32/crc32c as checksums.

The arithmetics for combining both in the GF(2) was taken and
slightly modified from zlib. Only passing two crcs is insufficient
as two crcs and the length of the second piece is needed for
merging. The code is made generic, so that only polynomials
need to be passed for crc32_le resp. crc32c_le.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:56 -05:00
Daniel Borkmann d921e049a0 lib: crc32: clean up spacing in test cases
This is nothing more but a whitepace cleanup, as 80 chars is not a
hard but soft limit, and otherwise makes the test cases array really
look ugly. So fix it up.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:56 -05:00
Ingo Molnar fb10d5b7ef Merge branch 'linus' into sched/core
Resolve cherry-picking conflicts:

Conflicts:
	mm/huge_memory.c
	mm/memory.c
	mm/mprotect.c

See this upstream merge commit for more details:

  52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-01 08:24:41 +01:00
Ming Lei 3d77b50c58 lib/scatterlist.c: don't flush_kernel_dcache_page on slab page
Commit b1adaf65ba ("[SCSI] block: add sg buffer copy helper
functions") introduces two sg buffer copy helpers, and calls
flush_kernel_dcache_page() on pages in SG list after these pages are
written to.

Unfortunately, the commit may introduce a potential bug:

 - Before sending some SCSI commands, kmalloc() buffer may be passed to
   block layper, so flush_kernel_dcache_page() can see a slab page
   finally

 - According to cachetlb.txt, flush_kernel_dcache_page() is only called
   on "a user page", which surely can't be a slab page.

 - ARCH's implementation of flush_kernel_dcache_page() may use page
   mapping information to do optimization so page_mapping() will see the
   slab page, then VM_BUG_ON() is triggered.

Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
before calling flush_kernel_dcache_page().

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Tested-by: Simon Baatz <gmbnomis@gmail.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>	[3.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 16:58:13 -07:00
Linus Torvalds 2a999aa0a1 Kconfig: make KOBJECT_RELEASE debugging require timer debugging
Without the timer debugging, the delayed kobject release will just
result in undebuggable oopses if it triggers any latent bugs.  That
doesn't actually help debugging at all.

So make DEBUG_KOBJECT_RELEASE depend on DEBUG_OBJECTS_TIMERS to avoid
having people enable one without the other.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-29 08:33:36 -07:00
Shaohua Li 1dddc01af0 percpu_ida: add an API to return free tags
Add an API to return free tags, blk-mq-tag will use it.

Note, this just returns a snapshot of free tags number. blk-mq-tag has
two usages of it. One is for info output for diagnosis. The other is to
quickly check if there are free tags for request dispatch checking.
Neither requires very precise.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25 11:56:00 +01:00
Shaohua Li 7fc2ba17e8 percpu_ida: add percpu_ida_for_each_free
Add a new API to iterate free ids. blk-mq-tag will use it.

Note, this doesn't guarantee to iterate all free ids restrictly. Caller
should be aware of this. blk-mq uses it to do sanity check for request
timedout, so can tolerate the limitation.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25 11:55:59 +01:00
Shaohua Li e26b53d0b2 percpu_ida: make percpu_ida percpu size/batch configurable
Make percpu_ida percpu size/batch configurable. The block-mq-tag will
use it.

After block-mq uses percpu_ida to manage tags, performance is improved.
My test is done in a 2 sockets machine, 12 process cross the 2 sockets.
So if there is lock contention or ipi, should be stressed heavily.
Testing is done for null-blk.

hw_queue_depth	nopatch iops	patch iops
64		~800k/s		~1470k/s
2048		~4470k/s	~4340k/s

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25 11:55:59 +01:00
Shaohua Li 098faf5805 percpu_counter: make APIs irq safe
In my usage, sometimes the percpu APIs are called with irq locked,
sometimes not. lockdep complains there is potential deadlock. Let's
always use percpucounter lock in irq safe way. There should be no
performance penality, as all those are slow code path.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25 11:55:59 +01:00
Stefano Stabellini 783d028104 swiotlb: print a warning when the swiotlb is full
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Changes in v7:
- use dev_warn instead of pr_warn.
2013-10-25 10:33:26 +00:00
Thierry Reding ce5be5a163 tracing/events: Fix swiotlb tracepoint creation
Tracepoints are only created when Xen support is enabled, but they are
also referenced within lib/swiotlb.c. So unless Xen support is enabled
the tracepoints will be missing, therefore causing builds to fail. Fix
this by moving the tracepoint creation to lib/swiotlb.c, which works
nicely because the Xen swiotlb support selects the generic swiotlb
support.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-10-24 10:37:06 -04:00
Greg Kroah-Hartman a7204d72db Merge 3.12-rc6 into driver-core-next
We want these fixes here too.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-19 13:05:38 -07:00
Matias Bjorling 5e9dd373de percpu_refcount: export symbols
Export the interface to be used within modules.

Signed-off-by: Matias Bjorling <m@bjorling.me>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:53 -07:00
Ben Hutchings 8eaede49df sysrq: Allow magic SysRq key functions to be disabled through Kconfig
Turn the initial value of sysctl kernel.sysrq (SYSRQ_DEFAULT_ENABLE)
into a Kconfig variable.

Original version by Bastian Blank <waldi@debian.org>.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-16 13:01:44 -07:00
Steven Whitehouse e66cf16109 GFS2: Use lockref for glocks
Currently glocks have an atomic reference count and also a spinlock
which covers various internal fields, such as the state. This intent of
this patch is to replace the spinlock and the atomic reference count
with a lockref structure. This contains a spinlock which we can continue
to use as before, and a reference counter which is used in conjuction
with the spinlock to replace the previous atomic counter.

As a result of this there are some new rules for reference counting on
glocks. We need to distinguish between reference count changes under
gl_spin (which are now just increment or decrement of the new counter,
provided the count cannot hit zero) and those which are outside of
gl_spin, but which now take gl_spin internally.

The conversion is relatively straight forward. There is probably some
further clean up which can be done, but the priority at this stage is to
make the change in as simple a manner as possible.

A consequence of this change is that the reference count is being
decoupled from the lru list processing. This should allow future
adoption of the lru_list code with glocks in due course.

The reason for using the "dead" state and not just relying on 0 being
the "invalid state" is so that in due course 0 ref counts can be
allowable. The intent is to eventually be able to remove the ref count
changes which are currently hidden away in state_change().

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-15 15:18:08 +01:00
Fengguang Wu 1461c5be7b kobject: show debug info on delayed kobject release
Useful for locating buggy drivers on kernel oops.

It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is
hopefully only enabled in debug kernels (like maybe the Fedora rawhide
one, or at developers), so being a bit more verbose is likely ok.

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-11 16:30:10 -07:00
Ingo Molnar ec0ad3d01f Merge branch 'core/urgent' into sched/core
Merge in asm goto fix, to be able to apply the asm/rmwcc.h fix.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-11 07:39:37 +02:00
Fengguang Wu 0ff18e3734 kobject: show debug info on delayed kobject release
Useful for locating buggy drivers on kernel oops.

It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is
hopefully only enabled in debug kernels (like maybe the Fedora rawhide
one, or at developers), so being a bit more verbose is likely ok.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-10 13:56:52 -07:00
Ingo Molnar 37bf06375c Linux 3.12-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSUc9zAAoJEHm+PkMAQRiG9DMH/AtpuAF6LlMRPjrCeuJQ1pyh
 T0IUO+CsLKO6qtM5IyweP8V6zaasNjIuW1+B6IwVIl8aOrM+M7CwRiKvpey26ldM
 I8G2ron7hqSOSQqSQs20jN2yGAqQGpYIbTmpdGLAjQ350NNNvEKthbP5SZR5PAmE
 UuIx5OGEkaOyZXvCZJXU9AZkCxbihlMSt2zFVxybq2pwnGezRUYgCigE81aeyE0I
 QLwzzMVdkCxtZEpkdJMpLILAz22jN4RoVDbXRa2XC7dA9I2PEEXI9CcLzqCsx2Ii
 8eYS+no2K5N2rrpER7JFUB2B/2X8FaVDE+aJBCkfbtwaYTV9UYLq3a/sKVpo1Cs=
 =xSFJ
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc4' into sched/core

Merge Linux v3.12-rc4 to fix a conflict and also to refresh the tree
before applying more scheduler patches.

Conflicts:
	arch/avr32/include/asm/Kbuild

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 12:36:13 +02:00
Tejun Heo 26ea12dec0 kobject: grab an extra reference on kobject->sd to allow duplicate deletes
sysfs currently has a rather weird behavior regarding removals.  A
directory removal would delete all files directly under it but
wouldn't recurse into subdirectories, which, while a bit inconsistent,
seems to make sense at the first glance as each directory is
supposedly associated with a kobject and each kobject can take care of
the directory deletion; however, this doesn't really hold as we have
groups which can be directories without a kobject associated with it
and require explicit deletions.

We're in the process of separating out sysfs from kboject / driver
core and want a consistent behavior.  A removal should delete either
only the specified node or everything under it.  I think it is helpful
to support recursive atomic removal and later patches will implement
it.

Such change means that a sysfs_dirent associated with kobject may be
deleted before the kobject itself is removed if one of its ancestor
gets removed before it.  As sysfs_remove_dir() puts the base ref, we
may end up with dangling pointer on descendants.  This can be solved
by holding an extra reference on the sd from kobject.

Acquire an extra reference on the associated sysfs_dirent on directory
creation and put it after removal.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03 16:38:52 -07:00
Nick Swenson 6b37838239 percpu_ida: Removing unused arguement from alloc_local_tag
Removing unused struct percpu_ida *pool from arguements of
alloc_local_tag, changed it's one use in percpu_ida.c

(nab: Fixed reference of idr.c -> percpu_ida.c)

Signed-Off-By: Nick Swenson <nks@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-03 04:30:16 -07:00
Zoltan Kiss 2b2b614dd2 tracing/events: Add bounce tracing to swiotbl
Ftrace is currently not able to detect when SWIOTLB has to do double buffering.
Under Xen you can only see it indirectly in function_graph, when
xen_swiotlb_map_page() doesn't stop after range_straddles_page_boundary(), but
calls spinlock functions, memcpy() and xen_phys_to_bus() as well. This patch
introduces the swiotlb:swiotlb_bounced event, which also prints out the
following informations to help you find out why bouncing happened:

dev_name: 0000:08:00.0 dma_mask=ffffffffffffffff dev_addr=9149f000 size=32768
swiotlb_force=0

If you use Xen, and (dev_addr + size + 1) > dma_mask, the buffer is out of the
device's DMA range. If swiotlb_force == 1, you should really change the kernel
parameters. Otherwise, the buffer is not contiguous in mfn space.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
[v1: Don't print 'swiotlb_force=X', just print swiotlb_force if it is enabled]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-10-02 12:53:26 -04:00
Linus Torvalds c31eeaced2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking changes from David Miller:

 1) Multiply in netfilter IPVS can overflow when calculating destination
    weight.  From Simon Kirby.

 2) Use after free fixes in IPVS from Julian Anastasov.

 3) SFC driver bug fixes from Daniel Pieczko.

 4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.

 5) Locking and encapsulation fixes to serial line CAN driver, from
    Andrew Naujoks.

 6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
    Eilon Greenstein, and Ariel Elior.

 7) In lapb, if no other packets are outstanding, T1 timeouts actually
    stall things and no packet gets sent.  Fix from Josselin Costanzi.

 8) ICMP redirects should not make it to the socket error queues, from
    Duan Jiong.

 9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.

10) Fix setting of VLAN priority field on via-rhine driver, from Roget
    Luethi.

11) Fix TX stalls and VLAN promisc programming in be2net driver from
    Ajit Khaparde.

12) Packet padding doesn't get handled correctly in new usbnet SG
    support code, from Ming Lei.

13) Fix races in netdevice teardown wrt.  network namespace closing.
    From Eric W.  Biederman.

14) Fix potential missed initialization of net_secret if not TCP
    connections are openned.  From Eric Dumazet.

15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
    Aleksander Morgado.

16) skb_cow_head() can change skb->data and thus packet header pointers,
    don't use stale ip_hdr reference in ip_tunnel code.

17) Backend state transition handling fixes in xen-netback, from Paul
    Durrant.

18) Packet offset for AH protocol is handled wrong in flow dissector,
    from Eric Dumazet.

19) Taking down an fq packet scheduler instance can leave stale packets
    in the queues, fix from Eric Dumazet.

20) Fix performance regressions introduced by TCP Small Queues.  From
    Eric Dumazet.

21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
    Hannes Frederic Sowa.

22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
    reference to the ipv4/ipv6 specific network device state, so use the
    reference put that will check and release the object if the
    reference hits zero.  From Salam Noureddine.

23) Fix memory corruption in ip_tunnel driver, and use skb_push()
    instead of __skb_push() so that similar bugs are less hard to find.
    From Steffen Klassert.

24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
    Nicolas Dichtel.

25) fq scheduler doesn't accurately rate limit in certain circumstances,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
  pkt_sched: fq: rate limiting improvements
  ip6tnl: allow to use rtnl ops on fb tunnel
  sit: allow to use rtnl ops on fb tunnel
  ip_tunnel: Remove double unregister of the fallback device
  ip_tunnel_core: Change __skb_push back to skb_push
  ip_tunnel: Add fallback tunnels to the hash lists
  ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
  qlcnic: Fix SR-IOV configuration
  ll_temac: Reset dma descriptors indexes on ndo_open
  skbuff: size of hole is wrong in a comment
  ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
  ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
  ethernet: moxa: fix incorrect placement of __initdata tag
  ipv6: gre: correct calculation of max_headroom
  powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
  Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
  bonding: Fix broken promiscuity reference counting issue
  tcp: TSQ can use a dynamic limit
  dm9601: fix IFF_ALLMULTI handling
  pkt_sched: fq: qdisc dismantle fixes
  ...
2013-10-01 12:58:48 -07:00
Greg Kroah-Hartman 88502b9c0a Merge 3.12-rc3 into driver-core-next
We want the driver core and sysfs fixes in here to make merges and
development easier.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-29 18:29:23 -07:00
Heiko Carstens 491f6f8e5f lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP()
Make use of arch_mutex_cpu_relax() so architectures can override the
default cpu_relax() semantics.
This is especially useful for s390, where cpu_relax() means that we
yield() the current (virtual) cpu and therefore is very expensive,
and would contradict the whole purpose of the lockless cmpxchg loop.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-28 12:46:24 +02:00