Commit Graph

9698 Commits

Author SHA1 Message Date
minskey guo cf23422b9d cpu/mem hotplug: enable CPUs online before local memory online
Enable users to online CPUs even if the CPUs belongs to a numa node which
doesn't have onlined local memory.

The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
in system boot/init period, or at the time of local memory online.  For a
numa node without onlined local memory, its zonelists are not initialized
at present.  As a result, any memory allocation operations executed by
CPUs within this node will fail.  In fact, an out-of-memory error is
triggered when attempt to online CPUs before memory comes to online.

This patch tries to create zonelists for such numa nodes, so that the
memory allocation for this node can be fallback'ed to other nodes.

[akpm@linux-foundation.org: remove unneeded export]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: minskey guo<chaohong.guo@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:00 -07:00
Mel Gorman 5e77190580 mm: compaction: add a tunable that decides when memory should be compacted and when it should be reclaimed
The kernel applies some heuristics when deciding if memory should be
compacted or reclaimed to satisfy a high-order allocation.  One of these
is based on the fragmentation.  If the index is below 500, memory will not
be compacted.  This choice is arbitrary and not based on data.  To help
optimise the system and set a sensible default for this value, this patch
adds a sysctl extfrag_threshold.  The kernel will only compact memory if
the fragmentation index is above the extfrag_threshold.

[randy.dunlap@oracle.com: Fix build errors when proc fs is not configured]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
Mel Gorman 76ab0f530e mm: compaction: add /proc trigger for memory compaction
Add a proc file /proc/sys/vm/compact_memory.  When an arbitrary value is
written to the file, all zones are compacted.  The expected user of such a
trigger is a job scheduler that prepares the system before the target
application runs.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
Miao Xie c0ff7453bb cpuset,mm: fix no node to alloc memory when changing cpuset's mems
Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later.  But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits).  So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:57 -07:00
Miao Xie 708c1bbc9d mempolicy: restructure rebinding-mempolicy functions
Nick Piggin reported that the allocator may see an empty nodemask when
changing cpuset's mems[1].  It happens only on the kernel that do not do
atomic nodemask_t stores.  (MAX_NUMNODES > BITS_PER_LONG)

But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores.  The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free
memory.  The reason is like this:

(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

I can use the attached program reproduce it by the following step:

# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> &
   <nr_tasks> = max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh

several hours later, oom will happen though there is a lot of free memory.

This patchset fixes this problem by expanding the nodes range first(set
newly allowed bits) and shrink it lazily(clear newly disallowed bits).  So
we use a variable to tell the write-side task that read-side task is
reading nodemask, and the write-side task clears newly disallowed nodes
after read-side task ends the current memory allocation.

This patch:

In order to fix no node to alloc memory, when we want to update mempolicy
and mems_allowed, we expand the set of nodes first (set all the newly
nodes) and shrink the set of nodes lazily(clean disallowed nodes), But the
mempolicy's rebind functions may breaks the expanding.

So we restructure the mempolicy's rebind functions and split the rebind
work to two steps, just like the update of cpuset's mems: The 1st step:
expand the set of the mempolicy's nodes.  The 2nd step: shrink the set of
the mempolicy's nodes.  It is used when there is no real lock to protect
the mempolicy in the read-side.  Otherwise we can do rebind work at once.

In order to implement it, we define

	enum mpol_rebind_step {
		MPOL_REBIND_ONCE,
		MPOL_REBIND_STEP1,
		MPOL_REBIND_STEP2,
		MPOL_REBIND_NSTEP,
	};

If the mempolicy needn't be updated by two steps, we can pass
MPOL_REBIND_ONCE to the rebind functions.  Or we can pass
MPOL_REBIND_STEP1 to do the first step of the rebind work and pass
MPOL_REBIND_STEP2 to do the second step work.

Besides that, it maybe long time between these two step and we have to
release the lock that protects mempolicy and mems_allowed.  If we hold the
lock once again, we must check whether the current mempolicy is under the
rebinding (the first step has been done) or not, because the task may
alloc a new mempolicy when we don't hold the lock.  So we defined the
following flag to identify it:

#define MPOL_F_REBINDING (1 << 2)

The new functions will be used in the next patch.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:57 -07:00
Jeff Chua f00e047efd timers: Fix slack calculation for expired timers
commit 3bbb9ec946 (timers: Introduce the concept of timer slack for
legacy timers) does not take the case into account when the timer is
already expired. This broke wireless drivers.

The solution is not to apply slack to already expired timers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
2010-05-24 12:10:23 +02:00
Thomas Gleixner bd45b7a385 timekeeping: Fix timezone update
commit 64ce4c2f (time: Clean up warp_clock()) breaks the timezone
update in a very subtle way. To avoid the direct access to timekeeping
internals it adds the timezone delta to the current time with
timespec_add_safe(). This works nicely when the timezone delta is > 0.
If timezone delta is < 0 then the wrap check in timespec_add_safe()
triggers and timespec_add_safe() returns TIME_MAX and screws up
timekeeping completely. 

The comment above timespec_add_safe() says:
    It's assumed that both values are valid (>= 0)

Add the timezone seconds adjustment directly.

Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-24 11:50:38 +02:00
Linus Torvalds 6109e2ce26 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (36 commits)
  PCI: hotplug: pciehp: Removed check for hotplug of display devices
  PCI: read memory ranges out of Broadcom CNB20LE host bridge
  PCI: Allow manual resource allocation for PCI hotplug bridges
  x86/PCI: make ACPI MCFG reserved error messages ACPI specific
  PCI hotplug: Use kmemdup
  PM/PCI: Update PCI power management documentation
  PCI: output FW warning in pci_read/write_vpd
  PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments
  PCI quirks: disable msi on AMD rs4xx internal gfx bridges
  PCI: Disable MSI for MCP55 on P5N32-E SLI
  x86/PCI: irq and pci_ids patch for additional Intel Cougar Point DeviceIDs
  PCI: aerdrv: trivial cleanup for aerdrv_core.c
  PCI: aerdrv: trivial cleanup for aerdrv.c
  PCI: aerdrv: introduce default_downstream_reset_link
  PCI: aerdrv: rework find_aer_service
  PCI: aerdrv: remove is_downstream
  PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS
  PCI: aerdrv: redefine PCI_ERR_ROOT_*_SRC
  PCI: aerdrv: rework do_recovery
  PCI: aerdrv: rework get_e_source()
  ...
2010-05-21 18:58:52 -07:00
Linus Torvalds 0961d6581c Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
  intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables
  intel-iommu: Combine the BIOS DMAR table warning messages
  panic: Add taint flag TAINT_FIRMWARE_WORKAROUND ('I')
  panic: Allow warnings to set different taint flags
  intel-iommu: intel_iommu_map_range failed at very end of address space
  intel-iommu: errors with smaller iommu widths
  intel-iommu: Fix boot inside 64bit virtualbox with io-apic disabled
  intel-iommu: use physfn to search drhd for VF
  intel-iommu: Print out iommu seq_id
  intel-iommu: Don't complain that ACPI_DMAR_SCOPE_TYPE_IOAPIC is not supported
  intel-iommu: Avoid global flushes with caching mode.
  intel-iommu: Use correct domain ID when caching mode is enabled
  intel-iommu mistakenly uses offset_pfn when caching mode is enabled
  intel-iommu: use for_each_set_bit()
  intel-iommu: Fix section mismatch dmar_ir_support() uses dmar_tbl.
2010-05-21 17:25:01 -07:00
Linus Torvalds a8251096b4 Merge branch 'modules' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'modules' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  module: drop the lock while waiting for module to complete initialization.
  MODULE_DEVICE_TABLE(isapnp, ...) does nothing
  hisax_fcpcipnp: fix broken isapnp device table.
  isapnp: move definitions to mod_devicetable.h so file2alias can reach them.
2010-05-21 17:15:44 -07:00
Linus Torvalds 6e80e8ed5e Merge branch 'for-2.6.35' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.35' of git://git.kernel.dk/linux-2.6-block: (86 commits)
  pipe: set lower and upper limit on max pages in the pipe page array
  pipe: add support for shrinking and growing pipes
  drbd: This is now equivalent to drbd release 8.3.8rc1
  drbd: Do not free p_uuid early, this is done in the exit code of the receiver
  drbd: Null pointer deref fix to the large "multi bio rewrite"
  drbd: Fix: Do not detach, if a bio with a barrier fails
  drbd: Ensure to not trigger late-new-UUID creation multiple times
  drbd: Do not Oops when C_STANDALONE when uuid gets generated
  writeback: fix mixed up arguments to bdi_start_writeback()
  writeback: fix problem with !CONFIG_BLOCK compilation
  block: improve automatic native capacity unlocking
  block: use struct parsed_partitions *state universally in partition check code
  block,ide: simplify bdops->set_capacity() to ->unlock_native_capacity()
  block: restart partition scan after resizing a device
  buffer: make invalidate_bdev() drain all percpu LRU add caches
  block: remove all rcu head initializations
  writeback: fixups for !dirty_writeback_centisecs
  writeback: bdi_writeback_task() must set task state before calling schedule()
  writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync
  drivers/block/drbd: Use kzalloc
  ...
2010-05-21 15:25:33 -07:00
Randy Dunlap 0fc377bd64 sysctl: fix kernel-doc notation and typos
Fix kernel-doc warnings, kernel-doc special characters, and
typos in recent kernel/sysctl.c additions.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Amerigo Wang <amwang@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-21 15:23:12 -07:00
Linus Torvalds 2a8ba8f032 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits)
  random: simplify fips mode
  crypto: authenc - Fix cryptlen calculation
  crypto: talitos - add support for sha224
  crypto: talitos - add hash algorithms
  crypto: talitos - second prepare step for adding ahash algorithms
  crypto: talitos - prepare for adding ahash algorithms
  crypto: n2 - Add Niagara2 crypto driver
  crypto: skcipher - Add ablkcipher_walk interfaces
  crypto: testmgr - Add testing for async hashing and update/final
  crypto: tcrypt - Add speed tests for async hashing
  crypto: scatterwalk - Fix scatterwalk_done() test
  crypto: hifn_795x - Rename ablkcipher_walk to hifn_cipher_walk
  padata: Use get_online_cpus/put_online_cpus in padata_free
  padata: Add some code comments
  padata: Flush the padata queues actively
  padata: Use a timer to handle remaining objects in the reorder queues
  crypto: shash - Remove usage of CRYPTO_MINALIGN
  crypto: mv_cesa - Use resource_size
  crypto: omap - OMAP macros corrected
  padata: Use get_online_cpus/put_online_cpus
  ...

Fix up conflicts in arch/arm/mach-omap2/devices.c
2010-05-21 14:46:51 -07:00
Jens Axboe ee9a3607fb Merge branch 'master' into for-2.6.35
Conflicts:
	fs/ext3/fsync.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:27:26 +02:00
Jens Axboe b492e95be0 pipe: set lower and upper limit on max pages in the pipe page array
We need at least two to guarantee proper POSIX behaviour, so
never allow a smaller limit than that.

Also expose a /proc/sys/fs/pipe-max-pages sysctl file that allows
root to define a sane upper limit. Make it default to 16 times the
default size, which is 16 pages.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:52 +02:00
Jens Axboe 35f3d14dbb pipe: add support for shrinking and growing pipes
This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for
growing and shrinking the size of a pipe and adjusts pipe.c and splice.c
(and relay and network splice) usage to work with these larger (or smaller)
pipes.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:40 +02:00
Linus Torvalds ac3ee84c60 Merge branch 'dbg-early-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'dbg-early-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  echi-dbgp: Add kernel debugger support for the usb debug port
  earlyprintk,vga,kdb: Fix \b and \r for earlyprintk=vga with kdb
  kgdboc: Add ekgdboc for early use of the kernel debugger
  x86,early dr regs,kgdb: Allow kernel debugger early dr register access
  x86,kgdb: Implement early hardware breakpoint debugging
  x86, kgdb, init: Add early and late debug states
  x86, kgdb: early trap init for early debug
2010-05-21 11:10:41 -07:00
Linus Torvalds 90b9a32d8f Merge branch 'kdb-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'kdb-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: (25 commits)
  kdb,debug_core: Allow the debug core to receive a panic notification
  MAINTAINERS: update kgdb, kdb, and debug_core info
  debug_core,kdb: Allow the debug core to process a recursive debug entry
  printk,kdb: capture printk() when in kdb shell
  kgdboc,kdb: Allow kdb to work on a non open console port
  kgdb: Add the ability to schedule a breakpoint via a tasklet
  mips,kgdb: kdb low level trap catch and stack trace
  powerpc,kgdb: Introduce low level trap catching
  x86,kgdb: Add low level debug hook
  kgdb: remove post_primary_code references
  kgdb,docs: Update the kgdb docs to include kdb
  kgdboc,keyboard: Keyboard driver for kdb with kgdb
  kgdb: gdb "monitor" -> kdb passthrough
  sparc,sunzilog: Add console polling support for sunzilog serial driver
  sh,sh-sci: Use NO_POLL_CHAR in the SCIF polled console code
  kgdb,8250,pl011: Return immediately from console poll
  kgdb: core changes to support kdb
  kdb: core for kgdb back end (2 of 2)
  kdb: core for kgdb back end (1 of 2)
  kgdb,blackfin: Add in kgdb_arch_set_pc for blackfin
  ...
2010-05-21 11:08:05 -07:00
Chris Wright 2c3c8bea60 sysfs: add struct file* to bin_attr callbacks
This allows bin_attr->read,write,mmap callbacks to check file specific data
(such as inode owner) as part of any privilege validation.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:31 -07:00
Peter Zijlstra 1704f47b50 lockdep: Add novalidate class for dev->mutex conversion
The conversion of device->sem to device->mutex resulted in lockdep
warnings. Create a novalidate class for now until the driver folks
come up with separate classes. That way we have at least the basic
mutex debugging coverage.

Add a checkpatch error so the usage is reserved for device->mutex.

[ tglx: checkpatch and compile fix for LOCKDEP=n ]

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:30 -07:00
NeilBrown db1afffab0 kref: remove kref_set
Of the three uses of kref_set in the kernel:

 One really should be kref_put as the code is letting go of a
    reference,
 Two really should be kref_init because the kref is being
    initialised.

This suggests that making kref_set available encourages bad code.
So fix the three uses and remove kref_set completely.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:29 -07:00
Linus Torvalds 33cf23b0a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (182 commits)
  [SCSI] aacraid: add an ifdef'd device delete case instead of taking the device offline
  [SCSI] aacraid: prohibit access to array container space
  [SCSI] aacraid: add support for handling ATA pass-through commands.
  [SCSI] aacraid: expose physical devices for models with newer firmware
  [SCSI] aacraid: respond automatically to volumes added by config tool
  [SCSI] fcoe: fix fcoe module ref counting
  [SCSI] libfcoe: FIP Keep-Alive messages for VPorts are sent with incorrect port_id and wwn
  [SCSI] libfcoe: Fix incorrect MAC address clearing
  [SCSI] fcoe: fix a circular locking issue with rtnl and sysfs mutex
  [SCSI] libfc: Move the port_id into lport
  [SCSI] fcoe: move link speed checking into its own routine
  [SCSI] libfc: Remove extra pointer check
  [SCSI] libfc: Remove unused fc_get_host_port_type
  [SCSI] fcoe: fixes wrong error exit in fcoe_create
  [SCSI] libfc: set seq_id for incoming sequence
  [SCSI] qla2xxx: Updates to ISP82xx support.
  [SCSI] qla2xxx: Optionally disable target reset.
  [SCSI] qla2xxx: ensure flash operation and host reset via sg_reset are mutually exclusive
  [SCSI] qla2xxx: Silence bogus warning by gcc for wrap and did.
  [SCSI] qla2xxx: T10 DIF support added.
  ...
2010-05-21 07:19:18 -07:00
Linus Torvalds 7a9b149212 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (229 commits)
  USB: remove unused usb_buffer_alloc and usb_buffer_free macros
  usb: musb: update gfp/slab.h includes
  USB: ftdi_sio: fix legacy SIO-device header
  USB: kl5usb105: reimplement using generic framework
  USB: kl5usb105: minor clean ups
  USB: kl5usb105: fix memory leak
  USB: io_ti: use kfifo to implement write buffering
  USB: io_ti: remove unsused private counter
  USB: ti_usb: use kfifo to implement write buffering
  USB: ir-usb: fix incorrect write-buffer length
  USB: aircable: fix incorrect write-buffer length
  USB: safe_serial: straighten out read processing
  USB: safe_serial: reimplement read using generic framework
  USB: safe_serial: reimplement write using generic framework
  usb-storage: always print quirks
  USB: usb-storage: trivial debug improvements
  USB: oti6858: use port write fifo
  USB: oti6858: use kfifo to implement write buffering
  USB: cypress_m8: use kfifo to implement write buffering
  USB: cypress_m8: remove unused drain define
  ...

Fix up conflicts (due to usb_buffer_alloc/free renaming) in
	drivers/input/tablet/acecad.c
	drivers/input/tablet/kbtab.c
	drivers/input/tablet/wacom_sys.c
	drivers/media/video/gspca/gspca.c
	sound/usb/usbaudio.c
2010-05-20 21:26:12 -07:00
Linus Torvalds f8965467f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1674 commits)
  qlcnic: adding co maintainer
  ixgbe: add support for active DA cables
  ixgbe: dcb, do not tag tc_prio_control frames
  ixgbe: fix ixgbe_tx_is_paused logic
  ixgbe: always enable vlan strip/insert when DCB is enabled
  ixgbe: remove some redundant code in setting FCoE FIP filter
  ixgbe: fix wrong offset to fc_frame_header in ixgbe_fcoe_ddp
  ixgbe: fix header len when unsplit packet overflows to data buffer
  ipv6: Never schedule DAD timer on dead address
  ipv6: Use POSTDAD state
  ipv6: Use state_lock to protect ifa state
  ipv6: Replace inet6_ifaddr->dead with state
  cxgb4: notify upper drivers if the device is already up when they load
  cxgb4: keep interrupts available when the ports are brought down
  cxgb4: fix initial addition of MAC address
  cnic: Return SPQ credit to bnx2x after ring setup and shutdown.
  cnic: Convert cnic_local_flags to atomic ops.
  can: Fix SJA1000 command register writes on SMP systems
  bridge: fix build for CONFIG_SYSFS disabled
  ARCNET: Limit com20020 PCI ID matches for SOHARD cards
  ...

Fix up various conflicts with pcmcia tree drivers/net/
{pcmcia/3c589_cs.c, wireless/orinoco/orinoco_cs.c and
wireless/orinoco/spectrum_cs.c} and feature removal
(Documentation/feature-removal-schedule.txt).

Also fix a non-content conflict due to pm_qos_requirement getting
renamed in the PM tree (now pm_qos_request) in net/mac80211/scan.c
2010-05-20 21:04:44 -07:00
Jason Wessel 0b4b3827db x86, kgdb, init: Add early and late debug states
The kernel debugger can operate well before mm_init(), but the x86
hardware breakpoint code which uses the perf api requires that the
kernel allocators are initialized.

This means the kernel debug core needs to provide an optional arch
specific call back to allow the initialization functions to run after
the kernel has been further initialized.

The kdb shell already had a similar restriction with an early
initialization and late initialization.  The kdb_init() was moved into
the debug core's version of the late init which is called
dbg_late_init();

CC: kgdb-bugreport@lists.sourceforge.net
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:29 -05:00
Jason Wessel 4402c153cb kdb,debug_core: Allow the debug core to receive a panic notification
It is highly desirable to trap into kdb on panic.  The debug core will
attempt to register as the first in line for the panic notifier.

CC: Ingo Molnar <mingo@elte.hu>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:28 -05:00
Jason Wessel 6d90634076 debug_core,kdb: Allow the debug core to process a recursive debug entry
This allows kdb to debug a crash with in the kms code with a
single level recursive re-entry.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:27 -05:00
Jason Wessel d37d39ae3b printk,kdb: capture printk() when in kdb shell
Certain calls from the kdb shell will call out to printk(), and any of
these calls should get vectored back to the kdb_printf() so that the
kdb pager and processing can be used, as well as to properly channel
I/O to the polled I/O devices.

CC: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2010-05-20 21:04:27 -05:00
Jason Wessel efe2f29e32 kgdboc,kdb: Allow kdb to work on a non open console port
If kdb is open on a serial port that is not actually a console make
sure to call the poll routines to emit and receive characters.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Martin Hicks <mort@sgi.com>
2010-05-20 21:04:26 -05:00
Jason Wessel 1cee5e35f1 kgdb: Add the ability to schedule a breakpoint via a tasklet
Some kgdb I/O modules require the ability to create a breakpoint
tasklet, such as kgdboc and external modules such as kgdboe.  The
breakpoint tasklet is used as an asynchronous entry point into the
debugger which will have a different function scope than the current
execution path where it might not be safe to have an inline
breakpoint.  This is true of some of the kgdb I/O drivers which share
code with kgdb and rest of the kernel users.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:26 -05:00
Jason Wessel f503b5ae53 x86,kgdb: Add low level debug hook
The only way the debugger can handle a trap in inside rcu_lock,
notify_die, or atomic_notifier_call_chain without a triple fault is
to have a low level "first opportunity handler" in the int3 exception
handler.

Generally this will be something the vast majority of folks will not
need, but for those who need it, it is added as a kernel .config
option called KGDB_LOW_LEVEL_TRAP.

CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: H. Peter Anvin <hpa@zytor.com>
CC: x86@kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:25 -05:00
Jason Wessel 98ec1878ca kgdb: remove post_primary_code references
Remove all the references to the kgdb_post_primary_code.  This
function serves no useful purpose because you can obtain the same
information from the "struct kgdb_state *ks" from with in the
debugger, if for some reason you want the data.

Also remove the unintentional duplicate assignment for ks->ex_vector.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:25 -05:00
Jason Wessel ada64e4c98 kgdboc,keyboard: Keyboard driver for kdb with kgdb
This patch adds in the kdb PS/2 keyboard driver.  This was mostly a
direct port from the original kdb where I cleaned up the code against
checkpatch.pl and added the glue to stitch it into kgdb.

This patch also enables early kdb debug via kgdbwait and the keyboard.

All the access to configure kdb using either a serial console or the
keyboard is done via kgdboc.

If you want to use only the keyboard and want to break in early you
would add to your kernel command arguments:

    kgdboc=kbd kgdbwait

If you wanted serial and or the keyboard access you could use:

    kgdboc=kbd,ttyS0

You can also configure kgdboc as a kernel module or at run time with
the sysfs where you can activate and deactivate kgdb.

Turn it on:
    echo kbd,ttyS0 > /sys/module/kgdboc/parameters/kgdboc

Turn it off:
    echo "" > /sys/module/kgdboc/parameters/kgdboc

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2010-05-20 21:04:24 -05:00
Jason Wessel a0de055cf6 kgdb: gdb "monitor" -> kdb passthrough
One of the driving forces behind integrating another front end (kdb)
to the debug core is to allow front end commands to be accessible via
gdb's monitor command.  It is true that you could write gdb macros to
get certain data, but you may want to just use gdb to access the
commands that are available in the kdb front end.

This patch implements the Rcmd gdb stub packet.  In gdb you access
this with the "monitor" command.  For instance you could type "monitor
help", "monitor lsmod" or "monitor ps A" etc...

There is no error checking or command restrictions on what you can and
cannot access at this point.  Doing something like trying to set
breakpoints with the monitor command is going to cause nothing but
problems.  Perhaps in the future only the commands that are actually
known to work with the gdb monitor command will be available.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:24 -05:00
Jason Wessel f5316b4aea kgdb,8250,pl011: Return immediately from console poll
The design of the kdb shell requires that every device that can
provide input to kdb have a polling routine that exits immediately if
there is no character available.  This is required in order to get the
page scrolling mechanism working.

Changing the kernel debugger I/O API to require all polling character
routines to exit immediately if there is no data allows the kernel
debugger to process multiple input channels.

NO_POLL_CHAR will be the return code to the polling routine when ever
there is no character available.

CC: linux-serial@vger.kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:22 -05:00
Jason Wessel dcc7871128 kgdb: core changes to support kdb
These are the minimum changes to the kgdb core in order to enable an
API to connect a new front end (kdb) to the debug core.

This patch introduces the dbg_kdb_mode variable controls where the
user level I/O is routed.  It will be routed to the gdbstub (kgdb) or
to the kdb front end which is a simple shell available over the kgdboc
connection.

You can switch back and forth between kdb or the gdb stub mode of
operation dynamically.  From gdb stub mode you can blindly type
"$3#33", or from the kdb mode you can enter "kgdb" to switch to the
gdb stub.

The logic in the debug core depends on kdb to look for the typical gdb
connection sequences and return immediately with KGDB_PASS_EVENT if a
gdb serial command sequence is detected.  That should allow a
reasonably seamless transition between kdb -> gdb without leaving the
kernel exception state.  The two gdb serial queries that kdb is
responsible for detecting are the "?" and "qSupported" packets.

CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Martin Hicks <mort@sgi.com>
2010-05-20 21:04:21 -05:00
Jason Wessel 67fc4e0cb9 kdb: core for kgdb back end (2 of 2)
This patch contains the hooks and instrumentation into kernel which
live outside the kernel/debug directory, which the kdb core
will call to run commands like lsmod, dmesg, bt etc...

CC: linux-arch@vger.kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Martin Hicks <mort@sgi.com>
2010-05-20 21:04:21 -05:00
Jason Wessel 5d5314d679 kdb: core for kgdb back end (1 of 2)
This patch contains only the kdb core.  Because the change set was
large, it was split.  The next patch in the series includes the
instrumentation into the core kernel which are mainly helper functions
for kdb.

This work is directly derived from kdb v4.4 found at:

ftp://oss.sgi.com/projects/kdb/download/v4.4/

The kdb internals have been re-organized to make them mostly platform
independent and to connect everything to the debug core which is used by
gdbstub (which has long been known as kgdb).

The original version of kdb was 58,000 lines worth of changes to
support x86.  From that implementation only the kdb shell, and basic
commands for memory access, runcontrol, lsmod, and dmesg where carried
forward.

This is a generic implementation which aims to cover all the current
architectures using the kgdb core: ppc, arm, x86, mips, sparc, sh and
blackfin.  More archictectures can be added by implementing the
architecture specific kgdb functions.

[mort@sgi.com: Compile fix with hugepages enabled]
[mort@sgi.com: Clean breakpoint code renaming kdba_ -> kdb_]
[mort@sgi.com: fix new line after printing registers]
[mort@sgi.com: Remove the concept of global vs. local breakpoints]
[mort@sgi.com: Rework kdb_si_swapinfo to use more generic name]
[mort@sgi.com: fix the information dump macros, remove 'arch' from the names]
[sfr@canb.auug.org.au: include fixup to include linux/slab.h]

CC: linux-arch@vger.kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Martin Hicks <mort@sgi.com>
2010-05-20 21:04:20 -05:00
Jason Wessel 53197fc495 Separate the gdbstub from the debug core
Split the former kernel/kgdb.c into debug_core.c which contains the
kernel debugger exception logic and to the gdbstub.c which contains
the logic for allowing gdb to talk to the debug core.

This also created a private include file called debug_core.h which
contains all the definitions to glue the debug_core to any other
debugger connections.

CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:19 -05:00
Jason Wessel c433820971 Move kernel/kgdb.c to kernel/debug/debug_core.c
Move kgdb.c in preparation to separate the gdbstub from the debug
core and exception handling.

CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-05-20 21:04:18 -05:00
Michal Nazarewicz 22c43c81a5 wait_event_interruptible_locked() interface
New wait_event_interruptible{,_exclusive}_locked{,_irq} macros added.
They work just like versions without _locked* suffix but require the
wait queue's lock to be held.  Also __wake_up_locked() is now exported
as to pair it with the above macros.

The use case of this new facility is when one uses wait queue's lock
to  protect a data structure.  This may be advantageous if the
structure needs to be protected by a spinlock anyway.  In particular,
with additional spinlock the following code has to be used to wait
for a condition:

spin_lock(&data.lock);
...
for (ret = 0; !ret && !(condition); ) {
	spin_unlock(&data.lock);
	ret = wait_event_interruptible(data.wqh, (condition));
	spin_lock(&data.lock);
}
...
spin_unlock(&data.lock);

This looks bizarre plus wait_event_interruptible() locks the wait
queue's lock anyway so there is a unlock+lock sequence where it could
be avoided.

To avoid those problems and benefit from wait queue's lock, a code
similar to the following should be used:

/* Waiting */
spin_lock(&data.wqh.lock);
...
ret = wait_event_interruptible_locked(data.wqh, (condition));
...
spin_unlock(&data.wqh.lock);

/* Waiting exclusively */
spin_lock(&data.whq.lock);
...
ret = wait_event_interruptible_exclusive_locked(data.whq, (condition));
...
spin_unlock(&data.whq.lock);

/* Waking up */
spin_lock(&data.wqh.lock);
...
wake_up_locked(&data.wqh);
...
spin_unlock(&data.wqh.lock);

When spin_lock_irq() is used matching versions of macros need to be
used (*_locked_irq()).

Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-20 13:21:42 -07:00
Linus Torvalds a0fe3cc5d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (40 commits)
  Input: psmouse - small formatting changes to better follow coding style
  Input: synaptics - set dimensions as reported by firmware
  Input: elantech - relax signature checks
  Input: elantech - enforce common prefix on messages
  Input: wistron_btns - switch to using kmemdup()
  Input: usbtouchscreen - switch to using kmemdup()
  Input: do not force selecting i8042 on Moorestown
  Input: Documentation/sysrq.txt - update KEY_SYSRQ info
  Input: 88pm860x_onkey - remove invalid irq number assignment
  Input: i8042 - add a PNP entry to the aux device list
  Input: i8042 - add some extra PNP keyboard types
  Input: wm9712 - fix wm97xx_set_gpio() logic
  Input: add keypad driver for keys interfaced to TCA6416
  Input: remove obsolete {corgi,spitz,tosa}kbd.c
  Input: kbtab - do not advertise unsupported events
  Input: kbtab - simplify kbtab_disconnect()
  Input: kbtab - fix incorrect size parameter in usb_buffer_free
  Input: acecad - don't advertise mouse events
  Input: acecad - fix some formatting issues
  Input: acecad - simplify usb_acecad_disconnect()
  ...

Trivial conflict in Documentation/feature-removal-schedule.txt
2010-05-20 10:33:06 -07:00
Linus Torvalds f39d01be4c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
  vlynq: make whole Kconfig-menu dependant on architecture
  add descriptive comment for TIF_MEMDIE task flag declaration.
  EEPROM: max6875: Header file cleanup
  EEPROM: 93cx6: Header file cleanup
  EEPROM: Header file cleanup
  agp: use NULL instead of 0 when pointer is needed
  rtc-v3020: make bitfield unsigned
  PCI: make bitfield unsigned
  jbd2: use NULL instead of 0 when pointer is needed
  cciss: fix shadows sparse warning
  doc: inode uses a mutex instead of a semaphore.
  uml: i386: Avoid redefinition of NR_syscalls
  fix "seperate" typos in comments
  cocbalt_lcdfb: correct sections
  doc: Change urls for sparse
  Powerpc: wii: Fix typo in comment
  i2o: cleanup some exit paths
  Documentation/: it's -> its where appropriate
  UML: Fix compiler warning due to missing task_struct declaration
  UML: add kernel.h include to signal.c
  ...
2010-05-20 09:20:59 -07:00
Linus Torvalds 46ee964509 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: PM QOS update fix
  Freezer / cgroup freezer: Update stale locking comments
  PM / platform_bus: Allow runtime PM by default
  i2c: Fix bus-level power management callbacks
  PM QOS update
  PM / Hibernate: Fix block_io.c printk warning
  PM / Hibernate: Group swap ops
  PM / Hibernate: Move the first_sector out of swsusp_write
  PM / Hibernate: Separate block_io
  PM / Hibernate: Snapshot cleanup
  FS / libfs: Implement simple_write_to_buffer
  PM / Hibernate: document open(/dev/snapshot) side effects
  PM / Runtime: Add sysfs debug files
  PM: Improve device power management document
  PM: Update device power management document
  PM: Allow runtime_suspend methods to call pm_schedule_suspend()
  PM: pm_wakeup - switch to using bool
2010-05-20 09:03:55 -07:00
Linus Torvalds fa5312d9e8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: change cancel_work_sync() to clear work->data
  workqueue: warn about flush_scheduled_work()
2010-05-20 09:03:02 -07:00
Linus Torvalds 96b5b7f4f2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (61 commits)
  KEYS: Return more accurate error codes
  LSM: Add __init to fixup function.
  TOMOYO: Add pathname grouping support.
  ima: remove ACPI dependency
  TPM: ACPI/PNP dependency removal
  security/selinux/ss: Use kstrdup
  TOMOYO: Use stack memory for pending entry.
  Revert "ima: remove ACPI dependency"
  Revert "TPM: ACPI/PNP dependency removal"
  KEYS: Do preallocation for __key_link()
  TOMOYO: Use mutex_lock_interruptible.
  KEYS: Better handling of errors from construct_alloc_key()
  KEYS: keyring_serialise_link_sem is only needed for keyring->keyring links
  TOMOYO: Use GFP_NOFS rather than GFP_KERNEL.
  ima: remove ACPI dependency
  TPM: ACPI/PNP dependency removal
  selinux: generalize disabling of execmem for plt-in-heap archs
  LSM Audit: rename LSM_AUDIT_NO_AUDIT to LSM_AUDIT_DATA_NONE
  CRED: Holding a spinlock does not imply the holding of RCU read lock
  SMACK: Don't #include Ext2 headers
  ...
2010-05-20 08:55:50 -07:00
Linus Torvalds 164d44fd92 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource: Add clocksource_register_hz/khz interface
  posix-cpu-timers: Optimize run_posix_cpu_timers()
  time: Remove xtime_cache
  mqueue: Convert message queue timeout to use hrtimers
  hrtimers: Provide schedule_hrtimeout for CLOCK_REALTIME
  timers: Introduce the concept of timer slack for legacy timers
  ntp: Remove tickadj
  ntp: Make time_adjust static
  time: Add xtime, wall_to_monotonic to feature-removal-schedule
  timer: Try to survive timer callback preempt_count leak
  timer: Split out timer function call
  timer: Print function name for timer callbacks modifying preemption count
  time: Clean up warp_clock()
  cpu-timers: Avoid iterating over all threads in fastpath_timer_check()
  cpu-timers: Change SIGEV_NONE timer implementation
  cpu-timers: Return correct previous timer reload value
  cpu-timers: Cleanup arm_timer()
  cpu-timers: Simplify RLIMIT_CPU handling
2010-05-19 17:11:10 -07:00
Linus Torvalds 6e0b7b2c39 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Clear CPU mask in affinity_hint when none is provided
  genirq: Add CPU mask affinity hint
  genirq: Remove IRQF_DISABLED from core code
  genirq: Run irq handlers with interrupts disabled
  genirq: Introduce request_any_context_irq()
  genirq: Expose irq_desc->node in proc/irq

Fixed up trivial conflicts in Documentation/feature-removal-schedule.txt
2010-05-19 17:09:40 -07:00
KOSAKI Motohiro fa9dc265ac cpumask: fix compat getaffinity
Commit a45185d2d "cpumask: convert kernel/compat.c" broke libnuma, which
abuses sched_getaffinity to find out NR_CPUS in order to parse
/sys/devices/system/node/node*/cpumap.

On NUMA systems with less than 32 possibly CPUs, the current
compat_sys_sched_getaffinity now returns '4' instead of the actual
NR_CPUS/8, which makes libnuma bail out when parsing the cpumap.

The libnuma call sched_getaffinity(0, bitmap, 4096) at first.  It mean
the libnuma expect the return value of sched_getaffinity() is either len
argument or NR_CPUS.  But it doesn't expect to return nr_cpu_ids.

Strictly speaking, userland requirement are

1) Glibc assume the return value mean the lengh of initialized
   of mask argument. E.g. if sched_getaffinity(1024) return 128,
   glibc make zero fill rest 896 byte.
2) Libnuma assume the return value can be used to guess NR_CPUS
   in kernel. It assume len-arg<NR_CPUS makes -EINVAL. But
   it try len=4096 at first and 4096 is always bigger than
   NR_CPUS. Then, if we remove strange min_length normalization,
   we never hit -EINVAL case.

sched_getaffinity() already solved this issue.  This patch adapts
compat_sys_sched_getaffinity() to match the non-compat case.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Ken Werner <ken.werner@web.de>
Cc: stable@kernel.org
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-19 11:48:18 -07:00
Linus Torvalds ba0234ec35 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (24 commits)
  [S390] drivers/s390/char: Use kmemdup
  [S390] drivers/s390/char: Use kstrdup
  [S390] debug: enable exception-trace debug facility
  [S390] s390_hypfs: Add new attributes
  [S390] qdio: remove API wrappers
  [S390] qdio: set correct bit in dsci
  [S390] qdio: dont convert timestamps to microseconds
  [S390] qdio: remove memset hack
  [S390] qdio: prevent starvation on PCI devices
  [S390] qdio: count number of qdio interrupts
  [S390] user space fault: report fault before calling do_exit
  [S390] topology: expose core identifier
  [S390] dasd: remove uid from devmap
  [S390] dasd: add dynamic pav toleration
  [S390] vdso: add missing vdso_install target
  [S390] vdso: remove redundant check for CONFIG_64BIT
  [S390] avoid default_llseek in s390 drivers
  [S390] vmcp: disallow modular build
  [S390] add breaking event address for user space
  [S390] virtualization aware cpu measurement
  ...
2010-05-19 11:35:30 -07:00