The recent change to the way scsi_device_get()/put() work broke the
non modular build (we do a module_refcount on a NULL). Fix this by
checking for non-null before checking module_refcount().
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Spotted by: Dan Aloni <da-xx@monatomic.org>
The problem is there's inconsistent locking semantic usage of
scsi_alloc_target(). Two callers assume the target comes back with
reference unincremented and the third assumes its incremented. Fix by
always making the reference incremented on return. Also fix path in
target alloc that could consistently increment the parent lock.
Finally document scsi_alloc_target() so its callers know what the
expectations are.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add support for a new lpfc soft_wwpn sysfs attribute
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add support for new dev_loss_tmo callback
Goodness is that it removes code for a parallel nodev timer that
existed in the driver
Add support for the new fast_io_fail callback
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch adds the following functionality to the FC transport:
- dev_loss_tmo LLDD callback :
Called to essentially confirm the deletion of an rport. Thus, it is
called whenever the dev_loss_tmo fires, or when the rport is deleted
due to other circumstances (module unload, etc). It is expected that
the callback will initiate the termination of any outstanding i/o on
the rport.
- fast_io_fail_tmo and LLD callback:
There are some cases where it may take a long while to truly determine
device loss, but the system is in a multipathing configuration that if
the i/o was failed quickly (faster than dev_loss_tmo), it could be
redirected to a different path and completed sooner.
Many thanks to Mike Reed who cleaned up the initial RFC in support
of this post.
The original RFC is at:
http://marc.theaimsgroup.com/?l=linux-scsi&m=115505981027246&w=2
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add support to return adapter symbolic name (now that attribute is dynamic)
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add support to post events via new FC event interfaces
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
During discussions with Mike Christie, I became convinced that we needed
a larger vendor id. This patch extends the id from 32 to 64 bits.
This applies on top of the prior patches that add SCSI transport events
via netlink.
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch formally adds support for the posting of FC events via netlink.
It is a followup to the original RFC at:
http://marc.theaimsgroup.com/?l=linux-scsi&m=114530667923464&w=2
and the initial posting at:
http://marc.theaimsgroup.com/?l=linux-scsi&m=115507374832500&w=2
The patch has been updated to optimize the send path, per the discussions
in the initial posting.
Per discussions at the Storage Summit and at OLS, we are to use netlink for
async events from transports. Also per discussions, to avoid a netlink
protocol per transport, I've create a single NETLINK_SCSITRANSPORT protocol,
which can then be used by all transports.
This patch:
- Creates new files scsi_netlink.c and scsi_netlink.h, which contains the
single and shared definitions for the SCSI Transport. It is tied into the
base SCSI subsystem intialization.
Contains a single interface routine, scsi_send_transport_event(), for a
transport to send an event (via multicast to a protocol specific group).
- Creates a new scsi_netlink_fc.h file, which contains the FC netlink event
messages
- Adds 3 new routines to the fc transport:
fc_get_event_number() - to get a FC event #
fc_host_post_event() - to send a simple FC event (32 bits of data)
fc_host_post_vendor_event() - to send a Vendor unique event, with
arbitrary amounts of data.
Note: the separation of event number allows for a LLD to send a standard
event, followed by vendor-specific data for the event.
Note: This patch assumes 2 prior fc transport patches have been installed:
http://marc.theaimsgroup.com/?l=linux-scsi&m=115555807316329&w=2http://marc.theaimsgroup.com/?l=linux-scsi&m=115581614930261&w=2
Sorry - next time I'll do something like making these individual
patches of the same posting when I know they'll be posted closely
together.
Signed-off-by: James Smart <James.Smart@emulex.com>
Tidy up configuration not to make SCSI always select NET
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Use block shared tags entirely within the driver. In the case of
shutdown, assume that there are no other outstanding commands, so tag
0 is fine.
Signed-off-by: Ed Lin <ed.lin@promise.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add Promise SuperTrak 'stex' driver, supporting SuperTrak
EX8350/8300/16350/16300 controllers. The controller's firmware accepts
SCSI commands, handing them to the underlying RAID or JBOD disks.
The driver consisted of the following cleanups and fixes, beyond its
initial submission:
Ed Lin:
stex: cleanup and minor fixes
stex: add new device ids
stex: update internal copy code path
stex: add hard reset function
stex: adjust command timeout in slave_config routine
stex: use more efficient method for unload/shutdown flush
Jeff Garzik:
[SCSI] Add Promise SuperTrak 'shasta' driver.
Rename drivers/scsi/shasta.c to stex.c ("SuperTrak EX").
[SCSI] stex: update with community comments from 'Promise SuperTrak' thread
[SCSI] stex: Fix warning, trim trailing whitespace.
[SCSI] stex: remove last remnants of "shasta" project code name
[SCSI] stex: removed 6-byte command emulation
[SCSI] stex: minor cleanups
[SCSI] stex: minor fixes: irq flag, error return value
[SCSI] stex: use dma_alloc_coherent()
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When accessing a device with disabled read access the capacity is set
randomly to 1GB. This makes it impossible to userspace tools to detect
invalid device capacities.
Signed-off-by: Mike Anderson <andmike@us.ibm.com>
Acked-by: Chris Mason <mason@suse.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In the normal IO path we should not be calling back
into the LLD since the LLD will have cleaned up the
task before or after calling complete pdu.
For the fail_command path we still need to do this
to force the cleanup.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If the scsi eh sends a TUR and the session is down we could
return SCSI_ML_HOST_BUSY. scsi eh will ignore this and send
ask us to abort the command and we blindly accesst the
command ptr.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When a digest is spread across two network buffers, we currently
ignore this and try to check the digest with the partial buffer.
Or course this fails. This patch has use iscsi_tcp_copy to
copy the whole digest before testing it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The first burst length is only relevant if immedate data = Yes
or if Initial R2T is No
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When we relogin to a target, we have not yet negotiated digests
so we must reset the hdr_size var.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch built over the last ones fixes a bug in the partial header
resend code, where we add on another 4 bytes to the send length on the resend.
We want just the header plus digest.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We currently allocated seperate tfms for data and header digests. There
is no reason for this since we can never calculate a rx header and
digest at the same time. Same for sends. So this patch removes the data
tfms and has the send and recv sides use the rx_tfm or tx_tfm.
I also made the connection creation code preallocate the tfms because I
thought I hit a bug where I changed the digests settings during a
relogin but could not allocate the tfm and then we just failed.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
iscsi_tcp calculates padding by using the expected transfer length. This
has the problem where if we have immediate data = no and initial R2T =
yes, and the transfer length ended up needing padding then we send:
1. header
2. padding which should have gone after data
3. data
Besides this bug, we also assume the target will always ask for nice
transfer lengths and the first burst length will always be a nice value.
As far as I can tell form the RFC this is not a requirement. It would be
silly to do this, but if someone did it we will end doing bad things.
Finally the last bug in that bit of code is in our handling of the
recalculation of data digests when we do not send a whole iscsi_buf in
one try. The bug here is that we call crypto_digest_final on a
iscsi_sendpage error, then when we send the rest of the iscsi_buf, we
doiscsi_data_digest_init and this causes the previous data digest to be
lost.
And to make matters worse, some of these bugs are replicated over and
over and over again for immediate data, solicited data and unsolicited
data. So the attached patch made over the iscsi git tree (see
kernel.org/git for details) which I updated today to include the patches
I said I merged, consolidates the sending of data, padding and digests
and calculation of data digests and fixes the above bugs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A couple targets like string bean and MDS, send r2ts with
a data len greater than the max burst we agreed to. We
were being strict in our enforcing of the iscsi rfc in that
code path, but there is no driver limitation that prevents
us from fullfilling the request. To allow those targets
to work we will ignore the max_burst length and send as
much data as the target asks for assuming it has consciously
decided to override its max burst length.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
It is possible that a ctask could be completing and getting
cleaned up at the same time, we are finishing up the last
data transfer. This could then result in the data transfer
code using stale or invalid values. This patch adds a refcount
to the ctask. When the count goes to zero then we know the
transmit thread and recv thread or softirq are not touching
it and we can safely release it.
The eh should not need to grab a reference because it only cleans
up a task if it has both the xmit mutex and recv lock (or recv
side suspended).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
iSCSI RFC states that the first burst length must be smaller than the
max burst length. We currently assume targets will be good, but that may
not be the case, so this patch adds a check.
This patch also moves the unsol data out offset to the lib so the LLDs
do not have to track it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Sanitize the Vendor, Product, and Revision strings contained in an
INQUIRY result by setting all non-graphic or non-ASCII characters to ' '.
Since the standard disallows such characters, this will affect
only non-compliant devices.
To help maintain backward compatibility, NUL characters are treated
specially. They are taken as string terminators; they and all the
following characters are set to ' '. If some valid characters get
erased as a result... well, we weren't seeing them before so we haven't
lost anything.
The primary purpose of this change is to allow blacklist entries to
match devices with illegal Vendor or Product strings.
In addition, the patch updates a couple of function prototypes, giving
inq_result its correct type (unsigned char *).
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The fix isn't actually in sd: it's in scsi_device_get(). I modified it
to allow devices to be returned in SDEV_CANCEL, but not SDEV_DEL. This
means that the device_remove_driver, which occurs in device_del() in
scsi_remove_device() after the device has gone into SDEV_CANCEL is now
effective at flushing the cache.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch adds support for sharing tag maps at the host level
(i.e. either every queue [LUN] has its own tag map or there's a single
one for the entire host). This formulation is primarily intended to
help single issue queue hardware, like the aic7xxx
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The current block queue implementation already contains most of the
machinery for shared tag maps. The only remaining pieces are a way to
allocate and destroy a tag map independently of the queues (so that
the maps can be managed on the life cycle of the overseeing entity)
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch sets can_queue in the aic94xx driver's scsi_host to better
performing values than what's there currently. It seems that
asd_ha->seq.can_queue reflects the number of requests that can be
queued per controller; so long as there's one scsi_host per
controller, it seems logical that the scsi_host ought to have the same
can_queue value. To the best of my (still limited) knowledge, this
method provides the correct value.
The effect of leaving this value set to 1 is terrible performance in
the case of either (a) certain Maxtor SAS drives flying solo or (b)
flooding several disks with I/O simultaneously (md-raid). There may be
more scenarios where we see similar problems that I haven't uncovered.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Right now, various kernel modules are being migrated over to use
request_firmware in order to pull in binary firmware blobs from userland
when the module is loaded. This makes sense.
However, there is right now little mechanism in place to automatically
determine which binary firmware blobs must be included with a kernel in
order to satisfy the prerequisites of these drivers. This affects
vendors, but also regular users to a certain extent too.
The attached patch introduces MODULE_FIRMWARE as a mechanism for
advertising that a particular firmware file is to be loaded - it will
then show up via modinfo and could be used e.g. when packaging a kernel.
Signed-off-by: Jon Masters <jcm@redhat.com>
Comments added in line with all the other MODULE_ tag
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is the end point of the separate aic94xx driver based on the
original driver and transport class from Luben Tuikov
<ltuikov@yahoo.com>
The log of the separate development is:
Alexis Bruemmer:
o aic94xx: fix hotplug/unplug for expanderless systems
o aic94xx: disable split completion timer/setting by default
o aic94xx: wide port off expander support
o aic94xx: remove various inline functions
o aic94xx: use bitops
o aic94xx: remove queue comment
o aic94xx: remove sas_common.c
o aic94xx: sas remove depot's
o aic94xx: use available list_for_each_entry_safe_reverse()
o aic94xx: sas header file merge
James Bottomley:
o aic94xx: fix TF_TMF_NO_CTX processing
o aic94xx: convert to request_firmware interface
o aic94xx: fix hotplug/unplug
o aic94xx: add link error counts to the expander phys
o aic94xx: add transport class phy reset capability
o aic94xx: remove local_attached flag
o Remove README
o Fixup Makefile variable for libsas rename
o Rename sas->libsas
o aic94xx: correct return code for sas_discover_event
o aic94xx: use parent backlink port
o aic94xx: remove channel abstraction
o aic94xx: fix routing algorithms
o aic94xx: add backlink port
o aic94xx: fix cascaded expander properties
o aic94xx: fix sleep under lock
o aic94xx: fix panic on module removal in complex topology
o aic94xx: make use of the new sas_port
o rename sas_port to asd_sas_port
o Fix for eh_strategy_handler move
o aic94xx: move entirely over to correct transport class formulation
o remove last vestages of sas_rphy_alloc()
o update for eh_timed_out move
o Preliminary expander support for aic94xx
o sas: remove event thread
o minor warning cleanups
o remove last vestiges of id mapping arrays
o Further updates
o Convert aic94xx over entirely to the transport class end device and
o update aic94xx/sas to use the new sas transport class end device
o [PATCH] aic94xx: attaching to the sas transport class
o Add missing completion removal from prior patch
o [PATCH] aic94xx: attaching to the sas transport class
o Build fixes from akpm
Jeff Garzik:
o [scsi aic94xx] Remove ->owner from PCI info table
Luben Tuikov:
o initial aic94xx driver
Mike Anderson:
o aic94xx: fix panic on module insertion
o aic94xx: stub out SATA_DEV case
o aic94xx: compile warning cleanups
o aic94xx: sas_alloc_task
o aic94xx: ref count update
o aic94xx nexus loss time value
o [PATCH] aic94xx: driver assertion in non-x86 BIOS env
Randy Dunlap:
o libsas: externs not needed
Robert Tarte:
o aic94xx: sequence patch - fixes SATA support
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This flag denotes local attachment of the phy. There are two problems
with it:
1) It's actually redundant ... you can get the same information simply
by seeing whether a host is the phys parent
2) we condition a lot of phy parameters on it on the false assumption
that we can only control local phys. I'm wiring up phy resets in the
aic94xx now, and it will be able to reset non-local phys as well.
I fixed 2) by moving the local check into the reset and stats function
of the mptsas, since that seems to be the only HBA that can't
(currently) control non-local phys.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Unlike the other tty comment patch this one has code changes. Specifically
it limits the queue size for a tty to 64K characters (128Kbytes) worst case
even if the tty is ignoring tty->throttle. This is because certain drivers
don't honour the throttle value correctly, although it is a useful
safeguard anyway.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Doesn't fix them but does show up some interesting areas that need review
and fixing.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix printk format warning:
drivers/cdrom/gscd.c:269: warning: format â%luâ expects type âlong unsigned intâ, but argument 2 has type âunsigned intâ
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we select NUMA with i386, the system is only X86_NUMAQ or using ACPI.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
None of the other /proc/meminfo lines have a space in the identifier. This
post-2.6.17 addition has the potential to break existing parsers, so use an
underscore instead (like Committed_AS).
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes the locking error noticed by lockdep:
=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
init/1 is trying to acquire lock:
(&sighand->siglock){....}, at: [<c047a78a>] flush_old_exec+0x3ae/0x859
but task is already holding lock:
(&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859
other info that might help us debug this:
2 locks held by init/1:
#0: (tasklist_lock){..--}, at: [<c047a76a>] flush_old_exec+0x38e/0x859
#1: (&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859
stack backtrace:
[<c04051e1>] show_trace_log_lvl+0x54/0xfd
[<c040579d>] show_trace+0xd/0x10
[<c04058b6>] dump_stack+0x19/0x1b
[<c043b33a>] __lock_acquire+0x773/0x997
[<c043bacf>] lock_acquire+0x4b/0x6c
[<c060630b>] _spin_lock+0x19/0x28
[<c047a78a>] flush_old_exec+0x3ae/0x859
[<c0498053>] load_elf_binary+0x4aa/0x1628
[<c0479cab>] search_binary_handler+0xa7/0x24e
[<c047b577>] do_execve+0x15b/0x1f9
[<c04022b4>] sys_execve+0x29/0x4d
[<c0403faf>] syscall_call+0x7/0xb
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reiserfs seems to have another locking level layer for the i_mutex due to the
xattrs-are-a-directory thing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
idescsi_pc_intr() uses local_irq_enable() in IRQ context: annotate it.
(this has no effect on kernels with lockdep disabled. On kernels with lockdep
enabled this means that we wont actually disable interrupts, and the warning
message will go away as well.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In file included from include/asm/mmzone.h:18,
from include/linux/mmzone.h:439,
<snip>
include/asm/srat.h:31:2: error: #error CONFIG_ACPI_SRAT not defined, and srat.h header has been included
make[1]: *** [arch/i386/kernel/asm-offsets.s] Error 1
This can happen with CONFIG_NUMA && !CONFIG_ACPI && !CONFIG_X86_NUMAQ
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpuset_excl_nodes_overlap always returns 0 if current is exiting. This caused
customer's systems to panic in the OOM killer when processes were having
trouble getting memory for the final put_user in mm_release. Even though
there were lots of processes to kill.
Change to returning 1 in this case. This achieves parity with !CONFIG_CPUSETS
case, and was observed to fix the problem.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
register_one_node()'s should be defined under CONFIG_NUMA=n.
fixes following bug.
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
mm/built-in.o: In function `add_memory': undefined reference to `register_one_node'
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JBD currently allocates commit and frozen buffers from slabs. With
CONFIG_SLAB_DEBUG, its possible for an allocation to cross the page
boundary causing IO problems.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200127
So, instead of allocating these from regular slabs - manage allocation from
its own slabs and disable slab debug for these slabs.
[akpm@osdl.org: cleanups]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the list of cpus allowed to tasks in the top (root) cpuset to
dynamically track what cpus are online, using a CPU hotplug notifier. Make
this top cpus file read-only.
On systems that have cpusets configured in their kernel, but that aren't
actively using cpusets (for some distros, this covers the majority of
systems) all tasks end up in the top cpuset.
If that system does support CPU hotplug, then these tasks cannot make use
of CPUs that are added after system boot, because the CPUs are not allowed
in the top cpuset. This is a surprising regression over earlier kernels
that didn't have cpusets enabled.
In order to keep the behaviour of cpusets consistent between systems
actively making use of them and systems not using them, this patch changes
the behaviour of the 'cpus' file in the top (root) cpuset, making it read
only, and making it automatically track the value of cpu_online_map. Thus
tasks in the top cpuset will have automatic use of hot plugged CPUs allowed
by their cpuset.
Thanks to Anton Blanchard and Nathan Lynch for reporting this problem,
driving the fix, and earlier versions of this patch.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Nathan Lynch <ntl@pobox.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>