Commit Graph

265 Commits

Author SHA1 Message Date
Greg Kroah-Hartman 19c38de88a kobjects: fix up improper use of the kobject name field
A number of different drivers incorrect access the kobject name field
directly.  This is not correct as the name might not be in the array.
Use the proper accessor function instead.
2007-10-12 14:51:02 -07:00
NeilBrown 6712ecf8f6 Drop 'size' argument from bio_endio and bi_end_io
As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant.  Remove it.

Now there is no need for bio_endio to subtract the size completed
from bi_size.  So don't do that either.

While we are at it, change bi_end_io to return void.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10 09:25:57 +02:00
Jens Axboe 165125e1e4 [BLOCK] Get rid of request_queue_t typedef
Some of the code has been gradually transitioned to using the proper
struct request_queue, but there's lots left. So do a full sweet of
the kernel and get rid of this typedef and replace its uses with
the proper type.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-24 09:28:11 +02:00
NeilBrown 4ad1366376 md: change bitmap_unplug and others to void functions
bitmap_unplug only ever returns 0, so it may as well be void.  Two callers try
to print a message if it returns non-zero, but that message is already printed
by bitmap_file_kick.

write_page returns an error which is not consistently checked.  It always
causes BITMAP_WRITE_ERROR to be set on an error, and that can more
conveniently be checked.

When the return of write_page is checked, an error causes bitmap_file_kick to
be called - so move that call into write_page - and protect against recursive
calls into bitmap_file_kick.

bitmap_update_sb returns an error that is never checked.

So make these 'void' and be consistent about checking the bit.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:15 -07:00
NeilBrown f0d76d70bc md: check that internal bitmap does not overlap other data
We current completely trust user-space to set up metadata describing an
consistant array.  In particlar, that the metadata, data, and bitmap do not
overlap.

But userspace can be buggy, and it is better to report an error than corrupt
data.  So put in some appropriate checks.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:15 -07:00
NeilBrown 713f6ab18b md: improve the is_mddev_idle test fix
Don't use 'unsigned' variable to track sync vs non-sync IO, as the only thing
we want to do with them is a signed comparison, and fix up the comment which
had become quite wrong.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:15 -07:00
NeilBrown df968c4e8d md: improve message about invalid superblock during autodetect
People try to use raid auto-detect with version-1 superblocks (which is not
supported) and get confused when they are told they have an invalid
superblock.

So be more explicit, and say it it is not a valid v0.90 superblock.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:15 -07:00
Rafael J. Wysocki 8314418629 Freezer: make kernel threads nonfreezable by default
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves.  This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie.  to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE.  It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear.  Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:02 -07:00
Dan Williams 685784aaf3 xor: make 'xor_blocks' a library routine for use with async_tx
The async_tx api tries to use a dma engine for an operation, but will fall
back to an optimized software routine otherwise.  Xor support is
implemented using the raid5 xor routines.  For organizational purposes this
routine is moved to a common area.

The following fixes are also made:
* rename xor_block => xor_blocks, suggested by Adrian Bunk
* ensure that xor.o initializes before md.o in the built-in case
* checkpatch.pl fixes
* mark calibrate_xor_blocks __init, Adrian Bunk

Cc: Adrian Bunk <bunk@stusta.de>
Cc: NeilBrown <neilb@suse.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2007-07-13 08:06:14 -07:00
NeilBrown a778b73ff7 md: fix bug with linear hot-add and elsewhere
Adding a drive to a linear array seems to have stopped working, due to changes
elsewhere in md, and insufficient ongoing testing...

So the patch to make linear hot-add work in the first place introduced a
subtle bug elsewhere that interracts poorly with older version of mdadm.

This fixes it all up.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-23 20:14:14 -07:00
NeilBrown 435b71be20 md: improve the is_mddev_idle test
During a 'resync' or similar activity, md checks if the devices in the
array are otherwise active and winds back resync activity when they are.
This test in done in is_mddev_idle, and it is somewhat fragile - it
sometimes thinks there is non-sync io when there isn't.

The test compares the total sectors of io (disk_stat_read) with the sectors
of resync io (disk->sync_io).  This has problems because total sectors gets
updated when a request completes, while resync io gets updated when the
request is submitted.  The time difference can cause large differenced
between the two which do not actually imply non-resync activity.  The test
currently allows for some fuzz (+/- 4096) but there are some cases when it
is not enough.

The test currently looks for any (non-fuzz) difference, either positive or
negative.  This clearly is not needed.  Any non-sync activity will cause
the total sectors to grow faster than the sync_io count (never slower) so
we only need to look for a positive differences.

If we do this then the amount of in-flight sync io will never cause the
appearance of non-sync IO.  Once enough non-sync IO to worry about starts
happening, resync will be slowed down and the measurements will thus be
more precise (as there is less in-flight) and control of resync will still
be suitably responsive.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 08:29:37 -07:00
Linus Torvalds 44ce6294d0 Revert "md: improve partition detection in md array"
This reverts commit 5b479c91da.

Quoth Neil Brown:

  "It causes an oops when auto-detecting raid arrays, and it doesn't
   seem easy to fix.

   The array may not be 'open' when do_md_run is called, so
   bdev->bd_disk might be NULL, so bd_set_size can oops.

   This whole approach of opening an md device before it has been
   assembled just seems to get more and more painful.  I think I'm going
   to have to come up with something clever to provide both backward
   comparability with usage expectation, and sane integration into the
   rest of the kernel."

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 18:51:36 -07:00
NeilBrown 5b479c91da md: improve partition detection in md array
md currently uses ->media_changed to make sure rescan_partitions
is call on md array after they are assembled.

However that doesn't happen until the array is opened, which is later
than some people would like.

So use blkdev_ioctl to do the rescan immediately that the
array has been assembled.

This means we can remove all the ->change infrastructure as it was only used
to trigger a partition rescan.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
NeilBrown 08a02ecd28 md: allow reshape_position for md arrays to be set via sysfs
"reshape_position" records how much progress has been made on a "reshape"
(adding drives, changing layout or chunksize).

When it is set, the number of drives, layout and chunksize can have
two possible values, an old an a new.

So allow these different values to be visible, and allow both old and new to
be set: Set the old ones first, then the reshape_position, then the new
values.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
NeilBrown 4d167f0937 md: stop using csum_partial for checksum calculation in md
If CONFIG_NET is not selected, csum_partial is not exported, so md.ko cannot
use it.  We shouldn't really be using csum_partial anyway as it is an
internal-to-networking interface.

So replace it with C code to do the same thing.  Speed is not crucial here, so
something simple and correct is best.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
NeilBrown e11e93facc md: move test for whether level supports bitmap to correct place
We need to check for internal-consistency of superblock in load_super.
validate_super is for inter-device consistency.

With the test in the wrong place, a badly created array will confuse md rather
an produce sensible errors.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
Martin Peschke c3f94b40e1 md: cleanup: use seq_release_private() where appropriate
We can save some lines of code by using seq_release_private().

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
Ahmed S. Darwish 50511da3da drivers/md.c: Use ARRAY_SIZE macro when appropriate
Use ARRAY_SIZE macro already defined in kernel.h

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
Peter Zijlstra f98393a64c mm: remove destroy_dirty_buffers from invalidate_bdev()
Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).

find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
	quilt add $file;
	sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
done

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:55 -07:00
NeilBrown 5792a2856a [PATCH] md: avoid a deadlock when removing a device from an md array via sysfs
A device can be removed from an md array via e.g.
  echo remove > /sys/block/md3/md/dev-sde/state

This will try to remove the 'dev-sde' subtree which will deadlock
since
  commit e7b0d26a86

With this patch we run the kobject_del via schedule_work so as to
avoid the deadlock.

Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-04 21:12:47 -07:00
NeilBrown 5e55e2f5fc [PATCH] md: convert compile time warnings into runtime warnings
...  still not sure why we need this ....

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:15 -07:00
NeilBrown 041ae52e26 [PATCH] md: clear the congested_fn when stopping a raid5
If this mddev and queue got reused for another array that doesn't register a
congested_fn, this function would get called incorretly.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00
NeilBrown b4c4c7b809 [PATCH] md: restart a (raid5) reshape that has been aborted due to a read/write error
An error always aborts any resync/recovery/reshape on the understanding that
it will immediately be restarted if that still makes sense.  However a reshape
currently doesn't get restarted.  With this patch it does.

To avoid restarting when it is not possible to do work, we call into the
personality to check that a reshape is ok, and strengthen raid5_check_reshape
to fail if there are too many failed devices.

We also break some code out into a separate function: remove_and_add_spares as
the indent level for that code was getting crazy.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-01 14:53:36 -08:00
NeilBrown d1b5380c7f [PATCH] md: clean out unplug and other queue function on md shutdown
The mddev and queue might be used for another array which does not set these,
so they need to be cleared.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-01 14:53:36 -08:00
NeilBrown 7dd5e7c3db [PATCH] md: move warning about creating a raid array on partitions of the one device
md tries to warn the user if they e.g.  create a raid1 using two partitions of
the same device, as this does not provide true redundancy.

However it also warns if a raid0 is created like this, and there is nothing
wrong with that.

At the place where the warning is currently printer, we don't necessarily know
what level the array will be, so move the warning from the point where the
device is added to the point where the array is started.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-01 14:53:36 -08:00
Eric W. Biederman 0b4d414714 [PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name.  Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:59 -08:00
Eric W. Biederman ff1d28efc5 [PATCH] sysctl: md: remove unnecessary insert_at_head flag
The sysctls used by the md driver are have unique binary numbers so remove the
insert_at_head flag as it serves no useful purpose.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:55 -08:00
Arjan van de Ven fa027c2a0a [PATCH] mark struct file_operations const 4
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

[akpm@sdl.org: dvb fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
NeilBrown 2a2275d630 [PATCH] md: fix potential memalloc deadlock in md
If a GFP_KERNEL allocation is attempted in md while the mddev_lock is held,
it is possible for a deadlock to eventuate.

This happens if the array was marked 'clean', and the memalloc triggers a
write-out to the md device.

For the writeout to succeed, the array must be marked 'dirty', and that
requires getting the mddev_lock.

So, before attempting a GFP_KERNEL allocation while holding the lock, make
sure the array is marked 'dirty' (unless it is currently read-only).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:51:00 -08:00
NeilBrown 1031be7a5f [PATCH] md: make sure the events count in an md array never returns to zero
Now that we sometimes step the array events count backwards (when
transitioning dirty->clean where nothing else interesting has happened - so
that we don't need to write to spares all the time), it is possible for the
event count to return to zero, which is potentially confusing and triggers and
MD_BUG.

We could possibly remove the MD_BUG, but is just as easy, and probably safer,
to make sure we never return to zero.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
NeilBrown 3f9d7b0d81 [PATCH] md: fix a few problems with the interface (sysfs and ioctl) to md
While developing more functionality in mdadm I found some bugs in md...

- When we remove a device from an inactive array (write 'remove' to
  the 'state' sysfs file - see 'state_store') would should not
  update the superblock information - as we may not have
  read and processed it all properly yet.

- initialise all raid_disk entries to '-1' else the 'slot sysfs file
  will claim '0' for all devices in an array before the array is
  started.

- all '\n' not to be present at the end of words written to
  sysfs files

- when we use SET_ARRAY_INFO to set the md metadata version,
  set the flag to say that there is persistant metadata.

- allow GET_BITMAP_FILE to be called on an array that hasn't
  been started yet.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:51 -08:00
NeilBrown 1757128438 [PATCH] md: assorted md and raid1 one-liners
Fix few bugs that meant that:
  - superblocks weren't alway written at exactly the right time (this
    could show up if the array was not written to - writting to the array
    causes lots of superblock updates and so hides these errors).

  - restarting device recovery after a clean shutdown (version-1 metadata
    only) didn't work as intended (or at all).

1/ Ensure superblock is updated when a new device is added.
2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync.
   The body of this if takes one of two branches depending on whether
   MD_RECOVERY_SYNC is set, so testing it in the clause of the if
   is wrong.
3/ Flag superblock for updating after a resync/recovery finishes.
4/ If we find the neeed to restart a recovery in the middle (version-1
   metadata only) make sure a full recovery (not just as guided by
   bitmaps) does get done.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:21 -08:00
Jeff Garzik fdee8ae449 [PATCH] MD: conditionalize some code
The autorun code is only used if this module is built into the static
kernel image.  Adjust #ifdefs accordingly.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:21 -08:00
NeilBrown 0d4ca600fc [PATCH] md: tidy up device-change notification when an md array is stopped
An md array can be stopped leaving all the setting still in place, or it can
torn down and destroyed.  set_capacity and other change notifications only
happen in the latter case, but should happen in both.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:20 -08:00
Josef Sipek c649bb9c55 [PATCH] struct path: convert md
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:47 -08:00
NeilBrown d63a5a74de [PATCH] lockdep: avoid lockdep warning in md
md_open takes ->reconfig_mutex which causes lockdep to complain.  This
(normally) doesn't have deadlock potential as the possible conflict is with a
reconfig_mutex in a different device.

I say "normally" because if a loop were created in the array->member hierarchy
a deadlock could happen.  However that causes bigger problems than a deadlock
and should be fixed independently.

So we flag the lock in md_open as a nested lock.  This requires defining
mutex_lock_interruptible_nested.

Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:39 -08:00
Peter Zijlstra 2e7b651df1 [PATCH] remove the old bd_mutex lockdep annotation
Remove the old complex and crufty bd_mutex annotation.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:38 -08:00
Nigel Cunningham 7dfb71030f [PATCH] Add include/linux/freezer.h and move definitions from sched.h
Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.

[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:27 -08:00
Rafael J. Wysocki 4b438a23fb [PATCH] md: do not freeze md threads for suspend
If there's a swap file on a software RAID, it should be possible to use this
file for saving the swsusp's suspend image.  Also, this file should be
available to the memory management subsystem when memory is being freed before
the suspend image is created.

For the above reasons it seems that md_threads should not be frozen during the
suspend and the appended patch makes this happen, but then there is the
question if they don't cause any data to be written to disks after the suspend
image has been created, provided that all filesystems are frozen at that time.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 18:29:24 -08:00
NeilBrown 2f47130361 [PATCH] md: change ONLINE/OFFLINE events to a single CHANGE event
It turns out that CHANGE is preferred to ONLINE/OFFLINE for various reasons
(not least of which being that udev understands it already).

So remove the recently added KOBJ_OFFLINE (no-one is likely to care anyway)
and change the ONLINE to a CHANGE event

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 18:29:23 -08:00
NeilBrown 7870db4c7f [PATCH] md: send online/offline uevents when an md array starts/stops
This allows udev to do something intelligent when an array becomes
available.

Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-03 12:27:55 -08:00
NeilBrown 01ab5662f5 [PATCH] md: simplify checking of available size when resizing an array
When "mdadm --grow --size=xxx" is used to resize an array (use more or less of
each device), we check the new siza against the available space in each
device.

We already have that number recorded in rdev->size, so calculating it is
pointless (and wrong in one obscure case).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
NeilBrown 2b6e845986 [PATCH] md: fix bug where spares don't always get rebuilt properly when they become live
If save_raid_disk is >= 0, then the device could be a device that is already
in sync that is being re-added.  So we need to default this value to -1.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28 11:30:51 -07:00
NeilBrown 1c05b4bc22 [PATCH] md: endian annotation for v1 superblock access
Includes a couple of bugfixes found by sparse.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-21 13:35:05 -07:00
Akinobu Mita e24650c2e7 [PATCH] md: fix /proc/mdstat refcounting
I have seen mdadm oops after successfully unloading md module.

This patch privents from unloading md module while
mdadm is polling /proc/mdstat.

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Akinbou Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17 08:18:43 -07:00
NeilBrown 5842730de1 [PATCH] md: fix bug where new drives added to an md array sometimes don't sync properly
This fixes a bug introduced in 2.6.18.

If a drive is added to a raid1 using older tools (mdadm-1.x or raidtools)
then it will be included in the array without any resync happening.

It has been submitted for 2.6.18.1.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-06 08:53:41 -07:00
Eric Sesterhenn 52e5f9d1cf BUG_ON cleanup for drivers/md/
This changes two if() BUG(); usages to BUG_ON(); so people
can disable it safely.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 23:33:23 +02:00
NeilBrown 3a0f5bbb1a [PATCH] md: add error reporting to superblock write failure
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:19 -07:00
NeilBrown e8703fe1f5 [PATCH] md: remove MAX_MD_DEVS which is an arbitrary limit
Once upon a time we needed to fixed limit to the number of md devices,
probably because we preallocated some array.  This need no longer exists, but
we still have an arbitrary limit.

So remove MAX_MD_DEVS and allow as many devices as we can fit into the 'minor'
part of a device number.

Also remove some useless noise at init time (which reports MAX_MD_DEVS) and
remove MD_THREAD_NAME_MAX which hasn't been used for a while.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:18 -07:00
NeilBrown 61df9d91e9 [PATCH] md: make messages about resync/recovery etc more specific
It is possible to request a 'check' of an md/raid array where the whole array
is read and consistancies are reported.

This uses the same mechanisms as 'resync' and so reports in the kernel logs
that a resync is being started.  This understandably confuses/worries people.

Also the text in /proc/mdstat suggests a 'resync' is happen when it is just a
check.

This patch changes those messages to be more specific about what is happening.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:18 -07:00
Paul Clements 9b1d1dac18 [PATCH] md: new sysfs interface for setting bits in the write-intent-bitmap
Add a new sysfs interface that allows the bitmap of an array to be dirtied.
The interface is write-only, and is used as follows:

echo "1000" > /sys/block/md2/md/bitmap

(dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
bitmaps of array md2)

echo "1000-2000" > /sys/block/md1/md/bitmap

(dirty the bits for chunks 1000-2000 in md1's bitmap)

This is useful, for example, in cluster environments where you may need to
combine two disjoint bitmaps into one (following a server failure, after a
secondary server has taken over the array).  By combining the bitmaps on
the two servers, a full resync can be avoided (This was discussed on the
list back on March 18, 2005, "[PATCH 1/2] md bitmap bug fixes" thread).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:17 -07:00
NeilBrown 850b2b420c [PATCH] md: replace magic numbers in sb_dirty with well defined bit flags
Instead of magic numbers (0,1,2,3) in sb_dirty, we have
some flags instead:
MD_CHANGE_DEVS
   Some device state has changed requiring superblock update
   on all devices.
MD_CHANGE_CLEAN
   The array has transitions from 'clean' to 'dirty' or back,
   requiring a superblock update on active devices, but possibly
   not on spares
MD_CHANGE_PENDING
   A superblock update is underway.

We wait for an update to complete by waiting for all flags to be clear.  A
flag can be set at any time, even during an update, without risk that the
change will be lost.

Stop exporting md_update_sb - isn't needed.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:17 -07:00
Adrian Bunk fbedac04fa [PATCH] md: the scheduled removal of the START_ARRAY ioctl for md
This patch contains the scheduled removal of the START_ARRAY ioctl for md.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:04:16 -07:00
NeilBrown 8469219596 [PATCH] md: avoid backward event updates in md superblock when degraded.
If we
  - shut down a clean array,
  - restart with one (or more) drive(s) missing
  - make some changes
  - pause, so that they array gets marked 'clean',
the event count on the superblock of included drives
will be the same as that of the removed drives.
So adding the removed drive back in will cause it
to be included with no resync.

To avoid this, we only update the eventcount backwards when the array
is not degraded.  In this case there can (should) be no non-connected
drives that we can get confused with, and this is the particular case
where updating-backwards is valuable.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27 11:01:31 -07:00
Andrew Morton d0a0a5ee7a [PATCH] md: fix oops in error-handling
During early MD setup (superblock reading), we don't have a personality yet.
But the error-handling code tries to dereference mddev->pers.  Fix.

Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:17 -07:00
NeilBrown 67463acb64 [PATCH] md: require CAP_SYS_ADMIN for (re-)configuring md devices via sysfs
The ioctl requires CAP_SYS_ADMIN, so sysfs should too.  Note that we don't
require CAP_SYS_ADMIN for reading attributes even though the ioctl does.
There is no reason to limit the read access, and much of the information is
already available via /proc/mdstat

Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:17 -07:00
NeilBrown 80ca3a44f5 [PATCH] md: unify usage of symbolic names for perms
Some places we use number (0660) someplaces names (S_IRUGO).  Change all
numbers to be names, and change 0655 to be what it should be.

Also make some formatting more consistent.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:17 -07:00
NeilBrown ff4e8d9a9f [PATCH] md: fix resync speed calculation for restarted resyncs
We introduced 'io_sectors' recently so we could count the sectors that causes
io during resync separate from sectors which didn't cause IO - there can be a
difference if a bitmap is being used to accelerate resync.

However when a speed is reported, we find the number of sectors processed
recently by subtracting an oldish io_sectors count from a current
'curr_resync' count.  This is wrong because curr_resync counts all sectors,
not just io sectors.

So, add a field to mddev to store the curren io_sectors separately from
curr_resync, and use that in the calculations.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:16 -07:00
NeilBrown 0b8c9de05c [PATCH] md: delay starting md threads until array is completely setup
When an array is started we start one or two threads (two if there is a
reshape or recovery that needs to be completed).

We currently start these *before* the array is completely set up and in
particular before queue->queuedata is set.  If the thread actually starts
very quickly on another CPU, we can end up dereferencing queue->queuedata
and oops.

This patch also makes sure we don't try to start a recovery if a reshape is
being restarted.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:16 -07:00
NeilBrown 31b65a0d38 [PATCH] md: set desc_nr correctly for version-1 superblocks
This has to be done in ->load_super, not ->validate_super

Without this, hot-adding devices to an array doesn't always
work right - though there is a work around in mdadm-2.5.2 to
make this less of an issue.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:16 -07:00
Ingo Molnar 663d440eaa [PATCH] lockdep: annotate blkdev nesting
Teach special (recursive) locking code to the lock validator.

Effects on non-lockdep kernels:

- the introduction of the following function variants:

  extern struct block_device *open_partition_by_devnum(dev_t, unsigned);

  extern int blkdev_put_partition(struct block_device *);

  static int
  blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags);

 which on non-lockdep are the same as open_by_devnum(), blkdev_put()
 and blkdev_get().

- a subclass parameter to do_open(). [unused on non-lockdep]

- a subclass parameter to __blkdev_put(), which is a new internal
  function for the main blkdev_put*() functions. [parameter unused
  on non-lockdep kernels, except for two sanity check WARN_ON()s]

these functions carry no semantical difference - they only express
object dependencies towards the lockdep subsystem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:27:10 -07:00
Jörn Engel 6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Greg Kroah-Hartman ce7b0f46bb [PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed
And remove the now unneeded number field.
Also fixes all drivers that set these fields.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman ff23eca3e8 [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
Also fixes up all files that #include it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman 8ab5e4c15b [PATCH] devfs: Remove devfs_remove() function from the kernel tree
Removes the devfs_remove() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:07 -07:00
Greg Kroah-Hartman 1a715c5cf9 [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree
Removes the devfs_mk_bdev() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:06 -07:00
Greg Kroah-Hartman 95dc112a57 [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree
Removes the devfs_mk_dir() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:06 -07:00
Adrian Bunk 0538195424 [PATCH] drivers/md/md.c: make code static
Make needlessly global code static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:40 -07:00
NeilBrown f655675b3f [PATCH] md: Allow the write_mostly flag to be set via sysfs
It appears in /sys/mdX/md/dev-YYY/state
and can be set or cleared by writing 'writemostly' or '-writemostly'
respectively.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:40 -07:00
NeilBrown a94213b1fa [PATCH] md: Allow resync_start to be set and queried via sysfs
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:40 -07:00
NeilBrown d4dbd0250e [PATCH] md: Allow raid 'layout' to be read and set via sysfs
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown 45dc2de1e5 [PATCH] md: Allow rdev state to be set via sysfs
The md/dev-XXX/state file can now be written:

 "faulty" simulates an error on the device
 "remove" removes the device from the array (if it is not busy)

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown 9e653b6342 [PATCH] md: Set/get state of array via sysfs
This allows the state of an md/array to be directly controlled via sysfs and
adds the ability to stop and array without tearing it down.

Array states/settings:

 clear
     No devices, no size, no level
     Equivalent to STOP_ARRAY ioctl
 inactive
     May have some settings, but array is not active
        all IO results in error
     When written, doesn't tear down array, but just stops it
 suspended (not supported yet)
     All IO requests will block. The array can be reconfigured.
     Writing this, if accepted, will block until array is quiescent
 readonly
     no resync can happen.  no superblocks get written.
     write requests fail
 read-auto
     like readonly, but behaves like 'clean' on a write request.

 clean - no pending writes, but otherwise active.
     When written to inactive array, starts without resync
     If a write request arrives then
       if metadata is known, mark 'dirty' and switch to 'active'.
       if not known, block and switch to write-pending
     If written to an active array that has pending writes, then fails.
 active
     fully active: IO and resync can be happening.
     When written to inactive array, starts with resync

 write-pending (not supported yet)
     clean, but writes are blocked waiting for 'active' to be written.

 active-idle
     like active, but no writes have been seen for a while (100msec).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown 4254376914 [PATCH] md: Don't write dirty/clean update to spares - leave them alone
- record the 'event' count on each individual device (they
  might sometimes be slightly different now)
- add a new value for 'sb_dirty': '3' means that the super
  block only needs to be updated to record a clean<->dirty
  transition.
- Prefer odd event numbers for dirty states and even numbers
  for clean states
- Using all the above, don't update the superblock on
  a spare device if the update is just doing a clean-dirty
  transition.  To accomodate this, a transition from
  dirty back to clean might now decrement the events counter
  if nothing else has changed.

The net effect of this is that spare drives will not see any IO requests
during normal running of the array, so they can go to sleep if that is what
they want to do.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown 07d84d109d [PATCH] md: Allow re-add to work on array without bitmaps
When an array has a bitmap, a device can be removed and re-added and only
blocks changes since the removal (as recorded in the bitmap) will be resynced.

It should be possible to do a similar thing to arrays without bitmaps.  i.e.
if a device is removed and re-added and *no* changes have been made in the
interim, then the add should not require a resync.

This patch allows that option.  This means that when assembling an array one
device at a time (e.g.  during device discovery) the array can be enabled
read-only as soon as enough devices are available, but extra devices can still
be added without causing a resync.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown acc55e2201 [PATCH] md/bitmap: tidy up i_writecount handling in md/bitmap
md/bitmap modifies i_writecount of a bitmap file to make sure that no-one else
writes to it.  The reverting of the change is sometimes done twice, and there
is one error path where it is omitted.

This patch tidies that up.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown d7375ab324 [PATCH] md/bitmap: fix online removal of file-backed bitmaps
When "mdadm --grow /dev/mdX --bitmap=none" is used to remove a filebacked
bitmap, the bitmap was disconnected from the array, but the file wasn't closed
(until the array was stopped).

The file also wasn't closed if adding the bitmap file failed.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
Adrian Bunk 5e56341d02 [PATCH] md: make md_print_devices() static
This patch makes the needlessly global md_print_devices() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown 7c7546ccf6 [PATCH] md: allow a linear array to have drives added while active
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown 5fd6c1dce0 [PATCH] md: allow checkpoint of recovery with version-1 superblock
For a while we have had checkpointing of resync.  The version-1 superblock
allows recovery to be checkpointed as well, and this patch implements that.

Due to early carelessness we need to add a feature flag to signal that the
recovery_offset field is in use, otherwise older kernels would assume that a
partially recovered array is in fact fully recovered.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown a8a55c387d [PATCH] md: remove nuisance message at shutdown
At shutdown, we switch all arrays to read-only, which creates a message for
every instantiated array, even those which aren't actually active.

So remove the message for non-active arrays.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown 16f17b39f3 [PATCH] md: increase the delay before marking metadata clean, and make it configurable
When a md array has been idle (no writes) for 20msecs it is marked as 'clean'.
 This delay turns out to be too short for some real workloads.  So increase it
to 200msec (the time to update the metadata should be a tiny fraction of that)
and make it sysfs-configurable.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown 9443a1d1f7 [PATCH] md: remove useless ioctl warning
This warning was slightly useful back in 2.2 days, but is more an annoyance
now.  It makes it awkward to add new ioctls (that we we are likely to do that
in the current climate, but it is possible).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
NeilBrown c331eb04b9 [PATCH] md: Fix badness in sysfs_notify caused by md_new_event
From: NeilBrown <neilb@suse.de>

If an error is reported by a drive in a RAID array (which is done via
bi_end_io - in interrupt context), we call md_error and md_new_event which
calls sysfs_notify.  However sysfs_notify grabs a mutex and so cannot be
called in interrupt context.

This patch just creates a variant of md_new_event which avoids the sysfs
call, and uses that.  A better fix for later is to arrange for the event to
be called from user-context.

Note: avoiding the sysfs call isn't a problem as an error will not, by
itself, modify the sync_action attribute.  (We do still need to
wake_up(&md_event_waiters) as an error by itself will modify /proc/mdstat).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-31 16:27:11 -07:00
Neil Brown c71d48877e [PATCH] Unlock md devices when stopping them on reboot.
otherwise we get nasty messages about locks not being released.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-26 11:52:11 -07:00
NeilBrown 2adc7d47c4 [PATCH] md: Fix inverted test for 'repair' directive.
We should be able to write 'repair' to /sys/block/mdX/md/sync_action,
however due to and inverted test, that always given EINVAL.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-21 12:59:17 -07:00
Ingo Molnar 5dc5cf7dd2 [PATCH] md: locking fix
- fix mddev_lock() usage bugs in md_attr_show() and md_attr_store().
  [they did not anticipate the possibility of getting a signal]

- remove mddev_lock_uninterruptible() [unused]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-20 07:54:04 -07:00
NeilBrown 4508a7a734 [PATCH] sysfs: Allow sysfs attribute files to be pollable
It works like this:
  Open the file
  Read all the contents.
  Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
  When poll returns,
     close the file and go to top of loop.
   or lseek to start of file and go back to the 'read'.

Events are signaled by an object manager calling
   sysfs_notify(kobj, dir, attr);

If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).

This has a cost of one int  per attribute, one wait_queuehead per kobject,
one int per open file.

The name "sysfs_notify" may be confused with the inotify
functionality.  Maybe it would be nice to support inotify for sysfs
attributes as well?

This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-14 11:41:24 -07:00
NeilBrown 926ce2d8a7 [PATCH] md: Remove some code that can sleep from under a spinlock
And remove the comments that were put in inplace of a fix too....

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:19:01 -08:00
NeilBrown df5b89b323 [PATCH] md: Convert reconfig_sem to reconfig_mutex
... being careful that mutex_trylock is inverted wrt down_trylock

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:03 -08:00
Arjan van de Ven 48c9c27b8b [PATCH] sem2mutex: drivers/md
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:03 -08:00
NeilBrown 8ddeeae51f [PATCH] md: Fix md grow/size code to correctly find the maximum available space
An md array can be asked to change the amount of each device that it is using,
and in particular can be asked to use the maximum available space.  This
currently only works if the first device is not larger than the rest.  As
'size' gets changed and so 'fit' becomes wrong.  So check if a 'fit' is
required early and don't corrupt it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:03 -08:00
NeilBrown e464eafdb4 [PATCH] md: Support suspending of IO to regions of an md array
This allows user-space to access data safely.  This is needed for raid5
reshape as user-space needs to take a backup of the first few stripes before
allowing reshape to commence.

It will also be useful in cluster-aware raid1 configurations so that all
cluster members can leave a section of the array untouched while a
resync/recovery happens.

A 'start' and 'end' of the suspended range are written to 2 sysfs attributes.
Note that only one range can be suspended at a time.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:02 -08:00
NeilBrown 16484bf596 [PATCH] md: Make 'reshape' a possible sync_action action
This allows reshape to be triggerred via sysfs (which is the only way to start
it happening).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:02 -08:00
NeilBrown 63c70c4f3a [PATCH] md: Split reshape handler in check_reshape and start_reshape
check_reshape checks validity and does things that can be done instantly -
like adding devices to raid1.  start_reshape initiates a restriping process to
convert the whole array.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:02 -08:00
NeilBrown f67055780c [PATCH] md: Checkpoint and allow restart of raid5 reshape
We allow the superblock to record an 'old' and a 'new' geometry, and a
position where any conversion is up to.  The geometry allows for changing
chunksize, layout and level as well as number of devices.

When using verion-0.90 superblock, we convert the version to 0.91 while the
conversion is happening so that an old kernel will refuse the assemble the
array.  For version-1, we use a feature bit for the same effect.

When starting an array we check for an incomplete reshape and restart the
reshape process if needed.  If the reshape stopped at an awkward time (like
when updating the first stripe) we refuse to assemble the array, and let
user-space worry about it.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
NeilBrown 292695531a [PATCH] md: Final stages of raid5 expand code
This patch adds raid5_reshape and end_reshape which will start and finish the
reshape processes.

raid5_reshape is only enabled in CONFIG_MD_RAID5_RESHAPE is set, to discourage
accidental use.

Read the 'help' for the CONFIG_MD_RAID5_RESHAPE entry.

and Make sure that you have backups, just in case.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
NeilBrown ccfcc3c10b [PATCH] md: Core of raid5 resize process
This patch provides the core of the resize/expand process.

sync_request notices if a 'reshape' is happening and acts accordingly.

It allocated new stripe_heads for the next chunk-wide-stripe in the target
geometry, marking them STRIPE_EXPANDING.

Then it finds which stripe heads in the old geometry can provide data needed
by these and marks them STRIPE_EXPAND_SOURCE.  This causes stripe_handle to
read all blocks on those stripes.

Once all blocks on a STRIPE_EXPAND_SOURCE stripe_head are read, any that are
needed are copied into the corresponding STRIPE_EXPANDING stripe_head.  Once a
STRIPE_EXPANDING stripe_head is full, it is marks STRIPE_EXPAND_READY and then
is written out and released.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
NeilBrown ad01c9e375 [PATCH] md: Allow stripes to be expanded in preparation for expanding an array
Before a RAID-5 can be expanded, we need to be able to expand the stripe-cache
data structure.

This requires allocating new stripes in a new kmem_cache.  If this succeeds,
we copy cache pages over and release the old stripes and kmem_cache.

We then allocate new pages.  If that fails, we leave the stripe cache at it's
new size.  It isn't worth the effort to shrink it back again.

Unfortuanately this means we need two kmem_cache names as we, for a short
period of time, we have two kmem_caches.  So they are raid5/%s and
raid5/%s-alt

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
NeilBrown 4588b42e9d [PATCH] md: Update status_resync to handle LARGE devices
status_resync - used by /proc/mdstat to report the status of a resync, assumes
that device sizes will always fit into an 'unsigned long' This is no longer
the case...

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00