Commit Graph

148 Commits

Author SHA1 Message Date
Hugh Dickins 4c21e2f244 [PATCH] mm: split page table lock
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.

This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock.  (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)

In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.

Splitting the lock is not quite for free: another cacheline access.  Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS.  But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.

There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:42 -07:00
Linus Torvalds 8caf89157d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 2005-10-28 12:44:24 -07:00
Dave Kleikamp 7038f1cbac JFS: make sure right-most xtree pages have header.next set to zero
The xtTruncate code was only doing this for leaf pages.  When a file is
horribly fragmented, we may truncate a file leaving an internal page with
an invalid head.next field, which may cause a stale page to be referenced.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-10-28 13:27:40 -05:00
Al Viro 27496a8c67 [PATCH] gfp_t: fs/*
- ->releasepage() annotated (s/int/gfp_t), instances updated
 - missing gfp_t in fs/* added
 - fixed misannotation from the original sweep caught by bitwise checks:
   XFS used __nocast both for gfp_t and for flags used by XFS allocator.
   The latter left with unsigned int __nocast; we might want to add a
   different type for those but for now let's leave them alone.  That,
   BTW, is a case when __nocast use had been actively confusing - it had
   been used in the same code for two different and similar types, with
   no way to catch misuses.  Switch of gfp_t to bitwise had caught that
   immediately...

One tricky bit is left alone to be dealt with later - mapping->flags is
a mix of gfp_t and error indications.  Left alone for now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:47 -07:00
Dave Kleikamp b6a47fd8ff JFS: Corrupted block map should not cause trap
Replace assert statements with better error handling.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-10-11 09:06:59 -05:00
Dave Kleikamp ac17b8b570 JFS: make special inodes play nicely with page balancing
This patch fixes up a few problems with jfs's reserved inodes.

1. There is no need for the jfs code setting the I_DIRTY bits in i_state.
   I am ashamed that the code ever did this, and surprised it hasn't been
   noticed until now.

2. Make sure special inodes are on an inode hash list.  If the inodes are
   unhashed, __mark_inode_dirty will fail to put the inode on the
   superblock's dirty list, and the data will not be flushed under memory
   pressure.

3. Force writing journal data to disk when metapage_writepage is unable to
   write a metadata page due to pending journal I/O.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-10-03 15:32:11 -05:00
Dave Kleikamp 438282d85d JFS: don't dereference tlck->ip from txUpdateMap
The inode pointer may no longer be valid

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-09-20 14:58:11 -05:00
Dave Kleikamp 6cb1269b96 JFS: Fix sparse warnings, including endian error
The fix in inode.c is a real bug.  It could result in undeleted, yet
unconnected files on big-endian hardware.

The others are trivial.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-09-15 23:25:41 -05:00
Linus Torvalds 32983696a4 Merge branch 'for-linus' from kernel.org:/.../shaggy/jfs-2.6 manually
Clash due to new delete_inode behavior (the filesystem now needs to do
the truncate_inode_pages() call itself).

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-11 10:14:54 -07:00
Mark Fasheh fef266580e [PATCH] update filesystems for new delete_inode behavior
Update the file systems in fs/ implementing a delete_inode() callback to
call truncate_inode_pages().  One implementation note: In developing this
patch I put the calls to truncate_inode_pages() at the very top of those
filesystems delete_inode() callbacks in order to retain the previous
behavior.  I'm guessing that some of those could probably be optimized.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:27 -07:00
Mark Bellon 8fc2751beb [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system)
are written to /etc/mtab by the mount command.  The quota tools look there
for the quota strings for their operation.  If, however, /etc/mtab is a
symlink to /proc/mounts (a "good thing" in some environments) the tools
don't write anything - they assume the kernel will take care of things.

While the quota options are sent down to the kernel via the mount system
call and the file system codes handle them properly unfortunately there is
no code to echo the quota strings into /proc/mounts and the quota tools
fail in the symlink case.

The attached patchs modify the EXT[2|3] and JFS codes to add the necessary
hooks.  The show_options function of each file system in these patches
currently deal with only those things that seemed related to quotas;
especially in the EXT3 case more can be done (later?).

Jan Kara also noted the difficulty in moving these changes above the FS
codes responding similarly to myself to Andrew's comment about possible
VFS migration. Issue summary:

 - FS codes have to process the entire string of options anyway.

 - Only FS codes that use quotas must have a show_options function (for
   quotas to work properly) however quotas are only used in a small number
   of FS.

 - Since most of the quota using FS support other options these FS codes
   should have the a show_options function to show those options - and the
   quota echoing becomes virtually negligible.

Based on feedback I have modified my patches from the original:

   JFS a missing patch has been restored to the posting
   EXT[2|3] and JFS always use the show_options function
       - Each FS has at least one FS specific option displayed
       - QUOTA output is under a CONFIG_QUOTA ifdef
       - a follow-on patch will add a multitude of options for each FS
   EXT[2|3] and JFS "quota" is treated as "usrquota"
   EXT3 journalled data check for journalled quota removed
   EXT[2|3] mount when quota specified but not compiled in

 - no changes from my original patch.  I tested the patch and the codes
   warn but

 - still mount.  With all due respection I believe the comments
   otherwise were a

 - misread of the patch.  Please reread/test and comment.  XFS patch
   removed - the XFS team already made the necessary changes EXT3 mixing
   old and new quotas are handled differently (not purely exclusive)

 - if old and new quotas for the same type are used together the old
   type is silently depricated for compatability (e.g.  usrquota and
   usrjquota)

 - mixing of old and new quotas is an error (e.g.  usrjquota and
   grpquota)

Signed-off-by: Mark Bellon <mbellon@mvista.com>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:23 -07:00
Dave Kleikamp 1d15b10f95 JFS: Implement jfs_init_security
This atomically initializes the security xattr when an object is created

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-09-01 09:05:39 -05:00
Dave Kleikamp 4f4b401bfa JFS: allow extended attributes to be set within a existing transaction
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-09-01 09:02:43 -05:00
Dave Kleikamp b1b5d7f9b7 JFS: jfs_delete_inode should always call clear_inode.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-08-30 14:28:56 -05:00
Linus Torvalds ae11be6f37 Merge refs/heads/for-linus from master.kernel.org:/pub/scm/linux/kernel/git/shaggy/jfs-2.6.git 2005-08-30 07:47:42 -07:00
Al Viro 008b150a3c [PATCH] Fix up symlink function pointers
This fixes up the symlink functions for the calling convention change:

 * afs, autofs4, befs, devfs, freevxfs, jffs2, jfs, ncpfs, procfs,
   smbfs, sysvfs, ufs, xfs - prototype change for ->follow_link()
 * befs, smbfs, xfs - same for ->put_link()

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-19 18:08:21 -07:00
Dave Kleikamp 01fa90cb2f Merge with /home/shaggy/git/linus-clean/ 2005-08-19 10:37:59 -05:00
Dave Kleikamp 686762c804 JFS: Initialize dentry->d_op for negative dentries too
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-08-17 13:53:13 -05:00
Dave Kleikamp 8a9cd6d676 JFS: Fix race in txLock
TxAnchor.anon_list is protected by jfsTxnLock (TXN_LOCK), but there was
a place in txLock() that was removing an entry from the list without holding
the spinlock.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-08-10 11:14:39 -05:00
Dave Kleikamp 30db1ae864 JFS: Check for invalid inodes in jfs_delete_inode
Some error paths may iput an invalid inode with i_nlink=0.  jfs should
not try to actually delete such an inode.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-08-01 16:54:26 -05:00
Dave Kleikamp cbc3d65ebc JFS: Improve sync barrier processing
Under heavy load, hot metadata pages are often locked by non-committed
transactions, making them difficult to flush to disk.  This prevents
the sync point from advancing past a transaction that had modified the
page.

There is a point during the sync barrier processing where all
outstanding transactions have been committed to disk, but no new
transaction have been allowed to proceed.  This is the best time
to write the metadata.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-27 09:17:57 -05:00
Dave Kleikamp 18190cc08d JFS: Fix i_blocks accounting when allocation fails
A failure in dbAlloc caused a directory's i_blocks to be incorrectly
incremented, causing jfs_fsck to find the inode to be corrupt.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-26 09:29:13 -05:00
Dave Kleikamp c2783f3a62 JFS: Don't set log_SYNCBARRIER when log->active == 0
If a metadata page is kept active, it is possible that the sync barrier logic
continues to trigger, even if all active transactions have been phyically
written to the journal.  This can cause a hang, since the completion of the
journal I/O is what unsets the sync barrier flag to allow new transactions
to be created.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-25 08:58:54 -05:00
Dave Kleikamp c40c202493 JFS: Fix typo in last patch
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-22 11:08:44 -05:00
Qu Fuping 3d9b1cdd24 JFS: fsync wrong behavior when I/O failure occurs
This is half of a patch that Qu Fuping submitted in April.  The first part
was applied to fs/mpage.c in 2.6.12-rc4.

jfs_fsync should return error, but it doesn't wait for the metadata page to
be uptodate, e.g.:
jfs_fsync->jfs_commit_inode->txCommit->diWrite->read_metapage->
__get_metapage->read_cache_page reads a page from disk. Because read is
async, when read_cache_page: err = filler(data, page), filler will not
return error, it just submits I/O request and returns. So, page is not
uptodate.  Checking only if(IS_ERROR(mp->page)) is not enough, we should
add "|| !PageUptodate(mp->page)"

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-15 10:36:08 -05:00
Dave Kleikamp 56d1254917 JFS: Remove assert statement in dbJoin & return -EIO instead
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-15 09:43:36 -05:00
Dave Kleikamp 00be3e7e5c JFS: Remove bogus WARN_ON statement and some dead code
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-14 15:15:39 -05:00
Ian Dall 59192ed9e7 JFS: Need to be root to create files with security context
It turns out this is due to some inverted logic in xattr.c

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-13 09:15:18 -05:00
Dave Kleikamp 6211502d7e JFS: Allow security.* xattrs to be set on symlinks
All of the different xattr namespaces have different rules.
user.* and ACL's are not allowed on symlinks, and since these were the
first xattrs implemented, I assumed there was no need to support xattrs
on symlinks.  This one-line patch should fix it.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-13 09:07:53 -05:00
Dave Kleikamp f7f24758ac Merge with /home/shaggy/git/linus-clean/
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-07-13 08:57:38 -05:00
Dave Kleikamp b38a3ab3d1 JFS: Code cleanup - getting rid of never-used debug code
I'm finally getting around to cleaning out debug code that I've never used.
There has always been code ifdef'ed out by _JFS_DEBUG_DMAP, _JFS_DEBUG_IMAP,
_JFS_DEBUG_DTREE, and _JFS_DEBUG_XTREE, which I have personally never used,
and I doubt that anyone has since the design stage back in OS/2.  There is
also a function, xtGather, that has never been used, and I don't know why it
was ever there.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-06-27 15:35:37 -05:00
Christoph Lameter 3e1d1d28d9 [PATCH] Cleanup patch for process freezing
1. Establish a simple API for process freezing defined in linux/include/sched.h:

   frozen(process)		Check for frozen process
   freezing(process)		Check if a process is being frozen
   freeze(process)		Tell a process to freeze (go to refrigerator)
   thaw_process(process)	Restart process
   frozen_process(process)	Process is frozen now

2. Remove all references to PF_FREEZE and PF_FROZEN from all
   kernel sources except sched.h

3. Fix numerous locations where try_to_freeze is manually done by a driver

4. Remove the argument that is no longer necessary from two function calls.

5. Some whitespace cleanup

6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
   cleared before setting PF_FROZEN, recalc_sigpending does not check
   PF_FROZEN).

This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 17:10:13 -07:00
Sonny Rao f5f287738b JFS: performance patch
Basically, we saw a large amount of time spent in the
jfs_strfromUCS_le() function, mispredicting the branch inside the
loop, so I just added some unlikely modifiers to the if statements to
re-ordered the code.  Again, these simple changes provided > 2 % on
spec-sfs, so please consider it for inclusion.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-06-23 16:57:56 -05:00
Christoph Hellwig 9a59f452ab [PATCH] remove <linux/xattr_acl.h>
This file duplicates <linux/posix_acl_xattr.h>, using slightly different
names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:33 -07:00
Dave Kleikamp 72e3148a6e JFS: Fix compiler warning in jfs_logmgr.c
fs/jfs/jfs_logmgr.c: In function `jfs_flush_journal':
fs/jfs/jfs_logmgr.c:1632: warning: unused variable `mp'

Some debug code in jfs_flush_journal does nothing when CONFIG_JFS_DEBUG
is not defined.  Place the whole code segment within an ifdef to avoid
unnecessary code to be compiled and the warning to be issued.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-06-03 14:09:54 -05:00
Dave Kleikamp c2731509cf JFS: kernel BUG at fs/jfs/jfs_txnmgr.c:859
add_missing_indices() must set tlck->type to tlckBTROOT when modifying
a root btree root to avoid a trap in txRelease()

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-06-02 12:18:20 -05:00
Jesper Juhl 259692bd5a JFS: Remove redundant kfree() NULL pointer checks
kfree() can handle a NULL pointer, don't worry about passing it one. 

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-05-09 10:47:14 -05:00
Dave Kleikamp 7a694ca749 JFS: Fix sparse warning
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-05-04 15:31:14 -05:00
Dave Kleikamp dcc9871270 JFS: cleanup - remove unneeded sanity check
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-05-04 15:30:51 -05:00
Dave Kleikamp 1868f4aa5a JFS: fix sparse warnings by moving extern declarations to headers
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-05-04 15:29:35 -05:00
Dave Kleikamp 6b6bf51081 JFS: Endian errors
Thanks sparse!

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-05-04 09:11:49 -05:00
Dave Kleikamp 6628465e33 [PATCH] JFS: Don't allocate extents that overlap existing extents
Modify xtSearch so that it returns the next allocated block when the
requested block is unmapped.  This can be used to make sure we don't
create a new extent that overlaps the next one.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-02 22:23:54 -07:00
Dave Kleikamp 1c6278295d [PATCH] JFS: Write journal sync points more often
This patch adds jfs_syncpt, which calls lmLogSync to write sync points
to the journal both in jfs_sync_fs and when sync barrier processing
completes.

lmLogSync accomplishes two things:  1) it pushes logged-but-dirty
metadata pages to disk, and 2) it writes a sync record to the journal
so that jfs_fsck doesn't need to replay more transactions than is
necessary.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-02 22:23:53 -07:00
Dave Kleikamp 7fab479beb [PATCH] JFS: Support page sizes greater than 4K
jfs has never worked on architecutures where the page size was not 4K.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-02 22:23:53 -07:00
Dave Kleikamp dc5798d9a7 [PATCH] JFS: Changes for larger page size
JFS code has always assumed a page size of 4K.  This patch fixes the
non-pagecache uses of pages to deal with larger pages.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-02 22:23:53 -07:00
Dave Kleikamp d2e83707ed [PATCH] JFS: Simplify creation of new iag
JFS was creating a new IAG (inode aggregate group) in one address
space, and afterwards, accessing it from another.  This could lead to
complications when cache pages contain more than one page of jfs
metadata.  This patch causes the IAG to be initialized in the same
address space that it is subsequently accessed with.

This also elimitates an I/O, but IAG's aren't created too often.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-02 22:23:53 -07:00
Dave Kleikamp 66f3131f54 [PATCH] JFS: reduce number of synchronous transactions
Use an inline pxd list rather than an xad list in the xadlock.
When the number of extents being modified can fit with the xadlock,
a transaction can be committed asynchronously.  Using a list of
pxd's instead of xad's allows us to fit 4 extents, rather than 2.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-02 22:23:52 -07:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00