Commit Graph

19429 Commits

Author SHA1 Message Date
Linus Torvalds
30c0f6a049 Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  fsnotify: drop two useless bools in the fnsotify main loop
  fsnotify: fix list walk order
  fanotify: Return EPERM when a process is not privileged
  fanotify: resize pid and reorder structure
  fanotify: drop duplicate pr_debug statement
  fanotify: flush outstanding perm requests on group destroy
  fsnotify: fix ignored mask handling between inode and vfsmount marks
  fanotify: add MAINTAINERS entry
  fsnotify: reset used_inode and used_vfsmount on each pass
  fanotify: do not dereference inode_mark when it is unset
2010-08-28 14:11:04 -07:00
Linus Torvalds
e933424c48 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  eCryptfs: Fix encrypted file name lookup regression
  ecryptfs: properly mark init functions
  fs/ecryptfs: Return -ENOMEM on memory allocation failure
2010-08-28 14:10:43 -07:00
Linus Torvalds
997396a73a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix get_ticket_handler() error handling
  ceph: don't BUG on ENOMEM during mds reconnect
  ceph: ceph_mdsc_build_path() returns an ERR_PTR
  ceph: Fix warnings
  ceph: ceph_get_inode() returns an ERR_PTR
  ceph: initialize fields on new dentry_infos
  ceph: maintain i_head_snapc when any caps are dirty, not just for data
  ceph: fix osd request lru adjustment when sending request
  ceph: don't improperly set dir complete when holding EXCL cap
  mm: exporting account_page_dirty
  ceph: direct requests in snapped namespace based on nonsnap parent
  ceph: queue cap snap writeback for realm children on snap update
  ceph: include dirty xattrs state in snapped caps
  ceph: fix xattr cap writeback
  ceph: fix multiple mds session shutdown
2010-08-28 14:07:20 -07:00
Linus Torvalds
2547d1d20f Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix NULL dereference in nfsd_statfs()
  nfsd4: fix downgrade/lock logic
  nfsd4: typo fix in find_any_file
  nfsd4: bad BUG() in preprocess_stateid_op
2010-08-28 14:05:55 -07:00
J. Bruce Fields
b76b4014f9 writeback: Fix lost wake-up shutting down writeback thread
Setting the task state here may cause us to miss the wake up from
kthread_stop(), so we need to recheck kthread_should_stop() or risk
sleeping forever in the following schedule().

Symptom was an indefinite hang on an NFSv4 mount.  (NFSv4 may create
multiple mounts in a temporary namespace while traversing the mount
path, and since the temporary namespace is immediately destroyed, it may
end up destroying a mount very soon after it was created, possibly
making this race more likely.)

INFO: task mount.nfs4:4314 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.nfs4    D 0000000000000000  2880  4314   4313 0x00000000
 ffff88001ed6da28 0000000000000046 ffff88001ed6dfd8 ffff88001ed6dfd8
 ffff88001ed6c000 ffff88001ed6c000 ffff88001ed6c000 ffff88001e5003a0
 ffff88001ed6dfd8 ffff88001e5003a8 ffff88001ed6c000 ffff88001ed6dfd8
Call Trace:
 [<ffffffff8196090d>] schedule_timeout+0x1cd/0x2e0
 [<ffffffff8106a31c>] ? mark_held_locks+0x6c/0xa0
 [<ffffffff819639a0>] ? _raw_spin_unlock_irq+0x30/0x60
 [<ffffffff8106a5fd>] ? trace_hardirqs_on_caller+0x14d/0x190
 [<ffffffff819671fe>] ? sub_preempt_count+0xe/0xd0
 [<ffffffff8195fc80>] wait_for_common+0x120/0x190
 [<ffffffff81033c70>] ? default_wake_function+0x0/0x20
 [<ffffffff8195fdcd>] wait_for_completion+0x1d/0x20
 [<ffffffff810595fa>] kthread_stop+0x4a/0x150
 [<ffffffff81061a60>] ? thaw_process+0x70/0x80
 [<ffffffff810cc68a>] bdi_unregister+0x10a/0x1a0
 [<ffffffff81229dc9>] nfs_put_super+0x19/0x20
 [<ffffffff810ee8c4>] generic_shutdown_super+0x54/0xe0
 [<ffffffff810ee9b6>] kill_anon_super+0x16/0x60
 [<ffffffff8122d3b9>] nfs4_kill_super+0x39/0x90
 [<ffffffff810eda45>] deactivate_locked_super+0x45/0x60
 [<ffffffff810edfb9>] deactivate_super+0x49/0x70
 [<ffffffff81108294>] mntput_no_expire+0x84/0xe0
 [<ffffffff811084ef>] release_mounts+0x9f/0xc0
 [<ffffffff81108575>] put_mnt_ns+0x65/0x80
 [<ffffffff8122cc56>] nfs_follow_remote_path+0x1e6/0x420
 [<ffffffff8122cfbf>] nfs4_try_mount+0x6f/0xd0
 [<ffffffff8122d0c2>] nfs4_get_sb+0xa2/0x360
 [<ffffffff810edcb8>] vfs_kern_mount+0x88/0x1f0
 [<ffffffff810ede92>] do_kern_mount+0x52/0x130
 [<ffffffff81963d9a>] ? _lock_kernel+0x6a/0x170
 [<ffffffff81108e9e>] do_mount+0x26e/0x7f0
 [<ffffffff81106b3a>] ? copy_mount_options+0xea/0x190
 [<ffffffff811094b8>] sys_mount+0x98/0xf0
 [<ffffffff810024d8>] system_call_fastpath+0x16/0x1b
1 lock held by mount.nfs4/4314:
 #0:  (&type->s_umount_key#24){+.+...}, at: [<ffffffff810edfb1>] deactivate_super+0x41/0x70

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2010-08-28 08:52:10 +02:00
Eric Paris
92b4678efa fsnotify: drop two useless bools in the fnsotify main loop
The fsnotify main loop has 2 bools which indicated if we processed the
inode or vfsmount mark in that particular pass through the loop.  These
bool can we replaced with the inode_group and vfsmount_group variables
and actually make the code a little easier to understand.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-27 21:42:11 -04:00
Eric Paris
f72adfd540 fsnotify: fix list walk order
Marks were stored on the inode and vfsmonut mark list in order from
highest memory address to lowest memory address.  The code to walk those
lists thought they were in order from lowest to highest with
unpredictable results when trying to match up marks from each.  It was
possible that extra events would be sent to userspace when inode
marks ignoring events wouldn't get matched with the vfsmount marks.

This problem only affected fanotify when using both vfsmount and inode
marks simultaneously.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-27 21:41:26 -04:00
Andreas Gruenbacher
a2f13ad0ba fanotify: Return EPERM when a process is not privileged
The appropriate error code when privileged operations are denied is
EPERM, not EACCES.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <paris@paris.rdu.redhat.com>
2010-08-27 19:59:42 -04:00
Tyler Hicks
93c3fe40c2 eCryptfs: Fix encrypted file name lookup regression
Fixes a regression caused by 21edad3220

When file name encryption was enabled, ecryptfs_lookup() failed to use
the encrypted and encoded version of the upper, plaintext, file name
when performing a lookup in the lower file system. This made it
impossible to lookup existing encrypted file names and any newly created
files would have plaintext file names in the lower file system.

https://bugs.launchpad.net/ecryptfs/+bug/623087

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-27 10:50:53 -05:00
Jerome Marchand
7371a38201 ecryptfs: properly mark init functions
Some ecryptfs init functions are not prefixed by __init and thus not
freed after initialization. This patch saved about 1kB in ecryptfs
module.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-27 10:50:52 -05:00
Julia Lawall
f137f15072 fs/ecryptfs: Return -ENOMEM on memory allocation failure
In this code, 0 is returned on memory allocation failure, even though other
failures return -ENOMEM or other similar values.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression ret;
expression x,e1,e2,e3;
@@

ret = 0
... when != ret = e1
*x = \(kmalloc\|kcalloc\|kzalloc\)(...)
... when != ret = e2
if (x == NULL) { ... when != ret = e3
  return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-27 10:50:52 -05:00
Takashi Iwai
f6360efb83 nfsd: fix NULL dereference in nfsd_statfs()
The commit ebabe9a900
    pass a struct path to vfs_statfs
introduced the struct path initialization, and this seems to trigger
an Oops on my machine.

fh_dentry field may be NULL and set later in fh_verify(), thus the
initialization of path must be after fh_verify().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-26 13:23:16 -04:00
J. Bruce Fields
f632265d0f Merge commit 'v2.6.36-rc1' into HEAD 2010-08-26 13:22:27 -04:00
J. Bruce Fields
7d94784293 nfsd4: fix downgrade/lock logic
If we already had a RW open for a file, and get a readonly open, we were
piggybacking on the existing RW open.  That's inconsistent with the
downgrade logic which blows away the RW open assuming you'll still have
a readonly open.

Also, make sure there is a readonly or writeonly open available for
locking, again to prevent bad behavior in downgrade cases when any RW
open may be lost.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-26 13:22:02 -04:00
J. Bruce Fields
18608ad49c nfsd4: typo fix in find_any_file
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-26 13:21:09 -04:00
J. Bruce Fields
30c0e1ef0a nfsd4: bad BUG() in preprocess_stateid_op
It's OK for this function to return without setting filp--we do it in
the special-stateid case.

And there's a legitimate case where we can hit this, since we do permit
reads on write-only stateid's.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-26 13:20:51 -04:00
Suresh Jayaraman
f0138a79d7 Cannot allocate memory error on mount
On 08/26/2010 01:56 AM, joe hefner wrote:
> On a recent Fedora (13), I am seeing a mount failure message that I can not explain. I have a Windows Server 2003ýa with a share set up for access only for a specific username (say userfoo). If I try to mount it from Linux,ýusing userfoo and the correct password all is well. If I try with a bad password or with some other username (userbar), it fails with "Permission denied" as expected. If I try to mount as username = administrator, and give the correct administrator password, I would also expect "Permission denied", but I see "Cannot allocate memory" instead.

> ýfs/cifs/netmisc.c: Mapping smb error code 5 to POSIX err -13
> ýfs/cifs/cifssmb.c: Send error in QPathInfo = -13
> ýCIFS VFS: cifs_read_super: get root inode failed

Looks like the commit 0b8f18e3 assumed that cifs_get_inode_info() and
friends fail only due to memory allocation error when the inode is NULL
which is not the case if CIFSSMBQPathInfo() fails and returns an error.
Fix this by propagating the actual error code back.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-26 16:53:27 +00:00
Dan Carpenter
b545787dbb ceph: fix get_ticket_handler() error handling
get_ticket_handler() returns a valid pointer or it returns
ERR_PTR(-ENOMEM) if kzalloc() fails.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-26 09:26:50 -07:00
Sage Weil
e072f8aa35 ceph: don't BUG on ENOMEM during mds reconnect
We are in a position to return an error; do that instead.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-26 09:26:37 -07:00
Dan Carpenter
f44c3890d9 ceph: ceph_mdsc_build_path() returns an ERR_PTR
ceph_mdsc_build_path() returns an ERR_PTR but this code is set up to
handle NULL returns.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-26 09:24:28 -07:00
Steve French
c89e5198b2 [CIFS] Eliminate unused variable warning
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-26 02:11:54 +00:00
Alan Cox
ad8453ab0a ceph: Fix warnings
Just scrubbing some warnings so I can see real problem ones in the build
noise. For 32bit we need to coax gcc politely into believing we really
honestly intend to the casts. Using (u64)(unsigned long) means we cast from
a pointer to a type of the right size and then extend it. This stops the
warning spew.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-25 12:02:14 -07:00
Dan Carpenter
ac1f12ef56 ceph: ceph_get_inode() returns an ERR_PTR
ceph_get_inode() returns an ERR_PTR and it doesn't return a NULL.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-25 12:01:54 -07:00
Linus Torvalds
37822188ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  Eliminate sparse warning - bad constant expression
  cifs: check for NULL session password
  missing changes during ntlmv2/ntlmssp auth and sign
  [CIFS] Fix ntlmv2 auth with ntlmssp
  cifs: correction of unicode header files
  cifs: fix NULL pointer dereference in cifs_find_smb_ses
  cifs: consolidate error handling in several functions
  cifs: clean up error handling in cifs_mknod
2010-08-25 09:57:59 -07:00
Sage Weil
36e21687e6 ceph: initialize fields on new dentry_infos
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-24 16:24:19 -07:00
Sage Weil
7d8cb26d7d ceph: maintain i_head_snapc when any caps are dirty, not just for data
We used to use i_head_snapc to keep track of which snapc the current epoch
of dirty data was dirtied under.  It is used by queue_cap_snap to set up
the cap_snap.  However, since we queue cap snaps for any dirty caps, not
just for dirty file data, we need to keep a valid i_head_snapc anytime
we have dirty|flushing caps.  This fixes a NULL pointer deref in
queue_cap_snap when writing back dirty caps without data (e.g.,
snaptest-authwb.sh).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-24 16:24:18 -07:00
shirishpargaonkar@gmail.com
2d20ca8358 Eliminate sparse warning - bad constant expression
Eliminiate sparse warning during usage of crypto_shash_* APIs
       error: bad constant expression

Allocate memory for shash descriptors once, so that we do not kmalloc/kfree it
for every signature generation (shash descriptor for md5 hash).

From ed7538619817777decc44b5660b52268077b74f3 Mon Sep 17 00:00:00 2001
From: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Date: Tue, 24 Aug 2010 11:47:43 -0500
Subject: [PATCH] eliminate sparse warnings during crypto_shash_* APis usage

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-24 18:12:52 +00:00
Christoph Hellwig
b5420f2359 xfs: do not discard page cache data on EAGAIN
If xfs_map_blocks returns EAGAIN because of lock contention we must redirty the
page and not disard the pagecache content and return an error from writepage.
We used to do this correctly, but the logic got lost during the recent
reshuffle of the writepage code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Mike Gao <ygao.linux@gmail.com>
Tested-by: Mike Gao <ygao.linux@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
2010-08-24 11:47:51 +10:00
Dave Chinner
3b93c7aaef xfs: don't do memory allocation under the CIL context lock
Formatting items requires memory allocation when using delayed
logging. Currently that memory allocation is done while holding the
CIL context lock in read mode. This means that if memory allocation
takes some time (e.g. enters reclaim), we cannot push on the CIL
until the allocation(s) required by formatting complete. This can
stall CIL pushes for some time, and once a push is stalled so are
all new transaction commits.

Fix this splitting the item formatting into two steps. The first
step which does the allocation and memcpy() into the allocated
buffer is now done outside the CIL context lock, and only the CIL
insert is done inside the CIL context lock. This avoids the stall
issue.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24 11:45:53 +10:00
Dave Chinner
a44f13edf0 xfs: Reduce log force overhead for delayed logging
Delayed logging adds some serialisation to the log force process to
ensure that it does not deference a bad commit context structure
when determining if a CIL push is necessary or not. It does this by
grabing the CIL context lock exclusively, then dropping it before
pushing the CIL if necessary. This causes serialisation of all log
forces and pushes regardless of whether a force is necessary or not.
As a result fsync heavy workloads (like dbench) can be significantly
slower with delayed logging than without.

To avoid this penalty, copy the current sequence from the context to
the CIL structure when they are swapped. This allows us to do
unlocked checks on the current sequence without having to worry
about dereferencing context structures that may have already been
freed. Hence we can remove the CIL context locking in the forcing
code and only call into the push code if the current context matches
the sequence we need to force.

By passing the sequence into the push code, we can check the
sequence again once we have the CIL lock held exclusive and abort if
the sequence has already been pushed. This avoids a lock round-trip
and unnecessary CIL pushes when we have racing push calls.

The result is that the regression in dbench performance goes away -
this change improves dbench performance on a ramdisk from ~2100MB/s
to ~2500MB/s. This compares favourably to not using delayed logging
which retuns ~2500MB/s for the same workload.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24 11:40:03 +10:00
Dave Chinner
1a387d3be2 xfs: dummy transactions should not dirty VFS state
When we  need to cover the log, we issue dummy transactions to ensure
the current log tail is on disk. Unfortunately we currently use the
root inode in the dummy transaction, and the act of committing the
transaction dirties the inode at the VFS level.

As a result, the VFS writeback of the dirty inode will prevent the
filesystem from idling long enough for the log covering state
machine to complete. The state machine gets stuck in a loop issuing
new dummy transactions to cover the log and never makes progress.

To avoid this problem, the dummy transactions should not cause
externally visible state changes. To ensure this occurs, make sure
that dummy transactions log an unchanging field in the superblock as
it's state is never propagated outside the filesystem. This allows
the log covering state machine to complete successfully and the
filesystem now correctly enters a fully idle state about 90s after
the last modification was made.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24 11:46:31 +10:00
Stuart Brodsky
2fe33661fc xfs: ensure f_ffree returned by statfs() is non-negative
Because of delayed updates to sb_icount field in the super block, it
is possible to allocate over maxicount number of inodes.  This
causes the arithmetic to calculate a negative number of free inodes
in user commands like df or stat -f.

Since maxicount is a somewhat arbitrary number, a slight over
allocation is not critical but user commands should be displayed as
0 or greater and never go negative.  To do this the value in the
stats buffer f_ffree is capped to never go negative.

[ Modified to use max_t as per Christoph's comment. ]

Signed-off-by: Stu Brodsky <sbrodsky@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
2010-08-24 11:46:05 +10:00
Dave Chinner
efceab1d56 xfs: handle negative wbc->nr_to_write during sync writeback
During data integrity (WB_SYNC_ALL) writeback, wbc->nr_to_write will
go negative on inodes with more than 1024 dirty pages due to
implementation details of write_cache_pages(). Currently XFS will
abort page clustering in writeback once nr_to_write drops below
zero, and so for data integrity writeback we will do very
inefficient page at a time allocation and IO submission for inodes
with large numbers of dirty pages.

Fix this by only aborting the page clustering code when
wbc->nr_to_write is negative and the sync mode is WB_SYNC_NONE.

Cc: <stable@kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24 11:44:56 +10:00
Dave Chinner
4536f2ad8b xfs: fix untrusted inode number lookup
Commit 7124fe0a5b ("xfs: validate untrusted inode
numbers during lookup") changes the inode lookup code to do btree lookups for
untrusted inode numbers. This change made an invalid assumption about the
alignment of inodes and hence incorrectly calculated the first inode in the
cluster. As a result, some inode numbers were being incorrectly considered
invalid when they were actually valid.

The issue was not picked up by the xfstests suite because it always runs fsr
and dump (the two utilities that utilise the bulkstat interface) on cache hot
inodes and hence the lookup code in the cold cache path was not sufficiently
exercised to uncover this intermittent problem.

Fix the issue by relaxing the btree lookup criteria and then checking if the
record returned contains the inode number we are lookup for. If it we get an
incorrect record, then the inode number is invalid.

Cc: <stable@kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24 11:42:30 +10:00
Dave Chinner
5b3eed756c xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE
Under heavy load parallel metadata loads (e.g. dbench), we can fail
to mark all the inodes in a cluster being freed as XFS_ISTALE as we
skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on.
When this happens and the inode cluster buffer has already been
marked stale and freed, inode reclaim can try to write the inode out
as it is dirty and not marked stale. This can result in writing th
metadata to an freed extent, or in the case it has already
been overwritten trigger a magic number check failure and return an
EUCLEAN error such as:

Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117

Fix this by ensuring that we hoover up all in memory inodes in the
cluster and mark them XFS_ISTALE when freeing the cluster.

Cc: <stable@kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24 11:42:41 +10:00
Dave Chinner
d17c701ce6 xfs: unlock items before allowing the CIL to commit
When we commit a transaction using delayed logging, we need to
unlock the items in the transaciton before we unlock the CIL context
and allow it to be checkpointed. If we unlock them after we release
the CIl context lock, the CIL can checkpoint and complete before
we free the log items. This breaks stale buffer item unlock and
unpin processing as there is an implicit assumption that the unlock
will occur before the unpin.

Also, some log items need to store the LSN of the transaction commit
in the item (inodes and EFIs) and so can race with other transaction
completions if we don't prevent the CIL from checkpointing before
the unlock occurs.

Cc: <stable@kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24 11:42:52 +10:00
Jeff Layton
24e6cf92fd cifs: check for NULL session password
It's possible for a cifsSesInfo struct to have a NULL password, so we
need to check for that prior to running strncmp on it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-23 17:38:31 +00:00
Shirish Pargaonkar
3ec6bbcdb4 missing changes during ntlmv2/ntlmssp auth and sign
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-23 17:38:24 +00:00
Andrew Morton
220eb7fd98 fs/bio-integrity.c: return -ENOMEM on kmalloc failure
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23 13:36:59 +02:00
David Rientjes
72f4650337 bio-integrity.c: remove dependency on __GFP_NOFAIL
The kmalloc() in bio_integrity_prep() is failable, so remove __GFP_NOFAIL
from its mask.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23 13:36:58 +02:00
Henry C Chang
07a27e226d ceph: fix osd request lru adjustment when sending request
Fix argument order.  We want to move the item to the end of the list, not
change the position of the head.

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22 21:34:27 -07:00
Sage Weil
124514918b ceph: don't improperly set dir complete when holding EXCL cap
If we hold the EXCL cap, we cannot trust the dir stats from the MDS (num
files, subdirs) and must not incorrectly conclude that the directory is
empty.  If we do, we get can bad results from lookup (bad ENOENT) and
bad readdir results.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22 21:33:32 -07:00
Tvrtko Ursulin
ff8d6e9831 fanotify: drop duplicate pr_debug statement
This reminded me... you have two pr_debugs in fanotify_should_send_event
which output redundant information. Maybe you intended it like that so
it is selectable how much log spam you want, or if not you may want to
apply this patch.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-22 20:30:12 -04:00
Eric Paris
2eebf582c9 fanotify: flush outstanding perm requests on group destroy
When an fanotify listener is closing it may cause a deadlock between the
listener and the original task doing an fs operation.  If the original task
is waiting for a permissions response it will be holding the srcu lock.  The
listener cannot clean up and exit until after that srcu lock is syncronized.
Thus deadlock.  The fix introduced here is to stop accepting new permissions
events when a listener is shutting down and to grant permission for all
outstanding events.  Thus the original task will eventually release the srcu
lock and the listener can complete shutdown.

Reported-by: Andreas Gruenbacher <agruen@suse.de>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-22 20:28:16 -04:00
Eric Paris
84e1ab4d87 fsnotify: fix ignored mask handling between inode and vfsmount marks
The interesting 2 list lockstep walking didn't quite work out if the inode
marks only had ignores and the vfsmount list requested events.  The code to
shortcut list traversal would not run the inode list since it didn't have real
event requests.  This code forces inode list traversal when a vfsmount mark
matches the event type.  Maybe we could add an i_fsnotify_ignored_mask field
to struct inode to get the shortcut back, but it doesn't seem worth it to grow
struct inode again.

I bet with the recent changes to lock the way we do now it would actually not
be a major perf hit to just drop i_fsnotify_mark_mask altogether.  But that is
for another day.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-22 20:09:41 -04:00
Eric Paris
5f3f259fa8 fsnotify: reset used_inode and used_vfsmount on each pass
The fsnotify main loop has 2 booleans which tell if a particular mark was
sent to the listeners or if it should be processed in the next pass.  The
problem is that the booleans were not reset on each traversal of the loop.
So marks could get skipped even when they were not sent to the notifiers.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-22 20:09:41 -04:00
Eric Paris
faa9560ae7 fanotify: do not dereference inode_mark when it is unset
The fanotify code is supposed to get the group from the mark.  It accidentally
only used the inode_mark.  If the vfsmount_mark was set but not the inode_mark
it would deref the NULL inode_mark.  Get the group from the correct place.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-22 20:09:41 -04:00
Michael Rubin
679ceace84 mm: exporting account_page_dirty
This allows code outside of the mm core to safely manipulate page state
and not worry about the other accounting. Not using these routines means
that some code will lose track of the accounting and we get bugs. This
has happened once already.

Signed-off-by: Michael Rubin <mrubin@google.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22 15:16:51 -07:00
Sage Weil
eb6bb1c5bd ceph: direct requests in snapped namespace based on nonsnap parent
When making a request in the virtual snapdir or a snapped portion of the
namespace, we should choose the MDS based on the first nonsnap parent (and
its caps).  If that is not the best place, we will get forward hints to
find the right MDS in the cluster.  This fixes ESTALE errors when using
the .snap directory and namespace with multiple MDSs.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22 15:16:48 -07:00
Sage Weil
ed32604448 ceph: queue cap snap writeback for realm children on snap update
When a realm is updated, we need to queue writeback on inodes in that
realm _and_ its children.  Otherwise, if the inode gets cowed on the
server, we can get a hang later due to out-of-sync cap/snap state.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22 15:16:47 -07:00
Sage Weil
4a625be472 ceph: include dirty xattrs state in snapped caps
When we snapshot dirty metadata that needs to be written back to the MDS,
include dirty xattr metadata.  Make the capsnap reference the encoded
xattr blob so that it will be written back in the FLUSHSNAP op.

Also fix the capsnap creation guard to include dirty auth or file bits,
not just tests specific to dirty file data or file writes in progress
(this fixes auth metadata writeback).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22 15:16:46 -07:00
Sage Weil
082afec92d ceph: fix xattr cap writeback
We should include the xattr metadata blob in the cap update message any
time we are flushing dirty state, NOT just when we are also dropping the
cap.  This fixes async xattr writeback.

Also, clean up the code slightly to avoid duplicating the bit test.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22 15:16:41 -07:00
Sage Weil
f3c60c5918 ceph: fix multiple mds session shutdown
The use of a completion when waiting for session shutdown during umount is
inappropriate, given the complexity of the condition.  For multiple MDS's,
this resulted in the umount thread spinning, often preventing the session
close message from being processed in some cases.

Switch to a waitqueue and defined a condition helper.  This cleans things
up nicely.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22 15:04:43 -07:00
Linus Torvalds
a28e0852d4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: wait for discard to finish
2010-08-22 09:44:47 -07:00
Steve French
9fbc590860 [CIFS] Fix ntlmv2 auth with ntlmssp
Make ntlmv2 as an authentication mechanism within ntlmssp
instead of ntlmv1.
Parse type 2 response in ntlmssp negotiation to pluck
AV pairs and use them to calculate ntlmv2 response token.
Also, assign domain name from the sever response in type 2
packet of ntlmssp and use that (netbios) domain name in
calculation of response.

Enable cifs/smb signing using rc4 and md5.

Changed name of the structure mac_key to session_key to reflect
the type of key it holds.

Use kernel crypto_shash_* APIs instead of the equivalent cifs functions.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-20 20:42:26 +00:00
Igor Druzhinin
bf4f121138 cifs: correction of unicode header files
This patch corrects a problem of compilation errors at removal of
UNIUPR_NOLOWER definition and adds include guards to cifs_unicode.h.

Signed-off-by: Igor Druzhinin <jaxbrigs@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-20 00:46:42 +00:00
Linus Torvalds
763008c435 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix an Oops in the NFSv4 atomic open code
  NFS: Fix the selection of security flavours in Kconfig
  NFS: fix the return value of nfs_file_fsync()
  rpcrdma: Fix SQ size calculation when memreg is FRMR
  xprtrdma: Do not truncate iova_start values in frmr registrations.
  nfs: Remove redundant NULL check upon kfree()
  nfs: Add "lookupcache" to displayed mount options
  NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
  SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
2010-08-18 15:45:23 -07:00
Jeff Layton
fc87a40677 cifs: fix NULL pointer dereference in cifs_find_smb_ses
cifs_find_smb_ses assumes that the vol->password field is a valid
pointer, but that's only the case if a password was passed in via
the options string. It's possible that one won't be if there is
no mount helper on the box.

Reported-by: diabel <gacek-2004@wp.pl>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-18 17:26:25 +00:00
Linus Torvalds
145c3ae46b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  fs: brlock vfsmount_lock
  fs: scale files_lock
  lglock: introduce special lglock and brlock spin locks
  tty: fix fu_list abuse
  fs: cleanup files_lock locking
  fs: remove extra lookup in __lookup_hash
  fs: fs_struct rwlock to spinlock
  apparmor: use task path helpers
  fs: dentry allocation consolidation
  fs: fix do_lookup false negative
  mbcache: Limit the maximum number of cache entries
  hostfs ->follow_link() braino
  hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
  remove SWRITE* I/O types
  kill BH_Ordered flag
  vfs: update ctime when changing the file's permission by setfacl
  cramfs: only unlock new inodes
  fix reiserfs_evict_inode end_writeback second call
2010-08-18 09:35:08 -07:00
Ryusuke Konishi
1cb0c924fa nilfs2: wait for discard to finish
nilfs_discard_segment() doesn't wait for completion of discard
requests.  This specifies BLKDEV_IFL_WAIT flag when calling
blkdev_issue_discard() in order to fix the sync failure.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@lst.de>
2010-08-19 00:11:06 +09:00
Trond Myklebust
0a377cff94 NFS: Fix an Oops in the NFSv4 atomic open code
Adam Lackorzynski reports:

with 2.6.35.2 I'm getting this reproducible Oops:

[  110.825396] BUG: unable to handle kernel NULL pointer dereference at
(null)
[  110.828638] IP: [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[  110.828638] PGD be89f067 PUD bf18f067 PMD 0
[  110.828638] Oops: 0000 [#1] SMP
[  110.828638] last sysfs file: /sys/class/net/lo/operstate
[  110.828638] CPU 2
[  110.828638] Modules linked in: rtc_cmos rtc_core rtc_lib amd64_edac_mod
i2c_amd756 edac_core i2c_core dm_mirror dm_region_hash dm_log dm_snapshot
sg sr_mod usb_storage ohci_hcd mptspi tg3 mptscsih mptbase usbcore nls_base
[last unloaded: scsi_wait_scan]
[  110.828638]
[  110.828638] Pid: 11264, comm: setchecksum Not tainted 2.6.35.2 #1
[  110.828638] RIP: 0010:[<ffffffff811247b7>]  [<ffffffff811247b7>]
encode_attrs+0x1a/0x2a4
[  110.828638] RSP: 0000:ffff88003bf5b878  EFLAGS: 00010296
[  110.828638] RAX: ffff8800bddb48a8 RBX: ffff88003bf5bb18 RCX:
0000000000000000
[  110.828638] RDX: ffff8800be258800 RSI: 0000000000000000 RDI:
ffff88003bf5b9f8
[  110.828638] RBP: 0000000000000000 R08: ffff8800bddb48a8 R09:
0000000000000004
[  110.828638] R10: 0000000000000003 R11: ffff8800be779000 R12:
ffff8800be258800
[  110.828638] R13: ffff88003bf5b9f8 R14: ffff88003bf5bb20 R15:
ffff8800be258800
[  110.828638] FS:  0000000000000000(0000) GS:ffff880041e00000(0063)
knlGS:00000000556bd6b0
[  110.828638] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  110.828638] CR2: 0000000000000000 CR3: 00000000be8ef000 CR4:
00000000000006e0
[  110.828638] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  110.828638] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  110.828638] Process setchecksum (pid: 11264, threadinfo
ffff88003bf5a000, task ffff88003f232210)
[  110.828638] Stack:
[  110.828638]  0000000000000000 ffff8800bfbcf920 0000000000000000
0000000000000ffe
[  110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[  110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[  110.828638] Call Trace:
[  110.828638]  [<ffffffff81124c1f>] ? nfs4_xdr_enc_setattr+0x90/0xb4
[  110.828638]  [<ffffffff81371161>] ? call_transmit+0x1c3/0x24a
[  110.828638]  [<ffffffff813774d9>] ? __rpc_execute+0x78/0x22a
[  110.828638]  [<ffffffff81371a91>] ? rpc_run_task+0x21/0x2b
[  110.828638]  [<ffffffff81371b7e>] ? rpc_call_sync+0x3d/0x5d
[  110.828638]  [<ffffffff8111e284>] ? _nfs4_do_setattr+0x11b/0x147
[  110.828638]  [<ffffffff81109466>] ? nfs_init_locked+0x0/0x32
[  110.828638]  [<ffffffff810ac521>] ? ifind+0x4e/0x90
[  110.828638]  [<ffffffff8111e2fb>] ? nfs4_do_setattr+0x4b/0x6e
[  110.828638]  [<ffffffff8111e634>] ? nfs4_do_open+0x291/0x3a6
[  110.828638]  [<ffffffff8111ed81>] ? nfs4_open_revalidate+0x63/0x14a
[  110.828638]  [<ffffffff811056c4>] ? nfs_open_revalidate+0xd7/0x161
[  110.828638]  [<ffffffff810a2de4>] ? do_lookup+0x1a4/0x201
[  110.828638]  [<ffffffff810a4733>] ? link_path_walk+0x6a/0x9d5
[  110.828638]  [<ffffffff810a42b6>] ? do_last+0x17b/0x58e
[  110.828638]  [<ffffffff810a5fbe>] ? do_filp_open+0x1bd/0x56e
[  110.828638]  [<ffffffff811cd5e0>] ? _atomic_dec_and_lock+0x30/0x48
[  110.828638]  [<ffffffff810a9b1b>] ? dput+0x37/0x152
[  110.828638]  [<ffffffff810ae063>] ? alloc_fd+0x69/0x10a
[  110.828638]  [<ffffffff81099f39>] ? do_sys_open+0x56/0x100
[  110.828638]  [<ffffffff81027a22>] ? ia32_sysret+0x0/0x5
[  110.828638] Code: 83 f1 01 e8 f5 ca ff ff 48 83 c4 50 5b 5d 41 5c c3 41
57 41 56 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 81 ec 18 01 00 00
<8b> 06 89 c2 83 e2 08 83 fa 01 19 db 83 e3 f8 83 c3 18 a8 01 8d
[  110.828638] RIP  [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[  110.828638]  RSP <ffff88003bf5b878>
[  110.828638] CR2: 0000000000000000
[  112.840396] ---[ end trace 95282e83fd77358f ]---

We need to ensure that the O_EXCL flag is turned off if the user doesn't
set O_CREAT.

Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-18 09:25:42 -04:00
Nick Piggin
99b7db7b8f fs: brlock vfsmount_lock
fs: brlock vfsmount_lock

Use a brlock for the vfsmount lock. It must be taken for write whenever
modifying the mount hash or associated fields, and may be taken for read when
performing mount hash lookups.

A new lock is added for the mnt-id allocator, so it doesn't need to take
the heavy vfsmount write-lock.

The number of atomics should remain the same for fastpath rlock cases, though
code would be slightly slower due to per-cpu access. Scalability is not not be
much improved in common cases yet, due to other locks (ie. dcache_lock) getting
in the way. However path lookups crossing mountpoints should be one case where
scalability is improved (currently requiring the global lock).

The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
Altix system (high latency to remote nodes), a simple umount microbenchmark
(mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
took 6.8s, afterwards took 7.1s, about 5% slower.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:48 -04:00
Nick Piggin
6416ccb789 fs: scale files_lock
fs: scale files_lock

Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).

One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.

However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.

A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.

Testing results:

On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.

Booting:    locks=  25049 cpu-hits=  23174 (92.5%) node-hits=  23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64   locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)

So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.

Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.

                throughput
2.6.34-rc2      24.5
+patch          24.9

                us      sys     idle    IO wait (in %)
2.6.34-rc2      51.25   28.25   17.25   3.25
+patch          53.75   18.5    19      8.75

So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.

Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.

Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:48 -04:00
Nick Piggin
d996b62a8d tty: fix fu_list abuse
tty: fix fu_list abuse

tty code abuses fu_list, which causes a bug in remount,ro handling.

If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".

Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.

The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.

[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:47 -04:00
Nick Piggin
ee2ffa0dfd fs: cleanup files_lock locking
fs: cleanup files_lock locking

Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:47 -04:00
Nick Piggin
b04f784e5d fs: remove extra lookup in __lookup_hash
fs: remove extra lookup in __lookup_hash

Optimize lookup for create operations, where no dentry should often be
common-case. In cases where it is not, such as unlink, the added overhead
is much smaller than the removed.

Also, move comments about __d_lookup racyness to the __d_lookup call site.
d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
vein, add kerneldoc comments to __d_lookup and clean up some of the comments:

- We are interested in how the RCU lookup works here, particularly with
  renames. Make that explicit, and point to the document where it is explained
  in more detail.
- RCU is pretty standard now, and macros make implementations pretty mindless.
  If we want to know about RCU barrier details, we look in RCU code.
- Delete some boring legacy comments because we don't care much about how the
  code used to work, more about the interesting parts of how it works now. So
  comments about lazy LRU may be interesting, but would better be done in the
  LRU or refcount management code.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:47 -04:00
Nick Piggin
2a4419b5b2 fs: fs_struct rwlock to spinlock
fs: fs_struct rwlock to spinlock

struct fs_struct.lock is an rwlock with the read-side used to protect root and
pwd members while taking references to them. Taking a reference to a path
typically requires just 2 atomic ops, so the critical section is very small.
Parallel read-side operations would have cacheline contention on the lock, the
dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
real parallelism increase.

Replace it with a spinlock to avoid one or two atomic operations in typical
path lookup fastpath.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:46 -04:00
Nick Piggin
baa0389073 fs: dentry allocation consolidation
fs: dentry allocation consolidation

There are 2 duplicate copies of code in dentry allocation in path lookup.
Consolidate them into a single function.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:45 -04:00
Nick Piggin
2e2e88ea8c fs: fix do_lookup false negative
fs: fix do_lookup false negative

In do_lookup, if we initially find no dentry, we take the directory i_mutex and
re-check the lookup. If we find a dentry there, then we revalidate it if
needed. However if that revalidate asks for the dentry to be invalidated, we
return -ENOENT from do_lookup. What should happen instead is an attempt to
allocate and lookup a new dentry.

This is probably not noticed because it is rare. It is only reached if a
concurrent create races in first (in which case, the dentry probably won't be
invalidated anyway), or if the racy __d_lookup has failed due to a
false-negative (which is very rare).

Fix this by removing code and have it use the normal reval path.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:45 -04:00
Andreas Gruenbacher
3a48ee8a4a mbcache: Limit the maximum number of cache entries
Limit the maximum number of mb_cache entries depending on the number of
hash buckets: if the only limit to the number of cache entries is the
available memory the hash chains can grow very long, taking a long time
to search.

At least partially solves https://bugzilla.lustre.org/show_bug.cgi?id=22771.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 06:24:41 -04:00
Al Viro
3b6036d148 hostfs ->follow_link() braino
we want the assignment to err done inside the if () to be
visible after it, so (re)declaring err inside if () body
is wrong.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 06:21:10 -04:00
Al Viro
850a496f96 hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
... not harmless in this case - we have a string in the end of buffer
already.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 06:18:57 -04:00
Christoph Hellwig
9cb569d601 remove SWRITE* I/O types
These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.

Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers.  Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.

In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 01:09:01 -04:00
Christoph Hellwig
87e99511ea kill BH_Ordered flag
Instead of abusing a buffer_head flag just add a variant of
sync_dirty_buffer which allows passing the exact type of write
flag required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 01:09:00 -04:00
Jan Kara
dad5eb6daa vfs: update ctime when changing the file's permission by setfacl
generic_acl_set didn't update the ctime of the file when its permission was
changed.

Steps to reproduce:
 # touch aaa
 # stat -c %Z aaa
 1275289822
 # setfacl -m  'u::x,g::x,o::x' aaa
 # stat -c %Z aaa
 1275289822                         <- unchanged

But, according to the spec of the ctime, vfs must update it.

Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>.

CC: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 01:04:22 -04:00
Alexander Shishkin
b845ff8f3e cramfs: only unlock new inodes
Commit 77b8a75f5b introduced a warning at fs/inode.c:692 unlock_new_inode(),
caused by unlock_new_inode() being called on existing inodes as well.

This patch changes setup_inode() to only call unlock_new_inode() for I_NEW
inodes.

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 01:01:33 -04:00
Sergey Senozhatsky
f4ae2faa40 fix reiserfs_evict_inode end_writeback second call
reiserfs_evict_inode calls end_writeback two times hitting
kernel BUG at fs/inode.c:298 becase inode->i_state is I_CLEAR already.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 00:58:57 -04:00
Linus Torvalds
351f13d708 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix false warning saying one of two super blocks is broken
  nilfs2: fix list corruption after ifile creation failure
2010-08-17 18:35:39 -07:00
David Howells
d7627467b7 Make do_execve() take a const filename pointer
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:

arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to.  This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel().  A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().

do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.

Further kernel_execve() and sys_execve() need to be changed to match.

This has been test built on x86_64, frv, arm and mips.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-17 18:07:43 -07:00
Trond Myklebust
df486a2590 NFS: Fix the selection of security flavours in Kconfig
Randy Dunlap reports:

ERROR: "svc_gss_principal" [fs/nfs/nfs.ko] undefined!


because in fs/nfs/Kconfig, NFS_V4 selects RPCSEC_GSS_KRB5
and/or in fs/nfsd/Kconfig, NFSD_V4 selects RPCSEC_GSS_KRB5.

RPCSEC_GSS_KRB5 does 5 selects, but none of these is enforced/followed
by the fs/nfs[d]/Kconfig configs:

	select SUNRPC_GSS
	select CRYPTO
	select CRYPTO_MD5
	select CRYPTO_DES
	select CRYPTO_CBC

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-17 17:42:45 -04:00
Jeff Layton
232341ba7f cifs: consolidate error handling in several functions
cifs has a lot of complicated functions that have to clean up things on
error, but some of them don't have all of the cleanup code
well-consolidated. Clean up and consolidate error handling in several
functions.

This is in preparation of later patches that will need to put references
to the tcon link container.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-16 20:34:48 +00:00
Jeff Layton
5d9ac7fd32 cifs: clean up error handling in cifs_mknod
Get rid of some nesting and add a label we can goto on error.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-16 17:00:44 +00:00
Ryusuke Konishi
ea1a16f716 nilfs2: fix false warning saying one of two super blocks is broken
After applying commit b2ac86e1, the following message got appeared
after unclean shutdown:

> NILFS warning: broken superblock. using spare superblock.

This turns out to be a false message due to the change which updates
two super blocks alternately.  The secondary super block now can be
selected if it's newer than the primary one.

This kills the false warning by suppressing it if another super block
is not actually broken.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-08-16 11:08:36 +09:00
Ryusuke Konishi
af4e36318e nilfs2: fix list corruption after ifile creation failure
If nilfs_attach_checkpoint() gets a memory allocation failure during
creation of ifile, it will return without removing nilfs_sb_info
struct from ns_supers list.  When a concurrently mounted snapshot is
unmounted or another new snapshot is mounted after that, this causes
kernel oops as below:

> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<f83662ff>] nilfs_find_sbinfo+0x74/0xa4 [nilfs2]
> *pde = 00000000
> Oops: 0000 [#1] SMP
<snip>
> Call Trace:
>  [<f835dc29>] ? nilfs_get_sb+0x165/0x532 [nilfs2]
>  [<c1173c87>] ? ida_get_new_above+0x16d/0x187
>  [<c109a7f8>] ? alloc_vfsmnt+0x7e/0x10a
>  [<c1070790>] ? kstrdup+0x2c/0x40
>  [<c1089041>] ? vfs_kern_mount+0x96/0x14e
>  [<c108913d>] ? do_kern_mount+0x32/0xbd
>  [<c109b331>] ? do_mount+0x642/0x6a1
>  [<c101a415>] ? do_page_fault+0x0/0x2d1
>  [<c1099c00>] ? copy_mount_options+0x80/0xe2
>  [<c10705d8>] ? strndup_user+0x48/0x67
>  [<c109b3f1>] ? sys_mount+0x61/0x90
>  [<c10027cc>] ? sysenter_do_call+0x12/0x22

This fixes the problem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@kernel.org
2010-08-16 11:08:36 +09:00
Linus Torvalds
d7824370e2 mm: fix up some user-visible effects of the stack guard page
This commit makes the stack guard page somewhat less visible to user
space. It does this by:

 - not showing the guard page in /proc/<pid>/maps

   It looks like lvm-tools will actually read /proc/self/maps to figure
   out where all its mappings are, and effectively do a specialized
   "mlockall()" in user space.  By not showing the guard page as part of
   the mapping (by just adding PAGE_SIZE to the start for grows-up
   pages), lvm-tools ends up not being aware of it.

 - by also teaching the _real_ mlock() functionality not to try to lock
   the guard page.

   That would just expand the mapping down to create a new guard page,
   so there really is no point in trying to lock it in place.

It would perhaps be nice to show the guard page specially in
/proc/<pid>/maps (or at least mark grow-down segments some way), but
let's not open ourselves up to more breakage by user space from programs
that depends on the exact deails of the 'maps' file.

Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools
source code to see what was going on with the whole new warning.

Reported-and-tested-by: François Valenduc <francois.valenduc@tvcablenet.be
Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-15 11:35:52 -07:00
Randy Dunlap
cd956a1c03 fs/dcache: fix function param name in kernel-doc
Fix parameter name in kernel-doc notation (causes a warning).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-14 16:21:00 -07:00
Linus Torvalds
83ae170092 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: clean up compiler warning in start_this_handle()
2010-08-13 17:59:26 -07:00
Linus Torvalds
10041d2d14 Merge branch 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  bkl: Remove locked .ioctl file operation
  v4l: Remove reference to bkl ioctl in compat ioctl handling
  logfs: kill BKL
2010-08-13 17:52:35 -07:00
David Howells
c788732523 Mark arguments to certain syscalls as being const
Mark arguments to certain system calls as being const where they should be but
aren't.  The list includes:

 (*) The filename arguments of various stat syscalls, execve(), various utimes
     syscalls and some mount syscalls.

 (*) The filename arguments of some syscall helpers relating to the above.

 (*) The buffer argument of various write syscalls.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-13 16:53:13 -07:00
Arnd Bergmann
b19dd42faf bkl: Remove locked .ioctl file operation
The last user is gone, so we can safely remove this

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-14 00:24:24 +02:00
Arnd Bergmann
02d6d685fc logfs: kill BKL
logfs does not need the BKL, so use ->unlocked_ioctl instead
of ->ioctl in file operations.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Joern Engel <joern@logfs.org>
[ fixed trivial conflict ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-14 00:24:24 +02:00
Linus Torvalds
2be1f3a73d Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] partitions: fix build error in ibm partition detection code
  [S390] appldata: fix dev_get_stats 64 bit conversion
  [S390] wire up prlimit64 and fanotify* syscalls
  [S390] zcrypt: fix Kconfig dependencies
  [S390] sys_personality: follow u_long to unsigned int conversion
  [S390] dasd: fix format string types
2010-08-13 10:54:04 -07:00
Linus Torvalds
a30bfd6cd4 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  O2net: Disallow o2net accept connection request from itself.
  ocfs2/dlm: remove potential deadlock -V3
  ocfs2/dlm: avoid incorrect bit set in refmap on recovery master
  Fix the nested PR lock calling issue in ACL
  ocfs2: Count more refcount records in file system fragmentation.
  ocfs2 fix o2dlm dlm run purgelist (rev 3)
  ocfs2/dlm: fix a dead lock
  ocfs2: do not overwrite error codes in ocfs2_init_acl
2010-08-13 10:43:50 -07:00
Linus Torvalds
2897c684d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [NFS] Set CONFIG_KEYS when CONFIG_NFS_USE_KERNEL_DNS is set
  AFS: Implement an autocell mount capability [ver #2]
  DNS: If the DNS server returns an error, allow that to be cached [ver #2]
  NFS: Use kernel DNS resolver [ver #2]
  cifs: update README to include details about 'fsc' option
2010-08-13 10:37:30 -07:00
Heiko Carstens
2041f657aa [S390] partitions: fix build error in ibm partition detection code
9c867fbe "partitions: fix sometimes unreadable partition strings" coverted
one line within the ibm partition code incorrectly. Fix this to get rid of
a build error.

fs/partitions/ibm.c: In function 'ibm_partition':
[...]
fs/partitions/ibm.c:185: error: too many arguments to function 'strlcat'

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-13 10:06:55 +02:00
Linus Torvalds
2069601b3f Revert "fsnotify: store struct file not struct path"
This reverts commit 3bcf3860a4 (and the
accompanying commit c1e5c95402 "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).

The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.

Fix up various conflicts due to later fsnotify work.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12 14:23:04 -07:00
Steve French
3f43231230 [NFS] Set CONFIG_KEYS when CONFIG_NFS_USE_KERNEL_DNS is set
Previous patch relied on DNS_RESOLVER setting CONFIG_KEYS
but needs to be selected in NFS config when using the new
DNS resolver

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-12 18:16:45 +00:00
Linus Torvalds
26df0766a7 Merge branch 'params' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'params' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (22 commits)
  param: don't deref arg in __same_type() checks
  param: update drivers/acpi/debug.c to new scheme
  param: use module_param in drivers/message/fusion/mptbase.c
  ide: use module_param_named rather than module_param_call
  param: update drivers/char/ipmi/ipmi_watchdog.c to new scheme
  param: lock if_sdio's lbs_helper_name and lbs_fw_name against sysfs changes.
  param: lock myri10ge_fw_name against sysfs changes.
  param: simple locking for sysfs-writable charp parameters
  param: remove unnecessary writable charp
  param: add kerneldoc to moduleparam.h
  param: locking for kernel parameters
  param: make param sections const.
  param: use free hook for charp (fix leak of charp parameters)
  param: add a free hook to kernel_param_ops.
  param: silence .init.text references from param ops
  Add param ops struct for hvc_iucv driver.
  nfs: update for module_param_named API change
  AppArmor: update for module_param_named API change
  param: use ops in struct kernel_param, rather than get and set fns directly
  param: move the EXPORT_SYMBOL to after the definitions.
  ...
2010-08-12 10:01:59 -07:00
David Howells
12fdff3fc2 Add a dummy printk function for the maintenance of unused printks
Add a dummy printk function for the maintenance of unused printks through gcc
format checking, and also so that side-effect checking is maintained too.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12 09:51:35 -07:00
Jan Kara
81d73a32d7 mm: fix writeback_in_progress()
Commit 83ba7b071f ("writeback: simplify the write back thread queue")
broke writeback_in_progress() as in that commit we started to remove work
items from the list at the moment we start working on them and not at the
moment they are finished.  Thus if the flusher thread was doing some work
but there was no other work queued, writeback_in_progress() returned
false.  This could in particular cause unnecessary queueing of background
writeback from balance_dirty_pages() or writeout work from
writeback_sb_if_idle().

This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.

NOTE: Both callsites of writeback_in_progress() (namely,
writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
need a different information than what writeback_in_progress() provides.
They would need to know whether *the kind of writeback they are going to
submit* is already queued.  But this information isn't that simple to
provide so let's fix writeback_in_progress() for the time being.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12 08:43:30 -07:00
Wu Fengguang
a50aeb4014 writeback: merge for_kupdate and !for_kupdate cases
Unify the logic for kupdate and non-kupdate cases.  There won't be
starvation because the inodes requeued into b_more_io will later be
spliced _after_ the remaining inodes in b_io, hence won't stand in the way
of other inodes in the next run.

It avoids unnecessary redirty_tail() calls, hence the update of
i_dirtied_when.  The timestamp update is undesirable because it could
later delay the inode's periodic writeback, or may exclude the inode from
the data integrity sync operation (which checks timestamp to avoid extra
work and livelock).

===
How the redirty_tail() comes about:

It was a long story..  This redirty_tail() was introduced with
wbc.more_io.  The initial patch for more_io actually does not have the
redirty_tail(), and when it's merged, several 100% iowait bug reports
arised:

reiserfs:
        http://lkml.org/lkml/2007/10/23/93

jfs:
        commit 29a424f283
        JFS: clear PAGECACHE_TAG_DIRTY for no-write pages

ext2:
        http://www.spinics.net/linux/lists/linux-ext4/msg04762.html

They are all old bugs hidden in various filesystems that become "visible"
with the more_io patch.  At the time, the ext2 bug is thought to be
"trivial", so not fixed.  Instead the following updated more_io patch with
redirty_tail() is merged:

	http://www.spinics.net/linux/lists/linux-ext4/msg04507.html

This will in general prevent 100% on ext2 and possibly other unknown FS bugs.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12 08:43:30 -07:00
Wu Fengguang
4ea879b96d writeback: fix queue_io() ordering
This was not a bug, since b_io is empty for kupdate writeback.  The next
patch will do requeue_io() for non-kupdate writeback, so let's fix it.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12 08:43:30 -07:00
Wu Fengguang
23539afc71 writeback: don't redirty tail an inode with dirty pages
Avoid delaying writeback for an expire inode with lots of dirty pages, but
no active dirtier at the moment.  Previously we only do that for the
kupdate case.

Any filesystem that does delayed allocation or unwritten extent conversion
after IO completion will cause this - for example, XFS.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12 08:43:30 -07:00
Wu Fengguang
16c4042f08 writeback: avoid unnecessary calculation of bdi dirty thresholds
Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so
that the latter can be avoided when under global dirty background
threshold (which is the normal state for most systems).

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12 08:43:29 -07:00
wanglei
bec5eb6141 AFS: Implement an autocell mount capability [ver #2]
Implement the ability for the root directory of a mounted AFS filesystem to
accept lookups of arbitrary directory names, to interpet the names as the names
of cells, to look the cell names up in the DNS for AFSDB records and to mount
the root.cell volume of the nominated cell on the pseudo-directory created by
lookup.

This facility is requested by passing:

	-o autocell

to the mountpoint for which this is desired, usually the /afs mount.

To use this facility, a DNS upcall program is required for AFSDB records.  This
can be obtained from:

	http://people.redhat.com/~dhowells/afs/dns.afsdb.c

It should be compiled with -lresolv and -lkeyutils and installed as, say:

	/usr/sbin/dns.afsdb

Then the following line needs to be added to /sbin/request-key.conf:

	create	dns_resolver afsdb:*	*	/usr/sbin/dns.afsdb %k

This can be tested by mounting AFS, say:

	insmod dns_resolver.ko
	insmod af-rxrpc.ko
	insmod kafs.ko rootcell=grand.central.org
	mount -t afs "#grand.central.org:root.cell." /afs -o autocell

and doing:

	ls /afs/grand.central.org/

which should show:

	archive/  cvs/  doc/  local/  project/  service/  software/  user/  www/

if it works.

Signed-off-by: Wang Lei <wang840925@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-11 17:11:29 +00:00
Wang Lei
4a2d789267 DNS: If the DNS server returns an error, allow that to be cached [ver #2]
If the DNS server returns an error, allow that to be cached in the DNS resolver
key in lieu of a value.  Userspace passes the desired error number as an option
in the payload:

	"#dnserror=<number>"

Userspace must map h_errno from the name resolution routines to an appropriate
Linux error before passing it up.  Something like the following mapping is
recommended:

	[HOST_NOT_FOUND]	= ENODATA,
	[TRY_AGAIN]		= EAGAIN,
	[NO_RECOVERY]		= ECONNREFUSED,
	[NO_DATA]		= ENODATA,

in lieu of Linux errors specifically for representing name service errors.  The
filesystem must map these errors appropropriately before passing them to
userspace.  AFS is made to map ENODATA and EAGAIN to EDESTADDRREQ for the
return to userspace; ECONNREFUSED is allowed to stand as is.

The error can be seen in /proc/keys as a negative number after the description
of the key.  Compare, for example, the following key entries:

2f97238c I--Q--     1  53s 3f010000     0     0 dns_resol afsdb:grand.centrall.org: -61
338bfbbe I--Q--     1  59m 3f010000     0     0 dns_resol afsdb:grand.central.org: 37

If the error option is supplied in the payload, the main part of the payload is
discarded.  The key should have an expiry time set by userspace.

Signed-off-by: Wang Lei <wang840925@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-11 17:11:28 +00:00
Bryan Schumaker
c2e8139c9f NFS: Use kernel DNS resolver [ver #2]
Use the kernel DNS resolver to translate hostnames to IP addresses.  Create a
new config option to choose between the legacy DNS resolver and the new
resolver.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-11 17:11:28 +00:00
Suresh Jayaraman
3694b91a59 cifs: update README to include details about 'fsc' option
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-11 17:11:28 +00:00
J. R. Okajima
0702099bd8 NFS: fix the return value of nfs_file_fsync()
By the commit af7fa16 2010-08-03 NFS: Fix up the fsync code
close(2) became returning the non-zero value even if it went well.
nfs_file_fsync() should return 0 when "status" is positive.

Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-11 13:10:16 -04:00
Davidlohr Bueso
5d7ca35a18 nfs: Remove redundant NULL check upon kfree()
Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-11 12:42:15 -04:00
Linus Torvalds
5af568cbd5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  isofs: Fix lseek() to position beyond 4 GB
  vfs: remove unused MNT_STRICTATIME
  vfs: show unreachable paths in getcwd and proc
  vfs: only add " (deleted)" where necessary
  vfs: add prepend_path() helper
  vfs: __d_path: dont prepend the name of the root dentry
  ia64: perfmon: add d_dname method
  vfs: add helpers to get root and pwd
  cachefiles: use path_get instead of lone dget
  fs/sysv/super.c: add support for non-PDP11 v7 filesystems
  V7: Adjust sanity checks for some volumes
  Add v7 alias
  v9fs: fixup for inode_setattr being removed

Manual merge to take Al's version of the fs/sysv/super.c file: it merged
cleanly, but Al had removed an unnecessary header include, so his side
was better.
2010-08-11 09:23:32 -07:00
Linus Torvalds
062e27ec1b Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: fix checkpatch.pl warnings
  Squashfs: fix filename typo
  Squashfs: update Kconfig and documentation for LZO
  Squashfs: fix block size use in LZO decompressor
  Squashfs: Add LZO compression support
  squashfs: fix filename in header comment
  Squashfs: Make XATTR config name consistent with other file systems
  squashfs: fix compiler inline warning
2010-08-11 09:20:13 -07:00
Linus Torvalds
bf25db3654 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: Fix groups code when num_devices is not divisible by group_width
  exofs: Remove useless optimization
  exofs: exofs_file_fsync and exofs_file_flush correctness
  exofs: Remove superfluous dependency on buffer_head and writeback
2010-08-11 09:19:43 -07:00
Linus Torvalds
682c30ed21 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits)
  ceph: generalize mon requests, add pool op support
  ceph: only queue async writeback on cap revocation if there is dirty data
  ceph: do not ignore osd_idle_ttl mount option
  ceph: constify dentry_operations
  ceph: whitespace cleanup
  ceph: add flock/fcntl lock support
  ceph: define on-wire types, constants for file locking support
  ceph: add CEPH_FEATURE_FLOCK to the supported feature bits
  ceph: support v2 reconnect encoding
  ceph: support v2 client_caps encoding
  ceph: move AES iv definition to shared header
  ceph: fix decoding of pool snap info
  ceph: make ->sync_fs not wait if wait==0
  ceph: warn on missing snap realm
  ceph: print useful error message when crush rule not found
  ceph: use %pU to print uuid (fsid)
  ceph: sync header defs with server code
  ceph: clean up header guards
  ceph: strip misleading/obsolete version, feature info
  ceph: specify supported features in super.h
  ...
2010-08-11 09:18:32 -07:00
Lubomir Rintel
ab654bab04 fs/sysv/super.c: add support for non-PDP11 v7 filesystems
This adds byte order autodetection (of PDP-11 and LE filesystems).  No
attempt is made to detect big-endian filesystems -- were there any?
Tested with PDP-11 v7 filesystems and PC-IX maintenance floppy.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:23 -07:00
Lubomir Rintel
0bcaa65a56 fs/sysv: v7: adjust sanity checks for some volumes
Newly mkfs-ed filesystems from Seventh Edition have last modification time
set to zero, but are otherwise perfectly valid.

Also, tighten up other sanity checks to filter out most filesystems with

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:22 -07:00
Lubomir Rintel
a36517e930 fs/sysv: add v7 alias
So that the module gets autoloaded when a v7 filesystem is mounted.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:22 -07:00
Dan Carpenter
bebf8cfaea afs: destroy work queue on init failure
We can clean up the work queue on this error path.  This function is
called from afs_init().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:22 -07:00
Alexey Dobriyan
9c867fbe06 partitions: fix sometimes unreadable partition strings
Fix this garbage happening quite often:

==>	 sda:
	scsi 3:0:0:0: CD-ROM            TOSHIBA
==>	 sda1 sda2 sda3 sda4 <sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
			    ^^^
	Uniform CD-ROM driver Revision: 3.20
	sr 3:0:0:0: Attached scsi CD-ROM sr0
==>	 sda5 sda6 sda7 >

Make "sda: sda1 ..." lines actually lines.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:20 -07:00
Robert P. J. Day
cfbef3cb16 procfs: simplify conditional processing of fs/proc.o.
Since the entire fs/proc directory is conditionally included based on
CONFIG_PROC_FS, it's redundant to check that same variable within that
directory.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:20 -07:00
Nathan Lynch
a2a20c412c signalfd: fill in ssi_int for posix timers and message queues
If signalfd is used to consume a signal generated by a POSIX interval
timer or POSIX message queue, the ssi_int field does not reflect the data
(sigevent->sigev_value) supplied to timer_create(2) or mq_notify(3).  (The
ssi_ptr field, however, is filled in.)

This behavior differs from signalfd's treatment of sigqueue-generated
signals -- see the default case in signalfd_copyinfo.  It also gives
results that differ from the case when a signal is handled conventionally
via a sigaction-registered handler.

So, set signalfd_siginfo->ssi_int in the remaining cases (__SI_TIMER,
__SI_MESGQ) where ssi_ptr is set.

akpm: a non-back-compatible change.  Merge into -stable to minimise the
number of kernels which are in the field and which miss this feature.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:20 -07:00
Chris Wright
b7300b78d1 blkdev: cgroup whitelist permission fix
The cgroup device whitelist code gets confused when trying to grant
permission to a disk partition that is not currently open.  Part of
blkdev_open() includes __blkdev_get() on the whole disk.

Basically, the only ways to reliably allow a cgroup access to a partition
on a block device when using the whitelist are to 1) also give it access
to the whole block device or 2) make sure the partition is already open in
a different context.

The patch avoids the cgroup check for the whole disk case when opening a
partition.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=589662

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Serge E. Hallyn <serue@us.ibm.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:18 -07:00
Changli Gao
b3397ad544 reiserfs: remove unused local `wait'
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:12 -07:00
Dan Carpenter
5fc79d85d2 autofs4: remove unneeded null check in try_to_fill_dentry()
After 97e7449a7a: "autofs4: fix indirect mount pending expire race" we no
longer assumed that "ino" can be null.  The other null checks got removed
but this was one was missed.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:06 -07:00
Changli Gao
a892e2d7dc vfs: use kmalloc() to allocate fdmem if possible
Use kmalloc() to allocate fdmem if possible.

vmalloc() is used as a fallback solution for fdmem allocation.  A new
helper function __free_fdtable() is introduced to reduce the lines of
code.

A potential bug, vfree() a memory allocated by kmalloc(), is fixed.

[akpm@linux-foundation.org: use __GFP_NOWARN, uninline alloc_fdmem() and free_fdmem()]
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:02 -07:00
Dmitry Torokhov
06b1e104b7 vfs: clarify that nonseekable_open() will never fail
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Kacur <jkacur@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:02 -07:00
Wu Fengguang
454eedb890 vfs: O_* bit numbers uniqueness check
The O_* bit numbers are defined in 20+ arch/*, and can silently overlap.
Add a compile time check to ensure the uniqueness as suggested by David
Miller.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Jamie Lokier <jamie@shareable.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:02 -07:00
Tony Battersby
58939473ba vfs: improve comment describing fget_light()
Improve the description of fget_light(), which is currently incorrect
about needing a prior refcnt (judging by the way it is actually used).

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:02 -07:00
Rusty Russell
9bbb9e5a33 param: use ops in struct kernel_param, rather than get and set fns directly
This is more kernel-ish, saves some space, and also allows us to
expand the ops without breaking all the callers who are happy for the
new members to be NULL.

The few places which defined their own param types are changed to the
new scheme (more which crept in recently fixed in following patches).

Since we're touching them anyway, we change get() and set() to take a
const struct kernel_param (which they really are).  This causes some
harmless warnings until we fix them (in following patches).

To reduce churn, module_param_call creates the ops struct so the callers
don't have to change (and casts the functions to reduce warnings).
The modern version which takes an ops struct is called module_param_cb.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Alessandro Rubini <rubini@ipvvis.unipv.it>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
2010-08-11 23:04:13 +09:30
Jan Andres
66a362a2aa isofs: Fix lseek() to position beyond 4 GB
isofs supports files larger than 4 GB by using multi-extent files.
However an lseek() to a position beyond 4 GB in such a file will
fail with EINVAL, because s_maxbytes in the isofs superblock is
initialized to 2^32-1, and generic_file_llseek() checks against
that value.

I therefore suggest increasing the value of s_maxbytes to have
full support for large files in isofs. With multi-extent files, file
size is only limited by the maximum size of the file system (8 TB),
so this seems a reasonable value for s_maxbytes.

Signed-off-by: Jan Andres <jandres@gmx.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:29:47 -04:00
Miklos Szeredi
532490f0a5 vfs: remove unused MNT_STRICTATIME
Commit d0adde574b added MNT_STRICTATIME
but it isn't actually used (MS_STRICTATIME clears MNT_RELATIME and
MNT_NOATIME rather than setting any mount flag).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:29:47 -04:00
Miklos Szeredi
8df9d1a414 vfs: show unreachable paths in getcwd and proc
Prepend "(unreachable)" to path strings if the path is not reachable
from the current root.

Two places updated are
 - the return string from getcwd()
 - and symlinks under /proc/$PID.

Other uses of d_path() are left unchanged (we know that some old
software crashes if /proc/mounts is changed).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:29:47 -04:00
Miklos Szeredi
ffd1f4ed5b vfs: only add " (deleted)" where necessary
__d_path() has 4 callers:

  d_path()
  sys_getcwd()
  seq_path_root()
  tomoyo_realpath_from_path2()

Of these the only one which needs the " (deleted)" ending is d_path().

sys_getcwd() checks for existence before calling __d_path().

seq_path_root() is used to show the mountpoint path in
/proc/PID/mountinfo, which is always a positive.

And tomoyo doesn't want the deleted ending.

Create a helper "path_with_deleted()" as subsequent patches will need
this in multiple places.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:28:21 -04:00
Miklos Szeredi
f2eb6575d5 vfs: add prepend_path() helper
Split off prepend_path() from __d_path().  This new helper takes an
end-of-buffer pointer and buffer-length pointer just like the other
prepend_* functions.  Move the " (deleted)" postfix out to __d_path().

This patch doesn't change any functionality but paves the way for the
following patches.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:28:21 -04:00
Miklos Szeredi
98dc568bc2 vfs: __d_path: dont prepend the name of the root dentry
In the old times pseudo-filesystems set the name of theroot dentry to
some prefix like "pipe:" and the name of the child dentry to "[123]"
and relied on a hack in __d_path() to replace the preceding slash with
the root's name to get "pipe:[123]".

Then the d_dname() dentry operation was introduced which solved the
same problem without having to pre-fill the name in each dentry.

Currently the following pseudo filesystems exist in the kernel:

perfmon
mtd
anon_inode
bdev
pipe
socket

Of these only perfmon, anon_inode, pipe and socket create
sub-dentries, all of which have now been switched to using d_dname().

bdev and mtd only create inodes.

This means that now the hack to overwrite the slash can be removed, so
for unreachable paths (e.g. within a detached mount) the path string
won't be polluted with garbage.  For these cases a subsequent patch
will add a prefix, indicating that the path is unreachable.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:28:21 -04:00
Miklos Szeredi
f7ad3c6be9 vfs: add helpers to get root and pwd
Add three helpers that retrieve a refcounted copy of the root and cwd
from the supplied fs_struct.

 get_fs_root()
 get_fs_pwd()
 get_fs_root_and_pwd()

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:28:20 -04:00
Miklos Szeredi
542ce7a9bc cachefiles: use path_get instead of lone dget
Dentry references should not be acquired without a corresponding
vfsmount ref.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:28:20 -04:00
Lubomir Rintel
6d0b5456e1 fs/sysv/super.c: add support for non-PDP11 v7 filesystems
This adds byte order autodetection (of PDP-11 and LE filesystems).  No
attempt is made to detect big-endian filesystems -- were there any?
Tested with PDP-11 v7 filesystems and PC-IX maintenance floppy.

[akpm@linux-foundation.org: coding-style fixes]
[AV: parser.h inclusion was a rudiment of discarded stuff]
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:24:18 -04:00
Lubomir Rintel
496ee9b8f3 V7: Adjust sanity checks for some volumes
Newly mkfs-ed filesystems from Seventh Edition have last modification
time set to zero, but are otherwise perfectly valid.

Also, tighten up other sanity checks to filter out most filesystems with
different bytesex than we're using.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:18:43 -04:00
Lubomir Rintel
b76212d7f1 Add v7 alias
So that the module gets autoloaded when a v7 filesystem is mounted.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:18:42 -04:00
Stephen Rothwell
8cef9c6735 v9fs: fixup for inode_setattr being removed
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-11 00:08:00 -04:00
Dave Kleikamp
aca0fa34bd jfs: don't allow os2 xattr namespace overlap with others
It's currently possible to bypass xattr namespace access rules by
prefixing valid xattr names with "os2.", since the os2 namespace stores
extended attributes in a legacy format with no prefix.

This patch adds checking to deny access to any valid namespace prefix
following "os2.".

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Reported-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10 15:33:09 -07:00
Linus Torvalds
2f9e825d3e Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
  block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
  xen-blkfront: fix missing out label
  blkdev: fix blkdev_issue_zeroout return value
  block: update request stacking methods to support discards
  block: fix missing export of blk_types.h
  writeback: fix bad _bh spinlock nesting
  drbd: revert "delay probes", feature is being re-implemented differently
  drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
  drbd: Disable delay probes for the upcomming release
  writeback: cleanup bdi_register
  writeback: add new tracepoints
  writeback: remove unnecessary init_timer call
  writeback: optimize periodic bdi thread wakeups
  writeback: prevent unnecessary bdi threads wakeups
  writeback: move bdi threads exiting logic to the forker thread
  writeback: restructure bdi forker loop a little
  writeback: move last_active to bdi
  writeback: do not remove bdi from bdi_list
  writeback: simplify bdi code a little
  writeback: do not lose wake-ups in bdi threads
  ...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
2010-08-10 15:22:42 -07:00
Linus Torvalds
fc385c3132 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (68 commits)
  U6715 16550A serial driver support
  Char: nozomi, set tty->driver_data appropriately
  Char: nozomi, fix tty->count counting
  serial: max3107: Fix gpiolib support
  hsu: call PCI pm hooks in suspend/resume function
  hsu: some code cleanup
  hsu: add a periodic timer to check dma rx channel
  hsu: driver for Medfield High Speed UART device
  mxser: remove unnesesary NULL check
  serial: add support for OX16PCI958 card
  serial: 68328serial.c: remove dead (ALMA_ANS | DRAGONIXVZ | M68EZ328ADS)
  timbuart: use __devinit and __devexit macros for probe and remove
  serial: MMIO32 support for 8250_early.c
  serial: mcf: don't take spinlocks in already protected functions
  serial: general fixes in the serial_rs485 structure
  serial: fix missing bit coverage of ASYNC_FLAGS
  serial: "altera_uart: simplify altera_uart_console_putc()" checkpatch fixes
  serial: crisv10: formatting of pointers in printk()
  vt: Fix warning: statement with no effect due to vt_kern.h
  tty_io: remove casts from void*
  ...
2010-08-10 15:03:42 -07:00
Yehuda Sadeh
e56fa10e92 ceph: generalize mon requests, add pool op support
Generalize the current statfs synchronous requests, and support pool_ops.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-10 14:41:25 -07:00
Patrick J. LoPresti
9b00c64318 nfs: Add "lookupcache" to displayed mount options
Running "cat /proc/mounts" fails to display the "lookupcache" option.
This oversight cost me a bunch of wasted time recently.

The following simple patch fixes it.

CC: stable <stable@kernel.org>
Signed-off-by: Patrick LoPresti <lopresti@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-10 17:28:01 -04:00
Linus Torvalds
7233e39276 Merge branch 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  staging: Pushdown bkl to easycap ioctl handlers
  autofs/autofs4: Move compat_ioctl handling into fs
  v4l: Convert v4l2-dev to unlocked_ioctl
  ia64/perfmon: Convert to unlocked_ioctl
  sunrpc: Remove duplicated #include
  ncpfs: Remove duplicated #include
2010-08-10 13:58:28 -07:00
hyc@symas.com
26df6d1340 tty: Add EXTPROC support for LINEMODE
This patch is against the 2.6.34 source.

Paraphrased from the 1989 BSD patch by David Borman @ cray.com:

     These are the changes needed for the kernel to support
     LINEMODE in the server.

     There is a new bit in the termios local flag word, EXTPROC.
     When this bit is set, several aspects of the terminal driver
     are disabled.  Input line editing, character echo, and mapping
     of signals are all disabled.  This allows the telnetd to turn
     off these functions when in linemode, but still keep track of
     what state the user wants the terminal to be in.

     New ioctl:
         TIOCSIG         Generate a signal to processes in the
                         current process group of the pty.

     There is a new mode for packet driver, the TIOCPKT_IOCTL bit.
     When packet mode is turned on in the pty, and the EXTPROC bit
     is set, then whenever the state of the pty is changed, the
     next read on the master side of the pty will have the TIOCPKT_IOCTL
     bit set.  This allows the process on the server side of the pty
     to know when the state of the terminal has changed; it can then
     issue the appropriate ioctl to retrieve the new state.

Since the original BSD patches accompanied the source code for telnet
I've left that reference here, but obviously the feature is useful for
any remote terminal protocol, including ssh.

The corresponding feature has existed in the BSD tty driver since 1989.
For historical reference, a good copy of the relevant files can be found
here:

http://anonsvn.mit.edu/viewvc/krb5/trunk/src/appl/telnet/?pathrev=17741

Signed-off-by: Howard Chu <hyc@symas.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-10 13:47:39 -07:00
Linus Torvalds
26b55633a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  ecryptfs: dont call lookup_one_len to avoid NULL nameidata
  fs/ecryptfs/file.c: introduce missing free
  ecryptfs: release reference to lower mount if interpose fails
  eCryptfs: Handle ioctl calls with unlocked and compat functions
  ecryptfs: Fix warning in ecryptfs_process_response()
2010-08-10 12:14:39 -07:00
Linus Torvalds
e8a89cebdb Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (79 commits)
  mtd: Remove obsolete <mtd/compatmac.h> include
  mtd: Update copyright notices
  jffs2: Update copyright notices
  mtd-physmap: add support users can assign the probe type in board files
  mtd: remove redwood map driver
  mxc_nand: Add v3 (i.MX51) Support
  mxc_nand: support 8bit ecc
  mxc_nand: fix correct_data function
  mxc_nand: add V1_V2 namespace to registers
  mxc_nand: factor out a check_int function
  mxc_nand: make some internally used functions overwriteable
  mxc_nand: rework get_dev_status
  mxc_nand: remove 0xe00 offset from registers
  mtd: denali: Add multi connected NAND support
  mtd: denali: Remove set_ecc_config function
  mtd: denali: Remove unuseful code in get_xx_nand_para functions
  mtd: denali: Remove device_info_tag structure
  mtd: m25p80: add support for the Winbond W25Q32 SPI flash chip
  mtd: m25p80: add support for the Intel/Numonyx {16,32,64}0S33B SPI flash chips
  mtd: m25p80: add support for the EON EN25P{32, 64} SPI flash chips
  ...

Fix up trivial conflicts in drivers/mtd/maps/{Kconfig,redwood.c} due to
redwood driver removal.
2010-08-10 11:49:21 -07:00