Commit Graph

695 Commits

Author SHA1 Message Date
Linus Torvalds 35fd3dc58d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes form Sage Weil:
 "There are two fixes in the messenger code, one that can trigger a NULL
  dereference, and one that error in refcounting (extra put).  There is
  also a trivial fix that in the fs client code that is triggered by NFS
  reexport."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix dentry reference leak in encode_fh()
  libceph: avoid NULL kref_put when osd reset races with alloc_msg
  rbd: reset BACKOFF if unable to re-queue
2012-10-29 08:49:25 -07:00
David Zafman 52eb5a900a ceph: fix dentry reference leak in encode_fh()
Call to d_find_alias() needs a corresponding dput()

This fixes http://tracker.newdream.net/issues/3271

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-29 08:17:10 -07:00
Hugh Dickins 35c2a7f490 tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking
Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
	u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():

BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
 [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0
 [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0
 [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10
 [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6

Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.

But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Sage Weil <sage@inktank.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:55 -04:00
Konstantin Khlebnikov 0b173bc4da mm: kill vma flag VM_CAN_NONLINEAR
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:17 +09:00
Linus Torvalds 7035cdf36d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "The bulk of this pull is a series from Alex that refactors and cleans
  up the RBD code to lay the groundwork for supporting the new image
  format and evolving feature set.  There are also some cleanups in
  libceph, and for ceph there's fixed validation of file striping
  layouts and a bugfix in the code handling a shrinking MDS cluster."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
  ceph: avoid 32-bit page index overflow
  ceph: return EIO on invalid layout on GET_DATALOC ioctl
  rbd: BUG on invalid layout
  ceph: propagate layout error on osd request creation
  libceph: check for invalid mapping
  ceph: convert to use le32_add_cpu()
  ceph: Fix oops when handling mdsmap that decreases max_mds
  rbd: update remaining header fields for v2
  rbd: get snapshot name for a v2 image
  rbd: get the snapshot context for a v2 image
  rbd: get image features for a v2 image
  rbd: get the object prefix for a v2 rbd image
  rbd: add code to get the size of a v2 rbd image
  rbd: lay out header probe infrastructure
  rbd: encapsulate code that gets snapshot info
  rbd: add an rbd features field
  rbd: don't use index in __rbd_add_snap_dev()
  rbd: kill create_snap sysfs entry
  rbd: define rbd_dev_image_id()
  rbd: define some new format constants
  ...
2012-10-08 06:38:18 +09:00
Alex Elder 6285bc2312 ceph: avoid 32-bit page index overflow
A pgoff_t is defined (by default) to have type (unsigned long).  On
architectures such as i686 that's a 32-bit type.  The ceph address
space code was attempting to produce 64 bit offsets by shifting a
page's index by PAGE_CACHE_SHIFT, but the result was not what was
desired because the shift occurred before the result got promoted
to 64 bits.

Fix this by converting all uses of page->index used in this way to
use the page_offset() macro, which ensures the 64-bit result has the
intended value.

This fixes http://tracker.newdream.net/issues/3112

Reported-by:  Mohamed Pakkeer <pakkeer.mohideen@realimage.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-03 10:51:18 -05:00
Sage Weil 457712a0bc ceph: return EIO on invalid layout on GET_DATALOC ioctl
If the user calls GET_DATALOC on a file with an invalid (e.g.,
zeroed) layout, return EIO to userland.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-10-03 10:51:17 -05:00
Kirill A. Shutemov 8c0a853770 fs: push rcu_barrier() from deactivate_locked_super() to filesystems
There's no reason to call rcu_barrier() on every
deactivate_locked_super().  We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths.  E.g.  on my machine exit_group() of a last process in IPC
namespace takes 0.07538s.  rcu_barrier() takes 0.05188s of that time.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02 21:35:55 -04:00
Sage Weil 6816282dab ceph: propagate layout error on osd request creation
If we are creating an osd request and get an invalid layout, return
an EINVAL to the caller.  We switch up the return to have an error
code instead of NULL implying -ENOMEM.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-10-01 17:20:00 -05:00
Wei Yongjun b905a7f8b7 ceph: convert to use le32_add_cpu()
Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-01 14:30:54 -05:00
Yan, Zheng 3e8f43a089 ceph: Fix oops when handling mdsmap that decreases max_mds
When i >= newmap->m_max_mds, ceph_mdsmap_get_addr(newmap, i) return
NULL. Passing NULL to memcmp() triggers oops.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-01 14:30:54 -05:00
Alex Elder c98f533c94 ceph: let path portion of mount "device" be optional
A recent change to /sbin/mountall causes any trailing '/' character
in the "device" (or fs_spec) field in /etc/fstab to be stripped.  As
a result, an entry for a ceph mount that intends to mount the root
of the name space ends up with now path portion, and the ceph mount
option processing code rejects this.

That is, an entry in /etc/fstab like:
    cephserver:port:/ /mnt ceph defaults 0 0
provides to the ceph code just "cephserver:port:" as the "device,"
and that gets rejected.

Although this is a bug in /sbin/mountall, we can have the ceph mount
code support an empty/nonexistent path, interpreting it to mean the
root of the name space.

RFC 5952 offers recommendations for how to express IPv6 addresses,
and recommends the usage found in RFC 3986 (which specifies the
format for URI's) for representing both IPv4 and IPv6 addresses that
include port numbers.  (See in particular the definition of
"authority" found in the Appendix of RFC 3986.)

According to those standards, no host specification will ever
contain a '/' character.  As a result, it is sufficient to scan a
provided "device" from an /etc/fstab entry for the first '/'
character, and if it's found, treat that as the beginning of the
path.  If no '/' character is present, we can treat the entire
string as the monitor host specification(s), and assume the path
to be the root of the name space.  We'll still require a ':' to
separate the host portion from the (possibly empty) path portion.

This means that we can more formally define how ceph will interpret
the "device" it's provided when processing a mount request:

    "device" will look like:
        <server_spec>[,<server_spec>...]:[<path>]
    where
        <server_spec> is <ip>[:<port>]
        <path> is optional, but if present must begin with '/'

This addresses http://tracker.newdream.net/issues/2919

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-10-01 14:30:49 -05:00
Al Viro 2744c171db ceph: don't abuse d_delete() on failure exits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26 22:20:20 -04:00
Sage Weil 45f2e081f5 ceph: avoid divide by zero in __validate_layout()
If "l->stripe_unit" is zero the the mod on the next line will cause a
divide by zero bug.  This comes from the copy_from_user() in
ceph_ioctl_set_layout_policy().  Passing 0 is valid, though (it means
"do not change") so avoid the % check in that case.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-21 15:55:28 -07:00
Sage Weil 6c5e50fa61 ceph: tolerate (and warn on) extraneous dentry from mds
If the MDS gives us a dentry and we weren't prepared to handle it,
WARN_ON_ONCE instead of crashing.

Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-21 15:55:25 -07:00
Sage Weil d1c338a509 libceph: delay debugfs initialization until we learn global_id
The debugfs directory includes the cluster fsid and our unique global_id.
We need to delay the initialization of the debug entry until we have
learned both the fsid and our global_id from the monitor or else the
second client can't create its debugfs entry and will fail (and multiple
client instances aren't properly reflected in debugfs).

Reported by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-20 10:03:15 -07:00
Sage Weil 5ef50c3bec ceph: simplify+fix atomic_open
The initial ->atomic_open op was carried over from the old intent code,
which was incomplete and didn't really work.  Replace it with a fresh
method.  In particular:

 * always attempt to do an atomic open+lookup, both for the create case
   and for lookups of existing files.
 * fix symlink handling by returning 1 to the VFS so that we can follow
   the link to its destination. This fixes a longstanding ceph bug (#2392).

Signed-off-by: Sage Weil <sage@inktank.com>
2012-08-02 09:11:19 -07:00
Linus Torvalds a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00
Linus Torvalds cc8362b1f6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph changes from Sage Weil:
 "Lots of stuff this time around:

   - lots of cleanup and refactoring in the libceph messenger code, and
     many hard to hit races and bugs closed as a result.
   - lots of cleanup and refactoring in the rbd code from Alex Elder,
     mostly in preparation for the layering functionality that will be
     coming in 3.7.
   - some misc rbd cleanups from Josh Durgin that are finally going
     upstream
   - support for CRUSH tunables (used by newer clusters to improve the
     data placement)
   - some cleanup in our use of d_parent that Al brought up a while back
   - a random collection of fixes across the tree

  There is another patch coming that fixes up our ->atomic_open()
  behavior, but I'm going to hammer on it a bit more before sending it."

Fix up conflicts due to commits that were already committed earlier in
drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits)
  rbd: create rbd_refresh_helper()
  rbd: return obj version in __rbd_refresh_header()
  rbd: fixes in rbd_header_from_disk()
  rbd: always pass ops array to rbd_req_sync_op()
  rbd: pass null version pointer in add_snap()
  rbd: make rbd_create_rw_ops() return a pointer
  rbd: have __rbd_add_snap_dev() return a pointer
  libceph: recheck con state after allocating incoming message
  libceph: change ceph_con_in_msg_alloc convention to be less weird
  libceph: avoid dropping con mutex before fault
  libceph: verify state after retaking con lock after dispatch
  libceph: revoke mon_client messages on session restart
  libceph: fix handling of immediate socket connect failure
  ceph: update MAINTAINERS file
  libceph: be less chatty about stray replies
  libceph: clear all flags on con_close
  libceph: clean up con flags
  libceph: replace connection state bits with states
  libceph: drop unnecessary CLOSED check in socket state change callback
  libceph: close socket directly from ceph_con_close()
  ...
2012-07-31 14:35:28 -07:00
Alex Elder aa711ee340 ceph: define snap counts as u32 everywhere
There are two structures in which a count of snapshots are
maintained:

    struct ceph_snap_context {
	...
        u32 num_snaps;
	...
    }
and
    struct ceph_snap_realm {
	...
        u32 num_prior_parent_snaps;   /*  had prior to parent_since */
	...
        u32 num_snaps;
	...
    }

These fields never take on negative values (e.g., to hold special
meaning), and so are really inherently unsigned.  Furthermore they
take their value from over-the-wire or on-disk formatted 32-bit
values.

So change their definition to have type u32, and change some spots
elsewhere in the code to account for this change.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-30 18:15:47 -07:00
Alan Cox 21ec6ffa46 ceph: fix potential double free
We re-run the loop but we don't re-set the attrs pointer back to NULL.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:15:35 -07:00
Sage Weil a53aab645c ceph: close old con before reopening on mds reconnect
When we detect a mds session reset, close the old ceph_connection before
reopening it.  This ensures we clean up the old socket properly and keep
the ceph_connection state correct.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-30 18:15:32 -07:00
Sage Weil 1fe60e51a3 libceph: move feature bits to separate header
This is simply cleanup that will keep things more closely synced with the
userland code.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-07-30 16:23:22 -07:00
Jan Kara 3ca9c3bd8a ceph: Push file_update_time() into ceph_page_mkwrite()
CC: Sage Weil <sage@newdream.net>
CC: ceph-devel@vger.kernel.org
Acked-by: Sage Weil <sage@newdream.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:45 +04:00
Sage Weil 8842b3be96 ceph: clean up useless d_parent checks
d_parent is never NULL, and IS_ROOT() is the proper way to check for a
(non-self-referential) parent.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 09:29:54 -07:00
David Howells 9249e17fe0 VFS: Pass mount flags to sget()
Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called.  They could also be passed to the
compare function.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:38:34 +04:00
Al Viro ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro 00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Al Viro 0b728e1911 stop passing nameidata * to ->d_revalidate()
Just the lookup flags.  Die, bastard, die...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:14 +04:00
Al Viro e45198a6ac make finish_no_open() return int
namely, 1 ;-)  That's what we want to return from ->atomic_open()
instances after finish_no_open().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:45 +04:00
Al Viro 30d9049474 kill struct opendata
Just pass struct file *.  Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int.  Next: saner prototypes for parts in
namei.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:39 +04:00
Al Viro d95852777b make ->atomic_open() return int
Change of calling conventions:
old		new
NULL		1
file		0
ERR_PTR(-ve)	-ve

Caller *knows* that struct file *; no need to return it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:35 +04:00
Al Viro 47237687d7 ->atomic_open() prototype change - pass int * instead of bool *
... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:31 +04:00
Miklos Szeredi 2d83bde9a1 ceph: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:16 +04:00
Miklos Szeredi 3819219b59 ceph: remove unused arg from ceph_lookup_open()
What was the purpose of this?

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:16 +04:00
Sage Weil b7a9e5dd40 libceph: set peer name on con_open, not init
The peer name may change on each open attempt, even when the connection is
reused.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-05 21:14:35 -07:00
Yan, Zheng 61600ef848 ceph: check PG_Private flag before accessing page->private
I got lots of NULL pointer dereference Oops when compiling kernel on ceph.
The bug is because the kernel page migration routine replaces some pages
in the page cache with new pages, these new pages' private can be non-zero.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 28c0254ede)
2012-06-20 07:43:48 -05:00
Sage Weil 9a64e8e0ac Linux 3.5-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPyr4LAAoJEHm+PkMAQRiGhvMH/1uXaDmJiiyAtMhC9kQbLclK
 5RpUOV+ukRrPXBJhwWGEZvC9G/DiWAfZ/19Ee6qTGZbA46yxkgZklqO+bw7fuOLH
 dPf4MNXdhgOgbs0KkVAk6aXIYzIU836pcYg+LcapG8E8SZp3SWbJzrVbUPFwPM+m
 Sv11ZcpJfM2HH9wFRdKErUOiZHsMY+LZHcw0nx+BObytjgzBbzHNkpF57F714TUO
 QplYpIToO3XtGhIM1yRDxww+2zFlVNsCZ8IC57EDbLb8BMZWuyZoFgWZqLAnrU0u
 vy7CHLledMSvs855juJ9JxGo/EDnfwJpCnjmcp8BY+h4b5T/k5mGK6d9aeXYRf4=
 =CcWn
 -----END PGP SIGNATURE-----

Merge tag 'v3.5-rc1'

Linux 3.5-rc1

Conflicts:
	net/ceph/messenger.c
2012-06-15 12:32:04 -07:00
Alex Elder 1bfd89f4e6 libceph: fully initialize connection in con_init()
Move the initialization of a ceph connection's private pointer,
operations vector pointer, and peer name information into
ceph_con_init().  Rearrange the arguments so the connection pointer
is first.  Hide the byte-swapping of the peer entity number inside
ceph_con_init()

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-06-06 09:23:54 -05:00
Linus Torvalds 1193755ac6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs changes from Al Viro.
 "A lot of misc stuff.  The obvious groups:
   * Miklos' atomic_open series; kills the damn abuse of
     ->d_revalidate() by NFS, which was the major stumbling block for
     all work in that area.
   * ripping security_file_mmap() and dealing with deadlocks in the
     area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
     general.
   * ->encode_fh() switched to saner API; insane fake dentry in
     mm/cleancache.c gone.
   * assorted annotations in fs (endianness, __user)
   * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
   * ->update_time() work from Josef.
   * other bits and pieces all over the place.

  Normally it would've been in two or three pull requests, but
  signal.git stuff had eaten a lot of time during this cycle ;-/"

Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
  nfs: don't open in ->d_revalidate
  vfs: retry last component if opening stale dentry
  vfs: nameidata_to_filp(): don't throw away file on error
  vfs: nameidata_to_filp(): inline __dentry_open()
  vfs: do_dentry_open(): don't put filp
  vfs: split __dentry_open()
  vfs: do_last() common post lookup
  vfs: do_last(): add audit_inode before open
  vfs: do_last(): only return EISDIR for O_CREAT
  vfs: do_last(): check LOOKUP_DIRECTORY
  vfs: do_last(): make ENOENT exit RCU safe
  vfs: make follow_link check RCU safe
  vfs: do_last(): use inode variable
  vfs: do_last(): inline walk_component()
  vfs: do_last(): make exit RCU safe
  vfs: split do_lookup()
  Btrfs: move over to use ->update_time
  fs: introduce inode operation ->update_time
  reiserfs: get rid of resierfs_sync_super
  reiserfs: mark the superblock as dirty a bit later
  ...
2012-06-01 10:34:35 -07:00
Alex Elder 15d9882c33 libceph: embed ceph messenger structure in ceph_client
A ceph client has a pointer to a ceph messenger structure in it.
There is always exactly one ceph messenger for a ceph client, so
there is no need to allocate it separate from the ceph client
structure.

Switch the ceph_client structure to embed its ceph_messenger
structure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-06-01 08:37:56 -05:00
Yan, Zheng 28c0254ede ceph: check PG_Private flag before accessing page->private
I got lots of NULL pointer dereference Oops when compiling kernel on ceph.
The bug is because the kernel page migration routine replaces some pages
in the page cache with new pages, these new pages' private can be non-zero.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-06-01 08:06:29 -05:00
Linus Torvalds 08615d7d85 Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc patches from Andrew Morton:

 - the "misc" tree - stuff from all over the map

 - checkpatch updates

 - fatfs

 - kmod changes

 - procfs

 - cpumask

 - UML

 - kexec

 - mqueue

 - rapidio

 - pidns

 - some checkpoint-restore feature work.  Reluctantly.  Most of it
   delayed a release.  I'm still rather worried that we don't have a
   clear roadmap to completion for this work.

* emailed from Andrew Morton <akpm@linux-foundation.org>: (78 patches)
  kconfig: update compression algorithm info
  c/r: prctl: add ability to set new mm_struct::exe_file
  c/r: prctl: extend PR_SET_MM to set up more mm_struct entries
  c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
  syscalls, x86: add __NR_kcmp syscall
  fs, proc: introduce /proc/<pid>/task/<tid>/children entry
  sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE
  aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector()
  eventfd: change int to __u64 in eventfd_signal()
  fs/nls: add Apple NLS
  pidns: make killed children autoreap
  pidns: use task_active_pid_ns in do_notify_parent
  rapidio/tsi721: add DMA engine support
  rapidio: add DMA engine support for RIO data transfers
  ipc/mqueue: add rbtree node caching support
  tools/selftests: add mq_perf_tests
  ipc/mqueue: strengthen checks on mqueue creation
  ipc/mqueue: correct mq_attr_ok test
  ipc/mqueue: improve performance of send/recv
  selftests: add mq_open_tests
  ...
2012-05-31 18:10:18 -07:00
Xi Wang a3860c1c5d introduce SIZE_MAX
ULONG_MAX is often used to check for integer overflow when calculating
allocation size.  While ULONG_MAX happens to work on most systems, there
is no guarantee that `size_t' must be the same size as `long'.

This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
portability and readability for allocation size validation.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Alex Elder <elder@dreamhost.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:26 -07:00
Linus Torvalds af56e0aa35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "There are some updates and cleanups to the CRUSH placement code, a bug
  fix with incremental maps, several cleanups and fixes from Josh Durgin
  in the RBD block device code, a series of cleanups and bug fixes from
  Alex Elder in the messenger code, and some miscellaneous bounds
  checking and gfp cleanups/fixes."

Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the
networking people preferring "unsigned int" over just "unsigned".

* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits)
  libceph: fix pg_temp updates
  libceph: avoid unregistering osd request when not registered
  ceph: add auth buf in prepare_write_connect()
  ceph: rename prepare_connect_authorizer()
  ceph: return pointer from prepare_connect_authorizer()
  ceph: use info returned by get_authorizer
  ceph: have get_authorizer methods return pointers
  ceph: ensure auth ops are defined before use
  ceph: messenger: reduce args to create_authorizer
  ceph: define ceph_auth_handshake type
  ceph: messenger: check return from get_authorizer
  ceph: messenger: rework prepare_connect_authorizer()
  ceph: messenger: check prepare_write_connect() result
  ceph: don't set WRITE_PENDING too early
  ceph: drop msgr argument from prepare_write_connect()
  ceph: messenger: send banner in process_connect()
  ceph: messenger: reset connection kvec caller
  libceph: don't reset kvec in prepare_write_banner()
  ceph: ignore preferred_osd field
  ceph: fully initialize new layout
  ...
2012-05-30 11:17:19 -07:00
Sage Weil c862868bb4 ceph: move encode_fh to new API
Use parent_inode has a flag for whether nfsd wants a connectable fh, but
generate one opportunistically so that we can take advantage of the
additional info in there.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:33 -04:00
Al Viro b0b0382bb4 ->encode_fh() API change
pass inode + parent's inode or NULL instead of dentry + bool saying
whether we want the parent or not.

NOTE: that needs ceph fix folded in.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:33 -04:00
Alex Elder 8f43fb5389 ceph: use info returned by get_authorizer
Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-05-17 08:18:13 -05:00
Alex Elder a3530df33e ceph: have get_authorizer methods return pointers
Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value.  This is to pave the way for making use of the
returned value in an upcoming patch.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-05-17 08:18:13 -05:00
Alex Elder a255651d4c ceph: ensure auth ops are defined before use
In the create_authorizer method for both the mds and osd clients,
the auth_client->ops pointer is blindly dereferenced.  There is no
obvious guarantee that this pointer has been assigned.  And
furthermore, even if the ops pointer is non-null there is definitely
no guarantee that the create_authorizer or destroy_authorizer
methods are defined.

Add checks in both routines to make sure they are defined (non-null)
before use.  Add similar checks in a few other spots in these files
while we're at it.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-05-17 08:18:13 -05:00