linux

Commit Graph

Author	SHA1	Message	Date
Vlad Apostolov	c319b58b13	[XFS] Make xfs_bulkstat() to report unlinked but referenced inodes We need xfs_bulkstat() to report inode stat for inodes with link count zero but reference count non zero. The fix here: http://oss.sgi.com/archives/xfs/2007-09/msg00266.html changed this behavior and made xfs_bulkstat() to filter all unlinked inodes including those that are not destroyed yet but held by reference. The attached patch returns back to the original behavior by marking the on-disk inode buffer "dirty" when di_mode is cleared (at that time both inode link and reference counter are zero). SGI-PV: 972004 SGI-Modid: xfs-linux-melb:xfs-kern:29914a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:13:37 +11:00
Lachlan McIlroy	98ce2b5b1b	[XFS] 971186 Undo mod xfs-linux-melb:xfs-kern:29845a due to a regression SGI-PV: 971596 SGI-Modid: xfs-linux-melb:xfs-kern:29902a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:13:27 +11:00
Eric Sandeen	bc58f9bb6b	[XFS] fix 32-bit compat ioctls for GETXFLAGS, SETXFLAGS, GETVERSION XFS_IOC_GETVERSION, XFS_IOC_GETXFLAGS and XFS_IOC_SETXFLAGS all take a "long" which changes size between 32 and 64 bit platforms. So, the ioctl cmds that come in from a 32-bit app aren't as expected, for example on GETXFLAGS, unknown cmd fd(3) cmd(80046601){t:'f';sz:4} due to the size mismatch. So, use instead the 32-bit version of the commands for compat ioctls, and other than that it doesn't take any more manipulation. Also, for both native and compat versions, just define them to the values as defined in fs.h SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29849a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:13:17 +11:00
Eric Sandeen	d4f3cc016f	[XFS] lose xfs_hex_dump in favor of print_hex_dump No need for xfs to have its own hex dumping routine now that the kernel has one. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29847a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:13:05 +11:00
Christoph Hellwig	91906a882a	[XFS] kill XFS_INOBT_IS_FREE_DISK This macro is unused an all other acros in this family operate on native types, so we most likely won't grow a user either. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29846a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:12:41 +11:00
Christoph Hellwig	c40ea74101	[XFS] kill superflous buffer locking There is no need to lock any page in xfs_buf.c because we operate on our own address_space and all locking is covered by the buffer semaphore. If we ever switch back to main blockdeive address_space as suggested e.g. for fsblock with a similar scheme the locking will have to be totally revised anyway because the current scheme is neither correct nor coherent with itself. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29845a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:12:07 +11:00
Eric Sandeen	0771fb4515	[XFS] Refactor xfs_mountfs Refactoring xfs_mountfs() to call sub-functions for logical chunks can help save a bit of stack, and can make it easier to read this long function. The mount path is one of the longest common callchains, easily getting to within a few bytes of the end of a 4k stack when over lvm, quotas are enabled, and quotacheck must be done. With this change on top of the other stack-related changes I've sent, I can get xfs to survive a normal xfsqa run on 4k stacks over lvm. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29834a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:11:56 +11:00
Christoph Hellwig	b53e675dc8	[XFS] xlog_rec_header/xlog_rec_ext_header endianess annotations Mostly trivial conversion with one exceptions: h_num_logops was kept in native endian previously and only converted to big endian in xlog_sync, but we always keep it big endian now. With todays cpus fast byteswap instructions that's not an issue but the new variant keeps the code clean and maintainable. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29821a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:11:47 +11:00
Christoph Hellwig	67fcb7bfb6	[XFS] clean up some xfs_log_priv.h macros - the various assign lsn macros are replaced by a single inline, xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except for a more sane calling convention. ASSIGN_LSN_DISK is replaced by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same, except we pass the cycle and block arguments explicitly instead of a log paramter. The latter two variants only had 2, respectively one user anyway. - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the same calling conventions. - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away the unused arch argument. Instead of conditional defintions depending on host endianess we now do an unconditional swap and shift then, which generates equal code. - the unused XLOG_SET macro is removed. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29820a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:11:38 +11:00
Christoph Hellwig	03bea6fe6c	[XFS] clean up some xfs_log_priv.h macros - the various assign lsn macros are replaced by a single inline, xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except for a more sane calling convention. ASSIGN_LSN_DISK is replaced by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same, except we pass the cycle and block arguments explicitly instead of a log paramter. The latter two variants only had 2, respectively one user anyway. - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the same calling conventions. - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away the unused arch argument. Instead of conditional defintions depending on host endianess we now do an unconditional swap and shift then, which generates equal code. - the unused XLOG_SET macro is removed. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29819a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:10:31 +11:00
Christoph Hellwig	9909c4aa1a	[XFS] kill xfs_freeze. No need to have a wrapper just two call two more functions. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29816a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 18:09:56 +11:00
Christoph Hellwig	10090be25c	[XFS] cleanup vnode useage in xfs_iget.c Get rid of vnode useage in xfs_iget.c and pass Linux inode / xfs_inode where apropinquate. And kill some useless helpers while we're at it. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29808a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:55:46 +11:00
Christoph Hellwig	6e7f75eafb	[XFS] cleanup vnode useage in xfs_ioctl.c xfs_ioctl.c passes around vnode pointers quite a lot, but all places already have the Linux inode which is identical to the vnode these days. Clean the code up to always use the Linux inode. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29807a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:55:35 +11:00
Christoph Hellwig	4ca488eb45	[XFS] Kill off xfs_statvfs. We were already filling the Linux struct statfs anyway, and doing this trivial task directly in xfs_fs_statfs makes the code quite a bit cleaner. While I was at it I also moved copying attributes that don't change over the lifetime of the filesystem outside the superblock lock. xfs_fs_fill_super used to get the magic number and blocksize through xfs_statvfs, but assigning them directly is a lot cleaner and will save some stack space during mount. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29802a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:53:27 +11:00
Christoph Hellwig	c43f408795	[XFS] simplify xfs_vn_getattr Just fill in struct kstat directly from the xfs_inode instead of doing a detour through a bhv_vattr_t and xfs_getattr. SGI-PV: 970980 SGI-Modid: xfs-linux-melb:xfs-kern:29770a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:49:06 +11:00
Christoph Hellwig	613d70436c	[XFS] kill xfs_iocore_t xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it just duplicates fields already in xfs_inode, and there is nothing this abstraction buys us on XFS/Linux. This patch removes it and shrinks source and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44 bytes in debug/non-debug builds. SGI-PV: 970852 SGI-Modid: xfs-linux-melb:xfs-kern:29754a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:48:58 +11:00
Eric Sandeen	007c61c686	[XFS] Remove spin.h remove spinlock init abstraction macro in spin.h, remove the callers, and remove the file. Move no-op spinlock_destroy to xfs_linux.h Cleanup spinlock locals in xfs_mount.c SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29751a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:45 +11:00
Eric Sandeen	36e41eebda	[XFS] Cleanup lock goop. Switch last couple lock_t's to spinlock_t's. Remove now-unused spinlock-related macros & types. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29748a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:35 +11:00
Eric Sandeen	3a0e487034	[XFS] ktrace kt_lock is unused, remove it. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29747a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:25 +11:00
Eric Sandeen	3685c2a1d7	[XFS] Unwrap XFS_SB_LOCK. Un-obfuscate XFS_SB_LOCK, remove XFS_SB_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29746a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:15 +11:00
Eric Sandeen	ba74d0cba5	[XFS] Unwrap mru_lock. Un-obfuscate mru_lock, remove mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29745a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:01 +11:00
Eric Sandeen	703e1f0fd2	[XFS] Unwrap xfs_dabuf_global_lock Un-obfuscate dabuf_global_lock, remove mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29744a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:46:48 +11:00
Eric Sandeen	64137e56d7	[XFS] Unwrap pagb_lock. Un-obfuscate pagb_lock, remove mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29743a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:46:39 +11:00
Eric Sandeen	869b906078	[XFS] Unwrap XFS_DQ_PINUNLOCK. Un-obfuscate DQ_PINLOCK, remove DQ_PINLOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29742a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:50 +11:00
Eric Sandeen	c8b5ea289f	[XFS] Unwrap GRANT_LOCK. Un-obfuscate GRANT_LOCK, remove GRANT_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29741a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:41 +11:00
Eric Sandeen	b22cd72c95	[XFS] Unwrap LOG_LOCK. Un-obfuscate LOG_LOCK, remove LOG_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29740a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:32 +11:00
Donald Douwsma	287f3dad14	[XFS] Unwrap AIL_LOCK SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29739a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:23 +11:00
Lachlan McIlroy	541d7d3c4b	[XFS] kill unnessecary ioops indirection Currently there is an indirection called ioops in the XFS data I/O path. Various functions are called by functions pointers, but there is no coherence in what this is for, and of course for XFS itself it's entirely unused. This patch removes it instead and significantly reduces source and binary size of XFS while making maintaince easier. SGI-PV: 970841 SGI-Modid: xfs-linux-melb:xfs-kern:29737a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:14 +11:00
Christoph Hellwig	21a62542b6	[XFS] simplify vn_revalidate No need to allocate a bhv_vattr_t on stack and call xfs_getattr to update a few fields in the Linux inode from the XFS inode, just do it directly. And yes, this function is in dire need of a better name and prototype, I'll do in a separate patch, though. SGI-PV: 970705 SGI-Modid: xfs-linux-melb:xfs-kern:29713a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:04 +11:00
Lachlan McIlroy	15947f2d4f	[XFS] more vnode/inode tracing fixes SGI-PV: 970335 SGI-Modid: xfs-linux-melb:xfs-kern:29697a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:43:54 +11:00
Christoph Hellwig	7642861b7e	[XFS] kill BMAPI_UNWRITTEN There is no reason to go through xfs_iomap for the BMAPI_UNWRITTEN because it has nothing in common with the other cases. Instead check for the shutdown filesystem in xfs_end_bio_unwritten and perform a direct call to xfs_iomap_write_unwritten (which should be renamed to something more sensible one day) SGI-PV: 970241 SGI-Modid: xfs-linux-melb:xfs-kern:29681a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:43:44 +11:00
Christoph Hellwig	6214ed4461	[XFS] kill BMAPI_DEVICE There is no reason to go into the iomap machinery just to get the right block device for an inode. Instead look at the realtime flag in the inode and grab the right device from the mount structure. I created a new helper, xfs_find_bdev_for_inode instead of opencoding it because I plan to use it in other places in the future. SGI-PV: 970240 SGI-Modid: xfs-linux-melb:xfs-kern:29680a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:43:35 +11:00
Lachlan McIlroy	cf441eeb79	[XFS] clean up vnode/inode tracing Simplify vnode tracing calls by embedding function name & return addr in the calling macro. Also do a lot of vnode->inode renaming for consistency, while we're at it. SGI-PV: 970335 SGI-Modid: xfs-linux-melb:xfs-kern:29650a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:42:19 +11:00
Lachlan McIlroy	44866d3928	[XFS] remove dead SYNC_BDFLUSH case in xfs_sync_inodes A large part of xfs_sync_inodes is conditional on the SYNC_BDFLUSH which is never passed to it. This patch removes it and adds an assert that triggers in case some new code tries to pass SYNC_BDFLUSH to it. SGI-PV: 970242 SGI-Modid: xfs-linux-melb:xfs-kern:29630a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:40:53 +11:00
Eric Van Hensbergen	8a0dc95fd9	9p: transport API reorganization This merges the mux.c (including the connection interface) with trans_fd in preparation for transport API changes. Ultimately, trans_fd will need to be rewritten to clean it up and simplify the implementation, but this reorganization is viewed as the first step. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-02-06 19:25:03 -06:00
Eric Van Hensbergen	14b8869ff4	9p: fix mmap to be read-only v9fs was allowing writable mmap which could lead to kernel BUG() cases. This sets the mmap function to generic_file_readonly_mmap which (correctly) returns an error to applications which open mmap for writing. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-02-06 19:25:05 -06:00
Anthony Liguori	d199d652c5	9p: add support for sticky bit GDM gets unhappy if /var/gdm doesn't have the sticky bit set. This patch adds support for the sticky bit in much the same way setuid/setgid is supported. With this patch, I can launch X from a v9fs rootfs (although I quickly run out of fds in the server once gnome starts up). Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-02-06 19:25:06 -06:00
Eric Van Hensbergen	c55703d807	9p: fix bug in attach-per-user When a new user attached at a directory other than the root, he would end up in the parent directory of the cwd. This was due to a logic error in the code which attaches the user at the mount point and walks back to the cwd. This patch fixes that. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-02-06 19:25:08 -06:00
Joel Becker	d24fbcda0c	ocfs2: Negotiate locking protocol versions. Currently, when ocfs2 nodes connect via TCP, they advertise their compatibility level. If the versions do not match, two nodes cannot speak to each other and they disconnect. As a result, this provides no forward or backwards compatibility. This patch implements a simple protocol negotiation at the dlm level by introducing a major/minor version number scheme for entities that communicate. Specifically, o2dlm has a major/minor version for interaction with o2dlm on other nodes, and ocfs2 itself has a major/minor version for interacting with the filesystem on other nodes. This will allow rolling upgrades of ocfs2 clusters when changes to the locking or network protocols can be done in a backwards compatible manner. In those cases, only the minor number is changed and the negotatied protocol minor is returned from dlm join. In the far less likely event that a required protocol change makes backwards compatibility impossible, we simply bump the major number. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-02-06 16:11:29 -08:00
Linus Torvalds	3e6bdf473f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86 * git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: x86: fix deadlock, make pgd_lock irq-safe virtio: fix trivial build bug x86: fix mttr trimming x86: delay CPA self-test and repeat it x86: fix 64-bit sections generic: add __FINITDATA x86: remove suprious ifdefs from pageattr.c x86: mark the .rodata section also NX x86: fix iret exception recovery on 64-bit cpuidle: dubious one-bit signed bitfield in cpuidle.h x86: fix sparse warnings in powernow-k8.c x86: fix sparse error in traps_32.c x86: trivial sparse/checkpatch in quirks.c x86 ptrace: disallow null cs/ss MAINTAINERS: RDC R-321x SoC maintainer brk randomization: introduce CONFIG_COMPAT_BRK brk: check the lower bound properly x86: remove X2 workaround x86: make spurious fault handler aware of large mappings x86: make traps on entry code be debuggable in user space, 64-bit	2008-02-06 13:54:09 -08:00
Ingo Molnar	32a932332c	brk randomization: introduce CONFIG_COMPAT_BRK based on similar patch from: Pavel Machek <pavel@ucw.cz> Introduce CONFIG_COMPAT_BRK. If disabled then the kernel is free (but not obliged to) randomize the brk area. Heap randomization breaks ancient binaries, so we keep COMPAT_BRK enabled by default. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-06 22:39:44 +01:00
Jan Kara	bd1939de90	ext3: fix lock inversion in direct IO We cannot start transaction in ext3_direct_IO() and just let it last during the whole write because dio_get_page() acquires mmap_sem which ranks above transaction start (e.g. because we have dependency chain mmap_sem->PageLock->journal_start, or because we update atime while holding mmap_sem) and thus deadlocks could happen. We solve the problem by starting a transaction separately for each ext3_get_block() call. We could have a problem that we allocate a block and before its data are written out the machine crashes and thus we expose stale data. But that does not happen because for hole-filling generic code falls back to buffered writes and for file extension, we add inode to orphan list and thus in case of crash, journal replay will truncate inode back to the original size. [akpm@linux-foundation.org: build fix] Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Mariusz Kozlowski	e1d7ae24a2	ext3: remove unused code from ext3_find_entry() Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	859cb93679	ext[234]: cleanup ext[234]_bg_num_gdb() Use ext[234]_bg_has_super() to remove duplicate code. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	fb01bfdac7	ext[234]: remove unused argument for ext[234]_find_goal() The argument chain for ext[234]_find_goal() is not used. This patch removes it and fixes comment as well. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	197cd65acc	ext[234]: use ext[234]_get_group_desc() Use ext[234]_get_group_desc() to get group descriptor from group number. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	144704e522	ext[234]: fix comment for nonexistent variable The comment in ext[234]_new_blocks() describes about "i". But there is no local variable called "i" in that scope. I guess it has been renamed to group_no. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Aneesh Kumar K.V	1eca93f9ca	ext3: change the default behaviour on error ext3 file system was by default ignoring errors and continuing. This is not a good default as continuing on error could lead to file system corruption. Change the default to mark the file system readonly. Debian and ubuntu already does this as the default in their fstab. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Aneesh Kumar K.V	feda58d37a	ext3: return after ext3_error in case of failures This fixes some instances where we were continuing after calling ext3_error. ext3_error calls panic only if errors=panic mount option is set. So we need to make sure we return correctly after ext3_error call Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Adrian Bunk	533083836f	make jbd/journal.c:__journal_abort_hard() static __journal_abort_hard() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Andi Kleen	e86e14385d	BKL-removal: remove incorrect comment refering to lock_kernel() from jbd/jbd2 None of the callers of this function does actually take the BKL as far as I can see. So remove the comment refering to the BKL. Signed-off-by: Andi Kleen <ak@suse.de> Cc: <linux-ext4@vger.kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Andi Kleen	d71cadd6bc	BKL-removal: remove incorrect BKL comment in ext2 No BKL used anywhere, so don't mention it. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Andi Kleen	14f9f7b28e	BKL-removal: convert ext2 over to use unlocked_ioctl I checked ext2_ioctl and could not find anything in there that would need the BKL. So convert it over to use unlocked_ioctl Signed-off-by: Andi Kleen <ak@suse.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Aneesh Kumar K.V	f762e9054f	ext3: add block bitmap validation When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Aneesh Kumar K.V	01584fa645	ext2: add block bitmap validation When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Miklos Szeredi	d12def1bcb	fuse: limit queued background requests Libfuse basically creates a new thread for each new request. This is fine for synchronous requests, which are naturally limited. However background requests (especially writepage) can cause a thread creation storm. To avoid this, limit the number of background requests available to userspace. This is done by introducing another queue for background requests, and a counter for the number of "active" requests, which are currently available for userspace. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Miklos Szeredi	b57d426445	fuse: save space in struct fuse_req Move the fields 'dentry' and 'vfsmount' into the request specific union, since these are only used for the RELEASE request. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Miklos Szeredi	0952b2a4a8	fuse: fix attribute caching after create Invalidate attributes on create, since st_ctime is updated. Reported by Szabolcs Szakacsits. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Eric Sandeen	af440f5292	ecryptfs: check for existing key_tfm at mount time Jeff Moyer pointed out that a mount; umount loop of ecryptfs, with the same cipher & other mount options, created a new ecryptfs_key_tfm_cache item each time, and the cache could grow quite large this way. Looking at this with mhalcrow, we saw that ecryptfs_parse_options() unconditionally called ecryptfs_add_new_key_tfm(), which is what was adding these items. Refactor ecryptfs_get_tfm_and_mutex_for_cipher_name() to create a new helper function, ecryptfs_tfm_exists(), which checks for the cipher on the cached key_tfm_list, and sets a pointer to it if it exists. This can then be called from ecryptfs_parse_options(), and new key_tfm's can be added only when a cached one is not found. With list locking changes suggested by akpm. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Trevor Highland	19e66a67e9	eCryptfs: change the type of cipher_code from u16 to u8 Only the lower byte of cipher_code is ever used, so it makes sense for its type to be u8. Signed-off-by: Trevor Highland <trevor.highland@gmail.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Michael Halcrow	25bd817403	eCryptfs: Minor fixes to printk messages The printk statements that result when the user does not have the proper key available could use some refining. Signed-off-by: Mike Halcrow <mhalcrow@us.ibm.com> Cc: Mike Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Eric Sandeen	2830bfd6cf	ecryptfs: remove debug as mount option, and warn if set via modprobe ecryptfs_debug really should not be a mount option; it is not per-mount, but rather sets a global "ecryptfs_verbosity" variable which affects all mounted filesysytems. It's already settable as a module load option, I think we can leave it at that. Also, if set, since secret values come out in debug messages, kick things off with a stern warning. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Mike Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Eric Sandeen	99db6e4a97	ecryptfs: make show_options reflect actual mount options Change ecryptfs_show_options to reflect the actual mount options in use. Note that this does away with the "dir=" output, which is not a valid mount option and appears to be unused. Mount options such as "ecryptfs_verbose" and "ecryptfs_xattr_metadata" are somewhat indeterminate for a given fs, but in any case the reported mount options can be used in a new mount command to get the same behavior. [akpm@linux-foundation.org: fix printk warning] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Trevor Highland	8e3a6f16ba	eCryptfs: set inode key only once per crypto operation There is no need to keep re-setting the same key for any given eCryptfs inode. This patch optimizes the use of the crypto API and helps performance a bit. Signed-off-by: Trevor Highland <trevor.highland@gmail.com> Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Michael Halcrow	cc11beffdf	eCryptfs: track header bytes rather than extents Remove internal references to header extents; just keep track of header bytes instead. Headers can easily span multiple pages with the recent persistent file changes. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Adrian Bunk	7896b63182	fs/ecryptfs/: possible cleanups - make the following needlessly global code static: - crypto.c:ecryptfs_lower_offset_for_extent() - crypto.c:key_tfm_list - crypto.c:key_tfm_list_mutex - inode.c:ecryptfs_getxattr() - main.c:ecryptfs_init_persistent_file() - remove the no longer used mmap.c:ecryptfs_lower_page_cache - #if 0 the unused read_write.c:ecryptfs_read() Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Karsten Wiese	844fcc5396	make sys_poll() wait at least timeout ms schedule_timeout(jiffies) waits for at least jiffies - 1. Add 1 jiffie to the timeout_jiffies calculated in sys_poll() to wait at least timeout_msecs, like poll() manpage says. Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:09 -08:00
Eric Dumazet	13f14b4d8b	Use ilog2() in fs/namespace.c We can use ilog2() in fs/namespace.c to compute hash_bits and hash_mask at compile time, not runtime. [akpm@linux-foundation.org: clean it all up] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:09 -08:00
Thomas Bogendoerfer	b75cb06f72	partition: use DEFAULT_SGI_PARTITION for SGI_PARTION default Use DEFAULT_SGI_PARTITION for SGI_PARTION default Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:08 -08:00
Nick Piggin	a8e3eff466	ext2: xip check fix ext2 should not worry about checking sb->s_blocksize for XIP before the sb's blocksize actually gets set. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:08 -08:00
Denis Cheng	bcf11cbecc	fs/reiserfs/xattr.c: use LIST_HEAD instead of LIST_HEAD_INIT Signed-off-by: Denis Cheng <crquan@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:08 -08:00
Jan Kara	941d2380e9	quota: improve inode list scanning in add_dquot_ref() We restarted scan of sb->s_inodes list whenever we had to drop inode_lock in add_dquot_ref(). This leads to overall quadratic running time and thus add_dquot_ref() can take several minutes when called on a life filesystem. We fix the problem by using the fact that inode cannot be removed from s_inodes list while we hold a reference to it and thus we can safely restart the scan if we don't drop the reference. Here we use the fact that inodes freshly added to s_inodes list are already guaranteed to have quotas properly initialized and the ordering of inodes on s_inodes list does not change so we cannot skip any inode. Thanks goes to Nick <gentuu@gmail.com> for analyzing the problem and testing the fix. [akpm@linux-foundation.org: iput(NULL) is legal] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Nick <gentuu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:07 -08:00
Nick Piggin	0d71bd5993	inotify: remove debug code The inotify debugging code is supposed to verify that the DCACHE_INOTIFY_PARENT_WATCHED scalability optimisation does not result in notifications getting lost nor extra needless locking generated. Unfortunately there are also some races in the debugging code. And it isn't very good at finding problems anyway. So remove it for now. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:07 -08:00
Nick Piggin	d599e36a9e	inotify: fix race There is a race between setting an inode's children's "parent watched" flag when placing the first watch on a parent, and instantiating new children of that parent: a child could miss having its flags set by set_dentry_child_flags, but then inotify_d_instantiate might still see !inotify_inode_watched. The solution is to set_dentry_child_flags after adding the watch. Locking is taken care of, because both set_dentry_child_flags and inotify_d_instantiate hold dcache_lock and child->d_locks. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:06 -08:00
Qi Yong	ba6f867f11	kill an unused PTR_ERR in bdev_cache_init() Signed-off-by: Qi Yong <qiyong@fc-cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:06 -08:00
Eric Dumazet	9cfe015aa4	get rid of NR_OPEN and introduce a sysctl_nr_open NR_OPEN (historically set to 10241024) actually forbids processes to open more than 10241024 handles. Unfortunatly some production servers hit the not so 'ridiculously high value' of 10241024 file descriptors per process. Changing NR_OPEN is not considered safe because of vmalloc space potential exhaust. This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to 10241024, so that admins can decide to change this limit if their workload needs it. [akpm@linux-foundation.org: export it for sparc64] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:06 -08:00
Richard Knutsson	774ed22c21	reiserfs: complement va_start() with va_end(). Complement va_start() with va_end(). Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Jan Kara	ece95912db	inotify: send IN_ATTRIB events when link count changes Currently, no notification event has been sent when inode's link count changed. This is inconvenient for the application in some cases: Suppose you have the following directory structure foo/test bar/ and you watch test. If someone does "mv foo/test bar/", you get event IN_MOVE_SELF and you know something has happened with the file "test". However if someone does "ln foo/test bar/test" and "rm foo/test" you get no inotify event for the file "test" (only directories "foo" and "bar" receive events). Furthermore it could be argued that link count belongs to file's metadata and thus IN_ATTRIB should be sent when it changes. The following patch implements sending of IN_ATTRIB inotify events when link count of the inode changes, i.e., when a hardlink to the inode is created or when it is removed. This event is sent in addition to all the events sent so far. In particular, when a last link to a file is removed, IN_ATTRIB event is sent in addition to IN_DELETE_SELF event. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Morten Welinder <mwelinder@gmail.com> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Steven French <sfrench@us.ibm.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Robert P. J. Day	14b30d6284	hfs: update comment to reflect actual init and exit routines Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Eric Sandeen	55581d018e	address hfs on-disk corruption robustness review comments Address Roman's review comments for the previously sent on-disk corruption hfs robustness patch. - use 0 as a failure value, rather than making a new macro HFS_BAD_KEYLEN, and use a switch statement instead of if's. - Add new fail: target to __hfs_brec_find to skip assignments using bad values when exiting with a failure. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Robert P. J. Day	7c28cbaed6	NCPFS: update diagnostic strings to match routine names. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Petr Vandrovec <VANDROVE@vc.cvut.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Akinobu Mita	797074e44d	fs: use list_for_each_entry_reverse and kill sb_entry Use list_for_each_entry_reverse for super_blocks list and remove unused sb_entry macro. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Akinobu Mita	e8462caa91	fs: use hlist_unhashed Use hlist_unhashed() instead of opencoded equivalent. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:04 -08:00
Michal Schmidt	07a154b2bb	proc: loadavg reading race The avenrun[] values are supposed to be protected by xtime_lock. loadavg_read_proc does not use it. Theoretically this may result in an occasional glitch when the value read from /proc/loadavg would be as much as 1<<11 times higher than it should be. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:04 -08:00
Jiri Olsa	0321155926	fs: remove dead config CONFIG_HAS_COMPAT_EPOLL_EVENT symbol Remove dead config CONFIG_HAS_COMPAT_EPOLL_EVENT symbol. Signed-off-by: Jiri Olsa <olsajiri@gmail.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:03 -08:00
Adrian Bunk	7747cdb2f8	fs/eventfd.c should #include <linux/syscalls.h> Every file should include the headers containing the prototypes for its global functions (in this case sys_eventfd()). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:03 -08:00
Adrian Bunk	7ec37dfd4d	fs/signalfd.c should #include <linux/syscalls.h> Every file should include the headers containing the prototypes for its global functions (in this case sys_signalfd()). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:03 -08:00
Adrian Bunk	12c2ab5e8f	fs/utimes.c should #include <linux/syscalls.h> Every file should include the headers containing the prototypes for its global functions. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:03 -08:00
Adrian Bunk	011e3fcd1e	proper prototype for get_filesystem_list() Ad a proper prototype for migration_init() in include/linux/fs.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Jeff Layton	ce88cc5ed8	smbfs: fix calculation of kernel_recvmsg size parameter in smb_receive() smb_receive calls kernel_recvmsg with a size that's the minimum of the amount of buffer space in the kvec passed in or req->rq_rlen (which represents the length of the response). This does not take into account any data that was read in a request earlier pass through smb_receive. If the first pass through smb_receive receives some but not all of the response, then the next pass can call kernel_recvmsg with a size field that's too big. kernel_recvmsg can overrun into the next response, throwing off the alignment and making it unrecognizable. This causes messages like this to pop up in the ring buffer: smb_get_length: Invalid NBT packet, code=69 as well as other errors indicating that the response is unrecognizable. Typically this is seen on a smbfs mount under heavy I/O. This patch changes the code to use (req->rq_rlen - req->rq_bytes_recvd) instead instead of just req->rq_rlen, since that should represent the amount of unread data in the response. I think this is correct, but an ACK or NACK from someone more familiar with this code would be appreciated... Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Vegard Nossum	b4cf9c342a	FAT: Fix printk format strings This makes sure printk format strings contain no more than a single line. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> [the message was tweaked.] Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Adrian Bunk	f74596d079	proper show_interrupts() prototype Add a proper prototype for show_interrupts() in include/linux/interrupt.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Andries E. Brouwer	0b03cfb25f	MNT_UNBINDABLE fix Some time ago ( http://lkml.org/lkml/2007/6/19/128 ) I wrote about MNT_UNBINDABLE that it felt like a bug that it is not reset by "mount --make-private". Today I happened to see mount(8) and Documentation/sharedsubtree.txt and both document the version obtained by applying the little patch given in the above (and again below). So, the present kernel code is not according to specs and must be regarded as buggy. Specification in Documentation/sharedsubtree.txt: See state diagram: unbindable should become private upon make-private. Specification in mount(8): ... It's also possible to set up uni-directional propagation (with --make- slave), to make a mount point unavailable for --bind/--rbind (with --make-unbindable), and to undo any of these (with --make-private). Repeat of old fix-shared-subtrees-make-private.patch (due to Dirk Gerrits, René Gabriëls, Peter Kooijmans): Acked-by: Ram Pai <linuxram@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Dmitry Antipov	bcfbf84d40	SIGIO-driven I/O with inotify queues Add SIGIO-driven I/O for descriptors returned by inotify_init(). The thing may be enabled by convenient fcntl (fd, F_SETFL, O_ASYNC) call. Signed-off-by: Dmitry Antipov <antipov@dev.rtsoft.ru> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:00 -08:00
Aneesh Kumar K.V	14e11e106b	ext2: change the default behaviour on error ext2 file system was by default ignoring errors and continuing. This is not a good default as continuing on error could lead to file system corruption. Change the default to mark the file system readonly. Debian and ubuntu already does this as the default in their fstab. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:00 -08:00
Aneesh Kumar K.V	7f0adaeced	ext2: return after ext2_error in case of failures This fixes some instances where we were continuing after calling ext2_error. ext2_error call panic only if errors=panic mount option is set. So we need to make sure we return correctly after ext2_error call. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:00 -08:00
Yan Zheng	1c17d18e37	A potential bug in inotify_user.c Following comment is at fs/inotify_user.c:287 /* coalescing: drop this event if it is a dupe of the previous */ I think the previous event in the comment should be the last event in the link list. But inotify_dev_get_event return the first event in the list. In addition, it doesn't check whether the list is empty Signed-off-by: Yan Zheng<yanzheng@21cn.com> Acked-by: Robert Love <rlove@rlove.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:00 -08:00
Jan Engelhardt	19c561a60f	fs/fat/: refine chmod checks Prohibit mode changes in non-quiet mode that cannot be stored reliably with the on-disk format. Suppose a vfat filesystem is mounted with umask=0 and [not-quiet]. Then all files will have mode 0777. Trying to change the owner will fail, because fat does not know about owners or groups. chmod 0770, on the other hand, will succeed, even though fat does not know about the permission triplet [user/group/other]. So this patch changes fat's not-quiet behavior so that only UNIX modes are accepted that can be mapped lossless between the fat disk format and the local system. There is only one attribute, and that is the readonly attribute, which is mapped to the UNIX write permission bit(s). chmod 0555 is therefore valid (taking away the +w bits <=> setting the readonly attribute). Since chmod 0775 and chmod 0755 is an ambiguous case as to whether to set or clear the readonly bit, these modes are also denied. In quiet mode, chmod and chown will continue to "succeed" as they did before, meaning that a subsequent stat() will temporarily return the new mode as long as the inode is not reread from disk, and chown will silently do nothing, not even return the new uid/gid in stat(). Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:40:59 -08:00
Andrew Morton	c773633916	deprecate smbfs in favour of cifs smbfs is a bit buggy and has no maintainer. Change it to shout at the user on the first five mount attempts - tell them to switch to CIFS. Come December we'll mark it BROKEN and see what happens. [olecom@flower.upol.cz: documentation update] Cc: Urban Widmark <urban@teststation.com> Acked-by: Steven French <sfrench@us.ibm.com> Signed-off-by: Oleg Verych <olecom@flower.upol.cz> Cc: Jeff Layton <jlayton@redhat.com> Cc: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 14:37:15 -08:00
Dominique Quatravaux	d7b88513c5	uml: fix hostfs tv_usec calculations To convert from tv_nsec to tv_usec, one needs to divide by 1000, not multiply. Signed-off-by: Dominique Quatravaux <dominique@quatravaux.org> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:30 -08:00
Andrew Morgan	e338d263a7	Add 64-bit capability support to the kernel The patch supports legacy (32-bit) capability userspace, and where possible translates 32-bit capabilities to/from userspace and the VFS to 64-bit kernel space capabilities. If a capability set cannot be compressed into 32-bits for consumption by user space, the system call fails, with -ERANGE. FWIW libcap-2.00 supports this change (and earlier capability formats) http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/ [akpm@linux-foundation.org: coding-syle fixes] [akpm@linux-foundation.org: use get_task_comm()] [ezk@cs.sunysb.edu: build fix] [akpm@linux-foundation.org: do not initialise statics to 0 or NULL] [akpm@linux-foundation.org: unused var] [serue@us.ibm.com: export __cap_ symbols] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:20 -08:00
David P. Quigley	4bea58053f	VFS: Reorder vfs_getxattr to avoid unnecessary calls to the LSM Originally vfs_getxattr would pull the security xattr variable using the inode getxattr handle and then proceed to clobber it with a subsequent call to the LSM. This patch reorders the two operations such that when the xattr requested is in the security namespace it first attempts to grab the value from the LSM directly. If it fails to obtain the value because there is no module present or the module does not support the operation it will fall back to using the inode getxattr operation. In the event that both are inaccessible it returns EOPNOTSUPP. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:20 -08:00
David P. Quigley	4249259404	VFS/Security: Rework inode_getsecurity and callers to return resulting buffer This patch modifies the interface to inode_getsecurity to have the function return a buffer containing the security blob and its length via parameters instead of relying on the calling function to give it an appropriately sized buffer. Security blobs obtained with this function should be freed using the release_secctx LSM hook. This alleviates the problem of the caller having to guess a length and preallocate a buffer for this function allowing it to be used elsewhere for Labeled NFS. The patch also removed the unused err parameter. The conversion is similar to the one performed by Al Viro for the security_getprocattr hook. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:20 -08:00
Fengguang Wu	8bc3be2751	writeback: speed up writeback of big dirty files After making dirty a 100M file, the normal behavior is to start the writeback for all data after 30s delays. But sometimes the following happens instead: - after 30s: ~4M - after 5s: ~4M - after 5s: all remaining 92M Some analyze shows that the internal io dispatch queues goes like this: s_io s_more_io ------------------------- 1) 100M,1K 0 2) 1K 96M 3) 0 96M 1) initial state with a 100M file and a 1K file 2) 4M written, nr_to_write <= 0, so write more 3) 1K written, nr_to_write > 0, no more writes(BUG) nr_to_write > 0 in (3) fools the upper layer to think that data have all been written out. The big dirty file is actually still sitting in s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and let the loop in generic_sync_sb_inodes() continue: this may starve newly expired inodes in s_dirty. It is also not an option to draw inodes from both s_more_io and s_dirty, an let the loop go on: this might lead to live locks, and might also starve other superblocks in sync time(well kupdate may still starve some superblocks, that's another bug). We have to return when a full scan of s_io completes. So nr_to_write > 0 does not necessarily mean that "all data are written". This patch introduces a flag writeback_control.more_io to indicate that more io should be done. With it the big dirty file no longer has to wait for the next kupdate invokation 5s later. In sync_sb_inodes() we only set more_io on super_blocks we actually visited. This avoids the interaction between two pdflush deamons. Also in __sync_single_inode() we don't blindly keep requeuing the io if the filesystem cannot progress. Failing to do so may lead to 100% iowait. Tested-by: Mike Snitzer <snitzer@gmail.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Michael Rubin <mrubin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:19 -08:00
Qi Yong	2d544564f9	skip writing data pages when inode is under I_SYNC Since I_SYNC was split out from I_LOCK, the concern in commit `4b89eed93e` ("Write back inode data pages even when the inode itself is locked") is not longer valid. We should revert to the original behavior: in __writeback_single_inode(), when we find an I_SYNC-ed inode and we're not doing a data-integrity sync, skip writing entirely. Otherwise, we are double calling do_writepages() Signed-off-by: Qi Yong <qiyong@fc-cn.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Cc: Joern Engel <joern@wohnheim.fh-wedel.de> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Cc: Michael Rubin <mrubin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:18 -08:00
Andrea Arcangeli	7766755a2f	Fix /proc dcache deadlock in do_exit This patch fixes a sles9 system hang in start_this_handle from a customer with some heavy workload where all tasks are waiting on kjournald to commit the transaction, but kjournald waits on t_updates to go down to zero (it never does). This was reported as a lowmem shortage deadlock but when checking the debug data I noticed the VM wasn't under pressure at all (well it was really under vm pressure, because lots of tasks hanged in the VM prune_dcache methods trying to flush dirty inodes, but no task was hanging in GFP_NOFS mode, the holder of the journal handle should have if this was a vm issue in the first place). No task was apparently holding the leftover handle in the committing transaction, so I deduced t_updates was stuck to 1 because a journal_stop was never run by some path (this turned out to be correct). With a debug patch adding proper reverse links and stack trace logging in ext3 deployed in production, I found journal_stop is never run because mark_inode_dirty_sync is called inside release_task called by do_exit. (that was quite fun because I would have never thought about this subtleness, I thought a regular path in ext3 had a bug and it forgot to call journal_stop) do_exit->release_task->mark_inode_dirty_sync->schedule() (will never come back to run journal_stop) The reason is that shrink_dcache_parent is racy by design (feature not a bug) and it can do blocking I/O in some case, but the point is that calling shrink_dcache_parent at the last stage of do_exit isn't safe for self-reaping tasks. I guess the memory pressure of the unbalanced highmem system allowed to trigger this more easily. Now mainline doesn't have this line in iput (like sles9 has): if (inode->i_state & I_DIRTY_DELAYED) mark_inode_dirty_sync(inode); so it will probably not crash with ext3, but for example ext2 implements an I/O-blocking ext2_put_inode that will lead to similar screwups with ext2_free_blocks never coming back and it's definitely wrong to call blocking-IO paths inside do_exit. So this should fix a subtle bug in mainline too (not verified in practice though). The equivalent fix for ext3 is also not verified yet to fix the problem in sles9 but I don't have doubt it will (it usually takes days to crash, so it'll take weeks to be sure). An alternate fix would be to offload that work to a kernel thread, but I don't think a reschedule for this is worth it, the vm should be able to collect those entries for the synchronous release_task. Signed-off-by: Andrea Arcangeli <andrea@suse.de> Cc: Jan Kara <jack@ucw.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:18 -08:00
Matt Mackall	1e88328111	maps4: make page monitoring /proc file optional Make /proc/ page monitoring configurable This puts the following files under an embedded config option: /proc/pid/clear_refs /proc/pid/smaps /proc/pid/pagemap /proc/kpagecount /proc/kpageflags [akpm@linux-foundation.org: Kconfig fix] Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:17 -08:00
Matt Mackall	304daa8132	maps4: add /proc/kpageflags interface This makes a subset of physical page flags available to userspace. Together with /proc/pid/kpagemap, this allows tracking of a wide variety of VM behaviors. Exported flags are decoupled from the kernel's internal flags. This allows us to reorder flag bits, and synthesize any bits that get redefined in terms of other bits. [akpm@linux-foundation.org: remove unneeded access_ok()] [akpm@linux-foundation.org: s/0/NULL/] Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:17 -08:00
Matt Mackall	161f47bf41	maps4: add /proc/kpagecount interface This makes physical page map counts available to userspace. Together with /proc/pid/pagemap and /proc/pid/clear_refs, this can be used to monitor memory usage on a per-page basis. [akpm@linux-foundation.org: remove unneeded access_ok()] [bunk@stusta.de: make struct proc_kpagemap static] Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:17 -08:00
Matt Mackall	85863e475e	maps4: add /proc/pid/pagemap interface This interface provides a mapping for each page in an address space to its physical page frame number, allowing precise determination of what pages are mapped and what pages are shared between processes. New in this version: - headers gone again (as recommended by Dave Hansen and Alan Cox) - 64-bit entries (as per discussion with Andi Kleen) - swap pte information exported (from Dave Hansen) - page walker callback for holes (from Dave Hansen) - direct put_user I/O (as suggested by Rusty Russell) This patch folds in cleanups and swap PTE support from Dave Hansen <haveblue@us.ibm.com>. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Matt Mackall	a6198797cc	maps4: regroup task_mmu by interface Reorder source so that all the code and data for each interface is together. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Matt Mackall	f248dcb34d	maps4: move clear_refs code to task_mmu.c This puts all the clear_refs code where it belongs and probably lets things compile on MMU-less systems as well. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Matt Mackall	4752c36978	maps4: simplify interdependence of maps and smaps This pulls the shared map display code out of show_map and puts it in show_smap where it belongs. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Matt Mackall	b3ae5acbbb	maps4: use pagewalker in clear_refs and smaps Use the generic pagewalker for smaps and clear_refs Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Fengguang Wu	ec4dd3eb35	maps4: add proportional set size accounting in smaps The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. So if a process has 1000 pages all to itself, and 1000 shared with one other process, its PSS will be 1500. - lwn.net: "ELC: How much memory are applications really using?" The PSS proposed by Matt Mackall is a very nice metic for measuring an process's memory footprint. So collect and export it via /proc/<pid>/smaps. Matt Mackall's pagemap/kpagemap and John Berthels's exmap can also do the job. They are comprehensive tools. But for PSS, let's do it in the simple way. Cc: John Berthels <jjberthels@gmail.com> Cc: Bernardo Innocenti <bernie@codewiz.org> Cc: Padraig Brady <P@draigBrady.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Hugh Dickins <hugh@veritas.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:16 -08:00
Ken Chen	75897d60a5	hugetlb: allow sticky directory mount option Allow sticky directory mount option for hugetlbfs. This allows admin to create a shared hugetlbfs mount point for multiple users, while prevent accidental file deletion that users may step on each other. It is similiar to default tmpfs mount option, or typical option used on /tmp. Signed-off-by: Ken Chen <kenchen@google.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <hermes@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:14 -08:00
Christoph Lameter	b98938c373	bufferhead: revert constructor removal The constructor for buffer_head slabs was removed recently. We need the constructor back in slab defrag in order to insure that slab objects always have a definite state even before we allocated them. I think we mistakenly merged the removal of the constuctor into a cleanup patch. You (ie: akpm) had a test that showed that the removal of the constructor led to a small regression. The prior state makes things easier for slab defrag. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:14 -08:00
Christoph Lameter	9e2779fa28	is_vmalloc_addr(): Check if an address is within the vmalloc boundaries Checking if an address is a vmalloc address is done in a couple of places. Define a common version in mm.h and replace the other checks. Again the include structures suck. The definition of VMALLOC_START and VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included there. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:14 -08:00
Christoph Lameter	eebd2aa355	Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:13 -08:00
Davide Libenzi	4d672e7ac7	timerfd: new timerfd API This is the new timerfd API as it is implemented by the following patch: int timerfd_create(int clockid, int flags); int timerfd_settime(int ufd, int flags, const struct itimerspec utmr, struct itimerspec otmr); int timerfd_gettime(int ufd, struct itimerspec *otmr); The timerfd_create() API creates an un-programmed timerfd fd. The "clockid" parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME. The timerfd_settime() API give new settings by the timerfd fd, by optionally retrieving the previous expiration time (in case the "otmr" parameter is not NULL). The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit is set in the "flags" parameter. Otherwise it's a relative time. The timerfd_gettime() API returns the next expiration time of the timer, or {0, 0} if the timerfd has not been set yet. Like the previous timerfd API implementation, read(2) and poll(2) are supported (with the same interface). Here's a simple test program I used to exercise the new timerfd APIs: http://www.xmailserver.org/timerfd-test2.c [akpm@linux-foundation.org: coding-style cleanups] [akpm@linux-foundation.org: fix ia64 build] [akpm@linux-foundation.org: fix m68k build] [akpm@linux-foundation.org: fix mips build] [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds] [heiko.carstens@de.ibm.com: fix s390] [akpm@linux-foundation.org: fix powerpc build] [akpm@linux-foundation.org: fix sparc64 more] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:07 -08:00
Oleg Nesterov	ed5d2cac11	exec: rework the group exit and fix the race with kill As Roland pointed out, we have the very old problem with exec. de_thread() sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then clears signal->flags. All signals (even fatal ones) sent in this window (which is not too small) will be lost. With this patch exec doesn't abuse SIGNAL_GROUP_EXIT. signal_group_exit(), the new helper, should be used to detect exit_group() or exec() in progress. It can have more users, but this patch does only strictly necessary changes. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robin Holt <holt@sgi.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:07 -08:00
Andrew Morton	59714d65df	get_task_comm(): return the result It was dumb to make get_task_comm() return void. Change it to return a pointer to the resulting output for caller convenience. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:07 -08:00
Peter Zijlstra	0ccf831cbe	lockdep: annotate epoll On Sat, 2008-01-05 at 13:35 -0800, Davide Libenzi wrote: > I remember I talked with Arjan about this time ago. Basically, since 1) > you can drop an epoll fd inside another epoll fd 2) callback-based wakeups > are used, you can see a wake_up() from inside another wake_up(), but they > will never refer to the same lock instance. > Think about: > > dfd = socket(...); > efd1 = epoll_create(); > efd2 = epoll_create(); > epoll_ctl(efd1, EPOLL_CTL_ADD, dfd, ...); > epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...); > > When a packet arrives to the device underneath "dfd", the net code will > issue a wake_up() on its poll wake list. Epoll (efd1) has installed a > callback wakeup entry on that queue, and the wake_up() performed by the > "dfd" net code will end up in ep_poll_callback(). At this point epoll > (efd1) notices that it may have some event ready, so it needs to wake up > the waiters on its poll wait list (efd2). So it calls ep_poll_safewake() > that ends up in another wake_up(), after having checked about the > recursion constraints. That are, no more than EP_MAX_POLLWAKE_NESTS, to > avoid stack blasting. Never hit the same queue, to avoid loops like: > > epoll_ctl(efd2, EPOLL_CTL_ADD, efd1, ...); > epoll_ctl(efd3, EPOLL_CTL_ADD, efd2, ...); > epoll_ctl(efd4, EPOLL_CTL_ADD, efd3, ...); > epoll_ctl(efd1, EPOLL_CTL_ADD, efd4, ...); > > The code "if (tncur->wq == wq \|\| ..." prevents re-entering the same > queue/lock. Since the epoll code is very careful to not nest same instance locks allow the recursion. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:07 -08:00
Linus Torvalds	f5bb3a5e9d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (79 commits) Jesper Juhl is the new trivial patches maintainer Documentation: mention email-clients.txt in SubmittingPatches fs/binfmt_elf.c: spello fix do_invalidatepage() comment typo fix Documentation/filesystems/porting fixes typo fixes in net/core/net_namespace.c typo fix in net/rfkill/rfkill.c typo fixes in net/sctp/sm_statefuns.c lib/: Spelling fixes kernel/: Spelling fixes include/scsi/: Spelling fixes include/linux/: Spelling fixes include/asm-m68knommu/: Spelling fixes include/asm-frv/: Spelling fixes fs/: Spelling fixes drivers/watchdog/: Spelling fixes drivers/video/: Spelling fixes drivers/ssb/: Spelling fixes drivers/serial/: Spelling fixes drivers/scsi/: Spelling fixes ...	2008-02-04 07:58:52 -08:00
Linus Torvalds	9853832c49	Merge branch 'locks' of git://linux-nfs.org/~bfields/linux * 'locks' of git://linux-nfs.org/~bfields/linux: pid-namespaces-vs-locks-interaction file locks: Use wait_event_interruptible_timeout() locks: clarify posix_locks_deadlock	2008-02-04 07:58:03 -08:00
Nick Piggin	2f98735c9c	vm audit: add VM_DONTEXPAND to mmap for drivers that need it Drivers that register a ->fault handler, but do not range-check the offset argument, must set VM_DONTEXPAND in the vm_flags in order to prevent an expanding mremap from overflowing the resource. I've audited the tree and attempted to fix these problems (usually by adding VM_DONTEXPAND where it is not obvious). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-04 07:55:38 -08:00
Vitaliy Gusev	ab1f161165	pid-namespaces-vs-locks-interaction fcntl(F_GETLK,..) can return pid of process for not current pid namespace (if process is belonged to the several namespaces). It is true also for pids in /proc/locks. So correct behavior is saving pointer to the struct pid of the process lock owner. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-03 17:51:36 -05:00
Matthew Wilcox	4321e01e7d	file locks: Use wait_event_interruptible_timeout() interruptible_sleep_on_locked() is just an open-coded wait_event_interruptible_timeout(), with the one difference that interruptible_sleep_on_locked() doesn't bother to check the condition on which it is waiting, depending instead on the BKL to avoid the case where it blocks after the wakeup has already been called. locks_block_on_timeout() is only used in one place, so it's actually simpler to inline it into its caller. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-03 17:51:36 -05:00
J. Bruce Fields	b533184fc3	locks: clarify posix_locks_deadlock For such a short function (with such a long comment), posix_locks_deadlock() seems to cause a lot of confusion. Attempt to make it a bit clearer: - Remove the initial posix_same_owner() check, which can never pass (since this is only called in the case that block_fl and caller_fl conflict) - Use an explicit loop (and a helper function) instead of a goto. - Rewrite the comment, attempting a clearer explanation, and removing some uninteresting historical detail. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-03 17:51:36 -05:00
Ohad Ben-Cohen	09c6dd3c9d	fs/binfmt_elf.c: spello fix s/litle/little Signed-off-by: Ohad Ben-Cohen <ohad@bencohen.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 18:05:15 +02:00
Joe Perches	c78bad11fb	fs/: Spelling fixes Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 17:33:42 +02:00
Paulius Zaleckas	efad798b9f	Spelling fixes: lenght->length Signed-off-by: Paulius Zaleckas <pauliusz@yahoo.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 15:42:53 +02:00
Robert P. J. Day	e1b8513d21	Typoes: "whith" -> "with" Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 15:14:02 +02:00
Robert P. J. Day	14e4a0f2bb	Fix a small number of "memeber" typoes. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-02-03 15:12:15 +02:00
David Woodhouse	c1f3ee120b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git	2008-02-03 18:30:32 +11:00
Linus Torvalds	63e9b66e29	Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux * 'for-linus' of git://linux-nfs.org/~bfields/linux: (100 commits) SUNRPC: RPC program information is stored in unsigned integers SUNRPC: Move exported symbol definitions after function declaration part 2 NLM: tear down RPC clients in nlm_shutdown_hosts SUNRPC: spin svc_rqst initialization to its own function nfsd: more careful input validation in nfsctl write methods lockd: minor log message fix knfsd: don't bother mapping putrootfh enoent to eperm rdma: makefile rdma: ONCRPC RDMA protocol marshalling rdma: SVCRDMA sendto rdma: SVCRDMA recvfrom rdma: SVCRDMA Core Transport Services rdma: SVCRDMA Transport Module rdma: SVCRMDA Header File svc: Add svc_xprt_names service to replace svc_sock_names knfsd: Support adding transports by writing portlist file svc: Add svc API that queries for a transport instance svc: Add /proc/sys/sunrpc/transport files svc: Add transport hdr size for defer/revisit svc: Move the xprt independent code to the svc_xprt.c file ...	2008-02-02 14:31:28 +11:00
Jeff Layton	d801b86168	NLM: tear down RPC clients in nlm_shutdown_hosts It's possible for a RPC to outlive the lockd daemon that created it, so we need to make sure that all RPC's are killed when lockd is coming down. When nlm_shutdown_hosts is called, kill off all RPC tasks associated with the host. Since we need to wait until they have all gone away, we might as well just shut down the RPC client altogether. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:15 -05:00
J. Bruce Fields	87d26ea777	nfsd: more careful input validation in nfsctl write methods Neil Brown points out that we're checking buf[size-1] in a couple places without first checking whether size is zero. Actually, given the implementation of simple_transaction_get(), buf[-1] is zero, so in both of these cases the subsequent check of the value of buf[size-1] will catch this case. But it seems fragile to depend on that, so add explicit checks for this case. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: NeilBrown <neilb@suse.de>	2008-02-01 16:42:15 -05:00
J. Bruce Fields	50431d94e7	lockd: minor log message fix Wendy Cheng noticed that function name doesn't agree here. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Wendy Cheng <wcheng@redhat.com>	2008-02-01 16:42:15 -05:00
J. Bruce Fields	f7b8066f9f	knfsd: don't bother mapping putrootfh enoent to eperm Neither EPERM and ENOENT map to valid errors for PUTROOTFH according to rfc 3530, and, if anything, ENOENT is likely to be slightly more informative; so don't bother mapping ENOENT to EPERM. (Probably this was originally done because one likely cause was that there is an fsid=0 export but that it isn't permitted to this particular client. Now that we allow WRONGSEC returns, this is somewhat less likely.) In the long term we should work to make this situation less likely, perhaps by turning off nfsv4 service entirely in the absence of the pseudofs root, or constructing a pseudofilesystem root ourselves in the kernel as necessary. Thanks to Benny Halevy <bhalevy@panasas.com> for pointing out this problem. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Benny Halevy <bhalevy@panasas.com>	2008-02-01 16:42:15 -05:00
Tom Tucker	9571af18fa	svc: Add svc_xprt_names service to replace svc_sock_names Create a transport independent version of the svc_sock_names function. The toclose capability of the svc_sock_names service can be implemented using the svc_xprt_find and svc_xprt_close services. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:14 -05:00
Tom Tucker	a217813f90	knfsd: Support adding transports by writing portlist file Update the write handler for the portlist file to allow creating new listening endpoints on a transport. The general form of the string is: <transport_name><space><port number> For example: echo "tcp 2049" > /proc/fs/nfsd/portlist This is intended to support the creation of a listening endpoint for RDMA transports without adding #ifdef code to the nfssvc.c file. Transports can also be removed as follows: '-'<transport_name><space><port number> For example: echo "-tcp 2049" > /proc/fs/nfsd/portlist Attempting to add a listener with an invalid transport string results in EPROTONOSUPPORT and a perror string of "Protocol not supported". Attempting to remove an non-existent listener (.e.g. bad proto or port) results in ENOTCONN and a perror string of "Transport endpoint is not connected" Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:13 -05:00
Tom Tucker	7fcb98d58c	svc: Add svc API that queries for a transport instance Add a new svc function that allows a service to query whether a transport instance has already been created. This is used in lockd to determine whether or not a transport needs to be created when a lockd instance is brought up. Specifying 0 for the address family or port is effectively a wild-card, and will result in matching the first transport in the service's list that has a matching class name. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:13 -05:00
Tom Tucker	7a18208383	svc: Make close transport independent Move sk_list and sk_ready to svc_xprt. This involves close because these lists are walked by svcs when closing all their transports. So I combined the moving of these lists to svc_xprt with making close transport independent. The svc_force_sock_close has been changed to svc_close_all and takes a list as an argument. This removes some svc internals knowledge from the svcs. This code races with module removal and transport addition. Thanks to Simon Holm Thøgersen for a compile fix. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Simon Holm Thøgersen <odie@cs.aau.dk>	2008-02-01 16:42:11 -05:00
Tom Tucker	d7c9f1ed97	svc: Change services to use new svc_create_xprt service Modify the various kernel RPC svcs to use the svc_create_xprt service. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:09 -05:00
Oleg Drokin	54ca95eb36	Leak in nlmsvc_testlock for async GETFL case Fix nlm_block leak for the case of supplied blocking lock info. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:07 -05:00
J. Bruce Fields	8838dc43d6	nfsd4: clean up access_valid, deny_valid checks. Document these checks a little better and inline, as suggested by Neil Brown (note both functions have two callers). Remove an obviously bogus check while we're there (checking whether unsigned value is negative). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Neil Brown <neilb@suse.de>	2008-02-01 16:42:07 -05:00
J. Bruce Fields	5c002b3bb2	nfsd: allow root to set uid and gid on create The server silently ignores attempts to set the uid and gid on create. Based on the comment, this appears to have been done to prevent some overly-clever IRIX client from causing itself problems. Perhaps we should remove that hack completely. For now, at least, it makes sense to allow root (when no_root_squash is set) to set uid and gid. While we're there, since nfsd_create and nfsd_create_v3 share the same logic, pull that out into a separate function. And spell out the individual modifications of ia_valid instead of doing them both at once inside a conditional. Thanks to Roger Willcocks <roger@filmlight.ltd.uk> for the bug report and original patch on which this is based. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:07 -05:00
Oleg Drokin	29dbf54615	lockd: fix a leak in nlmsvc_testlock asynchronous request handling Without the patch, there is a leakage of nlmblock structure refcount that holds a reference nlmfile structure, that holds a reference to struct file, when async GETFL is used (-EINPROGRESS return from file_ops->lock()), and also in some error cases. Fix up a style nit while we're here. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
Frank Filz	406a7ea97d	nfsd: Allow AIX client to read dir containing mountpoints This patch addresses a compatibility issue with a Linux NFS server and AIX NFS client. I have exported /export as fsid=0 with sec=krb5:krb5i I have mount --bind /home onto /export/home I have exported /export/home with sec=krb5i The AIX client mounts / -o sec=krb5:krb5i onto /mnt If I do an ls /mnt, the AIX client gets a permission error. Looking at the network traceIwe see a READDIR looking for attributes FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a NFS4ERR_WRONGSEC which the AIX client is not expecting. Since the AIX client is only asking for an attribute that is an attribute of the parent file system (pseudo root in my example), it seems reasonable that there should not be an error. In discussing this issue with Bruce Fields, I initially proposed ignoring the error in nfsd4_encode_dirent_fattr() if all that was being asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however, Bruce suggested that we avoid calling cross_mnt() if only these attributes are requested. The following patch implements bypassing cross_mnt() if only FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there is some complexity in the code in nfsd4_encode_fattr(), I didn't want to duplicate code (and introduce a maintenance nightmare), so I added a parameter to nfsd4_encode_fattr() that indicates whether it should ignore cross mounts and simply fill in the attribute using the passed in dentry as opposed to it's parent. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
J. Bruce Fields	39325bd03f	nfsd4: fix bad seqid on lock request incompatible with open mode The failure to return a stateowner from nfs4_preprocess_seqid_op() means in the case where a lock request is of a type incompatible with an open (due to, e.g., an application attempting a write lock on a file open for read), means that fs/nfsd/nfs4xdr.c:ENCODE_SEQID_OP_TAIL() never bumps the seqid as it should. The client, attempting to close the file afterwards, then gets an (incorrect) bad sequence id error. Worse, this prevents the open file from ever being closed, so we leak state. Thanks to Benny Halevy and Trond Myklebust for analysis, and to Steven Wilton for the report and extensive data-gathering. Cc: Benny Halevy <bhalevy@panasas.com> Cc: Steven Wilton <steven.wilton@team.eftel.com.au> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
Oleg Drokin	b7e6b86948	lockd: fix reference count leaks in async locking case In a number of places where we wish only to translate nlm_drop_reply to rpc_drop_reply errors we instead return early with rpc_drop_reply, skipping some important end-of-function cleanup. This results in reference count leaks when lockd is doing posix locking on GFS2. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
J. Bruce Fields	404ec117be	nfsd4: recognize callback channel failure earlier When the callback channel fails, we inform the client of that by returning a cb_path_down error the next time it tries to renew its lease. If we wait most of a lease period before deciding that a callback has failed and that the callback channel is down, then we decrease the chances that the client will find out in time to do anything about it. So, mark the channel down as soon as we recognize that an rpc has failed. However, continue trying to recall delegations anyway, in hopes it will come back up. This will prevent more delegations from being given out, and ensure cb_path_down is returned to renew calls earlier, while still making the best effort to deliver recalls of existing delegations. Also fix a couple comments and remove a dprink that doesn't seem likely to be useful. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
J. Bruce Fields	35bba9a37e	nfsd4: miscellaneous nfs4state.c style fixes Fix various minor style violations. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
J. Bruce Fields	5ec7b46c2f	nfsd4: make current_clientid local Declare this variable in the one function where it's used, and clean up some minor style problems. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
J. Bruce Fields	99d965eda7	nfsd: fix encode_entryplus_baggage() indentation Fix bizarre indentation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
J. Bruce Fields	366e0c1d91	nfsd4: kill unneeded cl_confirm check We generate a unique cl_confirm for every new client; so if we've already checked that this cl_confirm agrees with the cl_confirm of unconf, then we already know that it does not agree with the cl_confirm of conf. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:06 -05:00
J. Bruce Fields	f3aba4e5a1	nfsd4: remove unnecessary cl_verifier check from setclientid_confirm Again, the only way conf and unconf can have the same clientid is if they were created in the "probable callback update" case of setclientid, in which case we already know that the cl_verifier fields must agree. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:05 -05:00
J. Bruce Fields	f394baad13	nfsd4: kill unnecessary same_name() in setclientid_confirm If conf and unconf are both found in the lookup by cl_clientid, then they share the same cl_clientid. We always create a unique new cl_clientid field when creating a new client--the only exception is the "probable callback update" case in setclientid, where we copy the old cl_clientid from another clientid with the same name. Therefore two clients with the same cl_client field also always share the same cl_name field, and a couple of the checks here are redundant. Thanks to Simon Holm Thøgersen for a compile fix. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Simon Holm Thøgersen <odie@cs.aau.dk>	2008-02-01 16:42:05 -05:00
J. Bruce Fields	deda2faa8e	nfsd: uniquify cl_confirm values Using a counter instead of the nanoseconds value seems more likely to produce a unique cl_confirm. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:05 -05:00
J. Bruce Fields	49ba87811f	nfsd: eliminate final bogus case from setclientid logic We're supposed to generate a different cl_confirm verifier for each new client, so these to cl_confirm values should never be the same. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:05 -05:00
J. Bruce Fields	a186e76747	nfsd4: kill some unneeded setclientid comments Most of these comments just summarize the code. The matching of code to the cases described in the RFC may still be useful, though; add specific section references to make that easier to follow. Also update references to the outdated RFC 3010. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:05 -05:00
J. Bruce Fields	1f69f172c7	nfsd: minor fs/nfsd/auth.h cleanup While we're here, let's remove the redundant (and now wrong) pathname in the comment, and the #ifdef __KERNEL__'s. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:05 -05:00
J. Bruce Fields	2e8138a274	nfsd: move nfsd/auth.h into fs/nfsd This header is used only in a few places in fs/nfsd, so there seems to be little point to having it in include/. (Thanks to Robert Day for pointing this out.) Cc: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:05 -05:00
J. Bruce Fields	dbf847ecb6	knfsd: allow cache_register to return error on failure Newer server features such as nfsv4 and gss depend on proc to work, so a failure to initialize the proc files they need should be treated as fatal. Thanks to Andrew Morton for style fix and compile fix in case where CONFIG_NFSD_V4 is undefined. Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:05 -05:00
J. Bruce Fields	e331f606a8	nfsd: fail init on /proc/fs/nfs/exports creation failure I assume the reason failure of creation was ignored here was just to continue support embedded systems that want nfsd but not proc. However, in cases where proc is supported it would be clearer to fail entirely than to come up with some features disabled. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:04 -05:00
J. Bruce Fields	440bcc5920	nfsd: select CONFIG_PROC_FS in nfsv4 and gss server cases The server depends on upcalls under /proc to support nfsv4 and gss. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:04 -05:00
J. Bruce Fields	df95a9d4fb	knfsd: cache unregistration needn't return error There's really nothing much the caller can do if cache unregistration fails. And indeed, all any caller does in this case is print an error and continue. So just return void and move the printk's inside cache_unregister. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:04 -05:00
J. Bruce Fields	d5c3428b2c	nfsd: fail module init on reply cache init failure If the reply cache initialization fails due to a kmalloc failure, currently we try to soldier on with a reduced (or nonexistant) reply cache. Better to just fail immediately: the failure is then much easier to understand and debug, and it could save us complexity in some later code. (But actually, it doesn't help currently because the cache is also turned off in some odd failure cases; we should probably find a better way to handle those failure cases some day.) Fix some minor style problems while we're at it, and rename nfsd_cache_init() to remove the need for a comment describing it. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:04 -05:00
J. Bruce Fields	26808d3f10	nfsd: cleanup nfsd module initialization cleanup Handle the failure case here with something closer to the standard kernel style. Doesn't really matter for now, but I'd like to add a few more failure cases, and then this'll help. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:03 -05:00
J. Bruce Fields	46b2589576	knfsd: cleanup nfsd4 properly on module init failure We forgot to shut down the nfs4 state and idmapping code in this case. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:03 -05:00
J. Bruce Fields	ca2a05aa7c	nfsd: Fix handling of negative lengths in read_buf() The length "nbytes" passed into read_buf should never be negative, but we check only for too-large values of "nbytes", not for too-small values. Make nbytes unsigned, so it's clear that the former tests are sufficient. (Despite this read_buf() currently correctly returns an xdr error in the case of a negative length, thanks to an unsigned comparison with size_of() and bounds-checking in kmalloc(). This seems very fragile, though.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:03 -05:00
Chuck Lever	a628f66758	NFSD: Fix mixed sign comparison in nfs3svc_decode_symlinkargs Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-By: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:03 -05:00
Chuck Lever	9c7544d3a1	NFSD: Use unsigned length argument for decode_pathname Clean up: path name lengths are unsigned on the wire, negative lengths are not meaningful natively either. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-By: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:03 -05:00
Chuck Lever	5a022fc870	NFSD: Adjust filename length argument of nfsd_lookup Clean up: adjust the sign of the length argument of nfsd_lookup and nfsd_lookup_dentry, for consistency with recent changes. NFSD version 4 callers already pass an unsigned file name length. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-By: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:03 -05:00
Chuck Lever	ee1a95b3b3	NFSD: Use unsigned length argument for decode_filename Clean up: file name lengths are unsigned on the wire, negative lengths are not meaningful natively either. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-By: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:02 -05:00
Chuck Lever	48df020aa1	NLM: Fix sign of length of NLM variable length strings According to The Open Group's NLM specification, NLM callers are variable length strings. XDR variable length strings use an unsigned 32 bit length. And internally, negative string lengths are not meaningful for the Linux NLM implementation. Clean up: Make nlm_lock.len and nlm_reboot.len unsigned integers. This makes the sign of NLM string lengths consistent with the sign of xdr_netobj lengths. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-By: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:02 -05:00
J. Bruce Fields	d4395e03fe	knfsd: fix broken length check in nfs4idmap.c Obviously at some point we thought "error" represented the length when positive. This appears to be a long-standing typo. Thanks to Prasad Potluri <pvp@us.ibm.com> for finding the problem and proposing an earlier version of this patch. Cc: Steve French <smfltc@us.ibm.com> Cc: Prasad V Potluri <pvp@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:01 -05:00
Prasad P	aefa89d178	nfsd: Fix inconsistent assignment Dereferenced pointer "dentry" without checking and assigned to inode in the declaration. (We could just delete the NULL checks that follow instead, as we never get to the encode function in this particular case. But it takes a little detective work to verify that fact, so it's probably safer to leave the checks in place.) Cc: Steve French <smfltc@us.ibm.com> Signed-off-by: Prasad V Potluri <pvp@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:01 -05:00
J. Bruce Fields	63c86716ea	nfsd: move callback rpc_client creation into separate thread The whole reason to move this callback-channel probe into a separate thread was because (for now) we don't have an easy way to create the rpc_client asynchronously. But I forgot to move the rpc_create() to the spawned thread. Doh! Fix that. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:01 -05:00
J. Bruce Fields	46f8a64bae	nfsd4: probe callback channel only once Our callback code doesn't actually handle concurrent attempts to probe the callback channel. Some rethinking of the locking may be required. However, we can also just move the callback probing to this case. Since this is the only time a client is "confirmed" (and since that can only happen once in the lifetime of a client), this ensures we only probe once. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-02-01 16:42:01 -05:00
Al Viro	0c11b9428f	[PATCH] switch audit_get_loginuid() to task_struct * all callers pass something->audit_context Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-02-01 14:04:59 -05:00
Jens Axboe	8084870854	splice: always updated atime in direct splice Andre Majorel <aym-xunil@teaser.fr> points out that if we only updated the atime when we transfer some data, we deviate from the standard of always updating the atime. So change splice to always call file_accessed() even if splice_direct_to_actor() didn't transfer any data. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-01 09:26:32 +01:00
Linus Torvalds	75659ca0c1	Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc * 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits) Remove commented-out code copied from NFS NFS: Switch from intr mount option to TASK_KILLABLE Add wait_for_completion_killable Add wait_event_killable Add schedule_timeout_killable Use mutex_lock_killable in vfs_readdir Add mutex_lock_killable Use lock_page_killable Add lock_page_killable Add fatal_signal_pending Add TASK_WAKEKILL exit: Use task_is_* signal: Use task_is_* sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL ptrace: Use task_is_* power: Use task_is_* wait: Use TASK_NORMAL proc/base.c: Use task_is_* proc/array.c: Use TASK_REPORT perfmon: Use task_is_* ... Fixed up conflicts in NFS/sunrpc manually..	2008-02-01 11:45:47 +11:00
Linus Torvalds	f389e9fcec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: (21 commits) dlm: static initialization improvements dlm: clean ups dlm: Sanity check namelen before copying it dlm: keep cached master rsbs during recovery dlm: change error message to debug dlm: fix possible use-after-free dlm: limit dir lookup loop dlm: reject normal unlock when lock is waiting for lookup dlm: validate messages before processing dlm: reject messages from non-members dlm: another call to confirm_master in receive_request_reply dlm: recover locks waiting for overlap replies dlm: clear ast_type when removing from astqueue dlm: use fixed errno values in messages dlm: swap bytes for rcom lock reply dlm: align midcomms message buffer dlm: close othercons dlm: use dlm prefix on alloc and free functions dlm: don't print common non-errors dlm: proper prototypes ...	2008-01-31 09:29:31 +11:00
Denis Cheng	0fe410d3f3	dlm: static initialization improvements also change name_prefix from char pointer to char array. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
David Teigland	dbcfc34733	dlm: clean ups A couple small clean-ups. Remove unnecessary wrapper-functions in rcom.c, and remove unnecessary casting and an unnecessary ASSERT in util.c. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
Patrick Caulfeld	2a79289e87	dlm: Sanity check namelen before copying it The 32/64 compatibility code in the DLM does not check the validity of the lock name length passed into it, so it can easily overwrite memory if the value is rubbish (as early versions of libdlm can cause with unlock calls, it doesn't zero the field). This patch restricts the length of the name to the amount of data actually passed into the call. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
David Teigland	85f0379aa0	dlm: keep cached master rsbs during recovery To prevent the master of an rsb from changing rapidly, an unused rsb is kept on the "toss list" for a period of time to be reused. The toss list was being cleared completely for each recovery, which is unnecessary. Much of the benefit of the toss list can be maintained if nodes keep rsb's in their toss list that they are the master of. These rsb's need to be included when the resource directory is rebuilt during recovery. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
David Teigland	594199ebaa	dlm: change error message to debug The invalid lockspace messages are normal and can appear relatively often. They should be suppressed without debugging enabled. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
David Teigland	ce5246b972	dlm: fix possible use-after-free The dlm_put_lkb() can free the lkb and its associated ua structure, so we can't depend on using the ua struct after the put. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	755b5eb8ba	dlm: limit dir lookup loop In a rare case we may need to repeat a local resource directory lookup due to a race with removing the rsb and removing the resdir record. We'll never need to do more than a single additional lookup, though, so the infinite loop around the lookup can be removed. In addition to being unnecessary, the infinite loop is dangerous since some other unknown condition may appear causing the loop to never break. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	42dc1601a9	dlm: reject normal unlock when lock is waiting for lookup Non-forced unlocks should be rejected if the lock is waiting on the rsb_lookup list for another lock to establish the master node. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	c54e04b00f	dlm: validate messages before processing There was some hit and miss validation of messages that has now been cleaned up and unified. Before processing a message, the new validate_message() function checks that the lkb is the appropriate type, process-copy or master-copy, and that the message is from the correct nodeid for the the given lkb. Other checks and assertions on the lkb type and nodeid have been removed. The assertions were particularly bad since they would panic the machine instead of just ignoring the bad message. Although other recent patches have made processing old message unlikely, it still may be possible for an old message to be processed and caught by these checks. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	46b43eed70	dlm: reject messages from non-members Messages from nodes that are no longer members of the lockspace should be ignored. When nodes are removed from the lockspace, recovery can sometimes complete quickly enough that messages arrive from a removed node after recovery has completed. When processed, these messages would often cause an error message, and could in some cases change some state, causing problems. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	aec64e1be2	dlm: another call to confirm_master in receive_request_reply When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of retried, there may be other lkb's waiting on the rsb_lookup list for it to complete. A call to confirm_master() is needed to move on to the next waiting lkb since the current one won't be retried. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	601342ce02	dlm: recover locks waiting for overlap replies When recovery looks at locks waiting for replies, it fails to consider locks that have already received a reply for their first remote operation, but not received a reply for secondary, overlapping unlock/cancel. The appropriate stub reply needs to be called for these waiters. Appears when we start doing recovery in the presence of a many overlapping unlock/cancel ops. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	8a358ca8e7	dlm: clear ast_type when removing from astqueue The lkb_ast_type field indicates whether the lkb is on the astqueue list. When clearing locks for a process, lkb's were being removed from the astqueue list without clearing the field. If release_lockspace then happened immediately afterward, it could try to remove the lkb from the list a second time. Appears when process calls libdlm dlm_release_lockspace() which first closes the ls dev triggering clear_proc_locks, and then removes the ls (a write to control dev) causing release_lockspace(). Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	861e2369e9	dlm: use fixed errno values in messages Some errno values differ across platforms. So if we return things like -EINPROGRESS from one node it can get misinterpreted or rejected on another one. This patch fixes up the errno values passed on the wire so that they match the x86 ones (so as not to break the protocol), and re-instates the platform-specific ones at the other end. Many thanks to Fabio for testing this patch. Initial patch from Patrick. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
Fabio M. Di Nitto	550283e30c	dlm: swap bytes for rcom lock reply DLM_RCOM_LOCK_REPLY messages need byte swapping. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:41 -06:00

... 2 3 4 5 6 ...

7927 Commits