linux

Commit Graph

Author	SHA1	Message	Date
Eric Sandeen	256bdb497c	ext4: allocate struct ext4_allocation_context from a kmem cache struct ext4_allocation_context is rather large, and this bloats the stack of many functions which use it. Allocating it from a named slab cache will alleviate this. For example, with this change (on top of the noinline patch sent earlier): -ext4_mb_new_blocks 200 +ext4_mb_new_blocks 40 -ext4_mb_free_blocks 344 +ext4_mb_free_blocks 168 -ext4_mb_release_inode_pa 216 +ext4_mb_release_inode_pa 40 -ext4_mb_release_group_pa 192 +ext4_mb_release_group_pa 24 Most of these stack-allocated structs are actually used only for mballoc history; and in those cases often a smaller struct would do. So changing that may be another way around it, at least for those functions, if preferred. For now, in those cases where the ac is only for history, an allocation failure simply skips the history recording, and does not cause any other failures. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-10 01:13:33 -05:00
Jan Kara	7fb5409df0	ext4: Fix Direct I/O locking We cannot start transaction in ext4_direct_IO() and just let it last during the whole write because dio_get_page() acquires mmap_sem which ranks above transaction start (e.g. because we have dependency chain mmap_sem->PageLock->journal_start, or because we update atime while holding mmap_sem) and thus deadlocks could happen. We solve the problem by starting a transaction separately for each ext4_get_block() call. We could have a problem that we allocate a block and before its data are written out the machine crashes and thus we expose stale data. But that does not happen because for hole-filling generic code falls back to buffered writes and for file extension, we add inode to orphan list and thus in case of crash, journal replay will truncate inode back to the original size. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-10 01:08:38 -05:00
Aneesh Kumar K.V	8009f9fb30	ext4: Fix circular locking dependency with migrate and rm. In order to prevent a circular locking dependency when an unlink operation is racing with an ext4 migration, we delay taking i_data_sem until just before switch the inode format, and use i_mutex to prevent writes and truncates during the first part of the migration operation. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-10 01:20:05 -05:00
Eric Sandeen	0040d9875d	allow in-inode EAs on ext4 root inode The ext3 root inode was treated specially with respect to in-inode extended attributes, for reasons detailed in the removed comment below. The first mkfs-created inodes would not get extra_i_size or the EXT3_STATE_XATTR flag set in ext3_read_inode, which disallowed reading or setting in-inode EAs on the root. However, in ext4, ext4_mark_inode_dirty calls ext4_expand_extra_isize for all inodes; once this is done EAs may be placed in the root ext4 inode body. But for reasons above, it won't be found after a reboot. testcase: setfattr -n user.name -v value mntpt/ setfattr -n user.name2 -v value2 mntpt/ umount mntpt/; remount mntpt/ getfattr -d mntpt/ name2/value2 has gone missing; debugfs shows it in the inode body, but it is not found there by getattr. The following fixes it up; newer mkfs appears to properly zero the inodes, so this workaround isn't needed for ext4. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-02-05 22:36:43 -05:00
Aneesh Kumar K.V	42a10add85	ext4: Fix null bh pointer dereference in mballoc Repoted by Adrian Bunk <bunk@kernel.org>: The Coverity checker spotted the following NULL dereference: static int ext4_mb_mark_diskspace_used { ... if (!bitmap_bh) goto out_err; ... out_err: sb->s_dirt = 1; put_bh(bitmap_bh); ... Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-02-10 01:07:28 -05:00
Valerie Clement	b8356c465b	ext4: Don't set EXTENTS_FL flag for fast symlinks For fast symbolic links, the file content is stored in the i_block[] array, which is not compatible with the new file extents format. e2fsck reports error on such files because EXTENTS_FL is set. Don't set the EXTENTS_FL flag when creating fast symlinks. In the case of file migration, skip fast symbolic links. Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-05 10:56:37 -05:00
David Howells	1d1fe1ee02	iget: stop EXT4 from using iget() and read_inode() Stop the EXT4 filesystem from using iget() and read_inode(). Replace ext4_read_inode() with ext4_iget(), and call that instead of iget(). ext4_iget() then uses iget_locked() directly and returns a proper error code instead of an inode in the event of an error. ext4_fill_super() returns any error incurred when getting the root inode instead of EINVAL. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:27 -08:00
Akinobu Mita	859cb93679	ext[234]: cleanup ext[234]_bg_num_gdb() Use ext[234]_bg_has_super() to remove duplicate code. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	fb01bfdac7	ext[234]: remove unused argument for ext[234]_find_goal() The argument chain for ext[234]_find_goal() is not used. This patch removes it and fixes comment as well. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	197cd65acc	ext[234]: use ext[234]_get_group_desc() Use ext[234]_get_group_desc() to get group descriptor from group number. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	144704e522	ext[234]: fix comment for nonexistent variable The comment in ext[234]_new_blocks() describes about "i". But there is no local variable called "i" in that scope. I guess it has been renamed to group_no. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Christoph Lameter	eebd2aa355	Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:13 -08:00
Aneesh Kumar K.V	b939e3766e	ext4: Use the ext4_ext_actual_len() helper function ext4 uses the high bit of the extent length to encode whether the extent is intialized or not. The helper function ext4_ext_get_actual_len should be used to get the actual length of the extent. This addresses the kernel bug documented here: http://bugzilla.kernel.org/show_bug.cgi?id=9732 kernel BUG at fs/ext4/extents.c:1056! .... Call Trace: [<ffffffff88366073>] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1 [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff812748f6>] _spin_unlock+0x17/0x20 [<ffffffff883400a6>] :jbd2:start_this_handle+0x4e0/0x4fe [<ffffffff88366564>] :ext4dev:ext4_fallocate+0x175/0x39a [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff81056480>] __lock_acquire+0x4e7/0xc4d [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff810a8de7>] sys_fallocate+0xe4/0x10d [<ffffffff8100c043>] tracesys+0xd5/0xda Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Dmitry Monakhov	dbf9d7da33	ext4: fix uniniatilized extent splitting error Fix bug reported by Dmitry Monakhov caused by lost error code Testcase: blksize = 0x1000; fd = open(argv[1], O_RDWR\|O_CREAT, 0700); unsigned long long sz = 0x10000000UL; /* allocating big blocks chunk / syscall(__NR_fallocate, fd, 0, 0UL, sz) / grab all other available filesystem space / tfd = open("tmp", O_RDWR\|O_CREAT\|O_DIRECT, 0700); while( write(tfd, buf, 4096) > 0); / loop untill ENOSPC / fsync(fd); / just in case / while (pos < sz) { / each seek+ write operation result in splits uninitialized extent in three extents. Splitting may result in new extent allocation which probably will fail because of ENOSPC/ lseek(fd, blksize2 -1, SEEK_CUR); if ((ret = write(fd, 'a', 1)) != 1) exit(1); pos += blksize * 2; } Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	ce40733ce9	ext4: Check for return value from sb_set_blocksize sb_set_blocksize validates whether the specfied block size can be used by the file system. Make sure we fail mounting the file system if the blocksize specfied cannot be used. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Miklos Szeredi	cb45bbe44b	ext4: Add stripe= option to /proc/mounts Add stripe= option to /proc/mounts for ext4 filesystems. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	3dbd0ede4d	ext4: Enable the multiblock allocator by default Enable the multiblock allocator by default. Fix ext4_show_options() so if it is not enabled, the nomballoc option included in /proc/mounts. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:26 -05:00
Alex Tomas	c9de560ded	ext4: Add multi block allocator for ext4 Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-29 00:19:52 -05:00
Alex Tomas	1988b51e47	ext4: Add new functions for searching extent tree Add the functions ext4_ext_search_left() and ext4_ext_search_right(), which are used by mballoc during ext4_ext_get_blocks to decided whether to merge extent information. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Johann Lombardi <johann@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Eric Sandeen	c549a95d40	ext4: fix up EXT4FS_DEBUG builds Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail without these changes. Clean up some format warnings too. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	aa22df2cc8	ext4: Fix ext4_show_options to show the correct mount options. We need to look at the default value and make sure the mount options are not set via default value before showing them via ext4_show_options Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	c14c6fd5c5	ext4: Add EXT4_IOC_MIGRATE ioctl The below patch add ioctl for migrating ext3 indirect block mapped inode to ext4 extent mapped inode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Jean Noel Cordenner	25ec56b518	ext4: Add inode version support in ext4 This patch adds 64-bit inode version support to ext4. The lower 32 bits are stored in the osd1.linux1.l_i_version field while the high 32 bits are stored in the i_version_hi field newly created in the ext4_inode. This field is incremented in case the ext4_inode is large enough. A i_version mount option has been added to enable the feature. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>	2008-01-28 23:58:27 -05:00
Girish Shilamkar	818d276ceb	ext4: Add the journal checksum feature The journal checksum feature adds two new flags i.e JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Due to checksums, writing of the commit record no longer needs to be synchronous. Now commit record can be sent to disk without waiting for descriptor blocks to be written to disk. This behavior is controlled using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be able to recover the journal with _ASYNC_COMMIT hence it is made incompat. The commit header has been extended to hold the checksum along with the type of the checksum. For recovery in pass scan checksums are verified to ensure the sanity and completeness(in case of _ASYNC_COMMIT) of every transaction. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Girish Shilamkar <girish@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	4df3d265bf	ext4: Take read lock during overwrite case. When we are overwriting a file and not actually allocating new file system blocks we need to take only the read lock on i_data_sem. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:29 -05:00
Aneesh Kumar K.V	0e855ac8b1	ext4: Convert truncate_mutex to read write semaphore. We are currently taking the truncate_mutex for every read. This would have performance impact on large CPU configuration. Convert the lock to read write semaphore and take read lock when we are trying to read the file. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	c278bfeceb	ext4: Make ext4_get_blocks_wrap take the truncate_mutex early. When doing a migrate from ext3 to ext4 inode we need to make sure the test for inode type and walking inode data happens inside lock. To make this happen move truncate_mutex early before checking the i_flags. This actually should enable us to remove the verify_chain(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Mariusz Kozlowski	01f4adc044	ext4: remove unused code from ext4_find_entry() The unused code found in ext3_find_entry() is also present (and still unused) in the ext4_find_entry() code. This patch removes it. Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	221879c927	ext4: Check for the correct error return from ext4_ext_get_blocks returns negative values on error. We should check for <= 0 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	abcb2947c9	ext4: add block bitmap validation When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	bb4f397a1a	ext4: Change the default behaviour on error ext4 file system was by default ignoring errors and continuing. This is not a good default as continuing on error could lead to file system corruption. Change the default to mark the file system readonly. Debian and ubuntu already does this as the default in their fstab. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:26 -05:00
Eric Sandeen	e7c9559300	ext4: fix oops on corrupted ext4 mount When mounting an ext4 filesystem with corrupted s_first_data_block, things can go very wrong and oops. Because blocks_count in ext4_fill_super is a u64, and we must use do_div, the calculation of db_count is done differently than on ext4. If first_data_block is corrupted such that it is larger than ext4_blocks_count, for example, then the intermediate blocks_count value may go negative, but sign-extend to a very large value: blocks_count = (ext4_blocks_count(es) - le32_to_cpu(es->s_first_data_block) + EXT4_BLOCKS_PER_GROUP(sb) - 1); This is then assigned to s_groups_count which is an unsigned long: sbi->s_groups_count = blocks_count; This may result in a value of 0xFFFFFFFF which is then used to compute db_count: db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) / EXT4_DESC_PER_BLOCK(sb); and in this case db_count will wind up as 0 because the addition overflows 32 bits. This in turn causes the kmalloc for group_desc to be of 0 size: sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *), GFP_KERNEL); and eventually in ext4_check_descriptors, dereferencing sbi->s_group_desc[desc_block] will result in a NULL pointer dereference. The simplest test seems to be to sanity check s_first_data_block, EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure their combination won't result in a bad intermediate value for blocks_count. We could just check for db_count == 0, but catching it at the root cause seems like it provides more info. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Adrian Bunk	07620f69ef	ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*) Based on a report by Robert P. J. Day. Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	cb47dce791	ext4: Return after ext4_error in case of failures This fix some instances where we were continuing after calling ext4_error. ext4_error call panic only if errors=panic mount option is set. So we need to make sure we return correctly after ext4_error call Reported by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Eric Sandeen	e2b4657453	ext4: store maxbytes for bitmapped files and return EFBIG as appropriate Calculate & store the max offset for bitmapped files, and catch too-large seeks, truncates, and writes in ext4, shortening or rejecting as appropriate. Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Eric Sandeen	cd2291a463	ext4: different maxbytes functions for bitmap & extent files use 2 different maxbytes functions for bitmapped & extent-based files. Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	8180a5627d	ext4: Support large files This patch converts ext4_inode i_blocks to represent total blocks occupied by the inode in file system block size. Earlier the variable used to represent this in 512 byte block size. This actually limited the total size of the file. The feature is enabled transparently when we write an inode whose i_blocks cannot be represnted as 512 byte units in a 48 bit variable. inode flag EXT4_HUGE_FILE_FL Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	0fc1b45147	ext4: Add support for 48 bit inode i_blocks. Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode to represet the higher 16 bits for i_blocks. With this change max_file size becomes (2*48 -1 ) 512 bytes. We add a RO_COMPAT feature to the super block to indicate that inode have i_blocks represented as a split 48 bits. Super block with this feature set cannot be mounted read write on a kernel with CONFIG_LSF disabled. Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	a48380f769	ext4: Rename i_dir_acl to i_size_high Rename ext4_inode.i_dir_acl to i_size_high drop ext4_inode_info.i_dir_acl as it is not used Rename ext4_inode.i_size to ext4_inode.i_size_lo Add helper function for accessing the ext4_inode combined i_size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	7973c0c19e	ext4: Rename i_file_acl to i_file_acl_lo Rename i_file_acl to i_file_acl_lo. This helps in finding bugs where we use i_file_acl instead of the combined i_file_acl_lo and i_file_acl_high Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	1d03ec984c	ext4: Fix sparse warnings. Fix sparse warnings related to static functions and local variables. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	99e6f829a8	ext4: Introduce ext4_update__feature Introduce ext4_update__feature and use them instead of opencoding. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Avantika Mathur	2aa9fc4c40	ext4: fixes block group number being set to a negative value This patch fixes various places where the group number is set to a negative value. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Avantika Mathur	fd2d42912f	ext4: add ext4_group_t, and change all group variables to this type. In many places variables for block group are of type int, which limits the maximum number of block groups to 2^31. Each block group can have up to 2^15 blocks, with a 4K block size, and the max filesystem size is limited to 2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB This patch introduces a new type ext4_group_t, of type unsigned long, to represent block group numbers in ext4. All occurrences of block group variables are converted to type ext4_group_t. Signed-off-by: Avantika Mathur <mathur@us.ibm.com>	2008-01-28 23:58:27 -05:00
Eric Sandeen	bba907433b	ext4 extents: remove unneeded casts There are many casts in extents.c which are not needed, as the variables are already the type of the cast, or are being promoted for no particular reason in printk's. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	725d26d3f0	ext4: Introduce ext4_lblk_t This patch adds a new data type ext4_lblk_t to represent the logical file blocks. This is the preparatory patch to support large files in ext4 The follow up patch with convert the ext4_inode i_blocks to represent the number of blocks in file system block size. This changes makes it possible to have a block number 2**32 -1 which will result in overflow if the block number is represented by signed long. This patch convert all the block number to type ext4_lblk_t which is typedef to __u32 Also remove dead code ext4_ext_walk_space Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Jan Kara	a72d7f834e	ext4: Avoid rec_len overflow with 64KB block size With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry lenght. So we store 0xffff instead and convert value when read from / written to disk. The patch also converts some places to use ext4_next_entry() when we are changing them anyway. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Takashi Sato	afc7cbca5b	ext4: Support large blocksize up to PAGESIZE This patch set supports large block size(>4k, <=64k) in ext4, just enlarging the block size limit. But it is NOT possible to have 64kB blocksize on ext4 without some changes to the directory handling code. The reason is that an empty 64kB directory block would have a rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in the filesystem. The proposed solution is treat 64k rec_len with a an impossible value like rec_len = 0xffff to handle this. The Patch-set consists of the following 2 patches. [1/2] ext4: enlarge blocksize - Allow blocksize up to pagesize [2/2] ext4: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize Now on 64k page ppc64 box runs with this patch set we could create a 64k block size ext4dev, and able to handle empty directory block. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Andries E. Brouwer	b47b6f38e5	ext3, ext4: avoid divide by zero As it turns out, the kernel divides by EXT3_INODES_PER_GROUP(s) when mounting an ext3 filesystem. If that number is zero, a crash follows. Below a patch. This crash was reported by Joeri de Ruiter, Carst Tankink and Pim Vullers. Cc: <linux-ext4@vger.kernel.org> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-12-17 19:28:16 -08:00
Jan Kara	e47776a0a4	Forbid user to change file flags on quota files Forbid user from changing file flags on quota files. User has no bussiness in playing with these flags when quota is on. Furthermore there is a remote possibility of deadlock due to a lock inversion between quota file's i_mutex and transaction's start (i_mutex for quota file is locked only when trasaction is started in quota operations) in ext3 and ext4. Signed-off-by: Jan Kara <jack@suse.cz> Cc: LIOU Payphone <lioupayphone@gmail.com> Cc: <linux-ext4@vger.kernel.org> Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:38 -08:00
Linus Torvalds	0b832a4b93	Revert "ext2/ext3/ext4: add block bitmap validation" This reverts commit `7c9e69faa2`, fixing up conflicts in fs/ext4/balloc.c manually. The cost of doing the bitmap validation on each lookup - even when the bitmap is cached - is absolutely prohibitive. We could, and probably should, do it only when adding the bitmap to the buffer cache. However, right now we are better off just reverting it. Peter Zijlstra measured the cost of this extra validation as a 85% decrease in cached iozone, and while I had a patch that took it down to just 17% by not being _quite_ so stupid in the validation, it was still a big slowdown that could have been avoided by just doing it right. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-13 08:09:11 -08:00
Christoph Hellwig	3965516440	exportfs: make struct export_operations const Now that nfsd has stopped writing to the find_exported_dentry member we an mark the export_operations const Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:21 -07:00
Christoph Hellwig	1b961ac05a	ext4: new export ops Trivial switch over to the new generic helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:20 -07:00
Eric Sandeen	149041070d	ext4: lighten up resize transaction requirements When resizing online, setup_new_group_blocks attempts to reserve a potentially very large transaction, depending on the current filesystem geometry. For some journal sizes, there may not be enough room for this transaction, and the online resize will fail. The patch below resizes & restarts the transaction as necessary while setting up the new group, and should work with even the smallest journal. Tested with something like: [root@newbox ~]# dd if=/dev/zero of=fsfile bs=1024 count=32768 [root@newbox ~]# mkfs.ext3 -b 1024 fsfile 16384 [root@newbox ~]# mount -o loop fsfile mnt/ [root@newbox ~]# resize2fs /dev/loop0 resize2fs 1.40.2 (12-Jul-2007) Filesystem at /dev/loop0 is mounted on /root/mnt; on-line resizing required old desc_blocks = 1, new_desc_blocks = 1 Performing an on-line resize of /dev/loop0 to 32768 (1k) blocks. resize2fs: No space left on device While trying to add group #2 [root@newbox ~]# dmesg \| tail -n 1 JBD: resize2fs wants too many credits (258 > 256) [root@newbox ~]# With the below change, it works. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: Andreas Dilger <adilger@clusterfs.com>	2007-10-17 18:50:04 -04:00
Eric Sandeen	5b615287b3	ext4: fix setup_new_group_blocks locking setup_new_group_blocks() manipulates the group descriptor block bh under the block_bitmap bh's lock. It shouldn't matter since nobody but resize should be touching these blocks, but it's worth fixing up. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2007-10-17 18:50:04 -04:00
Aneesh Kumar K.V	ac39849ddc	ext4: sparse fixes Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:03 -04:00
Aneesh Kumar K.V	d8dd0b4543	ext4: Convert ext4_extent_idx.ei_leaf to ext4_extent_idx.ei_leaf_lo Convert ext4_extent_idx.ei_leaf ext4_extent_idx.ei_leaf_lo This helps in finding BUGs due to direct partial access of these split 48 bit values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:03 -04:00
Aneesh Kumar K.V	b377611d11	ext4: Convert ext4_extent.ee_start to ext4_extent.ee_start_lo Convert ext4_extent.ee_start to ext4_extent.ee_start_lo This helps in finding BUGs due to direct partial access of these split 48 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:03 -04:00
Aneesh Kumar K.V	308ba3ece7	ext4: Convert s_r_blocks_count and s_free_blocks_count Convert s_r_blocks_count and s_free_blocks_count to s_r_blocks_count_lo and s_free_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:50:02 -04:00
Aneesh Kumar K.V	6bc9feff14	ext4: Convert s_blocks_count to s_blocks_count_lo Convert s_blocks_count to s_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:02 -04:00
Aneesh Kumar K.V	5272f83727	ext4: Convert bg_inode_bitmap and bg_inode_table Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo and bg_inode_table_lo. This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix one direct partial access Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:02 -04:00
Aneesh Kumar K.V	3a14589cce	ext4: Convert bg_block_bitmap to bg_block_bitmap_lo Convert bg_block_bitmap to bg_block_bitmap_lo This helps in catching some BUGS due to direct partial access of these split fields. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:01 -04:00
Jose R. Santos	ce42158179	ext4: FLEX_BG Kernel support v2. This feature relaxes check restrictions on where each block groups meta data is located within the storage media. This allows for the allocation of bitmaps or inode tables outside the block group boundaries in cases where bad blocks forces us to look for new blocks which the owning block group can not satisfy. This will also allow for new meta-data allocation schemes to improve performance and scalability. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:50:01 -04:00
Aneesh Kumar K.V	c1bddad949	ext4: Fix sparse warnings Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:50:01 -04:00
Andreas Dilger	717d50e497	Ext4: Uninitialized Block Groups In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. The feature is enabled through a mkfs option mke2fs /dev/ -O uninit_groups A patch adding support for uninitialized block groups to e2fsprogs tools has been posted to the linux-ext4 mailing list. The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is. The attached graph shows the major improvements in e2fsck times in filesystems with a large total inode count, but few inodes in use. In each group descriptor if we have EXT4_BG_INODE_UNINIT set in bg_flags: Inode table is not initialized/used in this group. So we can skip the consistency check during fsck. EXT4_BG_BLOCK_UNINIT set in bg_flags: No block in the group is used. So we can skip the block bitmap verification for this group. We also add two new fields to group descriptor as a part of uninitialized group patch. __le16 bg_itable_unused; /* Unused inodes count / __le16 bg_checksum; / crc16(sb_uuid+group+desc) */ bg_itable_unused: If we have EXT4_BG_INODE_UNINIT not set in bg_flags then bg_itable_unused will give the offset within the inode table till the inodes are used. This can be used by fsck to skip list of inodes that are marked unused. bg_checksum: Now that we depend on bg_flags and bg_itable_unused to determine the block and inode usage, we need to make sure group descriptor is not corrupt. We add checksum to group descriptor to detect corruption. If the descriptor is found to be corrupt, we mark all the blocks and inodes in the group used. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:00 -04:00
Eric Sandeen	4074fe3736	ext4: remove #ifdef CONFIG_EXT4_INDEX CONFIG_EXT4_INDEX is not an exposed config option in the kernel, and it is unconditionally defined in ext4_fs.h. tune2fs is already able to turn off dir indexing, so at this point it's just cluttering up the code. Remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:50:00 -04:00
Coly Li	f077d0d7ea	ext4: Remove (partial, never completed) fragment support Fragment support in ext2/3/4 was never implemented, and it probably will never be implemented. So remove it from ext4. Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:49:59 -04:00
Mingming Cao	cd02ff0b14	jbd2: JBD_XXX to JBD2_XXX naming cleanup change JBD_XXX macros to JBD2_XXX in JBD2/Ext4 Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:49:58 -04:00
Mingming Cao	d802ffa885	JBD2/Ext4: Convert kmalloc to kzalloc in jbd2/ext4 Convert kmalloc to kzalloc() and get rid of the memset(). Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2007-10-17 18:49:57 -04:00
Mathieu Desnoyers	2b47c3611d	Fix f_version type: should be u64 instead of unsigned long Fix f_version type: should be u64 instead of long There is a type inconsistency between struct inode i_version and struct file f_version. fs.h: struct inode u64 i_version; and struct file unsigned long f_version; Users do: fs/ext3/dir.c: if (filp->f_version != inode->i_version) { So why isn't f_version a u64 ? It becomes a problem if versions gets higher than 2^32 and we are on an architecture where longs are 32 bits. This patch changes the f_version type to u64, and updates the users accordingly. It applies to 2.6.23-rc2-mm2. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Martin Bligh <mbligh@google.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: <linux-ext4@vger.kernel.org> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:53 -07:00
vignesh babu	d8ea6cf899	ext2/4: use is_power_of_2() Replace n & (n - 1) with is_power_of_2(n) Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:53 -07:00
Aneesh Kumar K.V	7c9e69faa2	ext2/ext3/ext4: add block bitmap validation When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given by ext4_blk_bitmap, ext4_inode_bitmap and ext4_inode_table. Validate the block bitmap against these blocks. [akpm@linux-foundation.org: cleanups] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:52 -07:00
Eric Sandeen	ef2fb67989	remove unused bh in calls to ext234_get_group_desc ext[234]_get_group_desc never tests the bh argument, and only sets it if it is passed in; it is perfectly happy with a NULL bh argument. But, many callers send one in and never use it. May as well call with NULL like other callers who don't use the bh. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:49 -07:00
Miklos Szeredi	d9c9bef134	ext4: show all mount options Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:49 -07:00
Fengguang Wu	e57aa839ce	convert ill defined log2() to ilog2() It's wrong to have #define log2(n) ffz(~(n)) It should be reversed: #define log2(n) flz(~(n)) or #define log2(n) fls(n) or just use ilog2(n) defined in linux/log2.h. This patch follows the last solution, recommended by Andrew Morton. Cc: <linux-ext4@vger.kernel.org> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Chris Ahna <christopher.j.ahna@intel.com> Cc: David Mosberger-Tang <davidm@hpl.hp.com> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:48 -07:00
Philippe De Muyter	febfcf9115	fs: mark nibblemap const Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:47 -07:00
Christoph Lameter	4ba9b9d0ba	Slab API: remove useless ctor parameter and reorder parameters Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void object, struct kmem_cache s, unsigned long flags) to ctor(struct kmem_cache s, void object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:45 -07:00
Peter Zijlstra	833f4077bf	lib: percpu_counter_init error handling alloc_percpu can fail, propagate that error. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:44 -07:00
Peter Zijlstra	52d9f3b409	lib: percpu_counter_sum_positive s/percpu_counter_sum/&_positive/ Because its consitent with percpu_counter_read* Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:44 -07:00
Peter Zijlstra	3cb4f9fa0c	lib: percpu_counter_sub Hugh spotted that some code does: percpu_counter_add(&counter, -unsignedlong) which, when the amount argument is of type s32, sort-of works thanks to two's-complement. However when we'd change the type to s64 this breaks on 32bit machines, because the promotion rules zero extend the unsigned number. Provide percpu_counter_sub() to hide the s64 cast. That is: percpu_counter_sub(&counter, foo) is equal to: percpu_counter_add(&counter, -(s64)foo); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:44 -07:00
Peter Zijlstra	aa0dff2d09	lib: percpu_counter_add s/percpu_counter_mod/percpu_counter_add/ Because its a better name, _mod implies modulo. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:44 -07:00
Nick Piggin	bfc1af650a	ext4: convert to new aops Convert ext4 to use write_begin()/write_end() methods. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dmitriy Monakhov <dmonakhov@sw.ru> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:55 -07:00
Fengguang Wu	f4e6b498d6	readahead: combine file_ra_state.prev_index/prev_offset into prev_pos Combine the file_ra_state members unsigned long prev_index unsigned int prev_offset into loff_t prev_pos It is more consistent and better supports huge files. Thanks to Peter for the nice proposal! [akpm@linux-foundation.org: fix shift overflow] Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:52 -07:00
Eric Sandeen	ef2b02d3e6	ext34: ensure do_split leaves enough free space in both blocks The do_split() function for htree dir blocks is intended to split a leaf block to make room for a new entry. It sorts the entries in the original block by hash value, then moves the last half of the entries to the new block - without accounting for how much space this actually moves. (IOW, it moves half of the entry count not half of the entry space). If by chance we have both large & small entries, and we move only the smallest entries, and we have a large new entry to insert, we may not have created enough space for it. The patch below stores each record size when calculating the dx_map, and then walks the hash-sorted dx_map, calculating how many entries must be moved to more evenly split the existing entries between the old block and the new block, guaranteeing enough space for the new entry. The dx_map "offs" member is reduced to u16 so that the overall map size does not change - it is temporarily stored at the end of the new block, and if it grows too large it may be overwritten. By making offs and size both u16, we won't grow the map size. Also add a few comments to the functions involved. This fixes the testcase reported by hooanon05@yahoo.co.jp on the linux-ext4 list, "ext3 dir_index causes an error" Thanks to Andreas Dilger for discussing the problem & solution with me. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Tested-by: Junjiro Okajima <hooanon05@yahoo.co.jp> Cc: Theodore Ts'o <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:18 -07:00
Eric Sandeen	3d82abae95	dir_index: error out instead of BUG on corrupt dx dirs Convert asserts (BUGs) in dx_probe from bad on-disk data to recoverable errors with helpful warnings. With help catching other asserts from Duane Griffin <duaneg@dghda.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Duane Griffin <duaneg@dghda.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-19 11:24:18 -07:00
Jan Kara	9c3013e9b9	quota: fix infinite loop If we fail to start a transaction when releasing dquot, we have to call dquot_release() anyway to mark dquot structure as inactive. Otherwise we end in an infinite loop inside dqput(). Signed-off-by: Jan Kara <jack@suse.cz> Cc: xb <xavier.bru@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-11 17:21:19 -07:00
Mingming Cao	dd54567a83	"ext4_ext_put_in_cache" uses __u32 to receive physical block number Yan Zheng wrote: > I think I found a bug in ext4/extents.c, "ext4_ext_put_in_cache" uses > "__u32" to receive physical block number. "ext4_ext_put_in_cache" is > used in "ext4_ext_get_blocks", it sets ext4 inode's extent cache > according most recently tree lookup (higher 16 bits of saved physical > block number are always zero). when serving a mapping request, > "ext4_ext_get_blocks" first check whether the logical block is in > inode's extent cache. if the logical block is in the cache and the > cached region isn't a gap, "ext4_ext_get_blocks" gets physical block > number by using cached region's physical block number and offset in > the cached region. as described above, "ext4_ext_get_blocks" may > return wrong result when there are physical block numbers bigger than > 0xffffffff. > You are right. Thanks for reporting this! Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Yan Zheng <yanzheng@21cn.com> Cc: <stable@kernel.org> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-31 15:39:37 -07:00
Eric Sandeen	780dcdb211	fix inode_table test in ext234_check_descriptors ext[234]_check_descriptors sanity checks block group descriptor geometry at mount time, testing whether the block bitmap, inode bitmap, and inode table reside wholly within the blockgroup. However, the inode table test is off by one so that if the last block in the inode table resides on the last block of the block group, the test incorrectly fails. This is because it tests the last block as (start + length) rather than (start + length - 1). This can be seen by trying to mount a filesystem made such as: mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024 which yields: EXT2-fs error (device loop0): ext2_check_descriptors: Inode table for group 0 not in group (block 101)! EXT2-fs: group descriptors corrupted! There is a similar bug in e2fsprogs, patch already sent for that. (I wonder if inside(), outside(), and/or in_range() should someday be used in this and other tests throughout the ext filesystems...) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-26 11:35:17 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Mingming Cao	b38bd33a6b	fix ext4/JBD2 build warnings Looking at the current linus-git tree jbd_debug() define in include/linux/jbd2.h extern u8 journal_enable_debug; #define jbd_debug(n, f, a...) \ do { \ if ((n) <= journal_enable_debug) { \ printk (KERN_DEBUG "(%s, %d): %s: ", \ __FILE__, __LINE__, __FUNCTION__); \ printk (f, ## a); \ } \ } while (0) > fs/ext4/inode.c: In function âext4_write_inodeâ: > fs/ext4/inode.c:2906: warning: comparison is always true due to limited > range of data type > > fs/jbd2/recovery.c: In function âjbd2_journal_recoverâ: > fs/jbd2/recovery.c:254: warning: comparison is always true due to > limited range of data type > fs/jbd2/recovery.c:257: warning: comparison is always true due to > limited range of data type > > fs/jbd2/recovery.c: In function âjbd2_journal_skip_recoveryâ: > fs/jbd2/recovery.c:301: warning: comparison is always true due to > limited range of data type > Noticed all warnings are occurs when the debug level is 0. Then found the "jbd2: Move jbd2-debug file to debugfs" patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b changed the jbd2_journal_enable_debug from int type to u8, makes the jbd_debug comparision is always true when the debugging level is 0. Thus the compile warning occurs. Thought about changing the jbd2_journal_enable_debug data type back to int, but can't, because the jbd2-debug is moved to debug fs, where calling debugfs_create_u8() to create the debugfs entry needs the value to be u8 type. Even if we changed the data type back to int, the code is still buggy, kernel should not print jbd2 debug message if the jbd2_journal_enable_debug is set to 0. But this is not the case. The fix is change the level of debugging to 1. The same should fixed in ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we probably should fix it all together. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Theodore Tso <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:47 -07:00
Rusty Russell	cf914a7d65	readahead: split ondemand readahead interface into two functions Split ondemand readahead interface into two functions. I think this makes it a little clearer for non-readahead experts (like Rusty). Internally they both call ondemand_readahead(), but the page argument is changed to an obvious boolean flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Fengguang Wu	dc7868fcb9	readahead: convert ext3/ext4 invocations Convert ext3/ext4 dir reads to use on-demand readahead. Readahead for dirs operates _not_ on file level, but on blockdev level. This makes a difference when the data blocks are not continuous. And the read routine is somehow opaque: there's no handy info about the status of current page. So a simplified call scheme is employed: to call into readahead whenever the current page falls out of readahead windows. Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Steven Pratt <slpratt@austin.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
Dmitry Monakhov	e9f410b1c0	ext4: extent macros cleanup Use the EXT_LAST_INDEX macro; that's what it's there for. Clean up ext4_ext_ext_grow_indepth() so the correct EXT_FIRST_INDEX or EXT_FIRST_MACRO is used as necessary. The two macros are equivalent, so the C will collapse the if statement out, but it makes the code much more readable. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Singed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:09:15 -04:00
Dmitry Monakhov	26d535ed24	Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:33:37 -04:00
Dave Hansen	d699594dc1	ext4: remove extra IS_RDONLY() check ext4_change_inode_journal_flag() is only called from one location: ext4_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY() call in it so this one is superfluous. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:33:51 -04:00
Vignesh Babu	1330593eb2	ext4: Use is_power_of_2() Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2() Signed-off-by: Vignesh Babu <vignesh.babu@wipro.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:11:02 -04:00
Eric Sandeen	fc0e15a667	Use zero_user_page() in ext4 where possible Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:20:44 -04:00
Andreas Dilger	f8628a14a2	ext4: Remove 65000 subdirectory limit This patch adds support to ext4 for allowing more than 65000 subdirectories. Currently the maximum number of subdirectories is capped at 32000. If we exceed 65000 subdirectories in an htree directory it sets the inode link count to 1 and no longer counts subdirectories. The directory link count is not actually used when determining if a directory is empty, as that only counts subdirectories and not regular files that might be in there. A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if the subdir count for any directory crosses 65000. A later fsck will clear EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory with >65000 subdirs. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 08:38:01 -04:00
Kalpak Shah	6dd4ee7cab	ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:19:57 -04:00
Kalpak Shah	ef7f38359e	ext4: Add nanosecond timestamps This patch adds nanosecond timestamps for ext4. This involves adding *time_extra fields to the ext4_inode to extend the timestamps to 64-bits. Creation time is also added by this patch. These extended fields will fit into an inode if the filesystem was formatted with large inodes (-I 256 or larger) and there are currently no EAs consuming all of the available space. For new inodes we always reserve enough space for the kernel's known extended fields, but for inodes created with an old kernel this might not have been the case. So this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature flag(ro-compat so that older kernels can't create inodes with a smaller extra_isize). which indicates if the fields fitting inside s_min_extra_isize are available or not. If the expansion of inodes if unsuccessful then this feature will be disabled. This feature is only enabled if requested by the sysadmin. None of the extended inode fields is critical for correct filesystem operation. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:15:20 -04:00

1 2 3 4 5

232 Commits