linux

Commit Graph

Author	SHA1	Message	Date
Jiaying Zhang	3889fd57ea	ext4: flush the i_completed_io_list during ext4_truncate Ted first found the bug when running 2.6.36 kernel with dioread_nolock mount option that xfstests #13 complained about wrong file size during fsck. However, the bug exists in the older kernels as well although it is somehow harder to trigger. The problem is that ext4_end_io_work() can happen after we have truncated an inode to a smaller size. Then when ext4_end_io_work() calls ext4_convert_unwritten_extents(), we may reallocate some blocks that have been truncated, so the inode size becomes inconsistent with the allocated blocks. The following patch flushes the i_completed_io_list during truncate to reduce the risk that some pending end_io requests are executed later and convert already truncated blocks to initialized. Note that although the fix helps reduce the problem a lot there may still be a race window between vmtruncate() and ext4_end_io_work(). The fundamental problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem held, it can race with an ongoing write request so that the io_end request is processed later when the corresponding blocks have been truncated. Ted and I have discussed the problem offline and we saw a few ways to fix the race completely: a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold whenever vmtruncate() is called. The i_mutex lock prevents any new write requests from entering writeback and the i_alloc_sem prevents the race from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate() is called from do_truncate(), which is probably the most common case. However, there are places where we may call vmtruncate() without holding either i_mutex or i_alloc_sem. I would like to ask for other people's opinions on what locks are expected to be held before calling vmtruncate(). There seems a disagreement among the callers of that function. b) We change the ext4 write path so that we change the extent tree to contain the newly allocated blocks and update i_size both at the same time --- when the write of the data blocks is completed. c) We add some additional locking to synchronize vmtruncate() and ext4_end_io_work(). This approach may have performance implications so we need to be careful. All of the above proposals may require more substantial changes, so we may consider to take the following patch as a bandaid. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-01-10 12:47:05 -05:00
Theodore Ts'o	b05e6ae58a	ext4: drop ec_type from the ext4_ext_cache structure We can encode the ec_type information by using ee_len == 0 to denote EXT4_EXT_CACHE_NO, ee_start == 0 to denote EXT4_EXT_CACHE_GAP, and if neither is true, then the cache type must be EXT4_EXT_CACHE_EXTENT. This allows us to reduce the size of ext4_ext_inode by another 8 bytes. (ec_type is 4 bytes, plus another 4 bytes of padding) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-01-10 12:13:26 -05:00
Theodore Ts'o	01f49d0b9d	ext4: use ext4_lblk_t instead of sector_t for logical blocks This fixes a number of places where we used sector_t instead of ext4_lblk_t for logical blocks, which for ext4 are still 32-bit data types. No point wasting space in the ext4_inode_info structure, and requiring 64-bit arithmetic on 32-bit systems, when it isn't necessary. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-01-10 12:13:03 -05:00
Kazuya Mio	ad4fb9cafe	ext4: fix 32bit overflow in ext4_ext_find_goal() ext4_ext_find_goal() returns an ideal physical block number that the block allocator tries to allocate first. However, if a required file offset is smaller than the existing extent's one, ext4_ext_find_goal() returns a wrong block number because it may overflow at "block - le32_to_cpu(ex->ee_block)". This patch fixes the problem. ext4_ext_find_goal() will also return a wrong block number in case a file offset of the existing extent is too big. In this case, the ideal physical block number is fixed in ext4_mb_initialize_context(), so it's no problem. reproduce: # dd if=/dev/zero of=/mnt/mp1/tmp bs=127M count=1 oflag=sync # dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 seek=1 oflag=sync # filefrag -v /mnt/mp1/file Filesystem type is: ef53 File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096) ext logical physical expected length flags 0 128 67456 128 eof /mnt/mp1/file: 2 extents found # rm -rf /mnt/mp1/tmp # echo $((512*4096)) > /sys/fs/ext4/loop0/mb_stream_req # dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 oflag=sync conv=notrunc result (linux-2.6.37-rc2 + ext4 patch queue): # filefrag -v /mnt/mp1/file Filesystem type is: ef53 File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096) ext logical physical expected length flags 0 0 33280 128 1 128 67456 33407 128 eof /mnt/mp1/file: 2 extents found result(apply this patch): # filefrag -v /mnt/mp1/file Filesystem type is: ef53 File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096) ext logical physical expected length flags 0 0 66560 128 1 128 67456 66687 128 eof /mnt/mp1/file: 2 extents found Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-01-10 12:12:28 -05:00
Theodore Ts'o	a107e5a3a4	Merge branch 'next' into upstream-merge Conflicts: fs/ext4/inode.c fs/ext4/mballoc.c include/trace/events/ext4.h	2010-10-27 23:44:47 -04:00
Kazuya Mio	a6371b636f	ext4: fix compile error in ext4_fallocate() When I compiled 2.6.36-rc3 kernel with EXT4FS_DEBUG definition, I got the following compile error. CC [M] fs/ext4/extents.o fs/ext4/extents.c: In function 'ext4_fallocate': fs/ext4/extents.c:3772: error: 'block' undeclared (first use in this function) fs/ext4/extents.c:3772: error: (Each undeclared identifier is reported only once fs/ext4/extents.c:3772: error: for each function it appears in.) make[2]: *** [fs/ext4/extents.o] Error 1 The patch fixes this problem. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:15 -04:00
Theodore Ts'o	bf89d16f6e	ext4: rename {ext,idx}_pblock and inline small extent functions Cleanup namespace leaks from fs/ext4 and the inline trivial functions ext4_{ext,idx}_pblock() and ext4_{ext,idx}_store_pblock() since the code size actually shrinks when we make these functions inline, they're so trivial. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:14 -04:00
Theodore Ts'o	1f109d5a17	ext4: make various ext4 functions be static These functions have no need to be exported beyond file context. No functions needed to be moved for this commit; just some function declarations changed to be static and removed from header files. (A similar patch was submitted by Eric Sandeen, but I wanted to handle code movement in separate patches to make sure code changes didn't accidentally get dropped.) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:14 -04:00
Theodore Ts'o	bd2d0210cf	ext4: use bio layer instead of buffer layer in mpage_da_submit_io Call the block I/O layer directly instad of going through the buffer layer. This should give us much better performance and scalability, as well as lowering our CPU utilization when doing buffered writeback. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:10 -04:00
Lukas Czerner	2407518de6	ext4: use sb_issue_zeroout in ext4_ext_zeroout Change ext4_ext_zeroout to use sb_issue_zeroout instead of its own approach to zero out extents. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:06 -04:00
Theodore Ts'o	58590b06d7	ext4: fix EOFBLOCKS_FL handling It turns out we have several problems with how EOFBLOCKS_FL is handled. First of all, there was a fencepost error where we were not clearing the EOFBLOCKS_FL when fill in the last uninitialized block, but rather when we allocate the next block _after_ the uninitalized block. Secondly we were not testing to see if we needed to clear the EOFBLOCKS_FL when writing to the file O_DIRECT or when were converting an uninitialized block (which is the most common case). Google-Bug-Id: 2928259 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:23:12 -04:00
Linus Torvalds	09dc942c2a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: Adding error check after calling ext4_mb_regular_allocator() ext4: Fix dirtying of journalled buffers in data=journal mode ext4: re-inline ext4_rec_len_(to\|from)_disk functions jbd2: Remove t_handle_lock from start_this_handle() jbd2: Change j_state_lock to be a rwlock_t jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop ext4: Add mount options in superblock ext4: force block allocation on quota_off ext4: fix freeze deadlock under IO ext4: drop inode from orphan list if ext4_delete_inode() fails ext4: check to make make sure bd_dev is set before dereferencing it jbd2: Make barrier messages less scary ext4: don't print scary messages for allocation failures post-abort ext4: fix EFBIG edge case when writing to large non-extent file ext4: fix ext4_get_blocks references ext4: Always journal quota file modifications ext4: Fix potential memory leak in ext4_fill_super ext4: Don't error out the fs if the user tries to make a file too big ext4: allocate stripe-multiple IOs on stripe boundaries ext4: move aio completion after unwritten extent conversion ... Fix up conflicts in fs/ext4/inode.c as per Ted. Fix up xfs conflicts as per earlier xfs merge.	2010-08-07 13:03:53 -07:00
Theodore Ts'o	c398eda0e4	ext4: Pass line numbers to ext4_error() and friends Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-07-27 11:56:40 -04:00
Jiri Kosina	f1bbbb6912	Merge branch 'master' into for-next	2010-06-16 18:08:13 +02:00
Uwe Kleine-König	421f91d21a	fix typos concerning "initiali[zs]e" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-06-16 18:05:05 +02:00
Andi Kleen	5a0790c2c4	ext4: remove initialized but not read variables No real bugs found, just removed some dead code. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-14 13:28:03 -04:00
Theodore Ts'o	60e6679e28	ext4: Drop whitespace at end of lines This patch was generated using: #!/usr/bin/perl -i while (<>) { s/[ ]+$//; print; } Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-17 07:00:00 -04:00
Dmitry Monakhov	0617b83fa2	ext4: restart ext4_ext_remove_space() after transaction restart If i_data_sem was internally dropped due to transaction restart, it is necessary to restart path look-up because extents tree was possibly modified by ext4_get_block(). https://bugzilla.kernel.org/show_bug.cgi?id=15827 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz>	2010-05-17 01:00:00 -04:00
Theodore Ts'o	786ec7915e	ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted Dimitry Monakhov discovered an edge case where it was possible for the EXT4_EOFBLOCKS_FL flag could get cleared unnecessarily. This is true; I have a test case that can be exercised via downloading and decompressing the file: wget ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-testcases/eofblocks-fl-test-case.img.bz2 bunzip2 eofblocks-fl-test-case.img dd if=/dev/zero of=eofblocks-fl-test-case.img bs=1k seek=17925 bs=1k count=1 conv=notrunc However, triggering it in real life is highly unlikely since it requires an extremely fragmented sparse file with a hole in exactly the right place in the extent tree. (It actually took quite a bit of work to generate this test case.) Still, it's nice to get even extreme corner cases to be correct, so this patch makes sure that we don't clear the EXT4_EOFBLOCKS_FL incorrectly even in this corner case. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-17 00:00:00 -04:00
Theodore Ts'o	f70f362b4a	ext4: Avoid crashing on NULL ptr dereference on a filesystem error If the EOFBLOCK_FL flag is set when it should not be and the inode is zero length, then eh_entries is zero, and ex is NULL, so dereferencing ex to print ex->ee_block causes a kernel OOPS in ext4_ext_map_blocks(). On top of that, the error message which is printed isn't very helpful. So we fix this by printing something more explanatory which doesn't involve trying to print ex->ee_block. Addresses-Google-Bug: #2655740 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 23:00:00 -04:00
Dmitry Monakhov	12e9b89200	ext4: Use bitops to read/modify i_flags in struct ext4_inode_info At several places we modify EXT4_I(inode)->i_flags without holding i_mutex (ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_flags. So convert handling of i_flags to use bitops which are atomic. https://bugzilla.kernel.org/show_bug.cgi?id=15792 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 22:00:00 -04:00
Theodore Ts'o	24676da469	ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE() EXT4_ERROR_INODE() tends to provide better error information and in a more consistent format. Some errors were not even identifying the inode or directory which was corrupted, which made them not very useful. Addresses-Google-Bug: #2507977 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 21:00:00 -04:00
Theodore Ts'o	2ed886852a	ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks() This saves a huge amount of stack space by avoiding unnecesary struct buffer_head's from being allocated on the stack. In addition, to make the code easier to understand, collapse and refactor ext4_get_block(), ext4_get_block_write(), noalloc_get_block_write(), into a single function. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 20:00:00 -04:00
Theodore Ts'o	e35fd6609b	ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks() Jack up ext4_get_blocks() and add a new function, ext4_map_blocks() which uses a much smaller structure, struct ext4_map_blocks which is 20 bytes, as opposed to a struct buffer_head, which nearly 5 times bigger on an x86_64 machine. By switching things to use ext4_map_blocks(), we can save stack space by using ext4_map_blocks() since we can avoid allocating a struct buffer_head on the stack. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 19:00:00 -04:00
Nikanth Karthikesan	6d19c42b7c	ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit and create a file larger than the limit. Add a check for that. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Amit Arora <aarora@in.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 14:00:00 -04:00
Dmitry Monakhov	21ca087a38	ext4: Do not zero out uninitialized extents beyond i_size The extents code will sometimes zero out blocks and mark them as initialized instead of splitting an extent into several smaller ones. This optimization however, causes problems if the extent is beyond i_size because fsck will complain if there are uninitialized blocks after i_size as this can not be distinguished from an inode that has an incorrect i_size field. https://bugzilla.kernel.org/show_bug.cgi?id=15742 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 06:00:00 -04:00
Jing Zhang	b720303df7	ext4: fix memory leaks in error path handling of ext4_ext_zeroout() When EIO occurs after bio is submitted, there is no memory free operation for bio, which results in memory leakage. And there is also no check against bio_alloc() for bio. Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-12 00:00:00 -04:00
Curt Wohlgemuth	fd2dd9fbaf	ext4: Fix buffer head leaks after calls to ext4_get_inode_loc() Calls to ext4_get_inode_loc() returns with a reference to a buffer head in iloc->bh. The callers of this function in ext4_write_inode() when in no journal mode and in ext4_xattr_fiemap() don't release the buffer head after using it. Addresses-Google-Bug: #2548165 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-04-03 17:44:16 -04:00
Akinobu Mita	731eb1a03a	ext4: consolidate in_range() definitions There are duplicate macro definitions of in_range() in mballoc.h and balloc.c. This consolidates these two definitions into ext4.h, and changes extents.c to use in_range() as well. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger@sun.com>	2010-03-03 23:55:01 -05:00
Akinobu Mita	5661bd6861	ext4: cleanup to use ext4_group_first_block_no() This is a cleanup and simplification patch which takes some open-coded calculations to calculate the first block number of a group and converts them to use the (already defined) ext4_group_first_block_no() function. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger@sun.com>	2010-03-03 23:53:39 -05:00
Frank Mayhar	273df556b6	ext4: Convert BUG_ON checks to use ext4_error() instead Convert a bunch of BUG_ONs to emit a ext4_error() message and return EIO. This is a first pass and most notably does _not_ cover mballoc.c, which is a morass of void functions. Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-02 11:46:09 -05:00
Jiaying Zhang	744692dc05	ext4: use ext4_get_block_write in buffer write Allocate uninitialized extent before ext4 buffer write and convert the extent to initialized after io completes. The purpose is to make sure an extent can only be marked initialized after it has been written with new data so we can safely drop the i_mutex lock in ext4 DIO read without exposing stale data. This helps to improve multi-thread DIO read performance on high-speed disks. Skip the nobh and data=journal mount cases to make things simple for now. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-04 16:14:02 -05:00
Jiaying Zhang	c7064ef13b	ext4: mechanical rename some of the direct I/O get_block's identifiers This commit renames some of the direct I/O's block allocation flags, variables, and functions introduced in Mingming's "Direct IO for holes and fallocate" patches so that they can be used by ext4's buffered write path as well. Also changed the related function comments accordingly to cover both direct write and buffered write cases. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-02 13:28:44 -05:00
Jiaying Zhang	c8d46e41bc	ext4: Add flag to files with blocks intentionally past EOF fallocate() may potentially instantiate blocks past EOF, depending on the flags used when it is called. e2fsck currently has a test for blocks past i_size, and it sometimes trips up - noticeably on xfstests 013 which runs fsstress. This patch from Jiayang does fix it up - it (along with e2fsprogs updates and other patches recently from Aneesh) has survived many fsstress runs in a row. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-24 09:52:53 -05:00
Leonard Michlmayr	aca92ff6f5	ext4: correctly calculate number of blocks for fiemap ext4_fiemap() rounds the length of the requested range down to blocksize, which is is not the true number of blocks that cover the requested region. This problem is especially impressive if the user requests only the first byte of a file: not a single extent will be reported. We fix this by calculating the last block of the region and then subtract to find the number of blocks in the extents. Signed-off-by: Leonard Michlmayr <leonard.michlmayr@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-04 17:07:28 -05:00
Eric Sandeen	12062dddda	ext4: move __func__ into a macro for ext4_warning, ext4_error Just a pet peeve of mine; we had a mishash of calls with either __func__ or "function_name" and the latter tends to get out of sync. I think it's easier to just hide the __func__ in a macro, and it'll be consistent from then on. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-15 14:19:27 -05:00
Theodore Ts'o	19f5fb7ad6	ext4: Use bitops to read/modify EXT4_I(inode)->i_state At several places we modify EXT4_I(inode)->i_state without holding i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage, ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_state. So convert handling of i_state to use bitops which are atomic. Cc: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-01-24 14:34:07 -05:00
Eric Sandeen	a1de02dccf	ext4: fix async i/o writes beyond 4GB to a sparse file The "offset" member in ext4_io_end holds bytes, not blocks, so ext4_lblk_t is wrong - and too small (u32). This caused the async i/o writes to sparse files beyond 4GB to fail when they wrapped around to 0. Also fix up the type of arguments to ext4_convert_unwritten_extents(), it gets ssize_t from ext4_end_aio_dio_nolock() and ext4_ext_direct_IO(). Reported-by: Giel de Nijs <giel@vectorwise.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2010-02-04 23:58:38 -05:00
Aneesh Kumar K.V	1296cc85c2	ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag We should update reserve space if it is delalloc buffer and that is indicated by EXT4_GET_BLOCKS_DELALLOC_RESERVE flag. So use EXT4_GET_BLOCKS_DELALLOC_RESERVE in place of EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2010-01-15 01:27:59 -05:00
Aneesh Kumar K.V	5f634d064c	ext4: Fix quota accounting error with fallocate When we fallocate a region of the file which we had recently written, and which is still in the page cache marked as delayed allocated blocks we need to make sure we don't do the quota update on writepage path. This is because the needed quota updated would have already be done by fallocate. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2010-01-25 04:00:31 -05:00
Theodore Ts'o	9d0be50230	ext4: Calculate metadata requirements more accurately In the past, ext4_calc_metadata_amount(), and its sub-functions ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount() badly over-estimated the number of metadata blocks that might be required for delayed allocation blocks. This didn't matter as much when functions which managed the reserved metadata blocks were more aggressive about dropping reserved metadata blocks as delayed allocation blocks were written, but unfortunately they were too aggressive. This was fixed in commit 0637c6f, but as a result the over-estimation by ext4_calc_metadata_amount() would lead to reserving 2-3 times the number of pending delayed allocation blocks as potentially required metadata blocks. So if there are 1 megabytes of blocks which have been not yet been allocation, up to 3 megabytes of space would get reserved out of the user's quota and from the file system free space pool until all of the inode's data blocks have been allocated. This commit addresses this problem by much more accurately estimating the number of metadata blocks that will be required. It will still somewhat over-estimate the number of blocks needed, since it must make a worst case estimate not knowing which physical blocks will be needed, but it is much more accurate than before. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-01-01 02:41:30 -05:00
Aneesh Kumar K.V	515f41c33a	ext4: Ensure zeroout blocks have no dirty metadata This fixes a bug (found by Curt Wohlgemuth) in which new blocks returned from an extent created with ext4_ext_zeroout() can have dirty metadata still associated with them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-29 23:39:06 -05:00
Surbhi Palande	034fb4c95f	ext4: replace BUG() with return -EIO in ext4_ext_get_blocks This patch fixes the Kernel BZ #14286. When the address of an extent corresponding to a valid block is corrupted, a -EIO should be reported instead of a BUG(). This situation should not normally not occur except in the case of a corrupted filesystem. If however it does, then the system should not panic directly but depending on the mount time options appropriate action should be taken. If the mount options so permit, the I/O should be gracefully aborted by returning a -EIO. http://bugzilla.kernel.org/show_bug.cgi?id=14286 Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-14 09:53:52 -05:00
Theodore Ts'o	fab3a549e2	ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) Fix the following potential circular locking dependency between mm->mmap_sem and ei->i_data_sem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-04115-gec044c5 #37 ------------------------------------------------------- ureadahead/1855 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac but task is already holding lock: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem){++++..}: [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81516633>] down_read+0x51/0x84 [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5 [<ffffffff811a3453>] ext4_get_block+0xab/0xef [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d [<ffffffff81155360>] mpage_readpages+0xd0/0x114 [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc [<ffffffff810f86f2>] ra_submit+0x21/0x25 [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c [<ffffffff81107b97>] __do_fault+0x55/0x3a2 [<ffffffff81109db0>] handle_mm_fault+0x327/0x734 [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa [<ffffffff81518205>] page_fault+0x25/0x30 [<ffffffff812a34d8>] clear_user+0x38/0x3c [<ffffffff81167e16>] padzero+0x20/0x31 [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff81166d64>] load_script+0x1b8/0x1cc [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff8113255f>] do_execve+0x1ce/0x2cf [<ffffffff81027494>] sys_execve+0x43/0x5a [<ffffffff8102918a>] stub_execve+0x6a/0xc0 -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by ureadahead/1855: #0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 stack backtrace: Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37 Call Trace: [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7 [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff8102f229>] ? sched_clock+0x9/0xd [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95 [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-09 21:30:02 -05:00
Jan Kara	b436b9bef8	ext4: Wait for proper transaction commit on fsync We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 23:51:10 -05:00
Dmitry Monakhov	5aca07eb7d	ext4: quota macros cleanup Currently all quota block reservation macros contains hard-coded "2" aka MAXQUOTAS value. This is no good because in some places it is not obvious to understand what does this digit represent. Let's introduce new macro with self descriptive name. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 22:42:15 -05:00
Theodore Ts'o	e6362609b6	ext4: call ext4_forget() from ext4_free_blocks() Add the facility for ext4_forget() to be called from ext4_free_blocks(). This simplifies the code in a large number of places, and centralizes most of the work of calling ext4_forget() into a single place. Also fix a bug in the extents migration code; it wasn't calling ext4_forget() when releasing the indirect blocks during the conversion. As a result, if the system cashed during or shortly after the extents migration, and the released indirect blocks get reused as data blocks, the journal replay would corrupt the data blocks. With this new patch, fixing this bug was as simple as adding the EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>	2009-11-23 07:17:05 -05:00
Theodore Ts'o	5068969686	ext4: make sure directory and symlink blocks are revoked When an inode gets unlinked, the functions ext4_clear_blocks() and ext4_remove_blocks() call ext4_forget() for all the buffer heads corresponding to the deleted inode's data blocks. If the inode is a directory or a symlink, the is_metadata parameter must be non-zero so ext4_forget() will revoke them via jbd2_journal_revoke(). Otherwise, if these blocks are reused for a data file, and the system crashes before a journal checkpoint, the journal replay could end up corrupting these data blocks. Thanks to Curt Wohlgemuth for pointing out potential problems in this area. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-11-23 07:17:34 -05:00
Mingming	ba230c3f6d	ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O To prepare for a direct I/O write, we need to split the unwritten extents before submitting the I/O. When no extents needed to be split, ext4_split_unwritten_extents() was incorrectly returning 0 instead of the size of uninitialized extents. This bug caused the wrong return value sent back to VFS code when it gets called from async IO path, leading to an unnecessary fall back to buffered IO. This bug also hid the fact that the check to see whether or not a split would be necessary was incorrect; we can only skip splitting the extent if the write completely covers the uninitialized extent. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-06 04:01:23 -05:00
Mingming	5f5249507e	ext4: skip conversion of uninit extents after direct IO if there isn't any At the end of direct I/O operation, ext4_ext_direct_IO() always called ext4_convert_unwritten_extents(), regardless of whether there were any unwritten extents involved in the I/O or not. This commit adds a state flag so that ext4_ext_direct_IO() only calls ext4_convert_unwritten_extents() when necessary. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-10 10:48:04 -05:00

1 2 3 4

172 Commits