We found a problem with buffer head reference leaks when using an ext4
partition without a journal. In particular, calls to ext4_forget() would
not to a brelse() on the input buffer head, which will cause pages they
belong to to not be reclaimable.
Further investigation showed that all places where ext4_journal_forget() and
ext4_journal_revoke() are called are subject to the same problem. The patch
below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
release of the buffer head when the journal handle isn't valid.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If there is no journal, ext4_should_writeback_data() should return
TRUE. This will fix ext4_set_aops() to set ext4_da_ops in the case of
delayed allocation; otherwise ext4_journaled_aops gets used by
default, which doesn't handle delayed allocation properly.
The advantage of using ext4_should_writeback_data() approach is that
it should handle nobh better as well.
Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
Kumar for suggesting this approach.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This mount option is largely superfluous, and in fact the way it was
implemented was buggy; if a filesystem which did not have the extents
feature flag was mounted -o extents, the filesystem would attempt to
create and use extents-based file even though the extents feature flag
was not eabled. The simplest thing to do is to nuke the mount option
entirely. It's not all that useful to force the non-creation of new
extent-based files if the filesystem can support it.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
A few weeks ago I posted a patch for discussion that allowed ext4 to run
without a journal. Since that time I've integrated the excellent
comments from Andreas and fixed several serious bugs. We're currently
running with this patch and generating some performance numbers against
both ext2 (with backported reservations code) and ext4 with and without
a journal. It just so happens that running without a journal is
slightly faster for most everything.
We did
iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
which creates 4 threads, each of which create and do reads and writes on
a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
to bypass the page cache. Results:
ext2 ext4, default ext4, no journal
initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s
rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s
reads 15.2 MB/s 16.9 MB/s 17.2 MB/s
re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s
random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s
random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s
So it seems that, so far, this was a useful exercise.
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When considering how many journal credits are needed for modifying a
chunk of data, we need to account for the super block, inode block,
quota blocks and xattr block, indirect/index blocks, also, group bitmap
and group descriptor blocks for new allocation (including data and
indirect/index blocks). There are many places in ext4 do the calculation
on their own and often missed one or two meta blocks, and often they
assume single block allocation, and did not considering the multile
chunk of allocation case.
This patch is trying to cleanup current journal credit code, provides
some common helper funtion to calculate the journal credits, to be used
for writepage, writepages, DIO, fallocate, migration, defrag, and for
both nonextent and extent files.
This patch modified the writepage/write_begin credit caculation for
nonextent files, to use the new helper function. It also fixed the
problem that writepage on nonextent files did not consider the case
blocksize <pagesize, thus could possibelly need multiple block
allocation in a single transaction.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch makes ext4 use inode-based implementation of data=ordered mode
in JBD2. It allows us to unify some data=ordered and data=writeback paths
(especially writepage since we don't have to start a transaction anymore)
and remove some buffer walking.
Updated fix from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
to fix file system hang due to corrupt jinode values.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Move ext4 headers out of include/linux. This is just the trivial move,
there's some more thing that could be done later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>