linux

Commit Graph

Author	SHA1	Message	Date
Fabian Frederick	27ff6a0f7f	GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:15:14 +01:00
Benjamin Marzinski	24972557b1	GFS2: remove transaction glock GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-05-14 10:04:34 +01:00
Joe Perches	d77d1b58aa	GFS2: Use pr_<level> more consistently Add pr_fmt, remove embedded "GFS2: " prefixes. This now consistently emits lower case "gfs2: " for each message. Other miscellanea around these changes: o Add missing newlines o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-07 09:30:51 +00:00
Fabian Frederick	fc554ed3d8	GFS2: global conversion to pr_foo() -All printk(KERN_foo converted to pr_foo(). -Messages updated to fit in 80 columns. -fs_macros converted as well. -fs_printk removed. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-06 17:34:06 +00:00
Rashika Kheria	c2b0b30edd	GFS2: Mark functions as static in gfs2/rgrp.c Mark functions as static in gfs2/rgrp.c because they are not used outside this file. This eliminates the following warning in gfs2/rgrp.c: fs/gfs2/rgrp.c:1092:5: warning: no previous prototype for ‘gfs2_rgrp_bh_get’ [-Wmissing-prototypes] fs/gfs2/rgrp.c:1157:5: warning: no previous prototype for ‘update_rgrp_lvb’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-02-10 12:29:16 +00:00
Steven Whitehouse	b2c8b3ea87	GFS2: Allocate block for xattr at inode alloc time, if required This is another step towards improving the allocation of xattr blocks at inode allocation time. Here we take advantage of Christoph's recent work on ACLs to allocate a block for the xattrs early if we know that we will be adding ACLs to the inode later on. The advantage of that is that it is much more likely that we'll get a contiguous run of two blocks where the first is the inode and the second is the xattr block. We still have to fall back to the original system in case we don't get the requested two contiguous blocks, or in case the ACLs are too large to fit into the block. Future patches will move more of the ACL setting code further up the gfs2_inode_create() function. Also, I'd like to be able to do the same thing with the xattrs from LSMs in due course, too. That way we should be able to slowly reduce the number of independent transactions, at least in the most common cases. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-02-04 15:45:11 +00:00
Bob Peterson	8b127d0494	GFS2: Small cleanup This is a small cleanup to function gfs2_rgrp_go_lock so that it uses rgd instead of its more complicated twin. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-16 14:22:12 +00:00
Steven Whitehouse	ac3beb6a5d	GFS2: Don't use ENOBUFS when ENOMEM is the correct error code Al Viro has tactfully pointed out that we are using the incorrect error code in some cases. This patch fixes that, and also removes the (unused) return value for glock dumping. > * gfs2_iget() - ENOBUFS instead of ENOMEM. ENOBUFS is > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since > we don't support STREAMS it's probably fair game, but... what the hell? Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>	2014-01-16 10:31:13 +00:00
Steven Whitehouse	7005c3e4ae	GFS2: Use range based functions for rgrp sync/invalidation Each rgrp header is represented as a single extent on disk, so we can calculate the position within the address space, since we are using address spaces mapped 1:1 to the disk. This means that it is possible to use the range based versions of filemap_fdatawrite/wait and for invalidating the page cache. Our eventual intent is to then be able to merge the address spaces used for rgrps into a single address space, rather than to have one for each glock, saving memory and reducing complexity. Since during umount, the rgrp structures are disposed of before the glocks, we need to store the extent information in the glock so that is is available for a final invalidation. This patch uses a field which is otherwise unused in rgrp glocks to do that, so that we do not have to expand the size of a glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 10:00:31 +00:00
Steven Whitehouse	7de41d36ff	GFS2: Remove test which is always true Since gfs2_inplace_reserve() is always called with a valid alloc parms structure, there is no need to test for this within the function itself - and in any case, after we've all ready dereferenced it anyway. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 09:59:30 +00:00
Bob Peterson	5ea5050cec	GFS2: Implement a "rgrp has no extents longer than X" scheme With the preceding patch, we started accepting block reservations smaller than the ideal size, which requires a lot more parsing of the bitmaps. To reduce the amount of bitmap searching, this patch implements a scheme whereby each rgrp keeps track of the point at this multi-block reservations will fail. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 09:58:08 +00:00
Bob Peterson	1330edbeaf	GFS2: Drop inadequate rgrps from the reservation tree This is just basically a resend of a patch I posted earlier. It didn't change from its original, except in diff offsets, etc: This patch fixes a bug in the GFS2 block allocation code. The problem starts if a process already has a multi-block reservation, but for some reason, another process disqualifies it from further allocations. For example, the other process might set on the GFS2_RDF_ERROR bit. The process holding the reservation jumps to label skip_rgrp, but that label comes after the code that removes the reservation from the tree. Therefore, the no longer usable reservation is not removed from the rgrp's reservations tree; it's lost. Eventually, the lost reservation causes the count of reserved blocks to get off, and eventually that causes a BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free) to trigger. This patch moves the call to after label skip_rgrp so that the disqualified reservation is properly removed from the tree, thus keeping the rgrp rd_reserved count sane. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 09:57:31 +00:00
Bob Peterson	5ce13431dd	GFS2: If requested is too large, use the largest extent in the rgrp Here is a second try at a patch I posted earlier, which also implements suggestions Steve made: Before this patch, GFS2 would keep searching through all the rgrps until it found one that had a chunk of free blocks big enough to satisfy the size hint, which is based on the file write size, regardless of whether the chunk was big enough to perform the write. However, when doing big writes there may not be a large enough chunk of free blocks in any rgrp, due to file system fragmentation. The largest chunk may be big enough to satisfy the write request, but it may not meet the ideal reservation size from the "size hint". The writes would slow to a crawl because every write would search every rgrp, then finally give up and default to a single-block write. In my case, performance would drop from 425MB/s to 18KB/s, or 24000 times slower. This patch basically makes it so that if we can't find a contiguous chunk of blocks big enough to satisfy the sizehint, we'll use the largest chunk of blocks we found that will still contain the write. It does so by keeping track of the largest run of blocks within the rgrp. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-01-03 09:57:02 +00:00
Al Viro	951b4bd553	gfs2: endianness misannotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-15 22:04:16 -05:00
Steven Whitehouse	9e07f2cb3d	GFS2: Speed up starting point selection for block allocation When setting the starting point for block allocation, there were calls to both gfs2_rbm_to_block() and gfs2_rbm_from_block() in the common case of there being an active reservation. The gfs2_rbm_from_block() function can be quite slow, and since the two conversions were effectively a no-op, it makes sense to avoid them entirely in this case. There is no functional change here, but the code should be a bit more efficient after this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-10-02 14:42:45 +01:00
Steven Whitehouse	7b9cff4671	GFS2: Add allocation parameters structure This patch adds a structure to contain allocation parameters with the intention of future expansion of this structure. The idea is that we should be able to add more information about the allocation in the future in order to allow the allocator to make a better job of placing the requests on-disk. There is no functional difference from applying this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-10-02 11:13:25 +01:00
Steven Whitehouse	af5c269799	GFS2: Clean up reservation removal The reservation for an inode should be cleared when it is truncated so that we can start again at a different offset for future allocations. We could try and do better than that, by resetting the search based on where the truncation started from, but this is only a first step. In addition, there are three callers of gfs2_rs_delete() but only one of those should really be testing the value of i_writecount. While we get away with that in the other cases currently, I think it would be better if we made that test specific to the one case which requires it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-27 12:49:33 +01:00
Bob Peterson	149ed7f51e	GFS2: new function gfs2_rbm_incr Since the previous patch eliminated bi in favor of bii, this follow-on patch needed to be adjusted accordingly. Here is the revised version. This patch adds a new function, gfs2_rbm_incr, which increments an rbm structure. This is more efficient than calling gfs2_rbm_to_block, incrementing, then calling gfs2_rbm_from_block. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-18 10:40:38 +01:00
Bob Peterson	e579ed4f44	GFS2: Introduce rbm field bii This is a respin of the original patch. As Steve pointed out, the introduction of field bii makes it easy to eliminate bi itself. This revised patch does just that, replacing bi with bii. This patch adds a new field to the rbm structure, called bii, which is an index into the array of bitmaps for an rgrp. This replaces *bi which was a pointer to the bitmap. This is being done for further optimizations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-18 10:39:53 +01:00
Bob Peterson	b870890519	GFS2: Do not reset flags on active reservations When we used try locks for rgrps on block allocations, it was important to clear the flags field so that we used a blocking hold on the glock. Now that we're not doing try locks, clearing flags is unnecessary, and a waste of time. In fact, it's probably doing the wrong thing because it clears the GL_SKIP bit that was set for the lvb tracking purposes. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-17 10:19:29 +01:00
Bob Peterson	7e230f5774	GFS2: introduce bi_blocks for optimization This patch introduces a new field in the bitmap structure called bi_blocks. Its purpose is to save us from constantly multiplying bi_len by the constant GFS2_NBBY. It also paves the way for more optimization in a future patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-17 10:15:13 +01:00
Bob Peterson	6aa7640f30	GFS2: optimize rbm_from_block wrt bi_start In function gfs2_rbm_from_block, it starts by checking if the block falls within the first bitmap. It does so by checking if the rbm's offset is less than (rbm->bi->bi_start + rbm->bi->bi_len) * GFS2_NBBY. However, the first bitmap will always have bi_start==0. Therefore this is an unnecessary calculation in a function that gets called billions of times. This patch removes the reference to bi_start. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-09-17 10:14:39 +01:00
Abhijith Das	6a98c333ed	GFS2: Fix fstrim boundary conditions This patch correctly distinguishes two boundary conditions: 1. When the given range is entire within the unaccounted space between two rgrps, and 2. The range begins beyond the end of the filesystem Also fix the unit of the returned value r.len (total trimming) to be in bytes instead of the (incorrect) 512 byte blocks With this patch, GFS2 passes multiple iterations of all the relevant xfstests (251, 260, 288) with different fs block sizes. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-06-19 21:41:26 +01:00
Bob Peterson	2b3dcf3581	GFS2: Increase i_writecount during gfs2_setattr_size This patch calls get_write_access in a few functions. This merely increases inode->i_writecount for the duration of the function. That will ensure that any file closes won't delete the inode's multi-block reservation while the function is running. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-06-03 16:38:58 +01:00
Bob Peterson	af21ca8ed5	GFS2: Use single-block reservations for directories This patch changes the multi-block allocation code, such that directory inodes only get a single block reserved in the bitmap. That way, the bitmaps are more tightly packed together, and there are fewer spans of free blocks for in-use block reservations. This means it takes less time to find a free span of blocks in the bitmap, which speeds things up. This increases the performance of some workloads by almost 2X. In Nate's mockup.py script (which does (1) create dir, (2) create dir in dir, (3) create file in that dir) the test executes in 23 steps rather than 43 steps, a 47% performance improvement. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-05-24 13:47:32 +01:00
Bob Peterson	20095218fb	GFS2: Remove vestigial parameter ip from function rs_deltree The functions that delete block reservations from the rgrp block reservations rbtree no longer use the ip parameter. This patch eliminates the parameter. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-08 08:41:04 +01:00
Steven Whitehouse	fd4b4e042c	GFS2: Clean up inode creation path This patch cleans up the inode creation code path in GFS2. After the Orlov allocator was merged, a number of potential improvements are now possible, and this is a first set of these. The quota handling is now updated so that it matches the point in the code where the allocation takes place. This means that the one exception in gfs2_alloc_blocks relating to quota is now no longer required, and we can use the generic code everywhere. In addition the call to figure out whether we need to allocate any extra blocks in order to add a directory entry is moved higher up gfs2_create_inode. This means that if it returns an error, we can deal with that at a stage where it is easier to handle that case. The returned status cannot change during the function since we hold an exclusive lock on the directory. Two calls to gfs2_rindex_update have been changed to one, again at the top of gfs2_create_inode to simplify error handling. The time stamps are also now initialised earlier in the creation process, this is gradually moving towards being able to remove the call to gfs2_refresh_inode in gfs2_inode_create once we have all the fields covered. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-08 08:39:56 +01:00
Bob Peterson	b2c87cae0e	GFS2: Issue discards in 512b sectors This patch changes GFS2's discard issuing code so that it calls function sb_issue_discard rather than blkdev_issue_discard. The code was calling blkdev_issue_discard and specifying the correct sector offset and sector size, but blkdev_issue_discard expects these values to be in terms of 512 byte sectors, even if the native sector size for the device is different. Calling sb_issue_discard with the BLOCK size instead ensures the correct block-to-512b-sector translation. I verified that "minlen" is specified in blocks, so comparing it to a number of blocks is correct. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-05 17:55:13 +01:00
Wei Yongjun	441362d06b	GFS2: return error if malloc failed in gfs2_rs_alloc() The error code in gfs2_rs_alloc() is set to ENOMEM when error but never be used, instead, gfs2_rs_alloc() always return 0. Fix to return 'error'. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-04 09:53:10 +01:00
Linus Torvalds	d895cb1af1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle and ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...	2013-02-26 20:16:07 -08:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Steven Whitehouse	350a9b0a72	GFS2: Split gfs2_trans_add_bh() into two There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-29 10:28:04 +00:00
Bob Peterson	13d2eb0129	GFS2: Reset rd_last_alloc when it reaches the end of the rgrp In function rg_mblk_search, it's searching for multiple blocks in a given state (e.g. "free"). If there's an active block reservation its goal is the next free block of that. If the resource group contains the dinode's goal block, that's used for the search. But if neither is the case, it uses the rgrp's last allocated block. That way, consecutive allocations appear after one another on media. The problem comes in when you hit the end of the rgrp; it would never start over and search from the beginning. This became a problem, since if you deleted all the files and data from the rgrp, it would never start over and find free blocks. So it had to keep searching further out on the media to allocate blocks. This patch resets the rd_last_alloc after it does an unsuccessful search at the end of the rgrp. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-02 10:05:27 +00:00
Bob Peterson	15bd50ad82	GFS2: Stop looking for free blocks at end of rgrp This patch adds a return code check after calling function gfs2_rbm_from_block while determining the free extent size. That way, when the end of an rgrp is reached, it won't try to process unaligned blocks after the end. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-02 10:05:10 +00:00
Abhijith Das	f1213cacc7	GFS2: Fix race in gfs2_rs_alloc QE aio tests uncovered a race condition in gfs2_rs_alloc where it's possible to come out of the function with a valid ip->i_res allocation but it gets freed before use resulting in a NULL ptr dereference. This patch envelopes the initial short-circuit check for non-NULL ip->i_res into the mutex lock. With this patch, I was able to successfully run the reproducer test multiple times. Resolves: rhbz#878476 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-02 10:04:53 +00:00
David Teigland	4e2f8849de	GFS2: remove redundant lvb pointer The lksb struct already contains a pointer to the lvb, so another directly from the glock struct is not needed. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-15 10:17:22 +00:00
Steven Whitehouse	aa8920c968	GFS2: Fix one RG corner case For filesystems with only a single resource group, we need to be careful that the allocation loop will not land up with a NULL resource group. This fixes a bug in a previous patch where the gfs2_rgrpd_get_next() function was being used instead of gfs2_rgrpd_get_first() Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-13 14:50:35 +00:00
Steven Whitehouse	9dbe9610b9	GFS2: Add Orlov allocator Just like ext3, this works on the root directory and any directory with the +T flag set. Also, just like ext3, any subdirectory created in one of the just mentioned cases will be allocated to a random resource group (GFS2 equivalent of a block group). If you are creating a set of directories, each of which will contain a job running on a different node, then by setting +T on the parent directory before creating the subdirectories, each will land up in a different resource group, and thus resource group contention between nodes will be kept to a minimum. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:33:17 +00:00
Steven Whitehouse	bcd97c0630	GFS2: Add test for resource group congestion status This patch uses information gathered by the recent glock statistics patch in order to derrive a boolean verdict on the congestion status of a resource group. This is then used when making decisions on which resource group to choose during block allocation. The aim is to avoid resource groups which are heavily contended by other nodes, while still ensuring locality of access wherever possible. Once a reservation has been made in a particular resource group we continue to use that resource group until a new reservation is required. This should help to ensure that we do not change resource groups too often. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:32:21 +00:00
Bob Peterson	a68a0a352a	GFS2: Speed up gfs2_rbm_from_block This patch is a rewrite of function gfs2_rbm_from_block. Rather than looping to find the right bitmap, the code now does a few simple math calculations. I compared the performance of both algorithms side by side and the new algorithm is noticeably faster. Sample instrumentation output from a "fast" machine: 5 million calls: millisec spent: Orig: 166 New: 113 5 million calls: millisec spent: Orig: 189 New: 114 In addition, I ran postmark (on a somewhat slowr CPU) before the after the new algorithm was put in place and postmark showed a decent improvement: Before the new algorithm: ------------------------- Time: 645 seconds total 584 seconds of transactions (171 per second) Files: 150087 created (232 per second) Creation alone: 100000 files (2083 per second) Mixed with transactions: 50087 files (85 per second) 49995 read (85 per second) 49991 appended (85 per second) 150087 deleted (232 per second) Deletion alone: 100174 files (7705 per second) Mixed with transactions: 49913 files (85 per second) Data: 273.42 megabytes read (434.08 kilobytes per second) 852.13 megabytes written (1.32 megabytes per second) With the new algorithm: ----------------------- Time: 599 seconds total 530 seconds of transactions (188 per second) Files: 150087 created (250 per second) Creation alone: 100000 files (1886 per second) Mixed with transactions: 50087 files (94 per second) 49995 read (94 per second) 49991 appended (94 per second) 150087 deleted (250 per second) Deletion alone: 100174 files (6260 per second) Mixed with transactions: 49913 files (94 per second) Data: 273.42 megabytes read (467.42 kilobytes per second) 852.13 megabytes written (1.42 megabytes per second) Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 13:31:36 +00:00
Lukas Czerner	076f0faa76	GFS2: Fix FITRIM argument handling Currently implementation in gfs2 uses FITRIM arguments as it were in file system blocks units which is wrong. The FITRIM arguments (fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are actually in bytes. Moreover, check for start argument beyond the end of file system, len argument being smaller than file system block and minlen argument being bigger than biggest resource group were missing. This commit converts the code to convert FITRIM argument to file system blocks and also adds appropriate checks mentioned above. All the problems were recognised by xfstests 251 and 260. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 09:41:58 +00:00
Lukas Czerner	3a238adefb	GFS2: Require user to provide argument for FITRIM When the fstrim_range argument is not provided by user in FITRIM ioctl we should just return EFAULT and not promoting bad behaviour by filling the structure in kernel. Let the user deal with it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 09:41:37 +00:00
Andrew Price	cd0ed19fb6	GFS2: Fix possible null pointer deref in gfs2_rs_alloc Despite the return value from kmem_cache_zalloc() being checked, the error wasn't being returned until after a possible null pointer dereference. This patch returns the error immediately, allowing the removal of the error variable. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-07 09:40:39 +00:00
Bob Peterson	3701530aed	GFS2: Fix infinite loop in rbm_find This patch fixes an infinite loop in gfs2_rbm_find that was introduced by the previous patch. The problem occurred when the length was less than 3 but the rbm block was byte-aligned, causing it to improperly return a extent length of zero, which caused it to spin. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Barry Marson <bmarson@redhat.com>	2012-09-24 10:47:27 +01:00
Steven Whitehouse	ff7f4cb461	GFS2: Consolidate free block searching functions With the recently added block reservation code, an additional function was added to search for free blocks. This had a restriction of only being able to search for aligned extents of free blocks. As a result the allocation patterns when reserving blocks were suboptimal when the existing allocation of blocks for an inode was not aligned to the same boundary. This patch resolves that problem by adding the ability for gfs2_rbm_find to search for extents of a particular minimum size. We can then use gfs2_rbm_find for both looking for reservations, and also looking for free blocks on an individual basis when we actually come to do the allocation later on. As a result we only need a single set of code to deal with both situations. The function gfs2_rbm_from_block() is moved up rgrp.c so that it occurs before all of its callers. Many thanks are due to Bob for helping track down the final issue in this patch. That fix to the rb_tree traversal and to not share block reservations from a dirctory to its children is included here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2012-09-24 10:47:26 +01:00
Bob Peterson	0688a5ecea	GFS2: Stop block extents at the end of bitmaps This patch stops multiple block allocations if a nonzero return code is received from gfs2_rbm_from_block. Without this patch, if enough pressure is put on the file system, you get a kernel warning quickly followed by: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa04f47e8>] gfs2_alloc_blocks+0x2c8/0x880 [gfs2] With this patch, things run normally. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:23 +01:00
Steven Whitehouse	c743ffd09f	GFS2: Fix unclaimed_blocks() wrapping bug and clean up When rgd->rd_free_clone is less than rgd->rd_reserved, the unclaimed_blocks() calculation would wrap and produce incorrect results. This patch checks for this condition when this function is called from gfs2_mblk_search() In addition, the use of this particular function in other places in the code has been dropped by means of a general clean up of gfs2_inplace_reserve(). This function is now much easier to follow. Also the setting of the rgd->rd_last_alloc field is corrected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:21 +01:00
Steven Whitehouse	9e733d3923	GFS2: Improve block reservation tracing This patch improves the tracing of block reservations by removing some corner cases and also providing more useful detail in the traces. A new field is added to the reservation structure to contain the inode number. This is used since in certain contexts it is not possible to access the inode itself to obtain this information. As a result we can then display the inode number for all tracepoints and also in case we dump the resource group. The "del" tracepoint operation has been removed. This could be called with the reservation rgrp set to NULL. That resulted in not printing the device number, and thus making the information largely useless anyway. Also, the conditional on the rgrp being NULL can then be removed from the tracepoint. After this change, all the block reservation tracepoint calls will be called with the rgrp information. The existing ins,clm and tdel calls to the block reservation tracepoint are sufficient to track the entire life of the block reservation. In gfs2_block_alloc() the error detection is updated to print out the inode number of the problematic inode. This can then be compared against the information in the glock dump,tracepoints, etc. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:20 +01:00
Steven Whitehouse	137834a696	GFS2: Fall back to ignoring reservations, if there are no other blocks left When we get to the stage of allocating blocks, we know that the resource group in question must contain enough free blocks, otherwise gfs2_inplace_reserve() would have failed. So if we are left with only free blocks which are reserved, then we must use those. This can happen if another node has sneeked in and use some blocks reserved on this node, for example. Generally this will happen very rarely and only when the resouce group is nearly full. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:19 +01:00
Steven Whitehouse	3e6339dd28	GFS2: Use rbm for gfs2_setbit() Use the rbm structure for gfs2_setbit() in order to simplify the arguments to the function. We have to add a bool to control whether the clone bitmap should be updated (if it exists) but otherwise it is a more or less direct substitution. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-09-24 10:47:16 +01:00

1 2 3 4 5

218 Commits