Commit Graph

2917 Commits

Author SHA1 Message Date
Zhi Yong Wu c2bfbc9b48 xfs: fix the comment of xfs_mountfs()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:53:07 -05:00
Zhi Yong Wu 2533787a43 xfs: fix the comment of xfs_sb_quiet_read_verify()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:51:49 -05:00
Zhi Yong Wu 8ba701ee9e xfs: fix the comment of xlog_recover_do_dquot_buffer()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:49:06 -05:00
Zhi Yong Wu 8e159e72e2 xfs: fix the comment of xfs_log_unmount_write()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:47:18 -05:00
Zhi Yong Wu 0b8182dba6 xfs: fix the comment of xfs_ifree_cluster()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:44:36 -05:00
Zhi Yong Wu 2f21ff1cca xfs: fix the comment of xfs_ialloc_ag_select()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:42:41 -05:00
Zhi Yong Wu b3c496343b xfs: fix the comment of xfs_extent_busy_update_extent()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:41:08 -05:00
Zhi Yong Wu 8b4ad79cc6 xfs: fix the comment of xfs_setsize_buftarg_early()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:40:39 -05:00
Zhi Yong Wu ad4809bf22 xfs: fix the comment of xfs_bmap_punch_delalloc_range()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:40:16 -05:00
Zhi Yong Wu 02bb4873db xfs: fix the comment of xfs_bmap_last_before()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:36:28 -05:00
Zhi Yong Wu a97f4df7b5 xfs: fix the comment of xfs_bmap_validate_ret()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:35:55 -05:00
Zhi Yong Wu 8be11e92b6 xfs: fix the comment of xfs_bmap_count_tree()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:35:32 -05:00
Zhi Yong Wu c7c1a7d8bb xfs: rename bio_add_buffer() to xfs_bio_add_buffer()
Follow up with xfs naming style.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:35:00 -05:00
Zhi Yong Wu 0a94da24b9 xfs: fix the comment of xlog_find_head()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:34:34 -05:00
Zhi Yong Wu 34be5ff378 xfs: fix the comment of xlog_recover_buffer_pass2()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:30:53 -05:00
Zhi Yong Wu 5c75390924 xfs: remove two unused macro definitions in xfs_linux.h
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:30:23 -05:00
Zhi Yong Wu 1cb9386354 xfs: fix the comment of xfs_btree_get_iroot()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:17:35 -05:00
Zhi Yong Wu f6c2734924 xfs: fix the comment of xfs_iroot_realloc()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:17:11 -05:00
Zhi Yong Wu 7c3e664051 xfs: remove one blank line in xfs_btree_make_block_unfull()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 15:16:07 -05:00
Zhi Yong Wu ac0e300fa5 xfs: fix the comment of xlog_write_setup_copy()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 14:59:31 -05:00
Zhi Yong Wu 99e738b783 xfs: fix the comment of xfs_mod_incore_sb_unlocked()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 14:59:05 -05:00
Zhi Yong Wu 49d3da149c xfs: fix the comment of xfs_btree_lookup()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 14:45:23 -05:00
Zhi Yong Wu b46fe8259b xfs: fix the comment of xfs_buf_free()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 14:44:39 -05:00
Zhi Yong Wu 0471f62e38 xfs: fix the comment of xfs_check_sizes()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-20 14:42:16 -05:00
Dave Chinner 2ad01f53dc xfs: use reference counts to free clean buffer items
When a transaction is cancelled and the buffer log item is clean in
the transaction, the buffer log item is unconditionally freed. If
the log item is in the AIL, however, this leads to a use after free
condition as the item still has other users.

In this case, xfs_buf_item_relse() should only be called on clean
buffer items if the reference count has dropped to zero. This
ensures only the last user frees the item.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-15 16:42:29 -05:00
Dwight Engen 8c567a7fab xfs: add capability check to free eofblocks ioctl
Check for CAP_SYS_ADMIN since the caller can truncate preallocated
blocks from files they do not own nor have write access to. A more
fine grained access check was considered: require the caller to
specify their own uid/gid and to use inode_permission to check for
write, but this would not catch the case of an inode not reachable
via path traversal from the callers mount namespace.

Add check for read-only filesystem to free eofblocks ioctl.

Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-15 14:25:01 -05:00
Dwight Engen b9fe505258 xfs: create internal eofblocks structure with kuid_t types
Have eofblocks ioctl convert uid_t to kuid_t into internal structure.
Update internal filter matching to compare ids with kuid_t types.

Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-15 14:24:10 -05:00
Dwight Engen 7aab1b2887 xfs: convert kuid_t to/from uid_t for internal structures
Use uint32 from init_user_ns for xfs internal uid/gid
representation in xfs_icdinode, xfs_dqid_t.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-15 14:22:40 -05:00
Dwight Engen fd5e2aa865 xfs: ioctl check for capabilities in the current user namespace
Use inode_capable() to check if SUID|SGID bits should be cleared to match
similar check in inode_change_ok().

The check for CAP_LINUX_IMMUTABLE was not modified since all other file
systems also check against init_user_ns rather than current_user_ns.

Only allow changing of projid from init_user_ns.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-15 14:19:25 -05:00
Dwight Engen 288bbe0eeb xfs: convert kuid_t to/from uid_t in ACLs
Change permission check for setting ACL to use inode_owner_or_capable()
which will additionally allow a CAP_FOWNER user in a user namespace to
be able to set an ACL on an inode covered by the user namespace mapping.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-15 14:18:31 -05:00
Dwight Engen c5eeb7ec3e xfs: create wrappers for converting kuid_t to/from uid_t
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-15 14:17:34 -05:00
Dave Chinner 4bb928cdb9 xfs: split the CIL lock
The xc_cil_lock is used for two purposes - to protect the CIL
itself, and to protect the push/commit state and lists. These are
two logically separate structures and operations, so can have their
own locks. This means that pushing on the CIL and the commit wait
ordering won't contend for a lock with other transactions that are
completing concurrently. As the CIL insertion is the hottest path
throught eh CIL, this is a big win.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13 16:21:21 -05:00
Dave Chinner 991aaf65ff xfs: Combine CIL insert and prepare passes
Now that all the log item preparation and formatting is done under
the CIL lock, we can get rid of the intermediate log vector chain
used to track items to be inserted into the CIL.

We can already find all the items to be committed from the
transaction handle, so as long as we attach the log vectors to the
item before we insert the items into the CIL, we don't need to
create a log vector chain to pass around.

This means we can move all the item insertion code into and optimise
it into a pair of simple passes across all the items in the
transaction. The first pass does the formatting and accounting, the
second inserts them all into the CIL.

We keep this two pass split so that we can separate the CIL
insertion - which must be done under the CIL spinlock - from the
formatting. We could insert each item into the CIL with a single
pass, but that massively increases the number of times we have to
grab the CIL spinlock. It is much more efficient (and hence
scalable) to do a batch operation and insert all objects in a single
lock grab.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13 16:20:09 -05:00
Dave Chinner f5baac354d xfs: avoid CIL allocation during insert
Now that we have the size of the log vector that has been allocated,
we can determine if we need to allocate a new log vector for
formatting and insertion. We only need to allocate a new vector if
it won't fit into the existing buffer.

However, we need to hold the CIL context lock while we do this so
that we can't race with a push draining the currently queued log
vectors. It is safe to do this as long as we do GFP_NOFS allocation
to avoid avoid memory allocation recursing into the filesystem.
Hence we can safely overwrite the existing log vector on the CIL if
it is large enough to hold all the dirty regions of the current
item.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13 16:19:03 -05:00
Dave Chinner 7492c5b42d xfs: Reduce allocations during CIL insertion
Now that we have the size of the object before the formatting pass
is called, we can allocation the log vector and it's buffer in a
single allocation rather than two separate allocations.

Store the size of the allocated buffer in the log vector so that
we potentially avoid allocation for future modifications of the
object.

While touching this code, remove the IOP_FORMAT definition.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13 16:12:30 -05:00
Dave Chinner 166d13688a xfs: return log item size in IOP_SIZE
To begin optimising the CIL commit process, we need to have IOP_SIZE
return both the number of vectors and the size of the data pointed
to by the vectors. This enables us to calculate the size ofthe
memory allocation needed before the formatting step and reduces the
number of memory allocations per item by one.

While there, kill the IOP_SIZE macro.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13 16:10:21 -05:00
Eric Sandeen 050a1952c3 xfs:free bp in xlog_find_tail() error path
xlog_find_tail() currently leaks a bp on one error path.

There is no error target, so manually free the bp before
returning the error.

Found by Coverity.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13 15:49:51 -05:00
Eric Sandeen 5d0a654974 xfs: free bp in xlog_find_zeroed() error path
xlog_find_zeroed() currently leaks a bp on one error path.

Using the bp_err: target resolves this.

Found by Coverity.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13 15:48:51 -05:00
Eric Sandeen 6dd93e9e5e xfs: avoid double-free in xfs_attr_node_addname
xfs_attr_node_addname()'s error handling tests whether it
should free "state" in the out: error handling label:

out:
        if (state)
                xfs_da_state_free(state);

but an earlier free doesn't set state to NULL afterwards; this
could lead to a double free.  Fix it by setting state to NULL
after it's freed.

This was found by Coverity.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13 15:48:01 -05:00
Jie Liu 2c2bcc0735 xfs: call roundup_64() to calculate the min_logblks
Replace roundup() with roundup_64() as we calculate min_logblks
with 64-bit divisions.  Hence, call roundup() will cause the
following error while compiling a 32-bit kernel:

fs/built-in.o: In function `xfs_log_calc_minimum_size':
fs/xfs/xfs_log_rlimit.c:140: undefined reference to `__udivdi3'

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13 14:19:11 -05:00
Jie Liu 3e7b91cf8c xfs: Validate log space at mount time
Validate log space during log mount stage, the underlying function
will drop a warning message via syslog in critical level if the log
space is too small or too large.

[ dchinner: For CRC enable filesystems, abort the mounting of the
filesystem as mkfs should never make a log too small for the given
filesystem configuration. ]

[ dchinner: make a note of the fact that the log size limits in
block counts are in units of filesystem blocks, not basic blocks. ]

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:50:35 -05:00
Jie Liu 5a96a94547 xfs: Add xfs_log_rlimit.c
Add source files for xfs_log_rlimit.c The new file is used for log
size calculations and validation shared with userspace.

[dchinner: xfs_log_calc_max_attrsetm_res() does not modify the
tr_attrsetm reservation, just calculates the maximum. ]

[dchinner: rework loop in xfs_log_get_max_trans_res() ]

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:49:38 -05:00
Jie Liu e773fc934f xfs: Refactor xfs_ticket_alloc() to extract a new helper
Refactor xlog_ticket_alloc() to extract a new helper, i.e.
xfs_log_calc_unit_res().

This helper would be used to calculate the total log reservation
size by adding extra log operation/transation headers for a new
log ticket.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:49:02 -05:00
Jie Liu f749278c5a xfs: Get rid of all XFS_XXX_LOG_RES() macro
Get rid of all XFS_XXX_LOG_RES() macros since they are obsoleted now.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:48:08 -05:00
Jie Liu 3d3c8b5222 xfs: refactor xfs_trans_reserve() interface
With the new xfs_trans_res structure has been introduced, the log
reservation size, log count as well as log flags are pre-initialized
at mount time.  So it's time to refine xfs_trans_reserve() interface
to be more neat.

Also, introduce a new helper M_RES() to return a pointer to the
mp->m_resv structure to simplify the input.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:47:34 -05:00
Jie Liu 783cb6d172 xfs: Make writeid transaction use tr_writeid
tr_writeid is defined at mp->m_resv structure, however, it does not
really being used when it should be..

This patch changes it to tr_writeid to fetch the correct log
reservation size.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:46:59 -05:00
Jie Liu 20996c9326 xfs: Introduce tr_fsyncts to m_reservation
A preparation step.

For now fsync_ts transaction use the pre-calculated log reservation
size of tr_swrite.  This patch introduce a new item tr_fsyncts to
mp->m_reservations structure so that we can fetch the log
reservation value for it in a same manner to others.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:46:29 -05:00
Jie Liu 0eadd10288 xfs: Introduce a new structure to hold transaction reservation items
Introduce a new structure xfs_trans_res to hold transaction
reservation item info per log ticket.

We also need to improve xfs_trans_resv_calc() by initializing the
log count as well as log flags for permanent log reservation.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:45:49 -05:00
Dave Chinner 9356fe22af xfs: make struct xfs_perag kernel only
The struct xfs_perag has many kernel-only definitions in it,
requiring a __KERNEL__ guard so userspace can use it to. Move it to
xfs_mount.h so that it it kernel-only, and let userspace redefine
it's own version of the structure containing only what it needs.
This gets rid of another __KERNEL__ check in the XFS header files.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:44:36 -05:00
Dave Chinner 4f3d71f68b xfs: move kernel specific type definitions to xfs.h
xfs_types.h is shared with userspace, so having kernel specific
types defined in it is problematic. Move all the kernel specific
defines to xfs_linux.h so we can remove the __KERNEL__ guards from
xfs_types.h.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:04:08 -05:00
Dave Chinner 9b90b0d9da xfs: xfs_filestreams.h doesn't need __KERNEL__
Because it is only used within the kernel.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 17:00:11 -05:00
Dave Chinner cb9eabff58 xfs: remove __KERNEL__ check from xfs_dir2_leaf.c
It's actually an ifndef section, which means it is only included in
userspace. however, it's deep within the libxfs code, so it's
unlikely that the condition checked in userspace can actually occur
(search an empty leaf) through the libxfs interfaces. i.e. if it can
happen in usrspace, it can happen in the kernel, so remove it from
userspace too....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:59:14 -05:00
Dave Chinner b49a0c1883 xfs: remove __KERNEL__ from debug code
There is no reason the remaining kernel-only debug code needs to
remain kernel-only. Kill the __KERNEL__ part of the defines, and let
userspace handle the debug code appropriately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:58:37 -05:00
Dave Chinner 63d20d6e36 xfs: kill __KERNEL__ check for debug code in allocation code
Userspace running debug builds is relatively rare, so there's need
to special case the allocation algorithm code coverage debug switch.
As it is, userspace defines random numbers to 0, so  invert the
logic of the switch so it is effectively a no-op in userspace.
This kills another couple of __KERNEL__ users.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:57:51 -05:00
Dave Chinner 94b406091b xfs: don't special case shared superblock mounts
Neither kernel or userspace support shared read-only mounts, so
don't bother special casing the support check to be different
between kernel and userspace. The same check can be used as neither
like it...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:57:16 -05:00
Dave Chinner a133d952b4 xfs: consolidate extent swap code
So we don't need xfs_dfrag.h in userspace anymore, move the extent
swap ioctl structure definition to xfs_fs.h where most of the other
ioctl structure definitions are.

Now that we don't need separate files for extent swapping, separate
the basic file descriptor checking code to xfs_ioctl.c, and the code
that does the extent swap operation to xfs_bmap_util.c.  This
cleanly separates the user interface code from the physical
mechanism used to do the extent swap.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:56:06 -05:00
Dave Chinner e546cb79ef xfs: consolidate xfs_utils.c
There are a few small helper functions in xfs_util, all related to
xfs_inode modifications. Move them all to xfs_inode.c so all
xfs_inode operations are consiolidated in the one place.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:55:17 -05:00
Dave Chinner f6bba2017a xfs: consolidate xfs_rename.c
Move the rename code to xfs_inode.c to continue consolidating
all the kernel xfs_inode operations in the one place.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:54:09 -05:00
Dave Chinner c24b5dfadc xfs: kill xfs_vnodeops.[ch]
Now we have xfs_inode.c for holding kernel-only XFS inode
operations, move all the inode operations from xfs_vnodeops.c to
this new file as it holds another set of kernel-only inode
operations. The name of this file traces back to the days of Irix
and it's vnodes which we don't have anymore.

Essentially this move consolidates the inode locking functions
and a bunch of XFS inode operations into the one file. Eventually
the high level functions will be merged into the VFS interface
functions in xfs_iops.c.

This leaves only internal preallocation, EOF block manipulation and
hole punching functions in vnodeops.c. Move these to xfs_bmap_util.c
where we are already consolidating various in-kernel physical extent
manipulation and querying functions.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:53:39 -05:00
Dave Chinner 836a94ad59 xfs: fix issues that cause userspace warnings
Some of the code shared with userspace causes compilation warnings
from things turned off in the kernel code, such as differences in
variable signedness. Fix those issues.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:52:54 -05:00
Dave Chinner c5c249b424 xfs: minor cleanups
These come from syncing the shared userspace and kernel code. Small
whitespace and trivial cleanups.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:46:08 -05:00
Dave Chinner 6898811459 xfs: create xfs_bmap_util.[ch]
There is a bunch of code in xfs_bmap.c that is kernel specific and
not shared with userspace. To minimise the difference between the
kernel and userspace code, shift this unshared code to
xfs_bmap_util.c, and the declarations to xfs_bmap_util.h.

The biggest issue here is xfs_bmap_finish() - userspace has it's own
definition of this function, and so we need to move it out of
xfs_bmap.[ch]. This means several other files need to include
xfs_bmap_util.h as well.

It also introduces and interesting dance for the stack switching
code in xfs_bmapi_allocate(). The stack switching/workqueue code is
actually moved to xfs_bmap_util.c, so that userspace can simply use
a #define in a header file to connect the dots without needing to
know about the stack switch code at all.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:45:17 -05:00
Dave Chinner ff55068c20 xfs: introduce xfs_sb.c for sharing with libxfs
xfs_mount.c is shared with userspace, but the only functions that
are shared are to do with physical superblock manipulations. This
means that less than 25% of the xfs_mount.c code is actually shared
with userspace. Move all the superblock functions to xfs_sb.c and
share that instead with libxfs.

Note that this will leave all the in-core transaction related
superblock counter modifications in xfs_mount.c as none of that is
shared with userspace. With a few more small changes, xfs_mount.h
won't need to be shared with userspace anymore, either.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:44:11 -05:00
Dave Chinner 1fb7e48db6 xfs: split out the remote symlink handling
The remote symlink format definition and manipulation needs to be
shared with userspace, but the in-kernel interfaces do not. Split
the remote symlink format handling out into xfs_symlink_remote.[ch]
fo it can easily be shared with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:43:38 -05:00
Dave Chinner fde2227ce1 xfs: split out attribute fork truncation code into separate file
The attribute inactivation code is not used by userspace, so like
the attribute listing, split it out into a separate file to minimise
the differences between the filesystem shared with libxfs in
userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:42:30 -05:00
Dave Chinner abec5f2bf9 xfs: split out attribute listing code into separate file
The attribute listing code is not used by userspace, so like the
directory readdir code, split it out into a separate file to
minimise the differences between the filesystem shared with libxfs
in userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:41:29 -05:00
Dave Chinner 2b9ab5ab9c xfs: reshuffle dir2 definitions around for userspace
Many of the definitions within xfs_dir2_priv.h are needed in
userspace outside libxfs. Definitions within xfs_dir2_priv.h are
wholly contained within libxfs, so we need to shuffle some of the
definitions around to keep consistency across files shared between
user and kernel space.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:40:57 -05:00
Dave Chinner 4a8af273de xfs: move getdents code into it's own file
The directory readdir code is not used by userspace, but it is
intermingled with files that are shared with userspace. This makes
it difficult to compare the differences between the userspac eand
kernel files are the userspace files don't have the getdents code in
them. Move all the kernel getdents code to a separate file to bring
the shared content between userspace and kernel files closer
together.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:39:56 -05:00
Dave Chinner 1fd7115eda xfs: introduce xfs_inode_buf.c for inode buffer operations
The only thing remaining in xfs_inode.[ch] are the operations that
read, write or verify physical inodes in their underlying buffers.
Move all this code to xfs_inode_buf.[ch] and so we can stop sharing
xfs_inode.[ch] with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:39:05 -05:00
Dave Chinner 7bb85ef360 xfs: move unrelated definitions out of xfs_inode.h
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:37:57 -05:00
Dave Chinner 5c4d97d01a xfs: move inode fork definitions to a new header file
The inode fork definitions are a combination of on-disk format
definition and in-memory tracking and manipulation. They are both
shared with userspace, so move them all into their own file so
sharing is easy to do and track.  This removes all inode fork
related information from xfs_inode.h.

Do the same for the all the C code that currently resides in
xfs_inode.c for the same reason.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:37:32 -05:00
Dave Chinner 7fd36c4418 xfs: split out transaction reservation code
The transaction reservation size calculations is used by both kernel
and userspace, but most of the transaction code in xfs_trans.c is
kernel specific. Split all the transaction reservation code out into
it's own files to make sharing with userspace simpler. This just
leaves kernel-only definitions in xfs_trans.h, so it doesn't need to
be shared with userspace anymore, either.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:36:16 -05:00
Dave Chinner d386b32b55 xfs: sync minor header differences needed by userspace.
Little things like exported functions, __KERNEL__ protections, and
so on that ensure user and kernel shared headers are identical.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:35:41 -05:00
Dave Chinner 76456fc2a6 xfs: introduce xfs_quota_defs.h
There are a lot of quota flag definitions that are shared by user
and kernel space. Move them all to xfs_quota_defs.h so we can
unshare xfs_quota.h and remove the __KERNEL__ regions from it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:20:18 -05:00
Dave Chinner c7298202e5 xfs: introduce xfs_rtalloc_defs.h
There are quite a few realtime device definitions shared with
userspace. Move them from xfs_rtalloc.h to xfs_rt_alloc_defs.h
so we don't need to share xfs_rtalloc.h with userspace anymore.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:13:10 -05:00
Dave Chinner 2a3c0acc35 xfs: split out on-disk transaction definitions
There's a bunch of definitions in xfs_trans.h that define on-disk
formats - transaction headers that get written into the log, log
item type definitions, etc. Split out everything into a separate
file so that all which remains in xfs_trans.h are kernel only
definitions.

Also, remove the duplicate magic number definitions for
XFS_TRANS_MAGIC...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:11:57 -05:00
Dave Chinner 9cd047f3a3 xfs: separate icreate log format definitions from xfs_icreate_item.h
The on disk log format definitions for the icreate log item are
intertwined with the kernel-only in-memory log item definitions.
Separate the log format definitions out into their own header file
so they can easily be shared with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:10:35 -05:00
Dave Chinner 6ca1c9063d xfs: separate dquot on disk format definitions out of xfs_quota.h
The on disk format definitions of the on-disk dquot, log formats and
quota off log formats are all intertwined with other definitions for
quotas. Separate them out into their own header file so they can
easily be shared with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:09:52 -05:00
Dave Chinner 9fbe24d95e xfs: split out EFI/EFD log item format definition
The EFI/EFD item format definitions are shared with userspace. Split
the out of header files that contain kernel only defintions to make
it simple to shared them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:07:13 -05:00
Dave Chinner a8da0da25c xfs: split out buf log item format definitions
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:06:37 -05:00
Dave Chinner 69432832fd xfs: split out inode log item format definition
The log item format definitions are shared with userspace. Split
them out of header files that contain kernel only defintions to make
it simple to shared them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:05:19 -05:00
Dave Chinner fc06c6d064 xfs: separate out log format definitions
The on-disk format definitions for the log are spread randoms
through a couple of header files. Consolidate it all in a single
file that can be shared easily with userspace. This means that
xfs_log.h and xfs_log_priv.h no longer need to be shared with
userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12 16:03:51 -05:00
Tejun Heo 7a378c9aea xfs: WQ_NON_REENTRANT is meaningless and going away
dbf2576e37 ("workqueue: make all workqueues non-reentrant") made
WQ_NON_REENTRANT no-op and the flag is going away.  Remove its usages.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: xfs@oss.sgi.com
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-30 13:11:17 -05:00
Dave Chinner e1b4271ac2 xfs: di_flushiter considered harmful
When we made all inode updates transactional, we no longer needed
the log recovery detection for inodes being newer on disk than the
transaction being replayed - it was redundant as replay of the log
would always result in the latest version of the inode would be on
disk. It was redundant, but left in place because it wasn't
considered to be a problem.

However, with the new "don't read inodes on create" optimisation,
flushiter has come back to bite us. Essentially, the optimisation
made always initialises flushiter to zero in the create transaction,
and so if we then crash and run recovery and the inode already on
disk has a non-zero flushiter it will skip recovery of that inode.
As a result, log recovery does the wrong thing and we end up with a
corrupt filesystem.

Because we have to support old kernel to new kernel upgrades, we
can't just get rid of the flushiter support in log recovery as we
might be upgrading from a kernel that doesn't have fully transactional
inode updates.  Unfortunately, for v4 superblocks there is no way to
guarantee that log recovery knows about this fact.

We cannot add a new inode format flag to say it's a "special inode
create" because it won't be understood by older kernels and so
recovery could do the wrong thing on downgrade. We cannot specially
detect the combination of zero mode/non-zero flushiter on disk to
non-zero mode, zero flushiter in the log item during recovery
because wrapping of the flushiter can result in false detection.

Hence that makes this "don't use flushiter" optimisation limited to
a disk format that guarantees that we don't need it. And that means
the only fix here is to limit the "no read IO on create"
optimisation to version 5 superblocks....

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit e60896d8f2)
2013-07-25 10:41:42 -05:00
Dave Chinner e60896d8f2 xfs: di_flushiter considered harmful
When we made all inode updates transactional, we no longer needed
the log recovery detection for inodes being newer on disk than the
transaction being replayed - it was redundant as replay of the log
would always result in the latest version of the inode would be on
disk. It was redundant, but left in place because it wasn't
considered to be a problem.

However, with the new "don't read inodes on create" optimisation,
flushiter has come back to bite us. Essentially, the optimisation
made always initialises flushiter to zero in the create transaction,
and so if we then crash and run recovery and the inode already on
disk has a non-zero flushiter it will skip recovery of that inode.
As a result, log recovery does the wrong thing and we end up with a
corrupt filesystem.

Because we have to support old kernel to new kernel upgrades, we
can't just get rid of the flushiter support in log recovery as we
might be upgrading from a kernel that doesn't have fully transactional
inode updates.  Unfortunately, for v4 superblocks there is no way to
guarantee that log recovery knows about this fact.

We cannot add a new inode format flag to say it's a "special inode
create" because it won't be understood by older kernels and so
recovery could do the wrong thing on downgrade. We cannot specially
detect the combination of zero mode/non-zero flushiter on disk to
non-zero mode, zero flushiter in the log item during recovery
because wrapping of the flushiter can result in false detection.

Hence that makes this "don't use flushiter" optimisation limited to
a disk format that guarantees that we don't need it. And that means
the only fix here is to limit the "no read IO on create"
optimisation to version 5 superblocks....

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-24 12:15:23 -05:00
Chandra Seetharaman d892d5864f xfs: Start using pquotaino from the superblock.
Start using pquotino and define a macro to check if the
superblock has pquotino.

Keep backward compatibilty by alowing mount of older superblock
with no separate pquota inode.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-22 14:46:26 -05:00
Chandra Seetharaman 0102629776 xfs: Initialize all quota inodes to be NULLFSINO
mkfs doesn't initialize the quota inodes to NULLFSINO as it does for the
other internal inodes. This leads to two in-core values (0 and NULLFSINO)
to be checked against, to make sure if a quota inode is valid.

Solve that problem by initializing the in-core values of all quotaino
values to NULLFSINO if they are 0 in the disk.

Note that these values are not written back to on-disk superblock unless
some quota is enabled on the filesystem. Even in that case sb_pquotino is
written to disk only if the on-disk superblock supports pquotino

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-22 14:10:53 -05:00
Chandra Seetharaman 297aa63769 xfs: Fix a deadlock in xfs_log_commit_cil() code path
While testing and rearranging pquota/gquota code, I stumbled
on a xfs_shutdown() during a mount. But the mount just hung.

Debugged and found that there is a deadlock involving
&log->l_cilp->xc_ctx_lock.

It is in a code path where &log->l_cilp->xc_ctx_lock is first
acquired in read mode and some levels down the same semaphore
is being acquired in write mode causing a deadlock.

This is the stack:
xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
  xlog_print_tic_res
    xfs_force_shutdown
      xfs_log_force_umount
        xlog_cil_force
          xlog_cil_force_lsn
            xlog_cil_push_foreground
              xlog_cil_push - tries to acquire same semaphore in write mode

This patch fixes the deadlock by changing the reason code for
xfs_force_shutdown in xlog_print_tic_res() to SHUTDOWN_LOG_IO_ERROR.

SHUTDOWN_LOG_IO_ERROR is the right reason code to be set since
we are in the log path.

Thanks to Dave for suggesting this solution.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-22 13:58:10 -05:00
Jie Liu 58e59854a3 xfs: fix assertion failure in xfs_vm_write_failed()
In xfs_vm_write_failed(), we evaluate the block_offset of pos with
PAGE_MASK which is an unsigned long.  That is fine on 64-bit platforms
regardless of whether the request pos is 32-bit or 64-bit.  However, on
32-bit platforms the value is 0xfffff000 and so the high 32 bits in it
will be masked off with (pos & PAGE_MASK) for a 64-bit pos.

As a result, the evaluated block_offset is incorrect which will cause
this failure ASSERT(block_offset + from == pos); and potentially pass
the wrong block to xfs_vm_kill_delalloc_range().

In this case, we can get a kernel panic if CONFIG_XFS_DEBUG is enabled:

XFS: Assertion failed: block_offset + from == pos, file: fs/xfs/xfs_aops.c, line: 1504

------------[ cut here ]------------
 kernel BUG at fs/xfs/xfs_message.c:100!
 invalid opcode: 0000 [#1] SMP
 ........
 Pid: 4057, comm: mkfs.xfs Tainted: G           O 3.9.0-rc2 #1
 EIP: 0060:[<f94a7e8b>] EFLAGS: 00010282 CPU: 0
 EIP is at assfail+0x2b/0x30 [xfs]
 EAX: 00000056 EBX: f6ef28a0 ECX: 00000007 EDX: f57d22a4
 ESI: 1c2fb000 EDI: 00000000 EBP: ea6b5d30 ESP: ea6b5d1c
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 CR0: 8005003b CR2: 094f3ff4 CR3: 2bcb4000 CR4: 000006f0
 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 DR6: ffff0ff0 DR7: 00000400
 Process mkfs.xfs (pid: 4057, ti=ea6b4000 task=ea5799e0 task.ti=ea6b4000)
 Stack:
 00000000 f9525c48 f951fa80 f951f96b 000005e4 ea6b5d7c f9494b34 c19b0ea2
 00000066 f3d6c620 c19b0ea2 00000000 e9a91458 00001000 00000000 00000000
 00000000 c15c7e89 00000000 1c2fb000 00000000 00000000 1c2fb000 00000080
 Call Trace:
 [<f9494b34>] xfs_vm_write_failed+0x74/0x1b0 [xfs]
 [<c15c7e89>] ? printk+0x4d/0x4f
 [<f9494d7d>] xfs_vm_write_begin+0x10d/0x170 [xfs]
 [<c110a34c>] generic_file_buffered_write+0xdc/0x210
 [<f949b669>] xfs_file_buffered_aio_write+0xf9/0x190 [xfs]
 [<f949b7f3>] xfs_file_aio_write+0xf3/0x160 [xfs]
 [<c115e504>] do_sync_write+0x94/0xd0
 [<c115ed1f>] vfs_write+0x8f/0x160
 [<c115e470>] ? wait_on_retry_sync_kiocb+0x50/0x50
 [<c115f017>] sys_write+0x47/0x80
 [<c15d860d>] sysenter_do_call+0x12/0x28
 .............
 EIP: [<f94a7e8b>] assfail+0x2b/0x30 [xfs] SS:ESP 0068:ea6b5d1c
 ---[ end trace cdd9af4f4ecab42f ]---
 Kernel panic - not syncing: Fatal exception

In order to avoid this, we can evaluate the block_offset of the start
of the page by using shifts rather than masks the mismatch problem.

Thanks Dave Chinner for help finding and fixing this bug.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-22 13:12:19 -05:00
Linus Torvalds 239dab4636 xfs: update (#2) for 3.11-rc1
- fix for xfs_fsr returning -EINVAL
 - cleanup in xfs_bulkstat
 - cleanup in xfs_open_by_handle
 - update mount options documentation
 - clean up local format handling in xfs_bmapi_write
 - fix dquot log reservations which were too small
 - fix sgid inheritance for subdirectories when default acls are in use
 - add project quota fields to various structures
 - fix teardown of quotainfo structures when quotas are turned off
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJR4FDjAAoJENaLyazVq6ZOhZIQAMLWC4Wz+3PpRLkIlXZ83wdG
 LYDNxdYntSVDXNLCrfIhuavgW1yhseLcQZD34g0hgOLQRsmjvUw2ikO6u7qlyoV1
 GZZVwdVhsZNicycvuEE0Sva9Jmjgxe0XUOCxJBNAWN0fm5Jzg7w0OGWyzxn2obRX
 L4LNPqnQS/phCrtNYfUnXpCuUcy0KbpX4GYrdt5tThIWHm7AcyRnOArEkFvVuwdf
 N3OSN/jSMaEF/GsAqmYFYSXhuL9P1vyBSlyFW82YbyFFd4FKZJbRiFbOsrgbFSkA
 Ssum7N+rfMc9DCbJsrztgxFaYpj42JR5eCm+jvTejx8nJWKiGjMVtzjq4QtwQQ6e
 vby7MzdjZ+l2oJclA0y8hOjeg0R7sPVP7xZziZRuK4PHsjtBH3N2FtCOeBtlGhyW
 14LK+z+5YXU/gEwmxV5LaknODb2mxvWycf70jaQ6bvrQRUiFnPNIxYKvgAx8YJxl
 jgYSassHHKtLg0S54P/T91tRsyDVOhy5lqgeogzK5uYa+v+xlMloG+fLzb9GmIgS
 hXgUIAo+lNlHZkw1FdD4aRgh3OMiUvLQN6woBMbbXfS5XaNpF1UG30YAXeeIqV5e
 cLChzY+jQiCsmcktb3YQs9C5yfxEciFFSYmZOaKCTgQRWWnyI4/lAu1gd2KtTYUt
 ZfV0niME4wp0kBWZHOEH
 =QiZy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.11-rc1-2' of git://oss.sgi.com/xfs/xfs

Pull more xfs updates from Ben Myers:
 "Here are a fix for xfs_fsr, a cleanup in bulkstat, a cleanup in
  xfs_open_by_handle, updated mount options documentation, a cleanup in
  xfs_bmapi_write, a fix for the size of dquot log reservations, a fix
  for sgid inheritance when acls are in use, a fix for cleaning up
  quotainfo structures, and some more of the work which allows group and
  project quotas to be used together.

  We had a few more in this last quota category that we might have liked
  to get in, but it looks there are still a few items that need to be
  addressed.

   - fix for xfs_fsr returning -EINVAL
   - cleanup in xfs_bulkstat
   - cleanup in xfs_open_by_handle
   - update mount options documentation
   - clean up local format handling in xfs_bmapi_write
   - fix dquot log reservations which were too small
   - fix sgid inheritance for subdirectories when default acls are in use
   - add project quota fields to various structures
   - fix teardown of quotainfo structures when quotas are turned off"

* tag 'for-linus-v3.11-rc1-2' of git://oss.sgi.com/xfs/xfs:
  xfs: Fix the logic check for all quotas being turned off
  xfs: Add pquota fields where gquota is used.
  xfs: fix sgid inheritance for subdirectories inheriting default acls [V3]
  xfs: dquot log reservations are too small
  xfs: remove local fork format handling from xfs_bmapi_write()
  xfs: update mount options documentation
  xfs: use get_unused_fd_flags(0) instead of get_unused_fd()
  xfs: clean up unused codes at xfs_bulkstat()
  xfs: use XFS_BMAP_BMDR_SPACE vs. XFS_BROOT_SIZE_ADJ
2013-07-13 11:40:24 -07:00
Chandra Seetharaman c31ad439e8 xfs: Fix the logic check for all quotas being turned off
During the review of seperate pquota inode patches, David noticed
that the test to detect all quotas being turned off was
incorrect, and hence the block was not freeing all the quota
information.

The check made sense in Irix, but in Linux, quota is turned off
one at a time, which makes the test invalid for Linux.

This problem existed since XFS was ported to Linux.

David suggested to fix the problem by detecting when all quotas are
turned off by checking m_qflags.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-11 16:49:10 -05:00
Chandra Seetharaman 92f8ff73f1 xfs: Add pquota fields where gquota is used.
Add project quota changes to all the places where group quota field
is used:
   * add separate project quota members into various structures
   * split project quota and group quotas so that instead of overriding
     the group quota members incore, the new project quota members are
     used instead
   * get rid of usage of the OQUOTA flag incore, in favor of separate
     group and project quota flags.
   * add a project dquot argument to various functions.

Not using the pquotino field from superblock yet.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-11 10:35:32 -05:00
Carlos Maiolino 42c49d7f24 xfs: fix sgid inheritance for subdirectories inheriting default acls [V3]
XFS removes sgid bits of subdirectories under a directory containing a default
acl.

When a default acl is set, it implies xfs to call xfs_setattr_nonsize() in its
code path. Such function is shared among mkdir and chmod system calls, and
does some checks unneeded by mkdir (calling inode_change_ok()). Such checks
remove sgid bit from the inode after it has been granted.

With this patch, we extend the meaning of XFS_ATTR_NOACL flag to avoid these
checks when acls are being inherited (thanks hch).

Also, xfs_setattr_mode, doesn't need to re-check for group id and capabilities
permissions, this only implies in another try to remove sgid bit from the
directories. Such check is already done either on inode_change_ok() or
xfs_setattr_nonsize().

Changelog:

V2: Extends the meaning of XFS_ATTR_NOACL instead of wrap the tests into another
    function

V3: Remove S_ISDIR check in xfs_setattr_nonsize() from the patch

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-10 10:21:51 -05:00
Dave Chinner b0a9dab78a xfs: dquot log reservations are too small
During review of the separate project quota inode patches, it became
obvious that the dquot log reservation calculation underestimated
the number dquots that can be modified in a transaction. This has
it's roots way back in the Irix quota implementation.

That is, when quotas were first implemented in XFS, it only
supported user and project quotas as Irix did not have group quotas.
Hence the worst case operation involving dquot modification was
calculated to involve 2 user dquots and 1 project dquot or 1 user
dequot and 2 project dquots. i.e. 3 dquots. This was determined back
in 1996, and has remained unchanged ever since.

However, back in 2001, the Linux XFS port dropped all support for
project quota and implmented group quotas over the top. This was
effectively done with a search-and-replace of project with group,
and as such the log reservation was not changed. However, with the
advent of group quotas, chmod and rename now could modify more than
3 dquots in a single transaction - both could modify 4 dquots. Hence
this log reservation has been wrong for a long time.

In 2005, project quota support was reintroduced into Linux, but it
was implemented to be mutually exclusive to group quotas and so this
didn't add any new changes to the dquot log reservation. Hence when
project quotas were in use (rather than group quotas) the log
reservation was again valid, just like in the Irix days.

Now, with the addition of the separate project quota inode, group
and project quotas are no longer mutually exclusive, and hence
operations can now modify three dquots per inode where previously it
was only two. The worst case here is the rename transaction, which
can allocate/free space on two different directory inodes, and if
they have different uid/gid/prid configurations and are world
writeable, then rename can actually modify 6 different dquots now.

Further, the dquot log reservation doesn't take into account the
space used by the dquot log format structure that precedes the dquot
that is logged, and hence further underestimates the worst case
log space required by dquots during a transaction. This has been
missing since the first commit in 1996.

Hence the worst case log reservation needs to be increased from 3 to
6, and it needs to take into account a log format header for each of
those dquots.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-09 16:43:16 -05:00
Dave Chinner f3508bcddf xfs: remove local fork format handling from xfs_bmapi_write()
The conversion from local format to extent format requires
interpretation of the data in the fork being converted, so it cannot
be done in a generic way. It is up to the caller to convert the fork
format to extent format before calling into xfs_bmapi_write() so
format conversion can be done correctly.

The code in xfs_bmapi_write() to convert the format is used
implicitly by the attribute and directory code, but they
specifically zero the fork size so that the conversion does not do
any allocation or manipulation. Move this conversion into the
shortform to leaf functions for the dir/attr code so the conversions
are explicitly controlled by all callers.

Now we can remove the conversion code in xfs_bmapi_write.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-09 16:40:22 -05:00
Yann Droneaud 862a62937e xfs: use get_unused_fd_flags(0) instead of get_unused_fd()
Macro get_unused_fd() is used to allocate a file descriptor with
default flags. Those default flags (0) can be "unsafe":
O_CLOEXEC must be used by default to not leak file descriptor
across exec().

Instead of macro get_unused_fd(), functions anon_inode_getfd()
or get_unused_fd_flags() should be used with flags given by userspace.
If not possible, flags should be set to O_CLOEXEC to provide userspace
with a default safe behavor.

In a further patch, get_unused_fd() will be removed so that
new code start using anon_inode_getfd() or get_unused_fd_flags()
with correct flags.

This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

The hard coded flag value (0) should be reviewed on a per-subsystem basis,
and, if possible, set to O_CLOEXEC.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-09 15:53:57 -05:00
Jie Liu 9cee4c5b7b xfs: clean up unused codes at xfs_bulkstat()
There are some unused codes at xfs_bulkstat():

- Variable bp is defined to point to the on-disk inode cluster
  buffer, but it proved to be of no practical help.

- We process the chunks of good inodes which were fetched by iterating
  btree records from an AG.  When processing inodes from each chunk,
  the code recomputing agbno if run into the first inode of a cluster,
  however, the agbno is not being used thereafter.

This patch tries to clean up those things.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-09 15:36:21 -05:00
Linus Torvalds da89bd213f xfs: update for 3.11-rc1
- part of the work to allow project quotas and group quotas to
   be used together
 - inode change count
 - inode create transaction
 - block queue plugging in buffer readahead and bulkstat
 - ordered log vector support
 - removal of dead code in and around xfs_sync_inode_grab,
   xfs_ialloc_get_rec, XFS_MOUNT_RETERR, XFS_ALLOCFREE_LOG_RES,
   XFS_DIROP_LOG_RES, xfs_chash, ctl_table, and xfs_growfs_data_private
 - don't keep silent if sunit/swidth can not be changed via mount
 - fix a leak of remote symlink blocks into the filesystem when
   xattrs are used on symlinks
 - fix for fiemap to return FIEMAP_EXTENT_UNKOWN flag on delay extents
 - part of a fix for xfs_fsr
 - disable speculative preallocation with small files
 - performance improvements for inode creates and deletes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJR3F9pAAoJENaLyazVq6ZOiKAP/jyfPbVj5AOiLVtTLhJUQ3Qf
 urCiMjl87BToixFxa/yxeOBrUbBiOgUQ/2om4b5cryYhN2RtQWiEi/iVdeBZw3rR
 1J6VxH09R25GVRobIj2AwJ87eXyqKvi2DaVBrQbgva0BH8fmWKhQISfwLTPUIXcK
 GTrVpS1amTWxh69/5p5bxaxo4u2y7DbZwYC4xQOcPeSrfxizMQmN2JqtUjtLDyKp
 J08Md04vNcxJvfbFopcSFAncStIr6xnKMiaqvFttybLJf9HL9MqoaO9Xp+6aX1QI
 Pibae3oFctH1tGOmjgyg30AbPjzAHaGDSw9vuRaommQZbMwiZXAV2VGuJ5QWtAbi
 wXh5GIzaap1Z0EYuD9qY0hsFizOpmL+1YH+F20qIqa3i5CiDMeYMWlmzhfN63zH9
 Zk2j1YUkbxLmKMEgl8s++eLfT1yenuzVAp5zUodApOPp161FOMJlVhhD55WDvKtP
 2/3Ig25fz5CLwcIZT1MoZ9B7UP9dfrAAK30AcUmO3Iumj6b8bsxh4pzSpJ02U1cG
 cMIe+OhxC6KqDZlXeLmDXodnOo9Wc4glwdivcynxCNFv6WS5Ez/ufm+4oN+OVUo7
 PwyNtoJfsptSYZyIRorXNez54ww3cOvvqjifUasTZTGkTSqCbf1LclzuMxcXDlyO
 3YzE/DCK8ZWwJc/ysbn/
 =v6KW
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.11-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "This includes several bugfixes, part of the work for project quotas
  and group quotas to be used together, performance improvements for
  inode creation/deletion, buffer readahead, and bulkstat,
  implementation of the inode change count, an inode create transaction,
  and the removal of a bunch of dead code.

  There are also some duplicate commits that you already have from the
  3.10-rc series.

   - part of the work to allow project quotas and group quotas to be
     used together
   - inode change count
   - inode create transaction
   - block queue plugging in buffer readahead and bulkstat
   - ordered log vector support
   - removal of dead code in and around xfs_sync_inode_grab,
     xfs_ialloc_get_rec, XFS_MOUNT_RETERR, XFS_ALLOCFREE_LOG_RES,
     XFS_DIROP_LOG_RES, xfs_chash, ctl_table, and
     xfs_growfs_data_private
   - don't keep silent if sunit/swidth can not be changed via mount
   - fix a leak of remote symlink blocks into the filesystem when xattrs
     are used on symlinks
   - fix for fiemap to return FIEMAP_EXTENT_UNKOWN flag on delay extents
   - part of a fix for xfs_fsr
   - disable speculative preallocation with small files
   - performance improvements for inode creates and deletes"

* tag 'for-linus-v3.11-rc1' of git://oss.sgi.com/xfs/xfs: (61 commits)
  xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD
  xfs: Change xfs_dquot_acct to be a 2-dimensional array
  xfs: Code cleanup and removal of some typedef usage
  xfs: Replace macro XFS_DQ_TO_QIP with a function
  xfs: Replace macro XFS_DQUOT_TREE with a function
  xfs: Define a new function xfs_is_quota_inode()
  xfs: implement inode change count
  xfs: Use inode create transaction
  xfs: Inode create item recovery
  xfs: Inode create transaction reservations
  xfs: Inode create log items
  xfs: Introduce an ordered buffer item
  xfs: Introduce ordered log vector support
  xfs: xfs_ifree doesn't need to modify the inode buffer
  xfs: don't do IO when creating an new inode
  xfs: don't use speculative prealloc for small files
  xfs: plug directory buffer readahead
  xfs: add pluging for bulkstat readahead
  xfs: Remove dead function prototype xfs_sync_inode_grab()
  xfs: Remove the left function variable from xfs_ialloc_get_rec()
  ...
2013-07-09 12:29:12 -07:00
Eric Sandeen a69c7c0772 xfs: use XFS_BMAP_BMDR_SPACE vs. XFS_BROOT_SIZE_ADJ
XFS_BROOT_SIZE_ADJ is an undocumented macro which accounts for
the difference in size between the on-disk and in-core btree
root.  It's much clearer to just use the newly-added
XFS_BMAP_BMDR_SPACE macro which gives us the on-disk size
directly.

In one case, we must test that the if_broot exists before
applying the macro, however.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-09 13:21:15 -05:00
Linus Torvalds 790eac5640 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second set of VFS changes from Al Viro:
 "Assorted f_pos race fixes, making do_splice_direct() safe to call with
  i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
  ->d_hash/->d_compare calling conventions changes from Linus, misc
  stuff all over the place."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  Document ->tmpfile()
  ext4: ->tmpfile() support
  vfs: export lseek_execute() to modules
  lseek_execute() doesn't need an inode passed to it
  block_dev: switch to fixed_size_llseek()
  cpqphp_sysfs: switch to fixed_size_llseek()
  tile-srom: switch to fixed_size_llseek()
  proc_powerpc: switch to fixed_size_llseek()
  ubi/cdev: switch to fixed_size_llseek()
  pci/proc: switch to fixed_size_llseek()
  isapnp: switch to fixed_size_llseek()
  lpfc: switch to fixed_size_llseek()
  locks: give the blocked_hash its own spinlock
  locks: add a new "lm_owner_key" lock operation
  locks: turn the blocked_list into a hashtable
  locks: convert fl_link to a hlist_node
  locks: avoid taking global lock if possible when waking up blocked waiters
  locks: protect most of the file_lock handling with i_lock
  locks: encapsulate the fl_link list handling
  locks: make "added" in __posix_lock_file a bool
  ...
2013-07-03 09:10:19 -07:00