Commit Graph

870 Commits

Author SHA1 Message Date
Milosz Tanski 5a6f282a20 fscache: Netfs function for cleanup post readpages
Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback.  It marks all the pages in the list
provided by readahead with PG_private_2.  In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG.  This happens because the page list still contains marked
pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages.  It will revoke the pages from the
underlying cache backend and unmark them.

The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.

This can be used to address the following oops:

[12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
	(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
[12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
[12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
[12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
[12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
[12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
[12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
[12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
[12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
[12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
[12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
[12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
[12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
[12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
[12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
[12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
[12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
[12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
[12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-06 09:17:30 +01:00
David Howells 696f69b6b0 FS-Cache: Fix heading in documentation
Fix a heading in the documentation to make it consistent with the contents
list.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-06 09:17:30 +01:00
David Howells da9803bc88 FS-Cache: Add interface to check consistency of a cached object
Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:

	int fscache_check_consistency(struct fscache_cookie *cookie)

This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.

The backends now have to implement a mandatory operation pointer:

	int (*check_consistency)(struct fscache_object *object)

that corresponds to the above API call.  FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.

Original-author: Hongyi Jia <jiayisuse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hongyi Jia <jiayisuse@gmail.com>
cc: Milosz Tanski <milosz@adfin.com>
2013-09-06 09:17:30 +01:00
Linus Torvalds 239dab4636 xfs: update (#2) for 3.11-rc1
- fix for xfs_fsr returning -EINVAL
 - cleanup in xfs_bulkstat
 - cleanup in xfs_open_by_handle
 - update mount options documentation
 - clean up local format handling in xfs_bmapi_write
 - fix dquot log reservations which were too small
 - fix sgid inheritance for subdirectories when default acls are in use
 - add project quota fields to various structures
 - fix teardown of quotainfo structures when quotas are turned off
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJR4FDjAAoJENaLyazVq6ZOhZIQAMLWC4Wz+3PpRLkIlXZ83wdG
 LYDNxdYntSVDXNLCrfIhuavgW1yhseLcQZD34g0hgOLQRsmjvUw2ikO6u7qlyoV1
 GZZVwdVhsZNicycvuEE0Sva9Jmjgxe0XUOCxJBNAWN0fm5Jzg7w0OGWyzxn2obRX
 L4LNPqnQS/phCrtNYfUnXpCuUcy0KbpX4GYrdt5tThIWHm7AcyRnOArEkFvVuwdf
 N3OSN/jSMaEF/GsAqmYFYSXhuL9P1vyBSlyFW82YbyFFd4FKZJbRiFbOsrgbFSkA
 Ssum7N+rfMc9DCbJsrztgxFaYpj42JR5eCm+jvTejx8nJWKiGjMVtzjq4QtwQQ6e
 vby7MzdjZ+l2oJclA0y8hOjeg0R7sPVP7xZziZRuK4PHsjtBH3N2FtCOeBtlGhyW
 14LK+z+5YXU/gEwmxV5LaknODb2mxvWycf70jaQ6bvrQRUiFnPNIxYKvgAx8YJxl
 jgYSassHHKtLg0S54P/T91tRsyDVOhy5lqgeogzK5uYa+v+xlMloG+fLzb9GmIgS
 hXgUIAo+lNlHZkw1FdD4aRgh3OMiUvLQN6woBMbbXfS5XaNpF1UG30YAXeeIqV5e
 cLChzY+jQiCsmcktb3YQs9C5yfxEciFFSYmZOaKCTgQRWWnyI4/lAu1gd2KtTYUt
 ZfV0niME4wp0kBWZHOEH
 =QiZy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.11-rc1-2' of git://oss.sgi.com/xfs/xfs

Pull more xfs updates from Ben Myers:
 "Here are a fix for xfs_fsr, a cleanup in bulkstat, a cleanup in
  xfs_open_by_handle, updated mount options documentation, a cleanup in
  xfs_bmapi_write, a fix for the size of dquot log reservations, a fix
  for sgid inheritance when acls are in use, a fix for cleaning up
  quotainfo structures, and some more of the work which allows group and
  project quotas to be used together.

  We had a few more in this last quota category that we might have liked
  to get in, but it looks there are still a few items that need to be
  addressed.

   - fix for xfs_fsr returning -EINVAL
   - cleanup in xfs_bulkstat
   - cleanup in xfs_open_by_handle
   - update mount options documentation
   - clean up local format handling in xfs_bmapi_write
   - fix dquot log reservations which were too small
   - fix sgid inheritance for subdirectories when default acls are in use
   - add project quota fields to various structures
   - fix teardown of quotainfo structures when quotas are turned off"

* tag 'for-linus-v3.11-rc1-2' of git://oss.sgi.com/xfs/xfs:
  xfs: Fix the logic check for all quotas being turned off
  xfs: Add pquota fields where gquota is used.
  xfs: fix sgid inheritance for subdirectories inheriting default acls [V3]
  xfs: dquot log reservations are too small
  xfs: remove local fork format handling from xfs_bmapi_write()
  xfs: update mount options documentation
  xfs: use get_unused_fd_flags(0) instead of get_unused_fd()
  xfs: clean up unused codes at xfs_bulkstat()
  xfs: use XFS_BMAP_BMDR_SPACE vs. XFS_BROOT_SIZE_ADJ
2013-07-13 11:40:24 -07:00
Dave Chinner 3e5b7d8b49 xfs: update mount options documentation
Because it's horribly out of date.

And mark various deprecated options as deprecated and give them a
removal date.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-09 16:26:18 -05:00
Linus Torvalds 80cc38b163 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  treewide: relase -> release
  Documentation/cgroups/memory.txt: fix stat file documentation
  sysctl/net.txt: delete reference to obsolete 2.4.x kernel
  spinlock_api_smp.h: fix preprocessor comments
  treewide: Fix typo in printk
  doc: device tree: clarify stuff in usage-model.txt.
  open firmware: "/aliasas" -> "/aliases"
  md: bcache: Fixed a typo with the word 'arithmetic'
  irq/generic-chip: fix a few kernel-doc entries
  frv: Convert use of typedef ctl_table to struct ctl_table
  sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
  doc: clk: Fix incorrect wording
  Documentation/arm/IXP4xx fix a typo
  Documentation/networking/ieee802154 fix a typo
  Documentation/DocBook/media/v4l fix a typo
  Documentation/video4linux/si476x.txt fix a typo
  Documentation/virtual/kvm/api.txt fix a typo
  Documentation/early-userspace/README fix a typo
  Documentation/video4linux/soc-camera.txt fix a typo
  lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
  ...
2013-07-04 11:40:58 -07:00
Mel Gorman 543cc11533 documentation: document the is_dirty_writeback aops callback
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:40 -07:00
Mel Gorman 26c0c5bf38 documentation: update address_space_operations
The documentation for address_space_operations is partially out of date.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:40 -07:00
Pavel Emelyanov 0f8975ec4d mm: soft-dirty bits for user memory changes tracking
The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to.  In order to do this tracking one should

  1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
  2. Wait some time.
  3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

To do this tracking, the writable bit is cleared from PTEs when the
soft-dirty bit is.  Thus, after this, when the task tries to modify a
page at some virtual address the #PF occurs and the kernel sets the
soft-dirty bit on the respective PTE.

Note, that although all the task's address space is marked as r/o after
the soft-dirty bits clear, the #PF-s that occur after that are processed
fast.  This is so, since the pages are still mapped to physical memory,
and thus all the kernel does is finds this fact out and puts back
writable, dirty and soft-dirty bits on the PTE.

Another thing to note, is that when mremap moves PTEs they are marked
with soft-dirty as well, since from the user perspective mremap modifies
the virtual memory at mremap's new address.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:26 -07:00
Linus Torvalds 790eac5640 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second set of VFS changes from Al Viro:
 "Assorted f_pos race fixes, making do_splice_direct() safe to call with
  i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
  ->d_hash/->d_compare calling conventions changes from Linus, misc
  stuff all over the place."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  Document ->tmpfile()
  ext4: ->tmpfile() support
  vfs: export lseek_execute() to modules
  lseek_execute() doesn't need an inode passed to it
  block_dev: switch to fixed_size_llseek()
  cpqphp_sysfs: switch to fixed_size_llseek()
  tile-srom: switch to fixed_size_llseek()
  proc_powerpc: switch to fixed_size_llseek()
  ubi/cdev: switch to fixed_size_llseek()
  pci/proc: switch to fixed_size_llseek()
  isapnp: switch to fixed_size_llseek()
  lpfc: switch to fixed_size_llseek()
  locks: give the blocked_hash its own spinlock
  locks: add a new "lm_owner_key" lock operation
  locks: turn the blocked_list into a hashtable
  locks: convert fl_link to a hlist_node
  locks: avoid taking global lock if possible when waking up blocked waiters
  locks: protect most of the file_lock handling with i_lock
  locks: encapsulate the fl_link list handling
  locks: make "added" in __posix_lock_file a bool
  ...
2013-07-03 09:10:19 -07:00
Al Viro 48bde8d362 Document ->tmpfile()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-03 16:23:29 +04:00
Linus Torvalds 3f490f7f99 This patch-set includes the following major enhancement patches.
o remount_fs callback function
  o restore parent inode number to enhance the fsync performance
  o xattr security labels
  o reduce the number of redundant lock/unlock data pages
  o avoid frequent write_inode calls
 
 The other minor bug fixes are as follows.
  o endian conversion bugs
  o various bugs in the roll-forward recovery routine
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0h8YAAoJEEAUqH6CSFDSJNMP/1V+e/PaLa8YsHuw5eWPT3Fe
 RYFXJv7SXadHqgQInjwD7JOF8BC9DlUyknYUiXFUZgvKsgMHz+OTCNl+1hLTbUXt
 e6Rhn7vU/fMu3TEtj+v6g8JbpXdXH8TSbtvh9LpoczRS98GYpZuckP0BtQsVkTL3
 jIq6WD8JRkb2DvpDl7viTT0Eq0T61CSmtOwOIvfhhiVxggvDWcnR1mTM9Tymdi4J
 NKtFkjCsKP0Z/7IZZUJAczGEHjsT+O6JDwp8+KVWuZT4BSuchoX4MYAY5wX527Ne
 rZvkolbbfnAFCC3BtETr0DPOkpxnHmJ6dEveIxjZ9cux12CAFA/Ww1QAL4ygiDkd
 Avn8BBEJnfnuzeOUkE0by+9hdF9LOU3CVSxiDhWJegYB16z+c9pSBD9xvlKhKk5B
 QNsjptB6m0CftAq7vIDsryL60uJ2cSHFxPqfwAHEpNngiU/NohTFSZE0sUMbLUNh
 FI6NrHoT7yld1HcB6cvL1lnUqIENFbNhDSSDcTdlN49IJJap4oqtgCnmMMIwbUCO
 vR2/26k5W7NwG+K6XN2IX13AsayzQahxTv8in/+LV0bfjHo6/1VzzGcqAmXJbDQw
 dLrNAeWaaIJi8J/zJiENMbFKXTj8Wax9jxKsW+4towQuyEt4ADvyt1c5gX3VR42T
 x8+YEargsdBf6FAhtN+H
 =qFcy
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches:
   - remount_fs callback function
   - restore parent inode number to enhance the fsync performance
   - xattr security labels
   - reduce the number of redundant lock/unlock data pages
   - avoid frequent write_inode calls

  The other minor bug fixes are as follows.
   - endian conversion bugs
   - various bugs in the roll-forward recovery routine"

* tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (56 commits)
  f2fs: fix to recover i_size from roll-forward
  f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes()
  f2fs: remove reusing any prefree segments
  f2fs: code cleanup and simplify in func {find/add}_gc_inode
  f2fs: optimize the init_dirty_segmap function
  f2fs: fix an endian conversion bug detected by sparse
  f2fs: fix crc endian conversion
  f2fs: add remount_fs callback support
  f2fs: recover wrong pino after checkpoint during fsync
  f2fs: optimize do_write_data_page()
  f2fs: make locate_dirty_segment() as static
  f2fs: remove unnecessary parameter "offset" from __add_sum_entry()
  f2fs: avoid freqeunt write_inode calls
  f2fs: optimise the truncate_data_blocks_range() range
  f2fs: use the F2FS specific flags in f2fs_ioctl()
  f2fs: sync dir->i_size with its block allocation
  f2fs: fix i_blocks translation on various types of files
  f2fs: set sb->s_fs_info before calling parse_options()
  f2fs: support xattr security labels
  f2fs: fix iget/iput of dir during recovery
  ...
2013-07-02 09:42:38 -07:00
Linus Torvalds 9e239bb939 Lots of bug fixes, cleanups and optimizations. In the bug fixes
category, of note is a fix for on-line resizing file systems where the
 block size is smaller than the page size (i.e., file systems 1k blocks
 on x86, or more interestingly file systems with 4k blocks on Power or
 ia64 systems.)
 
 In the cleanup category, the ext4's punch hole implementation was
 significantly improved by Lukas Czerner, and now supports bigalloc
 file systems.  In addition, Jan Kara significantly cleaned up the
 write submission code path.  We also improved error checking and added
 a few sanity checks.
 
 In the optimizations category, two major optimizations deserve
 mention.  The first is that ext4_writepages() is now used for
 nodelalloc and ext3 compatibility mode.  This allows writes to be
 submitted much more efficiently as a single bio request, instead of
 being sent as individual 4k writes into the block layer (which then
 relied on the elevator code to coalesce the requests in the block
 queue).  Secondly, the extent cache shrink mechanism, which was
 introduce in 3.9, no longer has a scalability bottleneck caused by the
 i_es_lru spinlock.  Other optimizations include some changes to reduce
 CPU usage and to avoid issuing empty commits unnecessarily.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJR0XhgAAoJENNvdpvBGATwMXkQAJwTPk5XYLqtAwLziFLvM6wG
 0tWa1QAzTNo80tLyM9iGqI6x74X5nddLw5NMICUmPooOa9agMuA4tlYVSss5jWzV
 yyB7vLzsc/2eZJusuVqfTKrdGybE+M766OI6VO9WodOoIF1l51JXKjktKeaWegfv
 NkcLKlakD4V+ZASEDB/cOcR/lTwAs9dQ89AZzgPiW+G8Do922QbqkENJB8mhalbg
 rFGX+lu9W0f3fqdmT3Xi8KGn3EglETdVd6jU7kOZN4vb5LcF5BKHQnnUmMlpeWMT
 ksOVasb3RZgcsyf5ZOV5feXV601EsNtPBrHAmH22pWQy3rdTIvMv/il63XlVUXZ2
 AXT3cHEvNQP0/yVaOTCZ9xQVxT8sL4mI6kENP9PtNuntx7E90JBshiP5m24kzTZ/
 zkIeDa+FPhsDx1D5EKErinFLqPV8cPWONbIt/qAgo6663zeeIyMVhzxO4resTS9k
 U2QEztQH+hDDbjgABtz9M/GjSrohkTYNSkKXzhTjqr/m5huBrVMngjy/F4/7G7RD
 vSEx5aXqyagnrUcjsupx+biJ1QvbvZWOVxAE/6hNQNRGDt9gQtHAmKw1eG2mugHX
 +TFDxodNE4iWEURenkUxXW3mDx7hFbGZR0poHG3M/LVhKMAAAw0zoKrrUG5c70G7
 XrddRLGlk4Hf+2o7/D7B
 =SwaI
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 update from Ted Ts'o:
 "Lots of bug fixes, cleanups and optimizations.  In the bug fixes
  category, of note is a fix for on-line resizing file systems where the
  block size is smaller than the page size (i.e., file systems 1k blocks
  on x86, or more interestingly file systems with 4k blocks on Power or
  ia64 systems.)

  In the cleanup category, the ext4's punch hole implementation was
  significantly improved by Lukas Czerner, and now supports bigalloc
  file systems.  In addition, Jan Kara significantly cleaned up the
  write submission code path.  We also improved error checking and added
  a few sanity checks.

  In the optimizations category, two major optimizations deserve
  mention.  The first is that ext4_writepages() is now used for
  nodelalloc and ext3 compatibility mode.  This allows writes to be
  submitted much more efficiently as a single bio request, instead of
  being sent as individual 4k writes into the block layer (which then
  relied on the elevator code to coalesce the requests in the block
  queue).  Secondly, the extent cache shrink mechanism, which was
  introduce in 3.9, no longer has a scalability bottleneck caused by the
  i_es_lru spinlock.  Other optimizations include some changes to reduce
  CPU usage and to avoid issuing empty commits unnecessarily."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
  ext4: optimize starting extent in ext4_ext_rm_leaf()
  jbd2: invalidate handle if jbd2_journal_restart() fails
  ext4: translate flag bits to strings in tracepoints
  ext4: fix up error handling for mpage_map_and_submit_extent()
  jbd2: fix theoretical race in jbd2__journal_restart
  ext4: only zero partial blocks in ext4_zero_partial_blocks()
  ext4: check error return from ext4_write_inline_data_end()
  ext4: delete unnecessary C statements
  ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
  jbd2: move superblock checksum calculation to jbd2_write_superblock()
  ext4: pass inode pointer instead of file pointer to punch hole
  ext4: improve free space calculation for inline_data
  ext4: reduce object size when !CONFIG_PRINTK
  ext4: improve extent cache shrink mechanism to avoid to burn CPU time
  ext4: implement error handling of ext4_mb_new_preallocation()
  ext4: fix corruption when online resizing a fs with 1K block size
  ext4: delete unused variables
  ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents
  jbd2: remove debug dependency on debug_fs and update Kconfig help text
  jbd2: use a single printk for jbd_debug()
  ...
2013-07-02 09:39:34 -07:00
Jeff Layton 7b2296afb3 locks: give the blocked_hash its own spinlock
There's no reason we have to protect the blocked_hash and file_lock_list
with the same spinlock. With the tests I have, breaking it in two gives
a barely measurable performance benefit, but it seems reasonable to make
this locking as granular as possible.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:46 +04:00
Jeff Layton 3999e49364 locks: add a new "lm_owner_key" lock operation
Currently, the hashing that the locking code uses to add these values
to the blocked_hash is simply calculated using fl_owner field. That's
valid in most cases except for server-side lockd, which validates the
owner of a lock based on fl_owner and fl_pid.

In the case where you have a small number of NFS clients doing a lot
of locking between different processes, you could end up with all
the blocked requests sitting in a very small number of hash buckets.

Add a new lm_owner_key operation to the lock_manager_operations that
will generate an unsigned long to use as the key in the hashtable.
That function is only implemented for server-side lockd, and simply
XORs the fl_owner and fl_pid.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:45 +04:00
Jeff Layton 1c8c601a8c locks: protect most of the file_lock handling with i_lock
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:42 +04:00
Linus Torvalds da53be12bb Don't pass inode to ->d_hash() and ->d_compare()
Instances either don't look at it at all (the majority of cases) or
only want it to find the superblock (which can be had as dentry->d_sb).
A few cases that want more are actually safe with dentry->d_inode -
the only precaution needed is the check that it hadn't been replaced with
NULL by rmdir() or by overwriting rename(), which case should be simply
treated as cache miss.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:36 +04:00
Al Viro 2233f31aad [readdir] ->readdir() is gone
everything's converted to ->iterate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:04 +04:00
Al Viro 5c0ba4e076 [readdir] introduce iterate_dir() and dir_context
iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:46 +04:00
Namjae Jeon 696c018c77 f2fs: add remount_fs callback support
Add the f2fs_remount function call which will be used
during the filesystem remounting. This function
will help us to change the mount options specific to
f2fs.

Also modify the f2fs background_gc mount option, which
will allow the user to dynamically trun on/off the
garbage collection in f2fs based on the background_gc
value. If background_gc=on, Garbage collection will
be turned off & if background_gc=off, Garbage collection
will be truned on.

By default the garbage collection is on in f2fs.

Change Log:
v2: Incorporated the review comments by Gu Zheng.
    Removing the restore part for VFS flags
    Updating comments with proper flag conditions
    Display GC background option as ON/OFF
    Revised conditions to stop GC in case of remount

v1: Initial changes for adding remount_fs callback
support.

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: change /** with /* for the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-17 21:17:55 +09:00
Dave Chinner f763fd440e xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit d3eaace84e)
2013-06-06 10:51:34 -05:00
Dave Chinner d3eaace84e xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-05 11:21:06 -05:00
Anatol Pomozov f884ab15af doc: fix misspellings with 'codespell' tool
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:12 +02:00
Lukas Czerner d47992f86b mm: change invalidatepage prototype to accept length
Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
2013-05-21 23:17:23 -04:00
Linus Torvalds 983a5f84a4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
 "These are mostly fixes.  The biggest exceptions are Josef's skinny
  extents and Jan Schmidt's code to rebuild our quota indexes if they
  get out of sync (or you enable quotas on an existing filesystem).

  The skinny extents are off by default because they are a new variation
  on the extent allocation tree format.  btrfstune -x enables them, and
  the new format makes the extent allocation tree about 30% smaller.

  I rebased this a few days ago to rework Dave Sterba's crc checks on
  the super block, but almost all of these go back to rc6, since I
  though 3.9 was due any minute.

  The biggest missing fix is the tracepoint bug that was hit late in
  3.9.  I ran into problems with that in overnight testing and I'm still
  tracking it down.  I'll definitely have that fixed for rc2."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (101 commits)
  Btrfs: allow superblock mismatch from older mkfs
  btrfs: enhance superblock checks
  btrfs: fix misleading variable name for flags
  btrfs: use unsigned long type for extent state bits
  Btrfs: improve the loop of scrub_stripe
  btrfs: read entire device info under lock
  btrfs: remove unused gfp mask parameter from release_extent_buffer callchain
  btrfs: handle errors returned from get_tree_block_key
  btrfs: make static code static & remove dead code
  Btrfs: deal with errors in write_dev_supers
  Btrfs: remove almost all of the BUG()'s from tree-log.c
  Btrfs: deal with free space cache errors while replaying log
  Btrfs: automatic rescan after "quota enable" command
  Btrfs: rescan for qgroups
  Btrfs: split btrfs_qgroup_account_ref into four functions
  Btrfs: allocate new chunks if the space is not enough for global rsv
  Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
  btrfs: move leak debug code to functions
  Btrfs: return free space in cow error path
  Btrfs: set UUID in root_item for created trees
  ...
2013-05-09 13:07:40 -07:00
Linus Torvalds 942d33da99 f2fs updates for v3.10
This patch-set includes the following major enhancement patches.
 o introduce a new gloabl lock scheme
 o add tracepoints on several major functions
 o fix the overall cleaning process focused on victim selection
 o apply the block plugging to merge IOs as much as possible
 o enhance management of free nids and its list
 o enhance the readahead mode for node pages
 o address several cretical deadlock conditions
 o reduce lock_page calls
 
 The other minor bug fixes and enhancements are as follows.
 o calculation mistakes: overflow
 o bio types: READ, READA, and READ_SYNC
 o fix the recovery flow, data races, and null pointer errors
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRijCLAAoJEEAUqH6CSFDSg9kQAIqxmQzCUvCN3HcyVe8bGhKz
 8xhKrAY6ySRCKMuBbFRQsNrXUhckE3A44DgzYm5/gQikr/c8zhbqPVrtZ968eCKb
 wm3J+Re/uwZr5eOXlJEaHIiSkMDtERN7Cu2oYJWZi2B9wCSZcgvoWQ3c3LUVk6yF
 GFdi1Y00ll5tFKbEGbXSsfdul9P8jp0MmuMnWBBQZF3TrjETXMdThA5FXN0yTf9s
 XkcGE9vTCCPk8p7P3YmGGw6CwlaL8oallm0//iL4nMNpJzveq2C09IlY2BNrxU3L
 iTNXeIBdbhwXpnh2zq26Cy+cIEDIp0oXYui5BYdr/LWyWU3T/INa+hjUUszsESxF
 51LIUA1rA9nX/BSmj2QomswZ3lt4u5jl6rSBFKv3NG1KsFrAdb8S4tHukRSTSxAJ
 gzpY6kLT1+bgciA16F5W4yhzMYPN5hPa8s6hx4LHlpoqQICQsurjtS9KW7vncLFt
 ttmCMn8ehHcTzKRNNqYaBerCtSB3Z3G/uAy1y+DB7Zx2h2mqhCBXRalyRvs7RKvK
 d5OyYCpHntxuzDwVuivnr9Ddp30LUP1WqexxK+ykn99Ji3leMmffHP8Oari8w96b
 RxSbjoo8hOgoS5xZ4v3AaqtLDlBpxC6oWJzDaq/fJeKxOx22Z5BDFUM9mBGxrouJ
 AATl8b+cW/aTZ4l7WOPU
 =Hqii
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - introduce a new gloabl lock scheme
   - add tracepoints on several major functions
   - fix the overall cleaning process focused on victim selection
   - apply the block plugging to merge IOs as much as possible
   - enhance management of free nids and its list
   - enhance the readahead mode for node pages
   - address several cretical deadlock conditions
   - reduce lock_page calls

  The other minor bug fixes and enhancements are as follows.
   - calculation mistakes: overflow
   - bio types: READ, READA, and READ_SYNC
   - fix the recovery flow, data races, and null pointer errors"

* tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: cover free_nid management with spin_lock
  f2fs: optimize scan_nat_page()
  f2fs: code cleanup for scan_nat_page() and build_free_nids()
  f2fs: bugfix for alloc_nid_failed()
  f2fs: recover when journal contains deleted files
  f2fs: continue to mount after failing recovery
  f2fs: avoid deadlock during evict after f2fs_gc
  f2fs: modify the number of issued pages to merge IOs
  f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.
  f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
  f2fs: check truncation of mapping after lock_page
  f2fs: enhance alloc_nid and build_free_nids flows
  f2fs: add a tracepoint on f2fs_new_inode
  f2fs: check nid == 0 in add_free_nid
  f2fs: add REQ_META about metadata requests for submit
  f2fs: give a chance to merge IOs by IO scheduler
  f2fs: avoid frequent background GC
  f2fs: add tracepoints to debug checkpoint request
  f2fs: add tracepoints for write page operations
  f2fs: add tracepoints to debug the block allocation
  ...
2013-05-08 15:11:48 -07:00
Eric Sandeen c854a9909a btrfs: document mount options in Documentation/fs/btrfs.txt
Document all current btrfs mount options.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:26 -04:00
Linus Torvalds 1db772216f Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J Bruce Fields:
 "Highlights include:

   - Some more DRC cleanup and performance work from Jeff Layton

   - A gss-proxy upcall from Simo Sorce: currently krb5 mounts to the
     server using credentials from Active Directory often fail due to
     limitations of the svcgssd upcall interface.  This replacement
     lifts those limitations.  The existing upcall is still supported
     for backwards compatibility.

   - More NFSv4.1 support: at this point, if a user with a current
     client who upgrades from 4.0 to 4.1 should see no regressions.  In
     theory we do everything a 4.1 server is required to do.  Patches
     for a couple minor exceptions are ready for 3.11, and with those
     and some more testing I'd like to turn 4.1 on by default in 3.11."

Fix up semantic conflict as per Stephen Rothwell and linux-next:

Commit 030d794bf4 ("SUNRPC: Use gssproxy upcall for server RPCGSS
authentication") adds two new users of "PDE(inode)->data", but we're
supposed to use "PDE_DATA(inode)" instead since commit d9dda78bad
("procfs: new helper - PDE_DATA(inode)").

The old PDE() macro is no longer available since commit c30480b92c
("proc: Make the PROC_I() and PDE() macros internal to procfs")

* 'for-3.10' of git://linux-nfs.org/~bfields/linux: (60 commits)
  NFSD: SECINFO doesn't handle unsupported pseudoflavors correctly
  NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo()
  nfsd: make symbol nfsd_reply_cache_shrinker static
  svcauth_gss: fix error return code in rsc_parse()
  nfsd4: don't remap EISDIR errors in rename
  svcrpc: fix gss-proxy to respect user namespaces
  SUNRPC: gssp_procedures[] can be static
  SUNRPC: define {create,destroy}_use_gss_proxy_proc_entry in !PROC case
  nfsd4: better error return to indicate SSV non-support
  nfsd: fix EXDEV checking in rename
  SUNRPC: Use gssproxy upcall for server RPCGSS authentication.
  SUNRPC: Add RPC based upcall mechanism for RPCGSS auth
  SUNRPC: conditionally return endtime from import_sec_context
  SUNRPC: allow disabling idle timeout
  SUNRPC: attempt AF_LOCAL connect on setup
  nfsd: Decode and send 64bit time values
  nfsd4: put_client_renew_locked can be static
  nfsd4: remove unused macro
  nfsd4: remove some useless code
  nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED
  ...
2013-05-03 10:59:39 -07:00
Linus Torvalds c8d8566952 xfs: update for v3.10-rc1
For 3.10-rc1 we have a number of bug fixes and cleanups and a currently
 experimental feature from David Chinner, CRCs protection for metadata.
 CRCs are enabled by using mkfs.xfs to create a filesystem with the
 feature bits set.
 
 * numerous fixes for speculative preallocation
 * don't verify buffers on IO errors
 * rename of random32 to prandom32
 * refactoring/rearrangement in xfs_bmap.c
 * removal of unused m_inode_shrink in struct xfs_mount
 * fix error handling of xfs_bufs and readahead
 * quota driven preallocation throttling
 * fix WARN_ON in xfs_vm_releasepage
 * add ratelimited printk for different alert levels
 * fix spurious forced shutdowns due to freed Extent Free Intents
 * remove some obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
 * remove some obsoleted comments
 * (experimental) CRC support for metadata
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRgocuAAoJENaLyazVq6ZOXHUP+wbTG7P1cX33AeG9PErEJduU
 dpwzDmDn1n41pA5AY3/i5sm67qYSpF793LgI95n+wPxOh0LTDdKPqrLSEh5GAQ2c
 bAaax4UTmwFI8bnnHC2zcexyZX0tKDgfW8pxQe8i8xEh/bJalLFOLq7wTFfhAcQX
 8NqI1BXp6bN7arm37rsUXpOS+mNIXxc5UtpKhREQ1zDQ/J+tQ3dGjnUmmMj4CX7F
 iRKzezT/5YpuWPX0MGgfuUAbSnNDQ9d4tPumTHEuTYuDtVWrea8ZRGMnXs+dEd4l
 NoFpeo1R0XaGtWx/4jOnWnmt3D+O3/k03jrFLmoZQKSuBW27jDkE7RRIq7OPmEo2
 WVhDOO3I3CzoTGWfQ3BZ78dWF6rU/a5baxPmnkla4o4GIxyycgtARvsQWF97aeKO
 ImISIIBrBoifaElKOA+bDyP57EMe5DHSHAiMXGxuo/+djhTAxn5GugLwbes0u/sS
 95DAsGy4PPOKcFJfHJvS0i64+lw0yFmeGqfcQ9GwsXALvl2QmA79O9wB9qN+AaXY
 AwC7eeC3xWnG86aPtxmnK8vduEFXWdBZ2ZPZjtr2wVo+FC/46pRhUqK1cUyDQxXH
 jx5CIyxe+8snRs8eGYu4k6lwVbCH6ICzRhNMtOCB6e4c+bXBun5eAoGP6jln2g3F
 z+CvMq0/WBFJ+86wzJqz
 =hHSs
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "For 3.10-rc1 we have a number of bug fixes and cleanups and a
  currently experimental feature from David Chinner, CRCs protection for
  metadata.  CRCs are enabled by using mkfs.xfs to create a filesystem
  with the feature bits set.

   - numerous fixes for speculative preallocation
   - don't verify buffers on IO errors
   - rename of random32 to prandom32
   - refactoring/rearrangement in xfs_bmap.c
   - removal of unused m_inode_shrink in struct xfs_mount
   - fix error handling of xfs_bufs and readahead
   - quota driven preallocation throttling
   - fix WARN_ON in xfs_vm_releasepage
   - add ratelimited printk for different alert levels
   - fix spurious forced shutdowns due to freed Extent Free Intents
   - remove some obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
   - remove some obsoleted comments
   - (experimental) CRC support for metadata"

* tag 'for-linus-v3.10-rc1' of git://oss.sgi.com/xfs/xfs: (46 commits)
  xfs: fix da node magic number mismatches
  xfs: Remote attr validation fixes and optimisations
  xfs: Teach dquot recovery about CONFIG_XFS_QUOTA
  xfs: add metadata CRC documentation
  xfs: implement extended feature masks
  xfs: add CRC checks to the superblock
  xfs: buffer type overruns blf_flags field
  xfs: add buffer types to directory and attribute buffers
  xfs: add CRC protection to remote attributes
  xfs: split remote attribute code out
  xfs: add CRCs to attr leaf blocks
  xfs: add CRCs to dir2/da node blocks
  xfs: shortform directory offsets change for dir3 format
  xfs: add CRC checking to dir2 leaf blocks
  xfs: add CRC checking to dir2 data blocks
  xfs: add CRC checking to dir2 free blocks
  xfs: add CRC checks to block format directory blocks
  xfs: add CRC checks to remote symlinks
  xfs: split out symlink code into it's own file.
  xfs: add version 3 inode format with CRCs
  ...
2013-05-02 14:49:33 -07:00
Linus Torvalds 149b306089 Mostly performance and bug fixes, plus some cleanups. The one new
feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
 allows installation of a hidden inode designed for boot loaders.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRfpDwAAoJENNvdpvBGATwsjwP/17V3AN6XTEhZK80p3/qN5YD
 N2QIHeyYIqCGpczLs2TQEkxWX6nqpDggAPXY956wvgeEMQV+pQ+DLO4Ol9+p5WD2
 hrklleYhtOjFQ3Xh4lqrEi5FzKVzWagVDLqgUjALJ+D+hkDB7ZQT/fm2sH45rzot
 xBp3aVqANU8GqAAbEW4/Ng9ZGMx0dpANiU2svbjM71sv2dCLFmWAkz+GgZsMbuJZ
 vnKIZP6I6plwP3LuZzEbVCA7F2PzC4ywEOJKjIEvgHpX6uMDR3FX8pD5Dlo/o6e2
 eP+KLnD43mJMxBmTn22x5Sm0N6DUzJCEELRJWB9wCZoLdEvbEWRxT3qsPXfLWelG
 2jj4bImXF2CqYEsJww5FV2WdXXdnuM57pZym5vMZGAFyKPSCJobA4Y3XRdXkBfXf
 Gq/cFoPYv2EcBIhz3zrRj+tbY8esbO9wOnF6+x+AF10BspD2V7nuoVdWVhOf0A3v
 i9ifGPwLk3e3xHr9oXheo7IWn52oviZeyD77d7D7MLhgn+xU4LaVhW3R63Q+mI4D
 0TXG25R1CVcE7wyFy3gqSVXSCDO0JcQBL5LgcL+wAGXcHPAXqBpN2DFTPo+9fJH2
 g3YMwr+wMbci1XRVQ2vdTt/nBZYjOCh6PgRmg3KjTz11Ra5EsjQvYjKWYwqf2RGn
 QhCgbzd/qtZfNJztLvr7
 =GCT2
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Mostly performance and bug fixes, plus some cleanups.  The one new
  feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
  allows installation of a hidden inode designed for boot loaders."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
  ext4: fix type-widening bug in inode table readahead code
  ext4: add check for inodes_count overflow in new resize ioctl
  ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
  ext4: fix online resizing for ext3-compat file systems
  jbd2: trace when lock_buffer in do_get_write_access takes a long time
  ext4: mark metadata blocks using bh flags
  buffer: add BH_Prio and BH_Meta flags
  ext4: mark all metadata I/O with REQ_META
  ext4: fix readdir error in case inline_data+^dir_index.
  ext4: fix readdir error in the case of inline_data+dir_index
  jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
  ext4: mext_insert_extents should update extent block checksum
  ext4: move quota initialization out of inode allocation transaction
  ext4: reserve xattr index for Rich ACL support
  jbd2: reduce journal_head size
  ext4: clear buffer_uninit flag when submitting IO
  ext4: use io_end for multiple bios
  ext4: make ext4_bio_write_page() use BH_Async_Write flags
  ext4: Use kstrtoul() instead of parse_strtoul()
  ext4: defragmentation code cleanup
  ...
2013-05-01 08:04:12 -07:00
Namjae Jeon 27cf10e133 Documentation: update nfs option in filesystem/vfat.txt
Add descriptions about 'stale_rw' and 'nostale_ro' nfs options in
filesystem/vfat.txt

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:41 -07:00
Dave Chinner dccc3f447a xfs: add metadata CRC documentation
Add some documentation about the self describing metadata and the
code templates used to implement it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27 13:27:43 -05:00
Simo Sorce 030d794bf4 SUNRPC: Use gssproxy upcall for server RPCGSS authentication.
The main advantge of this new upcall mechanism is that it can handle
big tickets as seen in Kerberos implementations where tickets carry
authorization data like the MS-PAC buffer with AD or the Posix Authorization
Data being discussed in IETF on the krbwg working group.

The Gssproxy program is used to perform the accept_sec_context call on the
kernel's behalf. The code is changed to also pass the input buffer straight
to upcall mechanism to avoid allocating and copying many pages as tokens can
be as big (potentially more in future) as 64KiB.

Signed-off-by: Simo Sorce <simo@redhat.com>
[bfields: containerization, negotiation api]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-26 11:41:28 -04:00
Lukas Czerner 27dd438542 ext4: introduce reserved space
Currently in ENOSPC condition when writing into unwritten space, or
punching a hole, we might need to split the extent and grow extent tree.
However since we can not allocate any new metadata blocks we'll have to
zero out unwritten part of extent or punched out part of extent, or in
the worst case return ENOSPC even though use actually does not allocate
any space.

Also in delalloc path we do reserve metadata and data blocks for the
time we're going to write out, however metadata block reservation is
very tricky especially since we expect that logical connectivity implies
physical connectivity, however that might not be the case and hence we
might end up allocating more metadata blocks than previously reserved.
So in future, metadata reservation checks should be removed since we can
not assure that we do not under reserve.

And this is where reserved space comes into the picture. When mounting
the file system we slice off a little bit of the file system space (2%
or 4096 clusters, whichever is smaller) which can be then used for the
cases mentioned above to prevent costly zeroout, or unexpected ENOSPC.

The number of reserved clusters can be set via sysfs, however it can
never be bigger than number of free clusters in the file system.

Note that this patch fixes the failure of xfstest 274 as expected.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2013-04-09 22:11:22 -04:00
Dr. Tilmann Bubeck 393d1d1d76 ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT
Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
associated attributes (like i_blocks, i_size, i_flags, ...) from the
specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
typically used to store a boot loader in a secure part of the
filesystem, where it can't be changed by a normal user by accident.
The data blocks of the previous boot loader will be associated with
the given inode.

This usercode program is a simple example of the usage:

int main(int argc, char *argv[])
{
  int fd;
  int err;

  if ( argc != 2 ) {
    printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n");
    exit(1);
  }

  fd = open(argv[1], O_WRONLY);
  if ( fd < 0 ) {
    perror("open");
    exit(1);
  }

  err = ioctl(fd, EXT4_IOC_SWAP_BOOT);
  if ( err < 0 ) {
    perror("ioctl");
    exit(1);
  }

  close(fd);
  exit(0);
}

[ Modified by Theodore Ts'o to fix a number of bugs in the original code.]

Signed-off-by: Dr. Tilmann Bubeck <t.bubeck@reinform.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-08 12:54:05 -04:00
Changman Lee 1571f84a1f f2fs: update f2fs.txt related with discard at mkfs
o mkfs.f2fs supports no discard option.
 o fixed volume label size in 512 bytes.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-03 17:27:52 +09:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Jeff Layton ecf3d1f1aa vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
The following set of operations on a NFS client and server will cause

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30  # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat `.': Stale NFS file handle

Obviously, we should not be getting an ESTALE error back there since the
inode still exists on the server. The problem is that the lookup code
will call d_revalidate on the dentry that "." refers to, because NFS has
FS_REVAL_DOT set.

nfs_lookup_revalidate will see that the parent directory has changed and
will try to reverify the dentry by redoing a LOOKUP. That of course
fails, so the lookup code returns ESTALE.

The problem here is that d_revalidate is really a bad fit for this case.
What we really want to know at this point is whether the inode is still
good or not, but we don't really care what name it goes by or whether
the dcache is still valid.

Add a new d_op->d_weak_revalidate operation and have complete_walk call
that instead of d_revalidate. The intent there is to allow for a
"weaker" d_revalidate that just checks to see whether the inode is still
good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
special casing.

[AV: changed method name, added note in porting, fixed confusion re
having it possibly called from RCU mode (it won't be)]

Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:09 -05:00
Huajun Li 9268cc3523 f2fs: update f2fs document to reflect SIT/NAT layout correctly
document to reflect the layout generated by mkfs.f2fs .

Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-04 09:42:59 +09:00
Linus Torvalds 1f0377ff08 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS update from Al Viro:
 "fscache fixes, ESTALE patchset, vmtruncate removal series, assorted
  misc stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (79 commits)
  vfs: make lremovexattr retry once on ESTALE error
  vfs: make removexattr retry once on ESTALE
  vfs: make llistxattr retry once on ESTALE error
  vfs: make listxattr retry once on ESTALE error
  vfs: make lgetxattr retry once on ESTALE
  vfs: make getxattr retry once on an ESTALE error
  vfs: allow lsetxattr() to retry once on ESTALE errors
  vfs: allow setxattr to retry once on ESTALE errors
  vfs: allow utimensat() calls to retry once on an ESTALE error
  vfs: fix user_statfs to retry once on ESTALE errors
  vfs: make fchownat retry once on ESTALE errors
  vfs: make fchmodat retry once on ESTALE errors
  vfs: have chroot retry once on ESTALE error
  vfs: have chdir retry lookup and call once on ESTALE error
  vfs: have faccessat retry once on an ESTALE error
  vfs: have do_sys_truncate retry once on an ESTALE error
  vfs: fix renameat to retry on ESTALE errors
  vfs: make do_unlinkat retry once on ESTALE errors
  vfs: make do_rmdir retry once on ESTALE errors
  vfs: add a flags argument to user_path_parent
  ...
2012-12-20 18:14:31 -08:00
Al Viro 21e89c0c48 Merge branch 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into for-linus 2012-12-20 18:49:14 -05:00
Marco Stornelli b9f61c3c0c documentation: drop vmtruncate
Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20 18:47:08 -05:00
Linus Torvalds 982197277c Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields:
 "Included this time:

   - more nfsd containerization work from Stanislav Kinsbursky: we're
     not quite there yet, but should be by 3.9.

   - NFSv4.1 progress: implementation of basic backchannel security
     negotiation and the mandatory BACKCHANNEL_CTL operation.  See

       http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

     for remaining TODO's

   - Fixes for some bugs that could be triggered by unusual compounds.
     Our xdr code wasn't designed with v4 compounds in mind, and it
     shows.  A more thorough rewrite is still a todo.

   - If you've ever seen "RPC: multiple fragments per record not
     supported" logged while using some sort of odd userland NFS client,
     that should now be fixed.

   - Further work from Jeff Layton on our mechanism for storing
     information about NFSv4 clients across reboots.

   - Further work from Bryan Schumaker on his fault-injection mechanism
     (which allows us to discard selective NFSv4 state, to excercise
     rarely-taken recovery code paths in the client.)

   - The usual mix of miscellaneous bugs and cleanup.

  Thanks to everyone who tested or contributed this cycle."

* 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
  nfsd4: don't leave freed stateid hashed
  nfsd4: free_stateid can use the current stateid
  nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
  nfsd: warn on odd reply state in nfsd_vfs_read
  nfsd4: fix oops on unusual readlike compound
  nfsd4: disable zero-copy on non-final read ops
  svcrpc: fix some printks
  NFSD: Correct the size calculation in fault_inject_write
  NFSD: Pass correct buffer size to rpc_ntop
  nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
  nfsd: simplify service shutdown
  nfsd: replace boolean nfsd_up flag by users counter
  nfsd: simplify NFSv4 state init and shutdown
  nfsd: introduce helpers for generic resources init and shutdown
  nfsd: make NFSd service structure allocated per net
  nfsd: make NFSd service boot time per-net
  nfsd: per-net NFSd up flag introduced
  nfsd: move per-net startup code to separated function
  nfsd: pass net to __write_ports() and down
  nfsd: pass net to nfsd_set_nrthreads()
  ...
2012-12-20 14:04:11 -08:00
David Howells ef778e7ae6 FS-Cache: Provide proper invalidation
Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one.  The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.

Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that.  Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20 22:04:07 +00:00
David Howells 9f10523f89 FS-Cache: Fix operation state management and accounting
Fix the state management of internal fscache operations and the accounting of
what operations are in what states.

This is done by:

 (1) Give struct fscache_operation a enum variable that directly represents the
     state it's currently in, rather than spreading this knowledge over a bunch
     of flags, who's processing the operation at the moment and whether it is
     queued or not.

     This makes it easier to write assertions to check the state at various
     points and to prevent invalid state transitions.

 (2) Add an 'operation complete' state and supply a function to indicate the
     completion of an operation (fscache_op_complete()) and make things call
     it.  The final call to fscache_put_operation() can then check that an op
     in the appropriate state (complete or cancelled).

 (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
     govern the state of an object:

	(a) The ->n_ops is now the number of extant operations on the object
	    and is now decremented by fscache_put_operation() only.

	(b) The ->n_in_progress is simply the number of objects that have been
	    taken off of the object's pending queue for the purposes of being
	    run.  This is decremented by fscache_op_complete() only.

	(c) The ->n_exclusive is the number of exclusive ops that have been
	    submitted and queued or are in progress.  It is decremented by
	    fscache_op_complete() and by fscache_cancel_op().

     fscache_put_operation() and fscache_operation_gc() now no longer try to
     clean up ->n_exclusive and ->n_in_progress.  That was leading to double
     decrements against fscache_cancel_op().

     fscache_cancel_op() now no longer decrements ->n_ops.  That was leading to
     double decrements against fscache_put_operation().

     fscache_submit_exclusive_op() now decides whether it has to queue an op
     based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
     will persist in being true even after all preceding operations have been
     cancelled or completed.  Furthermore, if an object is active and there are
     runnable ops against it, there must be at least one op running.

 (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
     provide a function to record completion of the pages as they complete.

     When n_pages reaches 0, the operation is deemed to be complete and
     fscache_op_complete() is called.

     Add calls to fscache_retrieval_complete() anywhere we've finished with a
     page we've been given to read or allocate for.  This includes places where
     we just return pages to the netfs for reading from the server and where
     accessing the cache fails and we discard the proposed netfs page.

The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs.  This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.


FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090                  /DG965RY
RIP: 0010:[<ffffffffa007050a>]  [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00  EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS:  0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
 ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
 [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
 [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs]
 [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs]
 [<ffffffff810d8d47>] evict+0xa1/0x15c
 [<ffffffff810d8e2e>] dispose_list+0x2c/0x38
 [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b
 [<ffffffff810c56b7>] prune_super+0xd5/0x140
 [<ffffffff8109b615>] shrink_slab+0x102/0x1ab
 [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595
 [<ffffffff8103e009>] ? process_timeout+0xb/0xb
 [<ffffffff8109dba3>] kswapd+0x270/0x289
 [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46
 [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595
 [<ffffffff8104bf7a>] kthread+0x7f/0x87
 [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
 [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe
 [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53
 [<ffffffff813ad6b0>] ? gs_change+0xb/0xb

Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20 21:58:26 +00:00
Linus Torvalds a13eea6bd9 Introduce a new file system, Flash-Friendly File System (F2FS), to Linux 3.8.
Highlights:
 - Add initial f2fs source codes
 - Fix an endian conversion bug
 - Fix build failures on random configs
 - Fix the power-off-recovery routine
 - Minor cleanup, coding style, and typos patches
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQxuJcAAoJEEAUqH6CSFDSq80QAI3i7NgUkx4h225MnbJdEKRb
 YX1MfSPmgE0q/15XS2qQu/s9NGJmXLV1IR9EtRSBlCQjwWhbx9Q9URktGkWslFnx
 6mBLy8EvVKDMVdwoUS8ZY6IjfKbmSnoIHTZrGaT9+9d7k8nlOQLaj3qQF4wBuw1+
 +qhJQV642v8qw7JiVVFgxcBSLpAS9cbdOA0vxfWncMwmRLaEO45W5+rob8ZN8WaS
 BUiYIiue8vlB0VDIYfpl/sSPJC/Bn1XsLKZoS2WJl8CKioE1ptLjT3acUBbabUxp
 hNLl8Ae0PylDMFpH8hrBXhleznrVqEMOTos/Z80/UyBny2sCxJFnaQ60TayUo2l2
 hYk5Wbyj8K7IBJEke23Fepild2PnGz22zf2v+tLxxVgPH5j7/l2XHfy9gPvDbd1P
 4ENiJUC3LE49Mi4TvEIFqhbrcJfD9C+v3bxpWGe8CevrpYZaB8tv/6nQXJCC/Ixp
 tMWqLKlHyXGmk5DZpiSFaj0/GbTPT0UGqZVRzzSXQpKqxJU6eTnXDa6aLUEYH8fH
 grOCriaJrd8SgL3l7RokQSQEyRHuNjMm1tlUQWOObE+y0nJjWb9Amwn1yUtJuNzx
 Np4nnlMhxwJ48P3LeeheSCuOUbxJtOzOR8MVXm7deYiGQbYaqB1/+9TbjOZBSX4O
 fpbCXrmqe1pUBukftZsL
 =iMoX
 -----END PGP SIGNATURE-----

Merge tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull new F2FS filesystem from Jaegeuk Kim:
 "Introduce a new file system, Flash-Friendly File System (F2FS), to
  Linux 3.8.

  Highlights:
   - Add initial f2fs source codes
   - Fix an endian conversion bug
   - Fix build failures on random configs
   - Fix the power-off-recovery routine
   - Minor cleanup, coding style, and typos patches"

From the Kconfig help text:

  F2FS is based on Log-structured File System (LFS), which supports
  versatile "flash-friendly" features. The design has been focused on
  addressing the fundamental issues in LFS, which are snowball effect
  of wandering tree and high cleaning overhead.

  Since flash-based storages show different characteristics according to
  the internal geometry or flash memory management schemes aka FTL, F2FS
  and tools support various parameters not only for configuring on-disk
  layout, but also for selecting allocation and cleaning algorithms.

and there's an article by Neil Brown about it on lwn.net:

  http://lwn.net/Articles/518988/

* tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
  f2fs: fix tracking parent inode number
  f2fs: cleanup the f2fs_bio_alloc routine
  f2fs: introduce accessor to retrieve number of dentry slots
  f2fs: remove redundant call to f2fs_put_page in delete entry
  f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask
  f2fs: rewrite f2fs_bio_alloc to make it simpler
  f2fs: fix a typo in f2fs documentation
  f2fs: remove unused variable
  f2fs: move error condition for mkdir at proper place
  f2fs: remove unneeded initialization
  f2fs: check read only condition before beginning write out
  f2fs: remove unneeded memset from init_once
  f2fs: show error in case of invalid mount arguments
  f2fs: fix the compiler warning for uninitialized use of variable
  f2fs: resolve build failures
  f2fs: adjust kernel coding style
  f2fs: fix endian conversion bugs reported by sparse
  f2fs: remove unneeded version.h header file from f2fs.h
  f2fs: update the f2fs document
  f2fs: update Kconfig and Makefile
  ...
2012-12-20 13:54:52 -08:00
Cyrill Gorcunov e71ec59320 docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:28 -08:00
Cyrill Gorcunov f1d8c16298 docs: add documentation about /proc/<pid>/fdinfo/<fd> output
[akpm@linux-foundation.org: tweak documentation]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:28 -08:00
Kees Cook 2f4b3bf6b2 /proc/pid/status: add "Seccomp" field
It is currently impossible to examine the state of seccomp for a given
process.  While attaching with gdb and attempting "call
prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not
reliable.  If the process is in seccomp mode 1, this query will kill the
process (prctl not allowed), if the process is in mode 2 with prctl not
allowed, it will similarly be killed, and in weird cases, if prctl is
filtered to return errno 0, it can look like seccomp is disabled.

When reviewing the state of running processes, there should be a way to
externally examine the seccomp mode.  ("Did this build of Chrome end up
using seccomp?" "Did my distro ship ssh with seccomp enabled?")

This adds the "Seccomp" line to /proc/$pid/status.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Morris <jmorris@namei.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:22 -08:00
Cyrill Gorcunov 834f82e2aa procfs: add VmFlags field in smaps output
During c/r sessions we've found that there is no way at the moment to
fetch some VMA associated flags, such as mlock() and madvise().

This leads us to a problem -- we don't know if we should call for mlock()
and/or madvise() after restore on the vma area we're bringing back to
life.

This patch intorduces a new field into "smaps" output called VmFlags,
where all set flags associated with the particular VMA is shown as two
letter mnemonics.

[ Strictly speaking for c/r we only need mlock/madvise bits but it has been
  said that providing just a few flags looks somehow inconsistent.  So all
  flags are here now. ]

This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as
other applications may start to use these fields.

The data is encoded in a somewhat awkward two letters mnemonic form, to
encourage userspace to be prepared for fields being added or removed in
the future.

[a.p.zijlstra@chello.nl: props to use for_each_set_bit]
[sfr@canb.auug.org.au: props to use array instead of struct]
[akpm@linux-foundation.org: overall redesign and simplification]
[akpm@linux-foundation.org: remove unneeded braces per sfr, avoid using bloaty for_each_set_bit()]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:22 -08:00