[PATCH] VFS: update overview document
This patch updates the Documentation/filesystems/vfs.txt document. I rearranged and rewrote parts of the introduction chapter and added better headings for each section. I also added a description for the inode rename() operation which was missing and added links to some useful external VFS documentation. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This commit is contained in:
parent
b8887e6e8c
commit
cc7d1f8f96
|
@ -3,7 +3,7 @@
|
||||||
|
|
||||||
Original author: Richard Gooch <rgooch@atnf.csiro.au>
|
Original author: Richard Gooch <rgooch@atnf.csiro.au>
|
||||||
|
|
||||||
Last updated on August 25, 2005
|
Last updated on October 28, 2005
|
||||||
|
|
||||||
Copyright (C) 1999 Richard Gooch
|
Copyright (C) 1999 Richard Gooch
|
||||||
Copyright (C) 2005 Pekka Enberg
|
Copyright (C) 2005 Pekka Enberg
|
||||||
|
@ -11,62 +11,61 @@
|
||||||
This file is released under the GPLv2.
|
This file is released under the GPLv2.
|
||||||
|
|
||||||
|
|
||||||
What is it?
|
Introduction
|
||||||
===========
|
============
|
||||||
|
|
||||||
The Virtual File System (otherwise known as the Virtual Filesystem
|
The Virtual File System (also known as the Virtual Filesystem Switch)
|
||||||
Switch) is the software layer in the kernel that provides the
|
is the software layer in the kernel that provides the filesystem
|
||||||
filesystem interface to userspace programs. It also provides an
|
interface to userspace programs. It also provides an abstraction
|
||||||
abstraction within the kernel which allows different filesystem
|
within the kernel which allows different filesystem implementations to
|
||||||
implementations to coexist.
|
coexist.
|
||||||
|
|
||||||
|
VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so
|
||||||
|
on are called from a process context. Filesystem locking is described
|
||||||
|
in the document Documentation/filesystems/Locking.
|
||||||
|
|
||||||
|
|
||||||
A Quick Look At How It Works
|
Directory Entry Cache (dcache)
|
||||||
============================
|
------------------------------
|
||||||
|
|
||||||
In this section I'll briefly describe how things work, before
|
The VFS implements the open(2), stat(2), chmod(2), and similar system
|
||||||
launching into the details. I'll start with describing what happens
|
calls. The pathname argument that is passed to them is used by the VFS
|
||||||
when user programs open and manipulate files, and then look from the
|
to search through the directory entry cache (also known as the dentry
|
||||||
other view which is how a filesystem is supported and subsequently
|
cache or dcache). This provides a very fast look-up mechanism to
|
||||||
mounted.
|
translate a pathname (filename) into a specific dentry. Dentries live
|
||||||
|
in RAM and are never saved to disc: they exist only for performance.
|
||||||
|
|
||||||
|
The dentry cache is meant to be a view into your entire filespace. As
|
||||||
|
most computers cannot fit all dentries in the RAM at the same time,
|
||||||
|
some bits of the cache are missing. In order to resolve your pathname
|
||||||
|
into a dentry, the VFS may have to resort to creating dentries along
|
||||||
|
the way, and then loading the inode. This is done by looking up the
|
||||||
|
inode.
|
||||||
|
|
||||||
|
|
||||||
Opening a File
|
The Inode Object
|
||||||
--------------
|
----------------
|
||||||
|
|
||||||
The VFS implements the open(2), stat(2), chmod(2) and similar system
|
An individual dentry usually has a pointer to an inode. Inodes are
|
||||||
calls. The pathname argument is used by the VFS to search through the
|
filesystem objects such as regular files, directories, FIFOs and other
|
||||||
directory entry cache (dentry cache or "dcache"). This provides a very
|
beasts. They live either on the disc (for block device filesystems)
|
||||||
fast look-up mechanism to translate a pathname (filename) into a
|
or in the memory (for pseudo filesystems). Inodes that live on the
|
||||||
specific dentry.
|
disc are copied into the memory when required and changes to the inode
|
||||||
|
are written back to disc. A single inode can be pointed to by multiple
|
||||||
|
dentries (hard links, for example, do this).
|
||||||
|
|
||||||
An individual dentry usually has a pointer to an inode. Inodes are the
|
To look up an inode requires that the VFS calls the lookup() method of
|
||||||
things that live on disc drives, and can be regular files (you know:
|
the parent directory inode. This method is installed by the specific
|
||||||
those things that you write data into), directories, FIFOs and other
|
filesystem implementation that the inode lives in. Once the VFS has
|
||||||
beasts. Dentries live in RAM and are never saved to disc: they exist
|
the required dentry (and hence the inode), we can do all those boring
|
||||||
only for performance. Inodes live on disc and are copied into memory
|
things like open(2) the file, or stat(2) it to peek at the inode
|
||||||
when required. Later any changes are written back to disc. The inode
|
data. The stat(2) operation is fairly simple: once the VFS has the
|
||||||
that lives in RAM is a VFS inode, and it is this which the dentry
|
dentry, it peeks at the inode data and passes some of it back to
|
||||||
points to. A single inode can be pointed to by multiple dentries
|
userspace.
|
||||||
(think about hardlinks).
|
|
||||||
|
|
||||||
The dcache is meant to be a view into your entire filespace. Unlike
|
|
||||||
Linus, most of us losers can't fit enough dentries into RAM to cover
|
|
||||||
all of our filespace, so the dcache has bits missing. In order to
|
|
||||||
resolve your pathname into a dentry, the VFS may have to resort to
|
|
||||||
creating dentries along the way, and then loading the inode. This is
|
|
||||||
done by looking up the inode.
|
|
||||||
|
|
||||||
To look up an inode (usually read from disc) requires that the VFS
|
The File Object
|
||||||
calls the lookup() method of the parent directory inode. This method
|
---------------
|
||||||
is installed by the specific filesystem implementation that the inode
|
|
||||||
lives in. There will be more on this later.
|
|
||||||
|
|
||||||
Once the VFS has the required dentry (and hence the inode), we can do
|
|
||||||
all those boring things like open(2) the file, or stat(2) it to peek
|
|
||||||
at the inode data. The stat(2) operation is fairly simple: once the
|
|
||||||
VFS has the dentry, it peeks at the inode data and passes some of it
|
|
||||||
back to userspace.
|
|
||||||
|
|
||||||
Opening a file requires another operation: allocation of a file
|
Opening a file requires another operation: allocation of a file
|
||||||
structure (this is the kernel-side implementation of file
|
structure (this is the kernel-side implementation of file
|
||||||
|
@ -74,51 +73,39 @@ descriptors). The freshly allocated file structure is initialized with
|
||||||
a pointer to the dentry and a set of file operation member functions.
|
a pointer to the dentry and a set of file operation member functions.
|
||||||
These are taken from the inode data. The open() file method is then
|
These are taken from the inode data. The open() file method is then
|
||||||
called so the specific filesystem implementation can do it's work. You
|
called so the specific filesystem implementation can do it's work. You
|
||||||
can see that this is another switch performed by the VFS.
|
can see that this is another switch performed by the VFS. The file
|
||||||
|
structure is placed into the file descriptor table for the process.
|
||||||
The file structure is placed into the file descriptor table for the
|
|
||||||
process.
|
|
||||||
|
|
||||||
Reading, writing and closing files (and other assorted VFS operations)
|
Reading, writing and closing files (and other assorted VFS operations)
|
||||||
is done by using the userspace file descriptor to grab the appropriate
|
is done by using the userspace file descriptor to grab the appropriate
|
||||||
file structure, and then calling the required file structure method
|
file structure, and then calling the required file structure method to
|
||||||
function to do whatever is required.
|
do whatever is required. For as long as the file is open, it keeps the
|
||||||
|
dentry in use, which in turn means that the VFS inode is still in use.
|
||||||
For as long as the file is open, it keeps the dentry "open" (in use),
|
|
||||||
which in turn means that the VFS inode is still in use.
|
|
||||||
|
|
||||||
All VFS system calls (i.e. open(2), stat(2), read(2), write(2),
|
|
||||||
chmod(2) and so on) are called from a process context. You should
|
|
||||||
assume that these calls are made without any kernel locks being
|
|
||||||
held. This means that the processes may be executing the same piece of
|
|
||||||
filesystem or driver code at the same time, on different
|
|
||||||
processors. You should ensure that access to shared resources is
|
|
||||||
protected by appropriate locks.
|
|
||||||
|
|
||||||
|
|
||||||
Registering and Mounting a Filesystem
|
Registering and Mounting a Filesystem
|
||||||
-------------------------------------
|
=====================================
|
||||||
|
|
||||||
If you want to support a new kind of filesystem in the kernel, all you
|
To register and unregister a filesystem, use the following API
|
||||||
need to do is call register_filesystem(). You pass a structure
|
functions:
|
||||||
describing the filesystem implementation (struct file_system_type)
|
|
||||||
which is then added to an internal table of supported filesystems. You
|
|
||||||
can do:
|
|
||||||
|
|
||||||
% cat /proc/filesystems
|
#include <linux/fs.h>
|
||||||
|
|
||||||
to see what filesystems are currently available on your system.
|
extern int register_filesystem(struct file_system_type *);
|
||||||
|
extern int unregister_filesystem(struct file_system_type *);
|
||||||
|
|
||||||
When a request is made to mount a block device onto a directory in
|
The passed struct file_system_type describes your filesystem. When a
|
||||||
your filespace the VFS will call the appropriate method for the
|
request is made to mount a device onto a directory in your filespace,
|
||||||
specific filesystem. The dentry for the mount point will then be
|
the VFS will call the appropriate get_sb() method for the specific
|
||||||
updated to point to the root inode for the new filesystem.
|
filesystem. The dentry for the mount point will then be updated to
|
||||||
|
point to the root inode for the new filesystem.
|
||||||
|
|
||||||
It's now time to look at things in more detail.
|
You can see all filesystems that are registered to the kernel in the
|
||||||
|
file /proc/filesystems.
|
||||||
|
|
||||||
|
|
||||||
struct file_system_type
|
struct file_system_type
|
||||||
=======================
|
-----------------------
|
||||||
|
|
||||||
This describes the filesystem. As of kernel 2.6.13, the following
|
This describes the filesystem. As of kernel 2.6.13, the following
|
||||||
members are defined:
|
members are defined:
|
||||||
|
@ -197,8 +184,14 @@ A fill_super() method implementation has the following arguments:
|
||||||
int silent: whether or not to be silent on error
|
int silent: whether or not to be silent on error
|
||||||
|
|
||||||
|
|
||||||
|
The Superblock Object
|
||||||
|
=====================
|
||||||
|
|
||||||
|
A superblock object represents a mounted filesystem.
|
||||||
|
|
||||||
|
|
||||||
struct super_operations
|
struct super_operations
|
||||||
=======================
|
-----------------------
|
||||||
|
|
||||||
This describes how the VFS can manipulate the superblock of your
|
This describes how the VFS can manipulate the superblock of your
|
||||||
filesystem. As of kernel 2.6.13, the following members are defined:
|
filesystem. As of kernel 2.6.13, the following members are defined:
|
||||||
|
@ -286,9 +279,9 @@ or bottom half).
|
||||||
a superblock. The second parameter indicates whether the method
|
a superblock. The second parameter indicates whether the method
|
||||||
should wait until the write out has been completed. Optional.
|
should wait until the write out has been completed. Optional.
|
||||||
|
|
||||||
write_super_lockfs: called when VFS is locking a filesystem and forcing
|
write_super_lockfs: called when VFS is locking a filesystem and
|
||||||
it into a consistent state. This function is currently used by the
|
forcing it into a consistent state. This method is currently
|
||||||
Logical Volume Manager (LVM).
|
used by the Logical Volume Manager (LVM).
|
||||||
|
|
||||||
unlockfs: called when VFS is unlocking a filesystem and making it writable
|
unlockfs: called when VFS is unlocking a filesystem and making it writable
|
||||||
again.
|
again.
|
||||||
|
@ -317,8 +310,14 @@ field. This is a pointer to a "struct inode_operations" which
|
||||||
describes the methods that can be performed on individual inodes.
|
describes the methods that can be performed on individual inodes.
|
||||||
|
|
||||||
|
|
||||||
|
The Inode Object
|
||||||
|
================
|
||||||
|
|
||||||
|
An inode object represents an object within the filesystem.
|
||||||
|
|
||||||
|
|
||||||
struct inode_operations
|
struct inode_operations
|
||||||
=======================
|
-----------------------
|
||||||
|
|
||||||
This describes how the VFS can manipulate an inode in your
|
This describes how the VFS can manipulate an inode in your
|
||||||
filesystem. As of kernel 2.6.13, the following members are defined:
|
filesystem. As of kernel 2.6.13, the following members are defined:
|
||||||
|
@ -394,51 +393,62 @@ otherwise noted.
|
||||||
will probably need to call d_instantiate() just as you would
|
will probably need to call d_instantiate() just as you would
|
||||||
in the create() method
|
in the create() method
|
||||||
|
|
||||||
|
rename: called by the rename(2) system call to rename the object to
|
||||||
|
have the parent and name given by the second inode and dentry.
|
||||||
|
|
||||||
readlink: called by the readlink(2) system call. Only required if
|
readlink: called by the readlink(2) system call. Only required if
|
||||||
you want to support reading symbolic links
|
you want to support reading symbolic links
|
||||||
|
|
||||||
follow_link: called by the VFS to follow a symbolic link to the
|
follow_link: called by the VFS to follow a symbolic link to the
|
||||||
inode it points to. Only required if you want to support
|
inode it points to. Only required if you want to support
|
||||||
symbolic links. This function returns a void pointer cookie
|
symbolic links. This method returns a void pointer cookie
|
||||||
that is passed to put_link().
|
that is passed to put_link().
|
||||||
|
|
||||||
put_link: called by the VFS to release resources allocated by
|
put_link: called by the VFS to release resources allocated by
|
||||||
follow_link(). The cookie returned by follow_link() is passed to
|
follow_link(). The cookie returned by follow_link() is passed
|
||||||
to this function as the last parameter. It is used by filesystems
|
to to this method as the last parameter. It is used by
|
||||||
such as NFS where page cache is not stable (i.e. page that was
|
filesystems such as NFS where page cache is not stable
|
||||||
installed when the symbolic link walk started might not be in the
|
(i.e. page that was installed when the symbolic link walk
|
||||||
page cache at the end of the walk).
|
started might not be in the page cache at the end of the
|
||||||
|
walk).
|
||||||
|
|
||||||
truncate: called by the VFS to change the size of a file. The i_size
|
truncate: called by the VFS to change the size of a file. The
|
||||||
field of the inode is set to the desired size by the VFS before
|
i_size field of the inode is set to the desired size by the
|
||||||
this function is called. This function is called by the truncate(2)
|
VFS before this method is called. This method is called by
|
||||||
system call and related functionality.
|
the truncate(2) system call and related functionality.
|
||||||
|
|
||||||
permission: called by the VFS to check for access rights on a POSIX-like
|
permission: called by the VFS to check for access rights on a POSIX-like
|
||||||
filesystem.
|
filesystem.
|
||||||
|
|
||||||
setattr: called by the VFS to set attributes for a file. This function is
|
setattr: called by the VFS to set attributes for a file. This method
|
||||||
called by chmod(2) and related system calls.
|
is called by chmod(2) and related system calls.
|
||||||
|
|
||||||
getattr: called by the VFS to get attributes of a file. This function is
|
getattr: called by the VFS to get attributes of a file. This method
|
||||||
called by stat(2) and related system calls.
|
is called by stat(2) and related system calls.
|
||||||
|
|
||||||
setxattr: called by the VFS to set an extended attribute for a file.
|
setxattr: called by the VFS to set an extended attribute for a file.
|
||||||
Extended attribute is a name:value pair associated with an inode. This
|
Extended attribute is a name:value pair associated with an
|
||||||
function is called by setxattr(2) system call.
|
inode. This method is called by setxattr(2) system call.
|
||||||
|
|
||||||
getxattr: called by the VFS to retrieve the value of an extended attribute
|
getxattr: called by the VFS to retrieve the value of an extended
|
||||||
name. This function is called by getxattr(2) function call.
|
attribute name. This method is called by getxattr(2) function
|
||||||
|
call.
|
||||||
|
|
||||||
listxattr: called by the VFS to list all extended attributes for a given
|
listxattr: called by the VFS to list all extended attributes for a
|
||||||
file. This function is called by listxattr(2) system call.
|
given file. This method is called by listxattr(2) system call.
|
||||||
|
|
||||||
removexattr: called by the VFS to remove an extended attribute from a file.
|
removexattr: called by the VFS to remove an extended attribute from
|
||||||
This function is called by removexattr(2) system call.
|
a file. This method is called by removexattr(2) system call.
|
||||||
|
|
||||||
|
|
||||||
|
The Address Space Object
|
||||||
|
========================
|
||||||
|
|
||||||
|
The address space object is used to identify pages in the page cache.
|
||||||
|
|
||||||
|
|
||||||
struct address_space_operations
|
struct address_space_operations
|
||||||
===============================
|
-------------------------------
|
||||||
|
|
||||||
This describes how the VFS can manipulate mapping of a file to page cache in
|
This describes how the VFS can manipulate mapping of a file to page cache in
|
||||||
your filesystem. As of kernel 2.6.13, the following members are defined:
|
your filesystem. As of kernel 2.6.13, the following members are defined:
|
||||||
|
@ -502,8 +512,14 @@ struct address_space_operations {
|
||||||
it. An example implementation can be found in fs/ext2/xip.c.
|
it. An example implementation can be found in fs/ext2/xip.c.
|
||||||
|
|
||||||
|
|
||||||
|
The File Object
|
||||||
|
===============
|
||||||
|
|
||||||
|
A file object represents a file opened by a process.
|
||||||
|
|
||||||
|
|
||||||
struct file_operations
|
struct file_operations
|
||||||
======================
|
----------------------
|
||||||
|
|
||||||
This describes how the VFS can manipulate an open file. As of kernel
|
This describes how the VFS can manipulate an open file. As of kernel
|
||||||
2.6.13, the following members are defined:
|
2.6.13, the following members are defined:
|
||||||
|
@ -661,7 +677,7 @@ of child dentries. Child dentries are basically like files in a
|
||||||
directory.
|
directory.
|
||||||
|
|
||||||
|
|
||||||
Directory Entry Cache APIs
|
Directory Entry Cache API
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
There are a number of functions defined which permit a filesystem to
|
There are a number of functions defined which permit a filesystem to
|
||||||
|
@ -880,3 +896,22 @@ Papers and other documentation on dcache locking
|
||||||
1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124).
|
1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124).
|
||||||
|
|
||||||
2. http://lse.sourceforge.net/locking/dcache/dcache.html
|
2. http://lse.sourceforge.net/locking/dcache/dcache.html
|
||||||
|
|
||||||
|
|
||||||
|
Resources
|
||||||
|
=========
|
||||||
|
|
||||||
|
(Note some of these resources are not up-to-date with the latest kernel
|
||||||
|
version.)
|
||||||
|
|
||||||
|
Creating Linux virtual filesystems. 2002
|
||||||
|
<http://lwn.net/Articles/13325/>
|
||||||
|
|
||||||
|
The Linux Virtual File-system Layer by Neil Brown. 1999
|
||||||
|
<http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/vfs.html>
|
||||||
|
|
||||||
|
A tour of the Linux VFS by Michael K. Johnson. 1996
|
||||||
|
<http://www.tldp.org/LDP/khg/HyperNews/get/fs/vfstour.html>
|
||||||
|
|
||||||
|
A small trail through the Linux kernel by Andries Brouwer. 2001
|
||||||
|
<http://www.win.tue.nl/~aeb/linux/vfs/trail.html>
|
||||||
|
|
Loading…
Reference in New Issue